The Flax Genome 3031160606, 9783031160608

The Flax Genome is a comprehensive compilation of most recent studies focused on reference genome, genetic resources and

384 9 12MB

English Pages 300 [301] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface to the Series
Contents
1 Reference Genome Sequence of Flax
1.1 Introduction
1.2 Flax Genome Assemblies
1.2.1 The First Version of the Flax Genome Assembly for CDC Bethune
1.2.2 The Second Version of the Flax Genome Assembly for CDC Bethune: Chromosome-Level Pseudomolecules
1.2.3 Recent Assemblies Expanding the Representation to Both Morphotypes and to the Closest Wild Relative of Cultivated Flax
1.2.4 Quality Examination of Flax Genome Assemblies
1.3 Repeat Sequence
1.4 Gene Annotation
1.5 Non-coding RNAs
1.6 Chloroplast Genome
1.7 Concluding Remarks
Acknowledgements
References
2 Repeat DNA Sequences in Flax Genomes
2.1 Introduction
2.2 Types and Distribution of Repetitive DNA Sequences
2.3 Challenges in the Identification of Repeats Per Se
2.4 Repetitive Elements in Flax and Other Crop Species
2.5 Tools for the Identification of Repetitive Elements
2.5.1 RepeatMasker
2.5.2 RepeatModeler
2.5.3 LTR_Finder
2.5.4 LTRharvest
2.5.5 LTRAnnotator
2.5.6 LTR_Retriever
2.5.7 SINE_Scan
2.5.8 HelitronScanner
2.5.9 Miniature Inverted-Repeat Transposable Elements (MITE Tracker)
2.6 Case Study: A Comparative Analysis of Flax Genome TEs
2.7 Conclusion and Future Perspectives
References
3 Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative
3.1 Introduction
3.2 Taxonomy
3.3 Biology
3.4 Domestication
3.5 Genetics
3.6 Genomics
3.7 Utilization
3.8 Conservation
3.9 Future Research
Acknowledgements
References
4 Flax Breeding
4.1 Taxonomy, Origin, Domestication, and Use of Flax
4.2 Flax Production in the World
4.3 Flax Harvested Area in the World
4.4 Flax Seed Yield Per Unit Area
4.5 Genetic Diversity
4.6 Breeding Strategy
4.6.1 Mass Selection Method
4.6.2 Pure-Line Selection Method
4.6.3 Pedigree Method
4.6.4 Bulk Method
4.6.5 Single-Seed Descent Method
4.6.6 Marker-Assisted Breeding
4.7 Conclusion and Future Perspectives
References
5 QTL Mapping: Strategy, Progress, and Prospects in Flax
5.1 Introduction
5.2 Linkage Map-Based QTL Mapping (LM)
5.2.1 Bi-Parental Populations
5.2.2 Linkage Map Construction
5.2.3 Statistical Models for Linkage Map-Based QTL Mapping
5.3 Association Mapping
5.3.1 Genetic Populations
5.3.2 Statistical Models for Association Mapping
5.3.3 Selection of Threshold for Significant Marker-Trait Associations
5.3.4 Post-QTN Identification Analysis
5.3.4.1 Haplotype Blocks (HBs)
5.3.4.2 QTN Effects
5.3.4.3 Favorable Alleles of QTNs
5.4 Meta-Analysis of GWAS and QTLs
5.4.1 Meta-GWAS
5.4.2 Meta-QTL Analysis
5.5 Candidate Gene Identification
5.6 QTL Mapping in Flax
5.6.1 Yield and Agronomic Traits
5.6.1.1 Yield
5.6.1.2 Seed Size
5.6.1.3 Flowering and Maturity
5.6.2 Fiber Traits
5.6.3 Seed Quality Traits
5.6.3.1 Protein and Oil Content
5.6.3.2 Iodine Value and Linolenic Acid Content
5.6.3.3 Seed Mucilage and Hull Content
5.6.3.4 Seed Coat Color
5.6.4 Abiotic Traits
5.6.4.1 Drought Tolerance
5.6.4.2 Salt Tolerance
5.6.5 Biotic Traits
5.6.5.1 Pasmo
5.6.5.2 Powdery Mildew (PM)
5.6.5.3 Fusarium Wilt (FW)
5.7 Perspectives
Acknowledgements
References
6 Genetics of Abiotic Stress in Flax
6.1 Introduction
6.2 Genetics and QTL Identification Studies in Flax
6.2.1 Increasing Flax Genome Resources and Their Role in Abiotic Stress Study
6.2.1.1 Drought Stress
6.2.1.2 Salinity Stress
6.2.1.3 Heavy Metal Stress
6.3 Conclusions and Future Directions
References
7 QTL and Candidate Genes for Flax Disease Resistance
7.1 Introduction
7.2 Phenotypic Performance of Disease Resistance
7.2.1 Field Nurseries for Disease Resistance Phenotyping
7.2.2 Criteria for Field Nurseries for Disease Resistance Phenotyping
7.2.3 Broad-Sense Heritability of Flax Disease Traits
7.2.4 Flax Genetic Populations and Their Disease Resistance
7.3 Quantitative Trait Loci (QTLs) Associated with Flax Disease Resistance
7.3.1 Quantitative Trait Loci (QTLs) Associated with Pasmo Resistance
7.3.2 Quantitative Trait Loci (QTLs) Associated with Powdery Mildew Resistance
7.3.3 Quantitative Trait Loci (QTLs) Associated with Fusarium Wilt Resistance
7.4 Candidate Genes Co-localized with QTLs
7.4.1 Candidate Genes Associated with Pasmo (PAS) Resistance
7.4.2 Candidate Genes Associated with Powdery Mildew (PM) Resistance
7.4.3 Candidate Genes Associated with Fusarium Wilt Resistance
7.5 Conclusions
Acknowledgements
References
8 Key Stages of Flax Bast Fiber Development Through the Prism of Transcriptomics
8.1 Introduction
8.2 Intrusive Elongation
8.2.1 The Significance of Fiber Intrusive Elongation
8.2.2 Approaches to Reveal the Genes for Proteins Especially Important for a Fiber at Intrusive Growth Stage of Development
8.2.3 General Cell Physiology as Revealed by Transcriptomic Data
8.2.4 Cell Wall Rearrangement
8.2.5 Cytoskeleton
8.2.6 Regulation by Hormones, Transcription Factors, Kinases, and Other Regulatory Proteins
8.2.7 Other Genes Specifically Upregulated in Intrusively Growing Fibers
8.3 Tertiary Cell Wall Formation
8.3.1 Expression of CESA Genes and Genes for Putative Cofactors in Fibers Forming the Tertiary Cell Wall
8.3.2 Genes Encoding Proteins Associated with Rhamnogalacturonan I Metabolism
8.3.3 Other Genes Specifically Upregulated in Fibers with TCW
8.3.4 Transcription Factors Potentially Involved in TCW Formation
8.4 Pipeline Through Transcriptomics to Get Molecular Keys for Targeted Fiber Crop Improvement
8.5 MiRNA—The Potential Regulators of Gene Expression
8.6 Conclusions and Future Perspective
Acknowledgements
References
9 Metabolomics and Transcriptomics-Based Tools for Linseed Improvement
9.1 Introduction
9.2 Flax Metabolites and Metabolomics
9.2.1 Primary Metabolites
9.2.1.1 Lipids
9.2.1.2 Proteins
9.2.1.3 Starch
9.2.1.4 Cellulose and Lignin
9.2.2 Secondary Metabolites
9.2.2.1 Lignans
9.2.2.2 Cyanogenic Glucoside
9.2.3 Metabolomics Tools
9.2.3.1 Targeted Metabolomics
9.2.3.2 Untargeted Metabolomics
9.2.3.3 Analytical Chromatography Platforms and Tools
9.2.3.4 Spectroscopic Tools
Nuclear Magnetic Resonance (NMR) Spectroscopy
Synchrotron Light Source Spectroscopy
9.2.4 Metabolite Biosynthesis
9.2.4.1 Fatty Acids
9.2.4.2 Flax Lignans
9.2.4.3 Cyanogenic Glucosides
9.2.4.4 Interplay Between the Primary and Secondary Metabolite Pathways Leading to FA, Lignans, and CGs
9.3 Transcriptomics and Pathway Regulation
9.3.1 Transcriptomic Resources and Tools
9.3.1.1 Genomic Resources
9.3.1.2 Transcriptomic Platforms
9.3.2 Flax Transcriptomic
9.3.3 From Genome to Transcriptome to Metabolome to Phenotype
9.4 Concluding Remarks
Acknowledgements
References
10 Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax
10.1 Introduction
10.2 Classification of Plant Resistance Gene Analogs
10.3 Experimental Methods for Resistance Gene Identification
10.3.1 Gene Cloning
10.3.2 QTL Mapping for Resistance Genes
10.4 Computational Methods for RGA Identification
10.4.1 RGAugury
10.4.1.1 Support for Docker and PodMan
10.4.1.2 Support for the RNL (RPW8)
10.4.2 Machine Learning Based RGA Annotation Pipelines
10.4.3 Comparison of RGA Identification Tools
10.5 RGA Profile of Flax
10.6 Conclusion and Perspective
References
11 Genome-Editing Tools for Flax Genetic Improvement
11.1 Introduction
11.2 Gene Editing
11.2.1 Definition and History
11.2.1.1 Definition
11.2.1.2 History
11.2.2 Gene-Editing Tools and Processes
11.2.2.1 Gene-Editing Tools
ZFNs
TALENs
CRISPR
11.2.2.2 CRISPR Gene-Editing Process
SgRNA Design
SgRNA and Cas9 Delivery
Cargo Systems Used with CRISPR
Delivery Methods Used with CRISPR
Gene-Editing Validation
11.2.3 Gene Editing in Plants
11.3 Gene Editing in Flax
11.3.1 Success Stories in Flax Genetic Transformation and Gene-Editing
11.3.2 Tools Paving the Road for Flax Gene Editing
11.3.3 Flax Genetic Resources
11.3.4 Flax Traits of Interest for Gene Editing
11.3.4.1 Agronomic Plant Traits
Seed Traits
Fiber Traits
Flax and Biotic and Abiotic Stresses
Flax Gene-Edited Traits Regulatory Frameworks
11.4 Concluding Remarks
Acknowledgements
References
12 Genomics Assisted Breeding Strategy in Flax
12.1 Introduction
12.2 Factors Affecting Genomic Prediction (GP) Efficiency
12.2.1 Genomic Prediction (GP) Models
12.2.2 Training Populations (TRPs) and Relatedness to the Test Populations (TPs)
12.2.3 Markers
12.3 Improving Predictive Ability by QTL Markers
12.3.1 QTL Identification by Single- and Multi-locus GWAS
12.3.2 Genomic Heritability ({{\varvec{h}}^{\bf 2})
12.3.3 A Case Study for Drought and Root Traits
12.4 A QTL-Based Genomic Selection (GS) Strategy
12.4.1 GS Modeling
12.4.2 Test Population (TP) and Genomic Selection (GS)
12.4.3 GS Evaluation
12.4.4 Cost of GS
12.5 Parent Evaluation and Cross-Prediction
12.5.1 Genomic Cross-Prediction for Flax Improvement
12.5.2 Future Based: Integrated Flax Breeding Improvement Strategy
12.6 Conclusions and Future Prospects
References
13 Flax Genomic Resources and Databases
13.1 Introduction
13.2 Flax Genomic Resources
13.2.1 Sequences
13.2.2 Molecular Markers
13.2.3 Genetic and Physical Maps
13.2.4 Genome Assemblies
13.2.5 Quantitative Trait Loci (QTLs)/Nucleotides (QTNs)
13.3 Phenotypic Evaluation of Flax Genetic Populations
13.4 Genomics and Breeding Databases
13.4.1 NCBI
13.4.2 Phytozome
13.4.3 Flax TILLING Platform
13.4.4 Canadian National Gene Bank Information System—GRIN-Global-CA
13.4.5 Flax Variety Databases
13.4.6 International Flax Database (IFDB)
13.4.7 FlaxDB: A Flax Genome and Breeding Database
13.5 Perspectives
Acknowledgements
References
Recommend Papers

The Flax Genome
 3031160606, 9783031160608

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Compendium of Plant Genomes

Frank M. You Bourlaye Fofana   Editors

The Flax Genome

Compendium of Plant Genomes Series Editor Chittaranjan Kole, President, International Climate Resilient Crop Genomics Consortium (ICRCGC), President, International Phytomedomics and Nutriomics Consortium (IPNC) and President, Genome India International (GII), Kolkata, India

Whole-genome sequencing is at the cutting edge of life sciences in the new millennium. Since the first genome sequencing of the model plant Arabidopsis thaliana in 2000, whole genomes of about 100 plant species have been sequenced and genome sequences of several other plants are in the pipeline. Research publications on these genome initiatives are scattered on dedicated web sites and in journals with all too brief descriptions. The individual volumes elucidate the background history of the national and international genome initiatives; public and private partners involved; strategies and genomic resources and tools utilized; enumeration on the sequences and their assembly; repetitive sequences; gene annotation and genome duplication. In addition, synteny with other sequences, comparison of gene families and most importantly potential of the genome sequence information for gene pool characterization and genetic improvement of crop plants are described.

Frank M. You • Bourlaye Fofana Editors

The Flax Genome

123

Editors Frank M. You Ottawa Research and Development Centre Agriculture and Agri-Food Canada Ottawa, ON, Canada

Bourlaye Fofana Charlottetown Research and Development Centre Agriculture and Agri-Food Canada Charlottetown, PE, Canada

ISSN 2199-4781 ISSN 2199-479X (electronic) Compendium of Plant Genomes ISBN 978-3-031-16060-8 ISBN 978-3-031-16061-5 (eBook) https://doi.org/10.1007/978-3-031-16061-5 © His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

This book series is dedicated to my wife Phullara and our children Sourav and Devleena Chittaranjan Kole

Preface to the Series

Genome sequencing has emerged as the leading discipline in the plant sciences coinciding with the start of the new century. For much of the twentieth century, plant geneticists were only successful in delineating putative chromosomal location, function, and changes in genes indirectly through the use of a number of “markers” physically linked to them. These included visible or morphological, cytological, protein, and molecular or DNA markers. Among them, the first DNA marker, the RFLPs, introduced a revolutionary change in plant genetics and breeding in the mid-1980s, mainly because of their infinite number and thus potential to cover maximum chromosomal regions, phenotypic neutrality, absence of epistasis, and codominant nature. An array of other hybridization-based markers, PCR-based markers, and markers based on both facilitated construction of genetic linkage maps, mapping of genes controlling simply inherited traits, and even gene clusters (QTLs) controlling polygenic traits in a large number of model and crop plants. During this period, a number of new mapping populations beyond F2 were utilized and a number of computer programs were developed for map construction, mapping of genes, and mapping of polygenic clusters or QTLs. Molecular markers were also used in the studies of evolution and phylogenetic relationship, genetic diversity, DNA fingerprinting, and map-based cloning. Markers tightly linked to the genes were used in crop improvement by employing the so-called marker-assisted selection. These strategies of molecular genetic mapping and molecular breeding made a spectacular impact during the last one and a half decades of the twentieth century. But still, they remained “indirect” approaches for elucidation and utilization of plant genomes since much of the chromosomes remained unknown and the complete chemical depiction of them was yet to be unraveled. Physical mapping of genomes was the obvious consequence that facilitated the development of the “genomic resources” including BAC and YAC libraries to develop physical maps in some plant genomes. Subsequently, integrated genetic–physical maps were also developed in many plants. This led to the concept of structural genomics. Later on, emphasis was laid on EST and transcriptome analysis to decipher the function of the active gene sequences leading to another concept defined as functional genomics. The advent of techniques of bacteriophage gene and DNA sequencing in the 1970s was extended to facilitate the sequencing of these genomic resources in the last decade of the twentieth century. vii

viii

As expected, sequencing of chromosomal regions would have led to too much data to store, characterize, and utilize with the-then available computer software could handle. But the development of information technology made the life of biologists easier by leading to a swift and sweet marriage of biology and informatics, and a new subject was born—bioinformatics. Thus, the evolution of the concepts, strategies, and tools of sequencing and bioinformatics reinforced the subject of genomics—structural and functional. Today, genome sequencing has traveled much beyond biology and involves biophysics, biochemistry, and bioinformatics! Thanks to the efforts of both public and private agencies, genome sequencing strategies are evolving very fast, leading to cheaper, quicker, and automated techniques right from clone-by-clone and whole-genome shotgun approaches to a succession of second-generation sequencing methods. The development of software for different generations facilitated this genome sequencing. At the same time, newer concepts and strategies were emerging to handle sequencing of the complex genomes, particularly the polyploids. It became a reality to chemically—and so directly—define plant genomes, popularly called whole-genome sequencing or simply genome sequencing. The history of plant genome sequencing will always cite the sequencing of the genome of the model plant Arabidopsis thaliana in 2000 that was followed by sequencing the genome of the crop and model plant rice in 2002. Since then, the number of sequenced genomes of higher plants has been increasing exponentially, mainly due to the development of cheaper and quicker genomic techniques and, most importantly, the development of collaborative platforms such as national and international consortia involving partners from public and/or private agencies. As I write this preface for the first volume of the new series “Compendium of Plant Genomes”, a net search tells me that complete or nearly complete whole-genome sequencing of 45 crop plants, eight crop and model plants, eight model plants, 15 crop progenitors and relatives, and three basal plants is accomplished, the majority of which are in the public domain. This means that we nowadays know many of our model and crop plants chemically, i.e., directly, and we may depict them and utilize them precisely better than ever. Genome sequencing has covered all groups of crop plants. Hence, information on the precise depiction of plant genomes and the scope of their utilization are growing rapidly every day. However, the information is scattered in research articles and review papers in journals and dedicated web pages of the consortia and databases. There is no compilation of plant genomes and the opportunity of using the information in sequence-assisted breeding or further genomic studies. This is the underlying rationale for starting this book series, with each volume dedicated to a particular plant. Plant genome science has emerged as an important subject in academia, and the present compendium of plant genomes will be highly useful to both students and teaching faculties. Most importantly, research scientists involved in genomics research will have access to systematic deliberations on the plant genomes of their interest. Elucidation of plant genomes is of interest not only for the geneticists and breeders but also for practitioners of an array of plant science disciplines, such as taxonomy, evolution, cytology,

Preface to the Series

Preface to the Series

ix

physiology, pathology, entomology, nematology, crop production, biochemistry, and obviously bioinformatics. It must be mentioned that information regarding each plant genome is ever-growing. The contents of the volumes of this compendium are, therefore, focusing on the basic aspects of the genomes and their utility. They include information on the academic and/or economic importance of the plants, a description of their genomes from a molecular genetic and cytogenetic point of view, and the genomic resources developed. Detailed deliberations focus on the background history of the national and international genome initiatives, public and private partners involved, strategies and genomic resources and tools utilized, enumeration of the sequences and their assembly, repetitive sequences, gene annotation, and genome duplication. In addition, synteny with other sequences, comparison of gene families, and, most importantly, the potential of the genome sequence information for gene pool characterization through genotyping by sequencing (GBS) and genetic improvement of crop plants have been described. As expected, there is a lot of variation of these topics in the volumes based on the information available on the crop, model, or reference plants. I must confess that as the series editor, it has been a daunting task for me to work on such a huge and broad knowledge base that spans so many diverse plant species. However, pioneering scientists with lifetime experience and expertise in the particular crops did excellent jobs editing the respective volumes. I myself have been a small science worker on plant genomes since the mid-1980s and that provided me the opportunity to personally know several stalwarts of plant genomics from all over the globe. Most, if not all, of the volume editors, are my longtime friends and colleagues. It has been highly comfortable and enriching for me to work with them on this book series. To be honest, while working on this series, I have been and will remain a student first, a science worker second, and a series editor last. And I must express my gratitude to the volume editors and the chapter authors for providing me the opportunity to work with them on this compendium. I also wish to mention here my thanks and gratitude to the Springer staff, particularly Dr. Christina Eckey and Dr. Jutta Lindenborn for the earlier set of volumes and presently Ing. Zuzana Bernhart for all their timely help and support. I always had to set aside additional hours to edit books beside my professional and personal commitments—hours I could and should have given to my wife, Phullara, and our kids, Sourav and Devleena. I must mention that they not only allowed me the freedom to take away those hours from them but also offered their support in the editing job itself. I am really not sure whether my dedication to this compendium to them will suffice to do justice to their sacrifices for the interest of science and the science community. New Delhi, India

Chittaranjan Kole

Contents

1

Reference Genome Sequence of Flax . . . . . . . . . . . . . . . . . . . . Frank M. You, Ismael Moumen, Nadeem Khan, and Sylvie Cloutier

1

2

Repeat DNA Sequences in Flax Genomes . . . . . . . . . . . . . . . . Nadeem Khan, Hamna Shazadee, Frank M. You, and Sylvie Cloutier

19

3

Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yong-Bi Fu

4

Flax Breeding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mukhlesur Rahman and Ahasanul Hoque

5

QTL Mapping: Strategy, Progress, and Prospects in Flax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Frank M. You, Nadeem Khan, Hamna Shazadee, and Sylvie Cloutier

37 55

69

6

Genetics of Abiotic Stress in Flax . . . . . . . . . . . . . . . . . . . . . . . 101 Bijendra Khadka and Sylvie Cloutier

7

QTL and Candidate Genes for Flax Disease Resistance . . . . . 121 Chunfang Zheng, Khalid Y. Rashid, Sylvie Cloutier, and Frank M. You

8

Key Stages of Flax Bast Fiber Development Through the Prism of Transcriptomics . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Tatyana Gorshkova, Natalia Mokshina, Nobutaka Mitsuda, and Oleg Gorshkov

9

Metabolomics and Transcriptomics-Based Tools for Linseed Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Ashok Somalraju and Bourlaye Fofana

10 Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Pingchuan Li and Frank M. You

xi

xii

11 Genome-Editing Tools for Flax Genetic Improvement . . . . . . 235 Vanessa Clemis, Mohsin Zaidi, and Bourlaye Fofana 12 Genomics Assisted Breeding Strategy in Flax . . . . . . . . . . . . . 253 Nadeem Khan, Hamna Shazadee, Sylvie Cloutier, and Frank M. You 13 Flax Genomic Resources and Databases . . . . . . . . . . . . . . . . . 273 Pingchuan Li, Ismael Moumen, Sylvie Cloutier, and Frank M. You

Contents

1

Reference Genome Sequence of Flax Frank M. You, Ismael Moumen, Nadeem Khan, and Sylvie Cloutier

1.1

Introduction

Flax (Linum usitatissimum L., 2n = 2x = 30), also called common flax or linseed, is a selfpollinating crop belonging to the Linaceae family (Singh et al. 2011). Its domestication by humans started around 8000 to 10,000 years ago in the Near-Middle East during the Neolithic period. It then propagated to the Nile Valley, Europe, and finally to the rest of the world (Fu 2011). To meet the growing industry demand, flax is one of the few crops that is cultivated as two main morphotypes: fibre and linseed (Liu et al. 2011). The linseed-type flax is the oilseed type also known as flaxseed. These two morphotypes have different morphology and agronomic traits. The fibre-type accessions are generally taller and have few branches, greater straw strength, and fewer and smaller seeds than the linseed-type accessions, which are comparatively shorter, more branched, have greater seed weight, oil content, and seed yield (Diederichsen and Ulrich 2009; You et al. 2017). Fibre flax was one of the top three fibre crops used in the textile industry, whereas linseed flax

F. M. You (&)  I. Moumen  N. Khan  S. Cloutier Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada e-mail: [email protected]

ranked fifth oilseed crop in the world (Ottai et al. 2011). Flax has been widely cultivated in broad geographical regions (Fig. 1.1). In the last 25 years, the main fibre production regions were Europe (73.8%) and Asia (24.5%), while the Americas (41.2%), Asia (35.4%), and Europe (18.5%) were the leading producers of linseed (Fig. 1.2). Fibre flax is mainly grown in Western Europe, Russia, and China, while linseed flax is primarily cultivated in Canada, USA, China, India, Western Europe, Russia, and Kazakhstan (Foulk et al. 2004; You et al. 2016; Soni 2021). The fluctuations of linseed and fibre production by the main world producing regions from 1994– 2019 are presented in Fig. 1.3. In recent years, France has led fibre production, while Kazakhstan has become the top flax seed producer. Recent advancements in flax research have improved our level of knowledge regarding this crop. Specifically, genomic studies have produced large amounts of genomic data, providing the required resources to enhance flax genetic improvement using genomics-based technologies and strategies. One of the major achievements in the past decade was the release of the first flax reference genome sequence of the Canadian cultivar CDC Bethune (Wang et al. 2012), followed by its first version of chromosome-scale pseudomolecules (You et al. 2018). Recently, five more flax genotypes have been sequenced (Dmitriev et al. 2020; Zhang et al. 2020; Sa et al. 2021), providing additional genome sequences for the flax research community. These include

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_1

1

2

F. M. You et al.

Fig. 1.1 Average production of flax fibre and tow (a) and linseed (b) in the world from 1994–2020. Source FAOSTAT

Fig. 1.2 Average production of flax fibre and tow (a) and linseed (b) by region from 1994–2020. Source FAOSTAT

the Chinese linseed cultivar Longya-10, the Chinese fibre cultivars Heiya-14 and Yiya-5, and the Russian fibre cultivar Atlant, as well as one accession of pale flax (L. bienne), which is the closest wild relative of cultivated flax. This chapter briefly reviews the major advances in flax genome sequencing, assembly, annotation, and comparative analysis between these genome sequences.

1.2

Flax Genome Assemblies

A reference genome developed from a genotype of a species represents a standard of genome sequence that delivers both the nucleotide sequence of its chromosomes and its associated structural information for genomics studies, providing a basis for comparison with other genotypes within or between species. Since the first plant genome assembly, i.e., Arabidopsis

thaliana, was published in 2000 (Arabidopsis Genome Initiative 2000) and the first draft human reference genome released in 2001 (Lander et al. 2001), hundreds of plant species have been sequenced (Marks et al. 2021). The rapid evolution of sequencing technologies, the development of new genome scaffolding techniques, and the improvement of genome assembly algorithms and software tools (Ghurye and Pop 2019; Wee et al. 2019) have led to comprehensive and highquality genome assemblies covering long repeat sequence gaps (Zimin et al. 2017a; Nurk et al. 2022). The third generation of sequencing technologies includes PacBio single-molecule high fidelity (HiFi) sequencing (Hon et al. 2020) and ultra-long-read Oxford Nanopore technology (ONT) (Jain et al. 2016; Bocklandt et al. 2019). The new genome scaffolding techniques have optical mapping such as BioNano genome mapping (Lam et al. 2012) and Hi-C sequencing (Belton et al. 2012; Burton et al. 2013).

1

Reference Genome Sequence of Flax

3

Fig. 1.3 Fibre and tow (a) and linseed (b) production by the main world producing countries from 1994–2019. Source FAOSTAT

1.2.1 The First Version of the Flax Genome Assembly for CDC Bethune CDC Bethune (Rowland et al. 2002), the Canadian high-yielding and medium-late-maturing linseed flax cultivar, developed by the Crop Development Centre at the University of Saskatchewan, was selected for the development of the first flax reference genome proposed in the Genome Canada research project entitled “Total Utilization Flax Genomics (TUFGEN)” which ran between 2009–2014 (https://genomecanada. ca/project/total-utilization-flax-genomics/). The genome size of CDC Bethune was estimated at 368 Mb based on the bacterial artificial

chromosome (BAC)-based physical map (Ragupathy et al. 2011) and at nearly 373 Mb based on flow cytometry (Wang et al. 2012). This pioneering CDC Bethune sequencing project used a whole-genome sequencing (WGS) strategy based on the Illumina sequencing platform. A total of 25.88 Gb Illumina short reads, corresponding to 94X genome coverage from seven paired-end and mate-pair libraries, with an insert size of 300 bp to 10 Kb, were generated. De novo assembly was performed using SOAPdenovo (Li et al. 2009). This led to an assembly consisting of 116,602 contigs (302 Mb) or 88,384 supercontigs (scaffolds) (318 Mb), covering approximately 81% of the flax genome, estimated at 370 Mb (Wang et al.

4 Table 1.1 Statistics of the first version (Wang et al. 2012) of the flax genome assembly and its annotation as deposited and summarized in Phytozome

F. M. You et al. Item

Statistics

Annotation version

v1.0

Total scaffold length (bp)

318,250,901

Number of scaffolds

88,420

Scaffold L50

132

Scaffold N50 (bp)

693,492

Total contig length (bp)

302,186,967

Number of contigs (bp)

116,824

Contig L50

4427

Contig N50 (bp)

20,125

Number of protein-coding transcripts

43,484

Number of protein-coding genes

43,471

Percentage of eukaryote BUSCO genes

97.7

Percentage of embryophyte BUSCO genes

92.1

L50: the minimum number of scaffolds or contigs containing half of the assembly; N50: the length of the shortest scaffold or contig from the L50 set Source https://phytozome-next.jgi.doe.gov/

2012). This assembly was the first flax reference genome sequence and opened a new era to flax genomic studies despite its large number of short scaffolds, with a scaffold N50 of only * 693 Kb. The sequences and genome annotation information are now available to download from Phytozome (Table 1.1).

1.2.2 The Second Version of the Flax Genome Assembly for CDC Bethune: ChromosomeLevel Pseudomolecules A pseudomolecule refers to the DNA sequence assembly representing a biological chromosome or full genome. To achieve that the assembled scaffolds or contigs are sorted and assigned to individual chromosomes with the aid of consensus genetic maps (Cloutier et al. 2012), BAC-based physical maps (Luo et al. 2010), and more recent scaffolding technologies, including optical mapping, such as BioNano genome mapping (Hastie et al. 2013; Stankova et al. 2016) and Hi-C sequencing (Belton et al. 2012).

The first chromosome-level pseudomolecules of CDC Bethune (v2.0) (You et al. 2018) were constructed by integrating information from the BAC-based physical map (Ragupathy et al. 2011), the simple sequence repeat (SSR) markerbased consensus genetic map (Cloutier et al. 2012a), and the BioNano genome optical maps (You et al. 2018). The long scaffolds in the assembly v1.0 (Wang et al. 2012) were sorted and assigned to the 15 chromosomes of flax. This new 316.2 Mb assembly represented the 15 chromosomes of flax with sizes ranging from 15.6 Mb for chromosome (Chr) 15 to 29.4 Mb for Chr 1 (You et al. 2018) (Table 1.2, Fig. 1.4). The pseudomolecules contain * 47 Mb of gaps within original scaffolds generated by de novo assembly and scaffolding with mate-pair sequences and between sorted scaffolds estimated with BioNano genome maps. The 15 chromosome sequences were deposited in the NCBI database (CP027619–CP027633). This chromosome-scale reference sequence represents a significant improvement over the first version of the draft flax genome reference sequence (Wang et al. 2012), benefiting genome-wide SNP discovery, QTL identification, genome-wide association studies, and comparative genome analyses.

1

Reference Genome Sequence of Flax

5

Table 1.2 Chromosome-scale pseudomolecules of the 15 chromosomes of the linseed cultivars CDC Bethune (You et al. 2018) and Longya-10 (Zhang et al. 2020)) and the fibre cultivar Yiya-5 (Sa et al. 2021) Chr

CDC Bethune NCBI accessions

Yiya-5a

Longya-10 Size (Mb)

Gap (Mb)

NCBI accessions

Size (Mb)

Gap (Mb)

Size (Mb)

Gap (Kb)

1

CP027619.1

29.43

5.46

CM036262.1

22.66

0.36

31.79

0.82

2

CP027626.1

25.73

5.09

CM036263.1

22.13

0.26

29.85

0.60

3

CP027627.1

26.64

3.81

CM036264.1

21.91

0.36

26.36

0.40

4

CP027628.1

19.93

2.66

CM036265.1

21.42

0.31

24.86

0.40

5

CP027629.1

17.70

1.95

CM036266.1

20.60

0.35

24.52

0.60

6

CP027630.1

18.08

1.99

CM036267.1

19.56

0.33

28.49

0.70

7

CP027631.1

18.30

2.63

CM036268.1

18.79

0.38

17.72

0.60

8

CP027632.1

23.79

3.77

CM036269.1

18.16

0.38

31.05

0.40

9

CP027633.1

22.09

3.85

CM036270.1

17.74

0.21

32.04

0.90

10

CP027620.1

18.20

1.79

CM036271.1

17.30

0.29

33.28

0.50

11

CP027621.1

19.89

2.42

CM036272.1

16.85

0.36

31.41

0.20

12

CP027622.1

20.89

3.66

CM036273.1

17.15

0.24

19.88

0.40

13

CP027623.1

20.48

2.14

CM036274.1

16.35

0.24

21.42

0.20

14

CP027624.1

19.39

2.86

CM036275.1

16.86

0.26

39.91

0.40

15

CP027625.1

15.64

2.50

CM036276.1

14.75

0.25

30.52

0.40

316.17

46.58

282.23

4.57

423.10

7.52

Total

NCBI accession numbers are not available for Yiya-5 because its assembly and annotation files are deposited in Zenodo (https://doi.org/10.5281/zenodo.4872893) a

1.2.3 Recent Assemblies Expanding the Representation to Both Morphotypes and to the Closest Wild Relative of Cultivated Flax In recent years, five other flax genotypes, including one linseed and three fibre cultivars, as well as one wild relative of flax (pale flax, L. bienne) have also been sequenced and assembled. Zhang et al. (2020) performed a WGS of three flax genotypes: the linseed-type cultivar Longya-10, the fibre-type cultivar Heiya-14, and a pale flax accession (Fig. 1.5). Illumina pairedend reads of 68.2, 73.5, and 49.1 high-quality Gbp corresponding to 133, 142, and 93X genome coverage were generated for the three genotypes, respectively. De novo assemblies were performed using ALLPATH-LG (Gnerre et al. 2011), scaffolding with mate-pair information was conducted using SSPACE (Boetzer

et al. 2011), and gap-filling was performed using GapCloser from the SOAPdenovo2 package (Luo et al. 2012). As a result, assemblies of 306.0, 303.7, and 293.5 Mb genome sequences with the scaffold N50 of 1,235 Kb, 700 Kb, and 384 Kb were obtained for Longya-10, Heiya-14, and pale flax, respectively. Gaps in the assemblies were estimated at 5.8 Mb for Longya-10, 2.8 Mb for Heiya-14, and 5.6 Mb for the pale flax genome. Hi-C data and a genetic map were used to enhance the Longya-10 genome assembly, leading to 434 scaffolds totaling 295.7 Mb in length for the chromosomal-level assembly. The longest scaffolds corresponding to 15 chromosomes have a total length of 282.23 Mb (Table 1.2). Around the same time, Dmitriev et al. (2020) released the genome sequence of the Russian fibre cultivar Atlant using both ONT and Illumina platforms. A total of 8.4 Gb ONT long reads with an N50 of 12 Kb corresponding to

6

F. M. You et al.

Fig. 1.4 Circos map illustrating the 15 chromosome of CDC Bethune with Track A, scaffolds integrated in the pseudomolecules; Track B, BioNano contigs mapped to scaffolds; Track C, bacterial artificial chromosome (BAC)-based contigs mapped to scaffolds and BioNano contigs; Track D, frequency distribution of long terminal

repeats (LTRs) on chromosomes with bin sizes of 0.1 Mb; Track E, frequency distribution of resistance gene analogues (RGAs) on chromosomes with bin sizes of 0.1 Mb; Track F, heat map of genes with bin sizes of 0.1 Mb; and Track G, the central region showing wholegenome duplication (WGD). Source You et al. (2018)

23X flax genome coverage and 22.6 million 250 bp paired-end reads corresponding to 30X genome coverage were generated. The ONT reads were assembled separately by several assemblers, including Canu 2.0 (Koren et al. 2017), Flye 2.7 (Kolmogorov et al. 2019), Shasta 0.5.0 (Shafin et al. 2020), and wtdbg2 2.5 (Ruan and Li 2020). Contigs were polished using Illumina reads by Racon (Vaser et al. 2017), Medaka (https://github.com/nanoporetech/medaka), and POLCA in the assembler MaSuRCA (Zimin et al. 2017b) to improve the accuracy. A comparison of the assemblies generated by the different assemblers and polishing tools indicated that the most complete and accurate assembly was obtained using Canu combined with the polishing tools Racon + Medaka + POLCA.

This assembly was 361.7 Mb in length, but the N50 was only 350 Kb (Table 1.3). The Chinese fibre-type genotype Yiya-5, a high fibre-yielding cultivar bred by the Xinjiang Yili Institute of Agricultural Sciences, China, has also been sequenced using the PacBio HiFi sequencing technology. A total of 21.80 Gb of circular consensus sequence (CCS) reads were generated with an N50 of 12,191 bp. The reads were assembled using Hifiasm (v0.13-r308) (Chen et al. 2020), generating a draft assembly of 1,632 contigs totalling 537.51 Mb. Removal of the redundant haplotigs yielded a refined assembly (v1.0) of 336 contigs with an N50 of 9.61 Mb totalling 454.95 Mb. Using 58.61 Gb high-quality Hi-C data, the contigs in the assembly v1.0 were further scaffolded, resulting

1

Reference Genome Sequence of Flax

7

Fig. 1.5 Plant (a) and seed (b) morphology of pale flax, Longya-10, and Heiya-14. Source Zhang et al. (2020)

in 15 chromosome-length scaffolds totalling 423 Mb (v2.0) (Table 1.2) and covering 93.0% of the sequences in the assembly v1.0. For the chromosome-scale pseudomolecules, Longya-10 has a smaller genome assembly size with gaps (282.23 Mb) than CDC Bethune (316.17 Mb), but similar genome assembly size without gaps (277.7 Mb) compared to CDC Bethune (269.6 Mb) (Table 1.2). The similarity in pseudomolecule sizes between CDC Bethune and Longya-10 could be because a similar Illumina-based sequencing technology strategy was used in both projects. At 423 Mb, Yiya-5 v2.0 pseudomolecules yielded a larger genome assembly size than CDC Bethune and Longya-10, with chromosomes ranging from 17.72 Mb (Chr 7) to 39.91 Mb (Chr 14) and only 7.5 Kb of gaps. The Yiya-5 v2.0 genome assembly is highly collinear with the CDC Bethune v2.0 genome assembly except for the central regions of chromosomes that

likely contain the centromeric repeat sequences resolved in the Yiya-5 assembly but missing in the CDC Bethune v2.0 assembly (Sa et al. 2021) (Fig. 1.6), suggesting that PacBio HiFi sequencing scaffolded with Hi-C data has significantly improved the genome assembly by providing a more complete assembly of the repeat sequences.

1.2.4 Quality Examination of Flax Genome Assemblies The Benchmarking Universal Single-Copy Orthologue (BUSCO) is a widely adopted assessment tool for genome assembly quality that uses a predefined and expected set of single-copy marker genes as a proxy for genome-wide completeness (Manni et al. 2021). To compare the assembly quality of all seven flax assemblies, we performed BUSCO analyses of the released

39.91 9.61

Sa et al. (2021)

Dmitriev et al. (2020) 3.51

5.00

3.04 0.70

0.38

4.61 1.24

0.35

3.09 0.69

Zhang et al. (2020)

Longest scaffold (Mbp) N50 (Mbp)

Wang et al. (2012)

F. M. You et al.

References

8

genome assemblies (Table 1.3) using the same BUSCO eudicots_odb10 dataset (Fig. 1.7). Of the protein-coding genes, a high percentage of complete single-copy and duplicated genes (*95%) with approximately 4% missing genes was observed for the Atlant, Heiya-14, and Longya-10 assemblies, indicative of high assembly quality. Approximately 6–8% of the genes were missing in the assembly of Yiya-5 and pale flax, and * 9.5% were absent from the CDC Bethune assembly.

454.96 262 (  9885 bp) Fibre Yiya-5

PacBio HiFi, Hi-C

Canu 2.0 Flye 2.7 Shasta 0.5.0 wtdbg2 2.5 Oxford Nanopore, Illumina

Fibre Heiya-14

Illumina, Hi-C Linseed Longya-10

Hifiasm v0.13-r308 minimap2 2.17-r941

293.58

361.70

2609 (  883 bp)

2458 (  1012 bp) Fibre

Pale flax

303.68 2748 (  899 bp)

Atlant

306.00 1865 (  911 bp)

ALLPATH-LG SSPACE SOAPdenovo2

318.25 88,384 (  100 bp) Linseed CDC Bethune

Illumina

SOAPdenovo

Total size (Mbp) Scaffolds Assembler Sequence technology Morphotype Genotype

Table 1.3 Description of flax genome assemblies released between 2012 and 2021

1.3

Repeat Sequence

Genome annotation of assembled genome sequences aims to assign biological information to sequences. Genome annotation primarily consists of two steps: identification of nonprotein-coding elements, such as repetitive DNA sequences, including the major transposable element (TE) classes, and gene annotation that involves the prediction of protein-coding genes and their functional annotation. Based on the mechanism of transposition, TEs can be grouped into two major classes: Class I retrotransposons, transposing via a copy-andpaste mechanism involving RNA intermediates, and Class II DNA transposons, transposing through a simple cut-and-paste mechanism without an RNA intermediate (Wicker et al. 2007). TEs are extremely diverse, and the thousands of distinct TE families in plants (Feschotte et al. 2002; Morgante 2006) account for a large portion of the genome in many plant species. In eukaryotic genomes, TEs contribute substantially to the genome size and they play important roles in structural and functional genomics (Tollis and Boissinot 2012). These sequences are known to cause significant changes in genomes, reflecting evolutionary differences across species (Mehrotra and Goyal 2014). In the genus Linum, there are more than 200 diploid species, characterized by karyotype variabilities observed as size, number, and structure of chromosomes (Goldblatt 2007; Rice et al. 2014). Such variability is mostly determined by the amount and composition of repeated sequences. Using high-throughput

1

Reference Genome Sequence of Flax

9

Fig. 1.6 Genomic synteny similarity between Yiya-5 v2.0 and CDC Bethune v2.0. Source modified from Sa et al. (2021)

Fig. 1.7 BUSCO assessment results of seven flax assemblies including five cultivated flax cultivars (two linseed and three fibre) and one pale flax accession

genome sequencing, Bolsheva et al. (2019) performed a comparative study on repeat sequences in 12 blue-flowered flax species, including cultivated and wild flax. These Linum species were found to largely differ in their satellite DNA families and relative content in genomes. Their evolution was accompanied by waves of amplification of satellite DNAs and long terminal repeat (LTR) retrotransposons (Bolsheva et al. 2019).

The TE content of the flax genome assemblies ranged from 23 to 55% (Wang et al. 2012; Zhang et al. 2020; Sa et al. 2021) (Table 1.4). LTRs are the most prominent TE type, accounting for 36– 80% depending on the genotypes (Table 1.4). For example, LTRs accounted for 75.35%, 36.45%, 36.34%, 80.49%, and 36.47% of all TEs identified in the assemblies of flax genotypes CDC Bethune, Longya-10, Heiya-14, Yiya-5,

10

F. M. You et al.

Table 1.4 Transposable elements (TEs) in the genome assemblies of five flax genotypes showing the percentages of each TE types per genome and as a proportion of all identified TEs (in parentheses) Type

Sequence percentage (%) of genome and all TEs (in parentheses) CDC Bethune v1.0a

CDC Bethune v1.0b

Longya-10

Heiya-14

Pale flax

Yiya-5 v2.0

Class I: Retrotransposon LTR/Copia

9.79 (40.09)

9.30 (40.33)

7.93 (20.50)

7.66 (20.76)

7.55 (20.56)

5.67 (10.24)

LTR/Gypsy

8.31 (34.03)

7.89 (34.22)

6.12 (15.82)

5.53 (14.99)

5.79 (15.77)

14.70 (26.55)

LTR/Unknown

0.30 (1.23)

0.28 (1.21)

0.05 (0.13)

0.22 (0.60)

0.05 (0.14)

3.82 (6.90)

Other

2.22 (9.09)

2.11 (9.15)

9.68 (25.03)

8.88 (24.07)

10.84 (29.52)

2.11 (7.55)

Class I total

20.62 (84.44)

19.58 (84.91)

23.78 (64.48)

22.29 (64.01)

24.23 (65.99)

26.29 (47.49)

Class II: Transposon Class II total

3.80 (15.56)

3.62 (15.70)

4.84 (12.51)

4.79 (12.98)

4.13 (11.25)

9.98 (18.03)

Other repeats

NA

NA

10.06 (26.01)

9.82 (26.61)

8.36 (22.77)

19.09 (34.47)

Overall total

24.42 (100)

23.06 (100)

38.68 (100)

36.90 (100)

36.72 (100)

55.36 (100)

NA not available Sources Wang et al. (2012) for CDC Bethunea; Gonzalez and Deyholos (2012) for CDC Bethuneb; Zhang et al. (2020) for Longya-10, Heiya-14, and Pale flax; Sa et al. (2021) for Yiya-5

and a pale flax accession, respectively (Table 1.4). Besides the inherent genome features of flax genotypes, differences in TE proportions of the assemblies may be a consequence of the sequencing technologies used. The thirdgeneration sequencing platforms improved the completeness of genome assemblies because long reads can span entire repeat sequences. For example, the fibre cultivar Yiya-5 was sequenced using PacBio HiFi, generating an assembly of 454 Mb in size, of which 55% were repeat sequences. Both the size of the genome and the proportion of TEs in the Yiya-5 assembly are significantly greater than those in the other four flax genotypes (Tables 1.2 and 1.4). Another factor affecting TE identification could be the software tools and repeat libraries which differed across the flax assemblies (Agrios 2005; Gonzalez and Deyholos 2012; Wang et al. 2012; You et al. 2018; Zhang et al. 2020; Sa et al. 2021).

Therefore, a standardized TE identification procedure with a common set of software tools is required to provide a better comparative TE analysis of flax genomes (You et al. 2015).

1.4

Gene Annotation

The gene annotation of a draft assembly includes gene prediction and their functional annotation. There are three major strategies for proteincoding gene prediction of assembled genome sequences: ab initio-based, evidence-based, and/or combination thereof. Ab initio gene prediction methods use statistical models to identify intrinsic gene content and signals and predict potential protein-coding genes strictly based on the genome sequence. As such, ab initio gene prediction can identify putative genes even if they share no similarity to known gene sequences or protein domains. On the other hand, some

1

Reference Genome Sequence of Flax

may also be erroneous calls. The evidence-based methods predict genes based on evidence for their transcription obtained from cDNAs, RNAseq data, or other gene expression data such as PacBio IsoSeq for example. The accuracy of gene prediction relies on their expression in the sample(s) sequenced and the integrity of the sample(s). Some real genes may potentially be missed because they are not represented in the sample(s) or because their expression and/or the quality of the sample(s) were too low. These may be captured based on statistical models; hence, a strategy that combines evidence-based and ab initio approaches by mapping proteins, expressed sequence tags (ESTs), and RNA-seq data to the target genome to validate predicted gene structures outperforms the individual strategies through complementarity (Holt and Yandell 2011; Hoff et al. 2016; Bruna et al. 2021). In the last decade, some combined approachbased software tools have been implemented and continuously improved. These tools integrate ab initio gene prediction, evidence from transcripts, and homology-based gene prediction, which relies on gene models of related and wellannotated species into an automatic pipeline, increased accuracy of protein-coding gene prediction, and they provide an efficient way to solve some computational complexities (Holt and Yandell 2011; Hoff et al. 2016; Bruna et al. 2021). Maker2 (Holt and Yandell 2011) is a pipeline that integrates ab initio gene predictors, including SNAP (Korf 2004), Augustus (Stanke et al. 2006), GeneMark (Lomsadze et al. 2014), and RNA-seq data, whereas Braker2 is a more recent pipeline for unsupervised RNA-seq-based genome annotation that combines the advantages of GeneMark-EP+ (Lomsadze et al. 2014) and Augustus (Stanke et al. 2006). Ab initio gene prediction for the assemblies of the six flax genotypes (CDC Bethune, Longya10, Heiya-14, Atlant, Yiya-5, and pale flax accession) has been performed (Wang et al. 2012; Dmitriev et al. 2020; Sa et al. 2021; Zhang et al. 2021). In these annotations, Augustus was used for ab initio gene prediction followed by validation using ESTs (CDC Bethune) or RNA-

11

seq data sequenced from different tissues, including stem and boll tissues (Longya-10, Heiya-14, and pale flax), and leaf, stem, flower, root, and bolls (Yiya-5). A total of 43,484 genes with an average gene size of 2307 bp were predicted from the first version of the CDC Bethune genome, with 89.5% of them aligned to one or more proteins in the NCBI nr protein database (Wang et al. 2012). A similar number of proteincoding genes, approximately 43,500, with similar gene lengths (2.3–2.5 Kb) were predicted for Longya-10 (linseed), Heiya-14 (fibre), and the pale flax accession. More protein-coding genes were predicted for the fibre genotypes Yiya-5 (49,616) and Atlant (77,522), which were sequenced using the third-generation sequencing technologies PacBio HiFi and ONT, respectively, and produced larger genome assemblies (454.96 Mb for Yiya-5 and 361.80 Mb for Atlant) than those obtained using Illumina short reads (Table 1.5). Of note, the Yiya-5 predicted genes are larger (3.7 Kb) than that of the other four flax genotypes (Table 1.5). Functional annotation is the process of assigning biological information to the predicted genes. The homology-based sequence alignment tool BLAST and some other bioinformatics tools such as InterProScan program (Jones et al. 2014), eggNOG-mapper (Huerta-Cepas et al. 2017), and DIAMOND (Buchfink et al. 2015) are commonly used for functional annotation. They are based on dedicated databases, including eggNOG 5.0 (Huerta-Cepas et al. 2019), GO (Harris et al. 2004), KEGG (Kanehisa et al. 2002), Pfam (Mistry et al. 2021), Swiss-Prot (Bairoch and Apweiler 2000), and NCBI non-redundant protein database nr. Table 1.6 lists the number of predicted genes that have been annotated using some common gene annotation databases (Wang et al. 2012; Dmitriev et al. 2020; Sa et al. 2021; Zhang et al. 2021). In the CDC Bethune assembly v1.0, 39,288 of 43,484 predicted genes aligned to one or more Arabidopsis proteins, resulting in 35,727 predicted genes with assigned functions from the protein annotation of Arabidopsis genes (Wang et al. 2012). In the assembly v2.0 of Yiya-5, of the 49,616 protein-coding genes, 34,938

12

F. M. You et al.

Table 1.5 Genes predicted from the genome assemblies of six flax genotypes Cultivar

Assembly

Gene prediction tools used

No. of proteincoding genes

Average gene length (bp)

Average exon length (bp)

References

CDC Bethune

v1.0

Augustus v. 2.5.5, GLIMMERHMM v. 3.0.1

43,484

2,307

237

Wang et al. (2012)

Longya10

v1.0

43,668

2,505

238

Heiya14

v1.0

Genscan v1.0, Augustus v2.5.5 GlimmerHMM v3.0.1, GeneID v1.3, SNAP

43,826

2,501

236

Zhang et al. (2020)

Pale flax

v1.0

43,424

2,344

231

Atlant

v1.0

PASA 2.4.1, Augustus 3.3.3, GlimmerHMM 3.0.4, SNAP v. 2006-07-28, GeneMark 4.61, CodingQuarry 2.0, EvidenceModeller 1.1.1

77,522

NA

NA

Dmitriev et al. (2020)

Yiya-5

v2.0

Braker2 v2.1.6 with HISAT2 v2.1.0, Augustus v3.4.0, GUSHR v1.0.0

49,616

3,702

215

Sa et al. (2021)

NA not available Table 1.6 Number of annotated protein-coding genes for six flax genotypes Database

CDC Bethune v1.0

Longya-10

Heiya-14

Pale flax

Atlant

Yiya-5 v2.0

KOG

17,319

25,055

15,775

21,540

NA

NA

GO

23,571

24,919

25,798

22,268

NA

22,600

KEGG

5999

9450

9677

13,978

NA

21,611

Pfam

32,166

NA

NA

NA

18,946

34,938

eggNOG

NA

NA

NA

NA

19,741

42,697

UniProt

NA

NA

NA

NA

3725

NA

Swiss-Prot

NA

33,005

34,147

27,472

NA

34,654

NA not available Source Wang et al. (2012) for CDC Bethune v1.0; Zhang et al. (2020) for Longya-10, Heiya-14, and Pale flax; Dmitriev et al. (2020) for Atlant; and Sa et al. (2021) for Yiya-5

(70.42%), 42,697 (86.05%), 22,600 (45.55%), 21,611 (43.56%), 34, 654 (69.84%), and 41,847 (84.34%) genes had significant hits with Pfam, eggNOG, GO, KEGG, Swiss-Prot, and nr databases, respectively. Overall, 43,364 (87.40%) genes were successfully annotated with at least one database (Sa et al. 2021). In the assembly of Atlant, 18,946 predicted genes were successfully annotated using the Pfam database, 19,741 using eggNOG, and 3,725 using UniProt (Dmitriev

et al. 2020). In the assemblies of Longya-10, Heiya-14, and pale flax, even though similar numbers of the predicted genes were obtained from three genotypes (Table 1.5), large differences in hits to several annotation databases were observed (Table 1.6) (Zhang et al. 2020). Overall, the annotation results of the four genome sequencing studies differed substantially (Wang et al. 2012; Dmitriev et al. 2020; Sa et al. 2021; Zhang et al. 2021).

1

Reference Genome Sequence of Flax

1.5

13

Non-coding RNAs

Non-coding RNAs (ncRNAs) comprise the majority of cellular RNAs and are a part of the transcriptome without having protein-coding roles. They play important roles in diverse biological processes, including translation (tRNA and rRNA), synthesis of the translational apparatus (snRNA), and gene regulation (miRNA). The genome sequence of four flax genotypes has been annotated for functional ncRNAs (Wang et al. 2012; Zhang et al. 2021). More than 700 copies of both tRNAs and rRNAs and 115–297 copies of putative miRNA precursor loci were identified from the four assemblies. More ncRNA copies were detected in CDC Bethune than in the flax cultivars Longya-10 and Heiya14 and the pale flax accession (Table 1.7).

1.6

Chloroplast Genome

The chloroplast (cp) is the photosynthetic double membrane-bound organelle that converts light energy to carbohydrates in plants and algae. The size of the chloroplast genome (plastome) of autotrophic angiosperms is generally conserved (Guo et al. 2021). The average cp genome size of land plants is 151 Kb, with most ranging from 130 to 170 Kb (Guo et al. 2021). The majority of cp genomes are circular DNA molecules with two copies of inverted repeats (IRs) of approximately 20–28 Kb, one large single-copy region (LSC) of 80–90 Kb, and one small single-copy

region (SCR) of 16–27 Kb (Jansen et al. 2005; Li and Zheng 2018). The large IR might help to protect the cp genome from major structural changes (Wu et al. 2011; Wu and Chaw 2014). The cp genomes were found to have highly conserved gene content and order and have been widely used for plant species identification, taxonomy, and phylogenetic analysis (Raubeson and Jansen 2005). Thus, cp genome sequencing has revealed significant sequence and structural variation within and between plant species, such as SNPs, indels, small inversions, and inverted repeats, which have proven to be valuable resources in the study of plant genome evolution (Borsch and Quandt 2009). This knowledge has also been useful in improving our understanding of the climatic adaptation of economically important crops, as well as the identification and study of important traits in closely related species (Wambugu et al. 2015; Brozynska et al. 2016). However, some angiosperms, such as the Campanulaceae (Knox 2014; Hong et al. 2017), Geraniaceae (Guisinger et al. 2008; Marcussen and Meseguer 2017), and some legume family species (Schwarz et al. 2015) are prone to largescale rearrangements. The rapid advances in chloroplast genetics and genomics have been greatly facilitated by the advent of high-throughput sequencing technologies. Since the first cp genome from tobacco (Nicotiana tabacum) was sequenced 35 years ago (Shinozaki et al. 1986), the National Center for Biotechnology Information (NCBI) organelle genome database has grown substantially and

Table 1.7 Copy number of non-coding RNAs in four flax genotypes CDC Bethune v1.0

Longya-10

Heiya-14

Pale flax

rRNA

1100

955

722

866

tRNA

1100

965

986

969

miRNA

297

126

115

128

snRNA

462

207

202

184

Accession

snoRNA Total

NA

555

543

534

2959

2808

2568

2681

rRNA ribosomal ribonucleic acid; tRNA transfer RNA; miRNA microRNA; snRNA small nuclear RNA; snoRNA small nucleolar RNA; NA not available Source Wang et al. (2012) for CDC Bethune v1.0; and Zhang et al. (2020) for Longya-10, Heiya-14 and Pale flax

14

F. M. You et al.

Table 1.8 Summary of chloroplast (cp) genomes in some plant species Species

CPG (bp)

IRs (bp)

LSC (bp)

SSC (bp)

References

Flax

156,721

31,990

81,767

10,974

de Santana Lopes et al. (2018)

Tomato

155,443–155,561

25,612–25,639

85,857–85,911

18,362–18,387

Wu (2016)

Soybean

52,218

25,574

83,175

17,895

Saski et al. (2005)

Wheat

135,766

21,487–21,485

80,003

12,791

Fu (2021)

Aegilops

135,502

21,483–21,481

79,766

12,772

Fu (2021)

Grape

160,928

26,358

89,147

19,065

Jansen et al. (2006)

Black mustard

153,633

26,193

83,552

17,695

Seol et al. (2017)

Cabbage

153,366

26,197

83,137

17,835

Seol et al. (2017)

CPG chloroplast genome; IRs inverted repeats; LSC large single copy; SSC small single copy

now has more than 4200 entries of cp genome sequences of land plants (Guo et al. 2021). Several plant cp genomes are listed in Table 1.8. The cp genome of flax was completely sequenced by de Santana Lopes et al. (2018). Flax has a circular cp genome of 156,721 bp with IRs of 31,990 bp separating the LSC of 81,767 bp and the SSC of 10,974 bp and containing 109 unique genes and two pseudogenes (rpl23 and ndhF) (de Santana Lopes et al. 2018). In addition, 176 SSRs, 20 tandem repeats, and 39 dispersed repeats were also identified (de Santana Lopes et al. 2018).

1.7

Concluding Remarks

The availability of high-quality reference genomes has a significant impact on the understanding of genome structure and function, species evolution, as well as applications in genetics and breeding. In the last decade, the first reference genome (v1.0) and its subsequent chromosomescale pseudomolecule iteration (v2.0) of the flax cultivar CDC Bethune have been widely used as a reference in genomic studies and breeding applications. An additional five flax genotypes, including linseed, fibre flax, and the closely related wild flax (pale flax), have also been sequenced using different sequencing platforms. Their genome assemblies and annotations constitute precious genomic resources for genome-wide comparative analyses. It is noteworthy that all

these assemblies and their annotations have revealed large variations in genome assembly size (304–455 Mb), predicted protein-coding genes (43,424–77,522 with an average gene size of 2.3– 3.7 Kb), and repeat content (23–55% of the genome). However, to date, insufficient evidence exists to conclude that these variations are due to inherent genome features of the flax genotypes because they were generated by different laboratories using different sequencing technologies, computational tools, and combinations thereof. Therefore, additional genome size information of the sequenced genotypes (such as estimate by flow cytometry) and genome annotation using consistent software tools and criteria are warranted to improve the comparability across genotypes. However, such comparisons will remain hindered by the limits imposed by the sequencing and assembly strategies employed. Acknowledgements The authors thank Dr. Bourlaye Fofana for reviewing and editing and Tara Edwards for English editing.

References Agrios GN (2005) Plant Pathology. Elsevier Academic Press, Amsterdam Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28:45–48

1

Reference Genome Sequence of Flax

Belton JM, McCord RP, Gibcus JH, Naumova N, Zhan Y et al (2012) Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58:268–276 Bocklandt S, Hastie A, Cao H (2019) Bionano genome mapping: high-throughput, ultra-long molecule genome analysis system for precision genome assembly and haploid-resolved structural variation discovery. Adv Exp Med Biol 1129:97–118 Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W (2011) Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27:578–579 Bolsheva NL, Melnikova NV, Kirov IV, Dmitriev AA, Krasnov GS et al (2019) Characterization of repeated DNA sequences in genomes of blue-flowered flax. BMC Evol Biol 19:49 Borsch T, Quandt D (2009) Mutational dynamics and phylogenetic utility of noncoding chloroplast DNA. Plant Syst Evol 282:169–199 Brozynska M, Furtado A, Henry RJ (2016) Genomics of crop wild relatives: expanding the gene pool for crop improvement. Plant Biotechnol J 14:1070–1085 Bruna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M (2021) BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform 3:lqaa108 Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60 Burton JN, Adey A, Patwardhan RP, Qiu R, Kitzman JO et al (2013) Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 31:1119–1125 Chen H, Zeng Y, Yang Y, Huang L, Tang B et al (2020) Allele-aware chromosome-level genome assembly and efficient transgene-free genome editing for the autotetraploid cultivated alfalfa. Nat Commun 11:2494 Cloutier S, Ragupathy R, Miranda E, Radovanovic N, Reimer E et al (2012) Integrated consensus genetic and physical maps of flax (Linum usitatissimum L.). Theor Appl Genet 125:1783–1795 de Santana LA, Pacheco TG, Santos KGD, Vieira LDN, Guerra MP et al (2018) The Linum usitatissimum L. plastome reveals a typical structural evolution, new editing sites, and the phylogenetic position of Linaceae within Malpighiales. Plant Cell Rep 37:307–328 Diederichsen A, Ulrich A (2009) Variability in stem fibre content and its association with other characteristics in 1177 flax (Linum usitatissimum L.) genebank accessions. Ind Crops Prod 30:33–39 Dmitriev AA, Pushkova EN, Novakovskiy RO, Beniaminov AD, Rozhmina TA et al (2020) Genome sequencing of fiber flax cultivar Atlant using Oxford Nanopore and Illumina platforms. Front Genet 11:590282 Feschotte C, Jiang N, Wessler SR (2002) Plant transposable elements: where genetics meets genomics. Nat Rev Genet 3:329–341

15 Foulk JA, Akin DE, Dodd RB, Frederick JR (2004) Optimising flax production in the South Atlantic region of the USA. J Sci Food Agri 84:870–876 Fu Y-B (2021) Characterizing chloroplast genomes and inferring maternal divergence of the Triticum-Aegilops complex. Sci Rep 11:15363 Fu YB (2011) Genetic evidence for early flax domestication with capsular dehiscence. Genet Resour Crop Evol 58:1119–1128 Ghurye J, Pop M (2019) Modern technologies and algorithms for scaffolding assembled genomes. PLoS Comput Biol 15:e1006994 Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN et al (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA 108:1513– 1518 Goldblatt P (2007) The index to plant chromosome numbers: past and future. Taxon 56:984–986 Gonzalez LG, Deyholos MK (2012) Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome. BMC Genomics 13:644 Guisinger MM, Kuehl JV, Boore JL, Jansen RK (2008) Genome-wide analyses of Geraniaceae plastid DNA reveal unprecedented patterns of increased nucleotide substitutions. Proc Natl Acad Sci USA 105:18424– 18429 Guo YY, Yang JX, Li HK, Zhao HS (2021) Chloroplast genomes of two species of Cypripedium: expanded genome size and proliferation of AT-biased repeat sequences. Front Plant Sci 12:609729 Harris MA, Clark J, Ireland A, Lomax J, Ashburner M et al (2004) The gene ontology (GO) database and informatics resource. Nucleic Acids Res 32:D258-261 Hastie AR, Dong L, Smith A, Finklestein J, Lam ET et al (2013) Rapid genome mapping in nanochannel arrays for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome. PLoS ONE 8:e55864 Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M (2016) BRAKER1: unsupervised RNASeq-Based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics 32:767–769 Holt C, Yandell M (2011) MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12:491 Hon T, Mars K, Young G, Tsai YC, Karalius JW et al (2020) Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data 7:399 Hong CP, Park J, Lee Y, Lee M, Park SG et al (2017) accD nuclear transfer of Platycodon grandiflorum and the plastid of early Campanulaceae. BMC Genomics 18:607 Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ et al (2017) Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol 34:2115–2122

16 Huerta-Cepas J, Szklarczyk D, Heller D, Hernandez-Plaza A, Forslund SK et al (2019) eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47:D309–D314 Jain M, Olsen HE, Paten B, Akeson M (2016) The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol 17:239 Jansen RK, Kaittanis C, Saski C, Lee S-B, Tomkins J et al (2006) Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids. BMC Evol Biol 6:32 Jansen RK, Raubeson LA, Boore JL, dePamphilis CW, Chumley TW et al (2005) Methods for obtaining and analyzing whole chloroplast genome sequences. Methods Enzymol 395:348–384 Jones P, Binns D, Chang HY, Fraser M, Li W et al (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30:1236–1240 Kanehisa M, Goto S, Kawashima S, Nakaya A (2002) The KEGG databases at GenomeNet. Nucleic Acids Res 30:42–46 Knox EB (2014) The dynamic history of plastid genomes in the Campanulaceae sensu lato is unique among angiosperms. Proc Natl Acad Sci USA 111:11097– 11102 Kolmogorov M, Yuan J, Lin Y, Pevzner PA (2019) Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37:540–546 Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736 Korf I (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59 Lam ET, Hastie A, Lin C, Ehrlich D, Das SK et al (2012) Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol 30:771–776 Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921 Li B, Zheng Y (2018) Dynamic evolution and phylogenomic analysis of the chloroplast genome in Schisandraceae. Sci Rep 8:9285 Li R, Yu C, Li Y, Lam TW, Yiu SM et al (2009) SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25:1966–1967 Liu F-H, Chen X, Long B, Shuai R-Y, Long C-L (2011) Historical and botanical evidence of distribution, cultivation and utilization of Linum usitatissimum L. (flax) in China. Veget Hist Archaeobot 20:561–566 Lomsadze A, Burns PD, Borodovsky M (2014) Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res 42:e119 Luo MC, Ma Y, You FM, Anderson OD, Kopecky D et al (2010) Feasibility of physical map construction from

F. M. You et al. fingerprinted bacterial artificial chromosome libraries of polyploid plant species. BMC Genomics 11:122 Luo R, Liu B, Xie Y, Li Z, Huang W et al (2012) SOAPdenovo2: an empirically improved memoryefficient short-read de novo assembler. Gigascience 1:18 Manni M, Berkeley MR, Seppey M, Zdobnov EM (2021) BUSCO: assessing genomic data quality and beyond. Curr Protoc 1:e323 Marcussen T, Meseguer AS (2017) Species-level phylogeny, fruit evolution and diversification history of Geranium (Geraniaceae). Mol Phylogenet Evol 110:134–149 Marks RA, Hotaling S, Frandsen PB, VanBuren R (2021) Representation and participation across 20 years of plant genome sequencing. Nat Plants 7:1571–1578 Mehrotra S, Goyal V (2014) Repetitive sequences in plant nuclear DNA: types, distribution, evolution and function. Genom Proteom Bioinform 12:164–171 Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA et al (2021) Pfam: the protein families database in 2021. Nucleic Acids Res 49:D412–D419 Morgante M (2006) Plant genome organisation and diversity: the year of the junk! Curr Opin Biotechnol 17:168–173 Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV et al (2022) The complete sequence of a human genome. Science 376:44–53 Ottai MES, Al-Kordy MAA, Afiah SA (2011) Evaluation, correlation and path coefficient analysis among seed yield and its attributes of oil flax (Linum usitatissimum) genotypes. Aust J Basic Appl Sci 5:252–258 Ragupathy R, Rathinavelu R, Cloutier S (2011) Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome. BMC Genomics 12:217 Raubeson LA, Jansen RK (2005) Plant diversity and evolution: genotypic and phenotypic variation in higher plants. In: Henry RJ (ed) Chloroplast genomes of plants. CABI Publishing, Wallingford, pp 45–68 Rice A, Glick L, Abadi S, Einhorn M, Kopelman NM et al (2014) The chromosome counts database (CCDB)—a community resource of plant chromosome numbers. New Phytol 206:19–26 Rowland GG, Hormis YA, Rashid KY (2002) CDC Bethune flax. Can J Plant Sci 82:101–102 Ruan J, Li H (2020) Fast and accurate long-read assembly with wtdbg2. Nat Methods 17:155–158 Sa R, Yi L, Siqin B, An M, Bao H et al (2021) Chromosome-level genome assembly and annotation of the fiber flax (Linum usitatissimum) genome. Front Genet 12:735690 Saski C, Lee SB, Daniell H, Wood TC, Tomkins J et al (2005) Complete chloroplast genome sequence of Gycine max and comparative analyses with other legume genomes. Plant Mol Biol 59:309–322 Schwarz EN, Ruhlman TA, Sabir JSM, Hajrah NH, Alharbi NS et al (2015) Plastid genome sequences of legumes reveal parallel inversions and multiple losses of rps16 in papilionoids. J Syst Evol 53:458–468

1

Reference Genome Sequence of Flax

Seol Y-J, Kim K, Kang S-H, Perumal S, Lee J et al (2017) The complete chloroplast genome of two Brassica species, Brassica nigra and B. Oleracea. Mitochondrial DNA Part A 28:167–168 Shafin K, Pesout T, Lorig-Roach R, Haukness M, Olsen HE et al (2020) Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat Biotechnol 38:1044–1053 Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N et al (1986) The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO J 5:2043–2049 Singh KK, Mridula D, Rehal J, Barnwal P (2011) Flaxseed: a potential source of food, feed and fiber. Crit Rev Food Sci Nutr 51:210–222 Soni S (2021) A complete guide on flaxseed cultivation. https://krishijagran.com/agripedia/a-complete-guideon-flaxseed-cultivation/ Stanke M, Keller O, Gunduz I, Hayes A, Waack S et al (2006) AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34:W435-439 Stankova H, Hastie AR, Chan S, Vrana J, Tulpova Z et al (2016) BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes. Plant Biotechnol J 14:1523–1531 Tollis M, Boissinot S (2012) The evolutionary dynamics of transposable elements in eukaryote genomes. Genome Dyn 7:68–91 Vaser R, Sovic I, Nagarajan N, Sikic M (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res 27:737–746 Wambugu PW, Brozynska M, Furtado A, Waters DL, Henry RJ (2015) Relationships of wild and domesticated rices (Oryza AA genome species) based upon whole chloroplast genome sequences. Sci Rep 5:13957 Wang Z, Hobson N, Galindo L, Zhu S, Shi D et al (2012) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72:461–473 Wee Y, Bhyan SB, Liu Y, Lu J, Li X et al (2019) The bioinformatics tools for the genome assembly and analysis based on third-generation sequencing. Brief Funct Genomics 18:1–12 Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P et al (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982

17 Wu CS, Chaw SM (2014) Highly rearranged and sizevariable chloroplast genomes in conifers II clade (cupressophytes): evolution towards shorter intergenic spacers. Plant Biotechnol J 12:344–353 Wu CS, Wang YN, Hsu CY, Lin CP, Chaw SM (2011) Loss of different inverted repeat copies from the chloroplast genomes of Pinaceae and cupressophytes and influence of heterotachy on the evaluation of gymnosperm phylogeny. Genome Biol Evol 3:1284– 1295 Wu Z (2016) The completed eight chloroplast genomes of tomato from Solanum genus. Mitochondrial DNA A DNA Mapp Seq Anal 27:4155–4157 You FM, Cloutier S, Shan Y, Ragupathy R (2015) LTR Annotator: automated identification and annotation of LTR retrotransposons in plant genomes. Int J Biosci Biochem Bioinforma 5:165–174 You FM, Duguid SD, Lam I, Cloutier S, Rashid KY et al (2016) Pedigrees and genetic base of the flax varieties registered in Canada. Can J Plant Sci 96:837–852 You FM, Jia G, Xiao J, Duguid SD, Rashid KY et al (2017) Genetic variability of 27 traits in a core collection of flax (Linum usitatissimum L.). Front Plant Sci 8:1636 You FM, Xiao J, Li P, Yao Z, Gao J et al (2018) Chromosome-scale pseudomolecules refined by optical, physical, and genetic maps in flax. Plant J 95:371– 384 Zhang J, Qi Y, Wang L, Wang L, Yan X et al (2020) Genomic comparison and population diversity analysis provide onsights into the domestication and improvement of flax. iScience 23:100967 Zhang Y, Edwards D, Batley J (2021) Comparison and evolutionary analysis of Brassica nucleotide binding site leucine rich repeat (NLR) genes and importance for disease resistance breeding. Plant Genome 14: e20060 Zimin AV, Puiu D, Hall R, Kingan S, Clavijo BJ et al (2017a) The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum. Gigascience 6:1–7 Zimin AV, Puiu D, Luo MC, Zhu T, Koren S et al (2017b) Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res 27:787–792

2

Repeat DNA Sequences in Flax Genomes Nadeem Khan, Hamna Shazadee, Frank M. You, and Sylvie Cloutier

2.1

Introduction

Humans have been growing flax (Linum usitatissimum L.) for its seeds and fiber since ancient times (Vaisey-Genser et al. 2003). Fiber flax is taller and has fewer branches toward the top of the stem than linseed. Linseed branches, on the other hand, develop from the center of the stem and yield large quantities of seeds (Diederichsen et al. 2003). Flax seeds are a rich source of omega-3 fatty acids and contain the essential alpha-linolenic and linoleic acids. Its health benefits have been proven in several studies (Mazza et al. 1989; Caligiuri et al. 2014; Goyal et al. 2014; Kezimana et al. 2018; Parikh et al. 2019). Also, flax seed contains lignans which are associated with reducing certain types of cancer (Goyal et al. 2014). In recent years, flax fiber has been used as a component of composite materials, with some fibers holding considerable

N. Khan  H. Shazadee  F. M. You (&)  S. Cloutier (&) Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada e-mail: [email protected] S. Cloutier e-mail: [email protected] N. Khan  H. Shazadee Department of Biology, University of Ottawa, 30 Marie Curie, Ottawa, ON K1N 6N5, Canada

promise for automotive, aerospace, and packaging industries where the length of the fiber is not as important as its other physico-chemical properties (Zhu et al. 2013; Mokhothu et al. 2015; Wu et al. 2016; Dhakal et al. 2019; Fombuena et al. 2019; Zhang et al. 2020a). Therefore, a greater understanding of the genes that influence the quality and yield, especially for seed and fiber, is expected to positively contribute to flax improvement. The first draft of the genome sequence of the Canadian flax cultivar CDC Bethune, published in 2012, was obtained using Illumina short reads, which resulted in a contig assembly of 302 Mb of non-redundant sequences, representing a genome coverage of * 81% (Wang et al. 2012). In 2018, employing a BioNano optical map, a BAC-based physical map and genetic maps, the long scaffold sequences of the assembly were further validated, orientated, ordered, and assigned to chromosomes (You et al. 2018). This chromosome-scale pseudomolecule assembly contains a total of 316 Mb (including * 50 Mb gaps), with individual chromosome lengths of 15.60–29.40 Mb, covering 97% of the annotated genes in the original scaffolds-based assembly. Based on Illumina sequencing, Hi-C technology, and genetic mapping, scaffold-level genome assemblies of the Chinese linseed cultivar Longya-10, the fiber cultivar Heiya-14, and a pale flax landrace (Linum bienne) were released in 2020 (Zhang et al. 2020b). These three assemblies have 306.0, 303.7, and 293.5 Mb in

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_2

19

20

total length, with the scaffold N50 lengths of 1235 kb, 700 kb, and 384 kb for Longya-10, Heiya-14, and the pale flax landrace, respectively. More recently, utilizing Oxford Nanopore Technologies (ONT) and Illumina platforms, the Russian flax fiber cultivar Atlant was sequenced and assembled, having a total sequence length of 361.7 Mb and an N50 of 350 kb (Dmitriev et al. 2021). Lastly, the Pacific Biosciences (PacBio) Hifi combined with Hi-C scaffolding was used to sequence the Chinese fiber flax cultivar Yiya-5 (Sa et al. 2021). The Hifi sequences for Yiya-5 were assembled with an N50 of 9.61 Mb and 336 contigs totaling 454.95 Mb. Hi-C scaffolding produced 15 chromosome-length pseudomolecules that covered 93% of the total length. The reference genome sequence resources are extremely important for future research progress in functional genomics and evolutionary studies of flax, such as in the discovery of transposable elements (TEs). The flax genome has recently undergone whole-genome duplication (WGD) and is 55.36% covered with repeat elements in the fiber cultivar Yiya-5 (You et al. 2018; Sa et al. 2021). For example, using short reads, it is very easy to collapse during the assembly process due to homologous or repeat sequences (Dmitriev et al. 2021). As per reassociation kinetics studies, approximately half of the flax genome is lowcopy-number sequences, while 35% is highly repetitive, and the remaining 15% belongs to the middle-repetitive sequence types, which often encompass transposable elements (Cullis 1981). Repetitive sequences, or repeats, therefore account for a significant fraction of the flax genome. Tandem and interspersed repeats, as well as copy number of variants, are key structural polymorphisms that can occur in either of these types of repetitive sequences (Hannan 2018). Generally, repetitive elements constitute the major proportion of genomes, indicating that they serve for vital biological purposes and should not be termed ‘junk DNA’ as they were referred to during the last century (Sperling et al. 2013). Current evidence points to their significant roles in evolution and human disease (Madireddy et al. 2017; Paulson 2018). The ubiquitous presence of repetitive DNA sequences in genomes causes difficulties during

N. Khan et al.

sequence assembly and automated annotation not only in flax, but also in other species. Thus, their identification and annotation is vital to understand functions and to aid in improving genome assemblies (Ragupathy et al. 2013). Hence, the major focus of this review will be on summarizing our current knowledge of the distribution of repeat sequences in cultivated and wild flax and on describing an automated pipeline for their discovery. The latter also holds potential for repeat discovery and characterization in other species.

2.2

Types and Distribution of Repetitive DNA Sequences

The efficiency of genomic sequencing has grown by a factor of ten in the last decade, and nextgeneration sequencing, PacBio, and ONT platforms can now sequence the whole human genome in a matter of days. This potential has sparked numerous studies targeted at sequencing the genomes of tens of thousands of individuals from both animal and plant species (Jain et al. 2018; Hunt et al. 2020; Dmitriev et al. 2021; Sa et al. 2021). For instance, PacBio and ONT sequencing platforms are revolutionizing our ability to capture an accurate picture of the molecular processes within the cell, leading to a deeper knowledge of the complex structural variants in a genome. However, some of the most difficult technical challenges associated with these new methods are caused by repetitive DNA: sequences that are similar or identical to other sequences in the genome. The majority of large genomes contain an abundance of repetitive sequences. For instance, repeats cover approximately half or more of the flax, wheat, and maize genomes (Cullis 1981; Garbus et al. 2015; Haberer et al. 2020). Repetitive DNA found in all domains of life—bacteria, archaea, and eukaryota—is classified into two types: interspersed repeats, which include TEs that occur in multiple loci across the genome, and tandem repeats (TRs) that occur at a single locus (Tørresen et al. 2019). TEs are typically several thousand base pairs (kbp) in length, but in eukaryotes, their size ranges from 100 bp to 20 kbp (Kidwell 2002).

2

Repeat DNA Sequences in Flax Genomes

TEs are the most well-defined type and the most abundant repeat type of many genomes, accounting for 10% to 85% of total genome content (Rebollo et al. 2012). TEs can be classified into two broad categories based on their transposition mechanisms: Class I retrotransposons which transpose through ‘copy-andpaste’ mechanism, and Class II DNA transposons which transpose through ‘cut-and-paste’ mechanism. Class I TEs mainly include long terminal repeat (LTR) retrotransposons and nonLTR retrotransposons, such as long (LINEs) and short interspersed nuclear elements (SINEs) (Wicker et al. 2007). LTR-TEs which are frequently found in protein-coding genes can include duplicated genes. Hence, the repeat masking method used during genome assembly has an impact on their accurate identification (Tørresen et al. 2019). Class II TEs move and integrate different locations within the genome through a DNA intermediate. More than 20 superfamilies of DNA transposons have been reorganized and annotated in Repbase that is used in RepeatMasker, such as EnSpm/CACTA, hAT, Harbinger, MuDR, Helitron and so on (Bao et al. 2015). TRs, on the other hand, can be made of motifs as small as 1 bp that are, as the name suggests, tandemly repeated. TRs are typically found in specialized chromosome areas of many eukaryotes, such as centromeres, telomeres, and heterochromatic knobs (Ugarković et al. 2002), and can be classified based on the size of the repeated units. Short TRs (< 10 bp) were termed microsatellites or simple sequence repeats (Litt et al. 1989), longer TRs (between 10 and 100 bp) were labeled minisatellite DNA (Jeffreys et al. 1985), and long TRs (> 100 bp) were designated satellite DNA (Vergnaud et al. 2000). Satellites are the most common type of TRs, and they are assumed to assist in organizing the genome and stabilizing the surrounding chromosome areas, which is crucial for chromosome function during cell division (Plohl et al. 2008). Moreover, repeats can also form the basis of whole-genome duplication, such as in the Arabidopsis thaliana genome (The Arabidopsis Genome Initiative 2000). DNA repeats are found in all kingdoms of life, but they represent a much larger proportion

21

of the genomes in plants: for instance, more than 80% of the maize genome is made of TEs (Haberer et al. 2020).

2.3

Challenges in the Identification of Repeats Per Se

Repeats are problematic in alignment against known reference sequences and during de novo assembly of a genome and can lead to errors when interpreting results. The primary goal of genome resequencing is to enable researchers to explore genetic diversity by comparing sequences from multiple genomes against a reference sequence. It is impossible to accurately assemble regions that are rich in repeats if the length of the sequence reads is shorter than the length of the repeat unit. The erroneous alignment of reads is an important problem because more than half of the genome is made of repeats. This includes thousands of repetitive DNA sections between and inside genes. Repetitive DNA limits the length of pieces that can successfully be assembled as a continuous string. The N50, which is the length of the smallest fragment when all larger fragments (contigs or scaffolds) have been assembled to the level of 50% of the overall assembly size, is the most common statistic to report genome assembly contiguity. High N50 values suggest a more complete assembly. Recently, with the advent of PacBio and ONT long-read sequencing technologies, more complete and accurate assemblies can be obtained. These long reads can resolve repeat regions, providing adequate read depth, length, and accuracy. In flax, Sa et al. (2021) found that the use of PacBio technology identified a higher number of repeats than prior genomic investigations (Wang et al. 2012). Typically, 70–80% of short reads (< 250 bp) map to a unique site on the flax genome. A recent study on flax by Sa et al. (2021) stated that repeats make up more than half of the genome (55.36%). As most repeats are not perfectly identical, which implies that many reads will have a distinct ‘best match’ even if it is not exactly the same sequence, the repeats may appear in several locations with

22

N. Khan et al.

minor variations. The simplest technique to resolve repeats is to assign reads to the site where they best align, but this is not necessarily correct and can lead to erroneous conclusions, particularly for SNP calling. The most important prerequisite to accurately annotate repeats is a high-quality reference genome onto which short/long reads can be mapped. Structural variants such as single nucleotide polymorphisms (SNPs), copy number variants (CNVs), and presence–absence variants (PAVs) can be detected by read mapping, providing there is sufficient coverage. Accurate identification of these genetic variants is essential to yield quality results that could lead to a better understanding of the relationship between genotypes and phenotypes. Aligning millions or billions of reads to a reference genome is computationally intensive. For convenience, multi-reads is a term used to describe reads that map to more than one location. Although the nature of the reads does not affect the read-mapping program directly, its outcome is influenced by it and can affect downstream studies such as SNPs, PAVs, and gene prediction. These important variants and gene predictions are largely dependent on the accurate identification of repeat elements.

2.4

Repetitive Elements in Flax and Other Crop Species

High-throughput sequencing technologies (HTST), such as Illumina, PacBio, and ONT, are continuously improving their high-throughput, read length, accuracy, and cost. All of these technologies have been used to sequence flax and each had a distinct capacity to detect repeats. The first draft sequence of the flax genome, strictly based on Illumina short reads identified 73.80 Mb of repeats, representing 24.40% of the entire genome contig-based assembly (Wang et al. 2012). Retroelements were the most prevalent mobile elements discovered in this assembly, accounting for 20.60%, with LTR type retroelements being the major ones, occupying 18.40%, followed by LINEs at 2.20% of the assembly (Wang et al. 2012). Here, only RepeatMasker (V.

3.3.0; http://www.repeatmasker.org/) was utilized to identify repeats. In particular, LTR elements in the V. 2.0 pseudomolecules were fully annotated using the LTRAnnotator pipeline (You et al. 2015) with both LTR-FINDER and LTRHarvest (Xu et al. 2007; Ellinghaus et al. 2008). A total of 2677 full-length LTRs, including 2023 Copia and 654 Gypsy TEs and 2245 full-length LTRs of unknown types, were identified, accounting for 10.17% of all pseudomolecules. This study also revealed 58,689 LTR fragments and 2403 solo LTRs, representing 15.48%. Together, all LTRs related sequences accounted for 25.73% of the pseudomolecules. Compared with the identified LTRs in CDC Bethune V. 1.0 assembly (Wang et al. 2012), more LTRs were identified. For the genome sequences of the three Chinese cultivars Longya-10, Heiya-14, and wild pale flax, a similar amount of TEs were identified, i.e., 39.95%, 38.01%, and 37.27%, respectively (Zhang et al. 2020b). Copia and Gypsy LTR-TEs were the most common, ranging from 5.53% to 7.93% among the three genomes. In this study, LTRFINDER (Xu et al. 2007), MITE-Hunter (Han et al. 2010), RepeatScout V. 1.0.5 (Price et al. 2005), PILER-DF (Edgar et al. 2005), and PASTEClassifier V. 1.0 (Hoede et al. 2014) were used to identify and classify TE families which then were used for RepeatMasker. ONT (Dmitriev et al. 2021) and PacBio (Sa et al. 2021) were recently used to sequence flax genomes with promising, but yet mixed results. Dmitriev et al. (2021) did not characterize their ONT assembly for repeats, but the PacBioderived assembly of Yiya-5 by (Sa et al. 2021) included 251.86 Mb of repeated sequences, accounting for 55.36% of the flax genome. Further, Class I retroelements accounted for 26.29% and Class II DNA transposons represented 9.98%. This result shows that long-read technologies are able to detect a higher percentage of repeats, allowing better resolution and identification of structural variants than short-reads. TEs have been identified in the genomes of many plant species based on several sequencing technologies and bioinformatics tools, as summarized in Table 2.1.

2

Repeat DNA Sequences in Flax Genomes

23

Table 2.1 Percentage of transposable elements (TEs) in the genome assemblies of flax and other plant species Species

Tools

Sequencing strategy

TEs in genome (%)

References

Flax

RepeatMasker

Illumina

20.60 (CDC Bethune)

Wang et al. (2012)

Flax

LTR_FINDER, MITEHunter, RepeatScout, PILER-DF, RepeatMasker

Illumina, Hi-C, and GM

37.27 (pale flax) 39.95 (Longya-10) 38.01 (Heiya-14)

Zhang et al. (2020b)

Flax

RepeatMasker, RepeatModeler, LTRharvest, LTR_retriever

PacBio and Hi-C

55.36 (Yiya-5)

Sa et al. (2021)

Maize

Vmatch

Illumina

* 80.00

Haberer et al. (2020)

Wheat

RepeatMasker

Illumina

* 80.00

Garbus et al. (2015)

Eggplant

RepeatMasker, LTR_FINDER, Repeat Modeler, RepeatScout

Illumina, ONT

70.09

Wei et al. (2020)

Cotton

RepeatMasker, LTR_FINDER, PILER, RepeatScout

PacBio, Illumina, 10X Genomics

* 61.00

Ma et al. (2021)

Soybean

RepeatMasker

Illumina

* 54.40

Liu et al. (2020)

Rice

HMMER

PAC, BAC

35.00

Sasaki et al. (2005)

Grape

RepeatMasker

PacBio

34.00

Girollet et al. (2019)

Radish

Tandem Repeat Finder

Illumina

31.00

He et al. (2015)

BAC bacterial artificial chromosome; GM genetic map; PacBio Pacific Biosciences; ONT Oxford Nanopore Technology; PAC P1-derived artificial chromosomes

2.5

Tools for the Identification of Repetitive Elements

Advances in genome sequencing have accelerated over the last decade, and downstream annotation is improving steadily in many crops and goes far beyond the basic annotation of gene coding sequences. The accurate identification, characterization, and curation of repeat elements are an important step in understanding genome sequences. As observed previously, the TE contents in the same species vary in different flax cultivars (Table 2.1). The assembly using short reads may likely generate a low abundance of repeat detection rate. Illumina reads are relatively short, ranging from 100–250 bp. Many repeat

regions are longer than the length of the read, making it difficult to properly assemble longer repeats, particularly if their time of insertion is recent and sufficient mutations have yet to accumulate in the LTRs. However, the major factor affecting the identification of TEs is the use of different bioinformatics tools. Generally, two approaches have been widely used for TE identification. One is the homology-based search against a well compiled repeat library. RepeatMasker is the most commonly used tool for this purpose that uses Repbase (Bao et al. 2015) or Dfam (Storer et al. 2021) as a TE database. Repbase is a database of representative repeat sequences in eukaryotic genomes, including abundant superfamilies of TEs. At the same time, Dfam is a community-

24

driven resource of online curation tools and direct user engagement. The accuracy of the homology-based approach largely relies on the completeness of the TE families. Considerable superfamilies and families of TEs have been identified from some species, such as Triticeae species (Garbus et al. 2015; Wicker et al. 2018; Zhu et al. 2021), but very limited repeat sequences have been included in the Repbase or Dfam for Linum genus or flax species, which will result in an underestimated proportion of TEs in the genome. Therefore, a reasonable way is to identify all potential repeats using the de novo approach from the target genome sequence. The de novo approach provides an unbiased perspective of its diversity and richness in a genome. Since each genome contains a unique repertoire of TE insertions, such de novo-based techniques are beneficial in the flax genome that lacks a well-defined TE library. Then the de novo repeat library plus previously existing TE families can be compiled to generate a non-redundant TE library. Eventually, RepeatMasker, based on the new TE library is used for repeat masking. TE discovery is important in flax for four major reasons: (i) genome-wide de novo identification of TEs, (ii) avoid false-positive results, (iii) discovery of new repeats, and (iv) classification of repeat sequences for comparative and evolutionary analyses. However, as a

N. Khan et al.

comprehensive TE library is not publically available for flax, the published genome sequences of flax (Wang et al. 2012; You et al. 2018; Zhang et al. 2020b; Dmitriev et al. 2021; Sa et al. 2021) do not provide comprehensive information about TEs and together with the use of inconsistent tools preclude comparative analysis. Thus, we established an annotation pipeline based on several de novo TE discovery tools, including RepeatModeler that integrates RECON, RepeatScout and LTRHavest/LTRretriver, HelitronScanner (Xiong et al. 2014), MITE Tracker (Crescente et al. 2018), SINEScan (Mao et al. 2016), and RepeatMaskers (Fig. 2.1). In brief, the use of RepeatModeler and other tools such as HelitronScanner and MITE Tracker helped to identify potential comprehensive representation of TEs that were used to generate a relatively complete TE library. RepeatMasker was then used with this library to identify genome-wide TEs. For this purpose, a standard pipeline of repeat identification in flax was developed to build a non-redundant flax TE library and perform a comparative analysis of these repeats across genomes. This automated strategy (Fig. 2.1) serves as a roadmap for repeat identification and characterization in crops. In the next section, we present a brief overview of these tools and propose a flax-based strategy (Fig. 2.1) for TE identification.

Fig. 2.1 Workflow for the identification of transposable elements (TEs) used for the case study in flax

2

Repeat DNA Sequences in Flax Genomes

2.5.1 RepeatMasker

25

RepeatMasker is a homology based, the most commonly used computational tool for identifying, classifying, and masking repetitive sequences, such as low-complexity and interspersed repetitive regions (http://www.repeatmasker.org/). RepeatMasker aligns the reference-based genomic sequences to a library of known repeats, such as Repbase (Bao et al. 2015) or the Dfam database (Storer et al. 2021). The Repbase library is the most commonly used database for repeats and covers most of the organisms, including plants and animals. However, for organisms not represented in the library files, ab initio methods such as RECON (Bao et al. 2002) or RepeatScout (Price et al. 2005) can be used. RepeatMasker also supports Dfam (https://www.dfam.org/home), which is an open access repository of TEs. Further, for sequence comparison purposes, RepeatMasker utilizes several search engines, including nhmmer, cross_match, ABBlast/WUBlast, RMBlast, and Decypher.

found in nearly every eukaryote, including fungi and animals, although they are more common in plant genomes (Bennetzen et al. 2014). To find repeated sequences, computational methods are important. LTR_FINDER is one of the most common search engines for finding LTR-TEs. It outperforms the LTR_retriever tool in terms of predictive ability (Ou et al. 2018). LTR_FINDER utilizes DNA sequences to predict the location and structure of full-length elements (Xu et al. 2007). Large-scale sequences are easily scanned to perform accurate ab initio LTR retrotransposon predictions by considering common structural features. LTR_FINDER accepts sequences in FASTA format and provides two types of output files: full-output and summaryoutput (Xu et al. 2007). Full-output files contain details such as the retroelement size, its location, the similarity between two LTRs, a sharpness estimate which is a measure of the reliability of LTR border prediction and more. The summaryoutput file is a simpler version of the full-output files.

2.5.2 RepeatModeler

2.5.4 LTRharvest

RepeatModeler (http://www.repeatmasker.org/ RepeatModeler/) is a de novo-type tool for the identification of TE families that can be applied to any eukaryotic species. It utilizes several modeling packages, including RECON, RepeatScout, tandem repeat finder (TRF), and RMBlast. The major purpose of RepeatModeler is to automate the runs of the various algorithms, given a genomic database. This includes clustering redundant results, refining and classifying the families, and producing a high-quality library of TE families. The library is utilized by RepeatMasker and ultimately for submission to the Dfam database.

LTRharvest is an efficient and flexible de novobased tool for the detection of full-length or near full-length LTR retrotransposons (Ellinghaus et al. 2008). De novo prediction for large-scale datasets, such as complete vertebrate chromosomes, was possible because of quick run time and minimal memory consumption. Iterative predictions utilizing alternative parameter values, such as for boosting sensitivity or specificity, are quick and easy to conduct. The only timeconsuming operation is the generation of the upgraded suffix array and it only needs to be done once for a dataset (https://www.zbh.unihamburg.de/forschung/gi/software/ltrharvest. html). LTRharvest accepts datasets in FASTA format and uses various well-known LTR transposon filters to predict length, distance, and sequence motifs. The properties of LTR retrotransposons are typically species-specific, and this can be addressed with LTRharvest through the on and off switch of filters and the selection

2.5.3 LTR_Finder LTR retrotransposons, the most common type of TEs, play a central role in the evolution and function of genes. LTR retrotransposons can be

26

of parameters. The software generates predictions in a tabular format and/or a generic file format that can be used for further processing, such as implemented in LTRdigest (Steinbiss et al. 2009), for example. The predicted LTR retrotransposon sequences are also reported in a FASTA format. LTRharvest offers three primary advantages: (1) the ease of use of large-scale datasets, (2) the ability to be modified for the prediction of known LTR features, and (3) its open source which makes it freely accessible to everyone.

2.5.5 LTRAnnotator An automated and standardized software tool like LTRAnnotator (You et al. 2015) is effective in the identification and removal of false positives from genome sequences. The removal of redundant repeats is critical for accurate assembly and LTR identification and for downstream comparative evolutionary analyses. To accomplish this task, LTRAnnotator, a Java-based tool, combines both LTR_FINDER and LTRharvest programs. LTRAnnotator was tested on the Arabidopsis genome with a sensitivity of up to 0.90, and it not only eliminated false-positive results but also identified ten new putative LTR retroelements. One of the major purposes of TE studies is to explore genome evolution, with LTRs being the prime example. LTRAnnotator can also calculate the evolution time of LTRs in genomes.

2.5.6 LTR_Retriever LTR_retriever is a Perl-based program for detecting LTR retroelements that uses genomic sequences to generate high-quality libraries (Ou et al. 2018). Testing in the rice genome demonstrated significant improvements by achieving high levels of precision, sensitivity, specificity, and accuracy (Ou et al. 2018). This program efficiently eliminates false positives from initial software predictions such as MGEScan-LTR (Rho et al. 2007) and LTRharvest (Ellinghaus

N. Khan et al.

et al. 2008). LTR_retriever outperformed other programs when utilized in the well annotated high-quality rice genome assembly (Sasaki et al. 2005) and other high-quality genomes such as maize (Jiao et al. 2017) and Arabidopsis (The Arabidopsis Genome Initiative 2000). LTR_retriever provides the insertion time and location of intact LTR elements in the genome. This program can also be utilized with long reads. For instance, the Arabidopsis genome was tested by using the 40 k self-corrected PacBio reads, and the resulting LTR library demonstrated high specificity and sensitivity (Ou et al. 2018). This tool can also be used to identify non-canonical LTRs, which are less common in genomes but preferentially inserted into genic regions.

2.5.7 SINE_Scan SINEs are mobile TEs that use RNA as an intermediary in a copy-and-paste amplification process known as retroposition (Deragon et al. 2006). SINEs are non-autonomous retrotransposons that are abundant in both humans and plants (Lander et al. 2001; Grover et al. 2004; Deragon et al. 2006). Despite being discovered about 45 years ago and studied intensely in model Eukaryotic organisms (Schmid et al. 1975; Vassetzky et al. 2013), computational identification of new SINEs remains challenging. They are small, non-coding, and highly variable; hence, they are often missed or incorrectly identified in plant genome sequences. For instance, their proportion in Zea mays is approximately 0.02%, followed by 0.05% in Arabidopsis thaliana, 0.16% in Brassica oleracea, and 0.18% in Beta vulgaris (Deragon et al. 2006; Baucom et al. 2009; Schwichtenberg et al. 2016). SINE-Finder (Wenke et al. 2011) is a de novo tool specifically developed for SINE discovery. It searches the structural signals and outputs all genomic regions matching them. However, the false discovery rate of SINEFinder is high, with 93.00% in Brachypodium distachyon, 99.60% in Arabidopsis thaliana, and 99.00% in Zea mays (Mao et al. 2016). SINE_Scan was reported to outperform SINE-

2

Repeat DNA Sequences in Flax Genomes

Finder (Mao et al. 2016). To detect a SINE element, SINE_Scan combines the hallmarks of SINE transposition, copy number and structural signals. Results from 19 plant and animal genome assemblies with sizes ranging from 120 Mb to 3.5 Gb, demonstrated both sensitivity and specificity. It uncovered a spate of new families and substantially increased the estimation of SINE abundance in these genomes. For example, SINE_Scan could identify three types of SINE: tRNA, 7SLRNA, and 5SRNA, but SINE-Finder was only able to distinguish tRNA and 7SLRNA (Mao et al. 2016). SINE_Scan has three main components: (i) de novo SINE discovery, (ii) candidate validation using copy number and transposition hallmarks, and (iii) classification and genome-wide annotation.

27

(Xiong et al. 2014). The application of these LCV patterns to all genomes in Phytozome (Goodstein et al. 2011) resulted in the discovery of numerous new Helitrons, leading to a reassessment of the preponderance of Helitrons. HelitronScanner not only detected divergent Helitrons but also discovered new ones previously missed by both HelSearch, HelitronFinder (Du et al. 2009; Yang et al. 2009), and the model-based method (Tempel et al. 2007). It utilizes conserved nucleotide sequences at potentially different locations, and thus overcomes the divergence of Helitron terimini across species. As a result, it represents a significant advancement over earlier methods based on DNA sequence or structure. HelitronScanner implements a four-step process: identification of the 5’ end, identification of the 3’ end, pairing of the ends, and building the Helitron fasta file.

2.5.8 HelitronScanner Helitrons are rolling-circle transposons that often capture gene sequences (Xiong et al. 2014). This ability renders them important TEs for evolutionary analysis. Helitrons are difficult to detect because, unlike some of the other DNA transposons, they typically do not have the inverted end repeats or target site duplications. Kapitonov et al. (2001) were the first to discover these unique eukaryotic transposons by performing a comparative bioinformatics investigation of multiple plants and animal genomes. Even among closely related species, the distribution of these transposons differs significantly. Silkworms, for instance, have 4.23% Helitrons (Han et al. 2013), in fruit flies, they range from 1–5% (Kapitonov et al. 2007), while in maize and Arabidopsis, they account for approximately 2% (Kapitonov et al. 2001; Yang et al. 2009). One of the main trusts here was the first characterization of Helitrons in flax which is described below in a case study using HelitronScanner. This tool was developed for the identification of Helitrons based on conserved DNA motifs using a motif-extracting algorithm (Xiong et al. 2014). It uses a twolayer local combinational variable (LCV) technique to extract patterns from sequences of protein families with different lengths and functions

2.5.9 Miniature Inverted-Repeat Transposable Elements (MITE Tracker) Miniature inverted-repeat transposable elements (MITEs) are DNA transposons that belong to class II TEs. They have a common property with DNA transposons, namely short conserved terminal inverted repeats (TIRs). However, they are more abundant than retrotransposons (Wessler et al. 1995; Fattash et al. 2013). MITEs are typically characterized by their small size, which ranges from 50 to 800 bp, and absence of transposase coding capacity. TIRs are flanked by short direct repeats known as target site duplications (Guo et al. 2017). MITEs are nonautonomous TEs because they do not encode proteins and have no coding potential for transposition. They are most commonly seen in introns or near genes (Wright et al. 2003; Lu et al. 2012). Thus, MITEs play a role in gene regulation. In potatoes, one MITE was found to induce a change in tuber skin color that has been linked to the flavonoid 3′,5′-hydroxylase encoding gene (Momose et al. 2010). In wheat, a MITE insertion in the promoter region of the Vrn-A1 gene causes its deregulation, resulting in the

28

N. Khan et al.

absence of vernalization needs for flowering (Yan et al. 2004). MITEs also play important roles in genetic diversity and evolution (Sampath et al. 2014). Several computational tools, such as detectMITE (Ye et al. 2016), FINDMITE (Tu 2001), MITE Digger (Yang 2013), and MITEHunter (Han et al. 2010) have been developed to detect MITEs in DNA sequences. MITE Tracker is a tool that identifies MITEs using an efficient alignment strategy to retrieve nearby invertedrepeat sequences from large complex genomes like rice and wheat (Crescente et al. 2018). MITE Tracker employs a sequence homology method to identify MITEs and filters only those that are likely TEs, making it a fast and low-memory consumer. In wheat, MITE Tracker revealed 6013 MITE families (Crescente et al. 2018). Explorations like this are needed in flax improvement to better understand its evolution and gene regulation. As a case study, we used the MITE Tracker tool to identify the MITEs in flax in the hopes that the in-depth characterization of the various repeats will assist our understanding of gene function and genome evolution.

2.6

Case Study: A Comparative Analysis of Flax Genome TEs

A total of seven previously published flax assemblies, including six cultivated flax (L. usitatissimum) and one wild relative (L. bienne), were used as a case study. We first identified the repeat sequences using RepeatModeler, HelitronScanner, MITE Tracker, and SINEScan for each of the seven flax assemblies. Secondly, all TE families of the seven assemblies were merged and filtered to generate a non-redundant flax repeat family library for RepeatMasker. This flax TE library contains 14,915 unique flax TE items or families with a total length of 29,699,425 bp (Table 2.2). It can be further enriched by adding additional TE families in the future and by improving the TE annotations. Hence, it can be used in future research for TE

identification of new flax genomes. This library is available at https://zenodo.org/record/ 6780559#.YzSQt9gpASV). With the flax TE library as a repeat database, RepeatMasker was used to identify repeat sequences in all seven flax assemblies. Repeated sequences accounted for 67.36% in Yiya-5, 58.15% in Atlant, 50.38% in pale flax, 50.07% in Heiya-14, 50.01% in Longya-10, 47.96% in CDC Bethune V. 1.0, and 40.62% in CDC Bethune V. 2.0. These TE estimates are much higher than those previously reported in the literature (Wang et al. 2012; You et al. 2018; Zhang et al. 2020b; Sa et al. 2021) (Table 2.3). Of note, a total of 8978 unknown repeats were the most abundant class in all flax genomes, accounting for 21.21% of the genome sequence totalling 7,654,420 bp (Table 2.2). Among the major repeat families, a total of 1301 helitrons were identified, accounting for 14.93% of the genome sequence. The highest proportion of helitrons was 21.61% in Yiya-5, and the lowest was 10.79% in CDC Bethune V. 2.0 (Table 2.2). A total of 2568 LTR elements, combining Copia, Gypsy, and unknown elements, averaged 9.06% of the genome sequence (Table 2.2). Similarly, the highest proportion of 10.04% was in Yiya-5, and the lowest (7.50%) was in CDC Bethune V. 2.0. In class II, MITE was found with the highest 395 entries, totalling 190,298 bp. Again, the highest 23.57% of class II was found in Yiya-5 and the lowest 13.08% in CDC Bethune V. 2.0 (Table 2.3). In summary, the abundance of different TE elements in flax genomes is largely groupspecific and responsible for genome size evolution. For instance, both helitrons and LTR elements are inferred to play a role in evolution (Vitte et al. 2005; Yang et al. 2009). So far, among the different HTST used for flax sequence assembly, PacBio was found to have higher repeat elements, such as in Yiya-5. In this study, the use of flax-based repeat library significantly increases the overall content of repeats when compared to the previously reported studies

2

Repeat DNA Sequences in Flax Genomes

Table 2.2 Summary of flax transposable element (TE) library identified from seven flax assemblies for use with RepeatMasker

Class

29 No. of families

Length of families (bp)

Class I: Retroelements LTR: LTR/Copia

1021

LTR/Gypsy

638

LTR/ERV1

9

7905

LTR/ERVK

6

33,316

LTR/Ngaro

3

19,980

5

17,127

LTR/Pao LTR/Unknown

2,952,534 1,761,853

895

451,298

LINE

802

1,105,720

SINE

26

Non-LTR: 79,475

Class II: DNA transposon DNA/Unknown DNA/CMC-EnSpm DNA/Kolobok-H DNA/MULE-MuDR

1 137 3 305

410 138,549 17,555 681,463

DNA/Maverick

1

1310

DNA/Merlin

1

170

DNA/PIF-Harbinger

62

DNA/TcMar-Mariner

2

137,284 992

DNA/TcMar-Pogo

7

12,183

DNA/Zisupton

1

6715

DNA/hAT

176

237,086

DNA/MITE

395

190,298

DNA/Unknown

8978

7,654,420

DNA/Helitron

1301

13,864,197

Others Satellite

4

65,758

Simple repeat

53

8414

rRNA

41

112,176

snRNA

13

16,971

tRNA

29

124,266

Total

(Wang et al. 2012; You et al. 2018; Zhang et al. 2020b; Dmitriev et al. 2021; Sa et al. 2021). Despite substantial theorizing about repeat sequences origin, function, and development, as well as significant experimental discoveries in

14,915

29,699,425

these areas over the past few decades, many concerns remain unanswered today about them. Nonetheless, the unique contributions of the several research groups working on repeats, as well as the recent push to apply novel genomic

53,643

Class I: Retroelements

2145

490

Satellites

55,711

Others

Small RNA

207,589

Unknown

373,676

8159

Helitron

Sub-total

99,722

Tourist/Harbinger

Unclassified

3177

1196

Hobo-Activator

190

Tc1-IS630-Pogo

112,444

21,266

Non-LTR/LINEs

Class II: DNA transposons

22,506

5326

LTR/unknown

1240

288

LTR/Retroviral

Non-LTR/SINEs

87

Non-LTR

11,413

12,502

LTR/Gypsy

LTR/BEL/Pao

86

587

2826

145,059

58,399

5912

40,953

518

1533

49

48,965

10,357

224

10,581

1406

140

33

14,122

12,934

LTR/Copia

27,114

31,137

LTR

37,695

Length (kb)

Pale flax landrace

Number

TE class

Linum bienne

Genotype

Morphotype

Species

0.03

0.20

0.97

49.41

19.89

2.01

13.95

0.18

0.52

0.02

16.68

3.53

0.08

3.61

0.48

0.05

0.01

3.89

4.81

9.24

12.84

(%)

494

3427

57,498

438,385

241,061

9523

114,139

1359

3508

191

128,720

22,097

1251

23,348

5910

439

151

17,749

21,007

45,256

68,604

Number

94

945

3232

149,384

62,699

5642

41,153

538

1476

45

48,854

10,425

222

10,648

1510

183

71

11,928

13,491

27,183

37,831

Length (kb)

CDC Bethune V. 1.0

Linseed

Linum usitatissimum

0.03

0.30

1.02

46.94

19.7

1.78

12.93

0.17

0.46

0.01

15.35

3.28

0.07

3.35

0.47

0.06

0.02

3.75

4.24

8.54

11.89

(%)

447

2044

53,167

348,092

192,851

7874

91,178

1122

3077

177

103,428

19,912

1182

21,094

5158

269

82

11,864

13,346

30,719

51,813

Number

77

558

2691

125,741

50,800

5343

34,126

483

1378

42

41,372

9650

197

9847

1388

144

34

10,122

12,033

23,721

33,568

Length (kb)

CDC Bethune V. 2.0

Table 2.3 Numbers, total length, and percentages of transposable elements (TEs) in the seven flax genome sequences

0.02

0.18

0.85

39.77

16.07

1.69

10.79

0.15

0.44

0.01

13.08

3.05

0.06

3.11

0.43

0.05

0.01

3.20

3.81

7.5

10.62

(%)

Longya-10

462

2201

57,464

380,195

210,927

8533

101,147

1263

3245

186

114,374

21,612

1277

22,889

5490

304

84

12,874

13,253

32,005

54,894

Number

83

746

3196

149,838

59,308

6160

42,559

562

1609

47

50,937

10,589

239

10,828

1466

173

34

11,947

15,145

28,765

39,593

Length (kb)

0.03 (continued)

0.24

1.05

48.97

19.38

2.01

13.91

0.18

0.53

0.02

16.65

3.46

0.08

3.54

0.48

0.06

0.01

3.90

4.95

9.40

12.94

(%)

30 N. Khan et al.

7970

Yiya-5

Number

59,760

Genotype

TE class

Class I: Retroelements

5223

26,979

1319

25,660

LTR/unknown

Non-LTR

Non-LTR/SINEs

Non-LTR/LINEs

184

3165

1218

120,922

Tc1-IS630-Pogo

Hobo-Activator

Tourist/Harbinger

Helitron

136,628

277

LTR/Retroviral

Class II: DNA transposons

81

LTR/BEL/Pao

98,321

564

1571

48

107,222

11,727

706

12,434

1403

200

34

26,622

17,404

14,356

12,844

LTR/Copia

45,662

58,096

32,781

LTR/Gypsy

LTR

Fiber

50.38

0.15

0.59

(%)

Length (kb)

Linum usitatissimum

Morphotype

147,884

428

1724

Species

429,387

Low complexity

Total

45,106

Simple repeats

Length (kb)

Pale flax landrace

Number

TE class

Linum bienne

Genotype

Morphotype

Species

Table 2.3 (continued) Linum usitatissimum

21.61

0.12

0.35

0.01

23.57

2.58

0.16

2.74

0.31

0.04

0.01

5.85

3.83

10.04

12.77

(%)

495,883

8044

45,533

Number

107,361

1,230

3151

184

121,494

22,562

1,272

23,834

5336

282

85

12,770

13,966

32,439

56,273

Number

Atlant

152,617

389

1805

Length (kb)

CDC Bethune V. 1.0

Linseed

401,259

7603

43,073

62,563

582

1584

47

71,123

10,840

315

11,155

1447

197

41

15,845

17,277

34,807

45,961

17.29

0.16

0.44

0.01

19.66

3.00

0.09

3.09

0.40

0.05

0.01

4.38

4.78

9.62

12.70

(%)

128,432

370

1686

Length (kb)

CDC Bethune V. 2.0 Number

Length (kb)

47.96

0.12

0.57

(%)

102,049

1282

3295

193

115,270

21,789

1262

23,051

5492

282

80

12,736

13,236

31,826

54,877

Number

Heiya-14

40.62

0.12

0.53

(%)

Longya-10

42,544

573

1597

47

50,888

10,644

225

10,869

1461

151

34

11,554

14,436

27,635

38,504

0.19

0.53

0.02

16.76

3.50

0.07

3.57

0.49

0.05

0.01

3.80

4.75

9.10

12.68

(%)

50.02

0.19

0.59

(%)

14.01 (continued)

153,034

571

1795

Length (kb)

Length (kb)

437,659

8015

46,786

Number

2 Repeat DNA Sequences in Flax Genomes 31

Low complexity 306,433

1464

2599

581

2083

6728

299,705

67.36

0.32

0.57

0.13

0.46

1.48

65.88

29.54

1.48

(%)

476,888

7998

49,039

471

2303

59,811

417,077

239,310

9568

Number

Atlant

210,382

1307

2818

128

1219

5472

204,910

87,826

6347

Length (kb)

58.15

0.36

0.78

0.04

0.34

1.52

56.63

24.27

1.76

(%)

440,922

8425

47,294

469

2263

58,451

382,471

212,324

8451

Number

Heiya-14

Number the total number of TE elements; Length (kb) the total length of TE elements; (%) percentage of the TE elements in the genome

537,274

49,008

7966

Simple repeats

Total

2513

506

Satellites

59,993

Others

Small RNA

477,281

Sub-total

134,387

6718

11,139

Number

TE class

280,893

Yiya-5

Genotype

Unknown

Length (kb)

Fiber

Morphotype

Unclassified

Linum usitatissimum

Species

Table 2.3 (continued)

152,013

510

1820

80

716

3126

148,886

59,495

6127

Length (kb)

50.07

0.17

0.60

0.03

0.24

1.04

49.03

19.59

2.01

(%)

32 N. Khan et al.

2

Repeat DNA Sequences in Flax Genomes

methodologies to this area, are changing the possibilities for learning more about repeats, their roles in genomes, and evolution.

2.7

Conclusion and Future Perspectives

The identification of repeats has previously been performed in flax (Wang et al. 2012; You et al. 2018; Zhang et al. 2020b; Sa et al. 2021). However, the identification of TEs was based on different tools and procedures that precluded direct comparisons. For example, the authors used known library-based RepeatMasker, resulting in different results. Because the TE library is not complete for some genomes like flax, the results tends to underestimate the number of TEs in the genomes. Also, these results are not directly comparable based on the published information. The major goal of this review was to develop a strategy for high-quality annotation of repetitive elements in flax by removing redundant repeats and using a single flax-derived repeat library across all seven flax genomes that would allow for true comparisons. The identification of repeats was significantly improved. For instance, the 67.36% repeat content in Yiya-5 obtained based on our flax-based strategy pipeline is a 12% increase from the previously reported 55.36% (Sa et al. 2021). This review also uncovered and highlighted the importance of helitron and LTR elements in flax, both of which have been reported as powerful driving forces of evolution (Kapitonov et al. 2001; GalindoGonzález et al. 2017).

References Bao W, Kojima KK, Kohany O (2015) Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6:11 Bao Z, Eddy SR (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12:1269–1276 Baucom RS, Estill JC, Chaparro C et al (2009) Exceptional diversity, non-random distribution, and rapid

33 evolution of retroelements in the B73 maize genome. PLoS Genet 5:e1000732 Bennetzen JL, Wang H (2014) The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu Rev Plant Biol 65:505–530 Caligiuri SP, Edel AL, Aliani M et al (2014) Flaxseed for hypertension: implications for blood pressure regulation. Curr Hypertens Rep 16:499 Crescente JM, Zavallo D, Helguera M et al (2018) MITE tracker: an accurate approach to identify miniature inverted-repeat transposable elements in large genomes. BMC Bioinform 19:348 Cullis CA (1981) DNA sequence organisation in the flax genome. Biochim Biophys Acta 652:1–15 Deragon JM, Zhang X (2006) Short interspersed elements (SINEs) in plants: origin, classification, and use as phylogenetic markers. Syst Biol 55:949–956 Dhakal HN, Sain M (2019) Enhancement of mechanical properties of flax-epoxy composite with carbon fibre hybridisation for lightweight applications. Materials (Basel, Switzerland) 13 Diederichsen A, Richards K (2003) Cultivated flax and the genus Linum L.: taxonomy and germplasm conservation. In: Muir AD, Westcott ND (eds) Flax: the genus Linum. CRC Press, London, pp 22–54 Dmitriev AA, Pushkova EN, Novakovskiy RO et al (2021) Genome sequencing of fiber flax cultivar atlant using Oxford Nanopore and Illumina platforms. Front Genet 11:1487 Du C, Fefelova N, Caronna J et al (2009) The polychromatic Helitron landscape of the maize genome. Nucleic Acids Res 106:19916–19921 Edgar R, Myers E (2005) PILER: identification and classification of genomic repeats. Bioinformatics 21 (Suppl 1):i152-158 Ellinghaus D, Kurtz S, Willhoeft U (2008) LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform 9:18 Fattash I, Rooke R, Wong A et al (2013) Miniature inverted-repeat transposable elements: discovery, distribution, and activity. Genome 56:475–486 Fombuena V, Petrucci R, Dominici F et al (2019) Maleinized linseed oil as epoxy resin hardener for composites with high bio content obtained from linen byproducts. Polymers 11:2 Galindo-González L, Mhiri C, Deyholos MK et al (2017) LTR-retrotransposons in plants: engines of evolution. Gene 626:14–25 Garbus I, Romero JR, Valarik M et al (2015) Characterization of repetitive DNA landscape in wheat homeologous group 4 chromosomes. BMC Genomics 16:375–375 Girollet N, Rubio B, Lopez-Roques C et al (2019) De novo phased assembly of the Vitis riparia grape genome. Sci Data 6:127 Goodstein DM, Shu S, Howson R et al (2011) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40:D1178–D1186

34 Goyal A, Sharma V, Upadhyay N et al (2014) Flax and flaxseed oil: an ancient medicine & modern functional food. J Food Sci Technol 51:1633–1653 Grover D, Mukerji M, Bhatnagar P et al (2004) Alu repeat analysis in the complete human genome: trends and variations with respect to genomic composition. Bioinformatics 20:813–817 Guo C, Spinelli M, Ye C et al (2017) Genome-wide comparative analysis of miniature inverted repeat transposable elements in 19 Arabidopsis thaliana ecotype accessions. Sci Rep 7:2634 Haberer G, Kamal N, Bauer E et al (2020) European maize genomes highlight intraspecies variation in repeat and gene content. Nat Genet 52:950–957 Han M-J, Shen Y-H, Xu M-S et al (2013) Identification and evolution of the silkworm Helitrons and their contribution to transcripts. DNA Res 20:471–484 Han Y, Wessler SR (2010) MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res 38:e199–e199 Hannan AJ (2018) Tandem repeats mediating genetic plasticity in health and disease. Nat Rev Genet 19:286–298 He Q, Cai Z, Hu T et al (2015) Repetitive sequence analysis and karyotyping reveals centromereassociated DNA sequences in radish (Raphanus sativus L.). BMC Plant Biol 15:105 Hoede C, Arnoux S, Moisset M et al (2014) PASTEC: an automatic transposable element classification tool. PLoS ONE 9:e91929 Hunt SP, Jarvis DE, Larsen DJ et al (2020) A chromosome-scale assembly of the garden orach (Atriplex hortensis L.) genome using oxford nanopore sequencing. Front Plant Sci 11:624 Jain M, Koren S, Miga KH et al (2018) Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol 36:338–345 Jeffreys AJ, Wilson V, Thein SL (1985) Hypervariable “minisatellite” regions in human DNA. Nature 314:67–73 Jiao Y, Peluso P, Shi J et al (2017) Improved maize reference genome with single-molecule technologies. Nature 546:524–527 Kapitonov VV, Jurka J (2001) Rolling-circle transposons in eukaryotes. Proc Natl Acad Sci USA 98:8714–8719 Kapitonov VV, Jurka J (2007) Helitrons on a roll: eukaryotic rolling-circle transposons. Trends Genet 23:521–529 Kezimana P, Dmitriev AA, Kudryavtseva AV et al (2018) Secoisolariciresinol diglucoside of flaxseed and its metabolites: biosynthesis and potential for nutraceuticals. Front Genet 9:641 Kidwell MG (2002) Transposable elements and the evolution of genome size in eukaryotes. Genetica 115:49–63 Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921

N. Khan et al. Litt M, Luty JA (1989) A hypervariable microsatellite revealed by in vitro amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am J Hum Genet 44:397–401 Liu Y, Du H, Li P et al (2020) Pan-genome of wild and cultivated soybeans. Cell 182:162-176.e113 Lu C, Chen J, Zhang Y et al (2012) Miniature invertedrepeat transposable elements (MITEs) have been accumulated through amplification bursts and play important roles in gene expression and species diversity in Oryza sativa. Mol Biol Evol 29:1005–1017 Ma Z, Zhang Y, Wu L et al (2021) High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement. Nat Genet 53:1385–1391 Madireddy A, Gerhardt J (2017) Replication through repetitive DNA elements and their role in human diseases. Adv Exp Med Biol 1042:549–581 Mao H, Wang H (2016) SINE_scan: an efficient tool to discover short interspersed nuclear elements (SINEs) in large-scale genomic datasets. Bioinformatics 33:743–745 Mazza G, Biliaderis CG (1989) Functional properties of flax seed mucilage. J Food Sci 54:1302–1305 Mokhothu TH, John MJ (2015) Review on hygroscopic aging of cellulose fibres and their biocomposites. Carbohydr Polym 131:337–354 Momose M, Abe Y, Ozeki Y (2010) Miniature invertedrepeat transposable elements of stowaway are active in potato. Genetics 186:59–66 Ou S, Jiang N (2018) LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol 176:1410–1422 Parikh M, Maddaford TG, Austria JA et al (2019) Dietary flaxseed as a strategy for improving human health. Nutrients 11:5 Paulson H (2018) Repeat expansion diseases. Handb Clin Neurol 147:105–123 Plohl M, Luchetti A, Mestrović N et al (2008) Satellite DNAs between selfishness and functionality: structure, genomics and evolution of tandem repeats in centromeric (hetero)chromatin. Gene 409:72–82 Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1):i351-358 Ragupathy R, You FM, Cloutier S (2013) Arguments for standardizing transposable element annotation in plant genomes. Trends Plant Sci 18:367–376 Rebollo R, Romanish MT, Mager DL (2012) Transposable elements: an abundant and natural source of regulatory sequences for host genes. Annu Rev Genet 46:21–42 Rho M, Choi J-H, Kim S et al (2007) De novo identification of LTR retrotransposons in eukaryotic genomes. BMC Genomics 8:90 Sa R, Yi L, Siqin B et al (2021) Chromosome-level genome assembly and annotation of the fiber flax (Linum usitatissimum) genome. Front Genet 12:1665

2

Repeat DNA Sequences in Flax Genomes

Sampath P, Murukarthick J, Izzah NK et al (2014) Genome-wide comparative analysis of 20 miniature inverted-repeat transposable element families in Brassica rapa and B. oleracea. PLoS One 9:e94499 Sasaki T, International rice genome sequencing P (2005) The map-based sequence of the rice genome. Nature 436:793–800 Schmid CW, Deininger PL (1975) Sequence organization of the human genome. Cell 6:345–358 Schwichtenberg K, Wenke T, Zakrzewski F et al (2016) Diversification, evolution and methylation of short interspersed nuclear element families in sugar beet and related Amaranthaceae species. Plant J 85:229– 244 Sperling AK, Li RW (2013) Repetitive sequences. In: Maloy S, Hughes K (eds) Brenner’s encyclopedia of genetics, 2nd edn. Academic Press, San Diego, pp 150–154 Steinbiss S, Willhoeft U, Gremme G et al (2009) Finegrained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Res 37:7002–7013 Storer J, Hubley R, Rosen J et al (2021) The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA 12:2 Tempel S, Nicolas J, El Amrani A et al (2007) Modelbased identification of Helitrons results in a new classification of their families in Arabidopsis thaliana. Gene 403:18–28 The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 Tørresen OK, Star B, Mier P et al (2019) Tandem repeats lead to sequence assembly errors and impose multilevel challenges for genome and protein databases. Nucleic Acids Res 47:10994–11006 Tu Z (2001) Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae. Proc Natl Acad Sci USA 98:1699–1704 Ugarković D, Plohl M (2002) Variation in satellite DNA profiles–causes and effects. EMBO J 21:5955–5959 Vaisey-Genser M, Morris DH (2003) Introduction: history of the cultivation and uses of flaxseed. In: Muir AD, Westcott ND (eds) The genus Linum. CRC Press, London, pp 1–21 Vassetzky NS, Kramerov DA (2013) SINEBase: a database and tool for SINE analysis. Nucleic Acids Res 41:D83-89 Vergnaud G, Denoeud F (2000) Minisatellites: mutability and genome architecture. Genome Res 10:899–907 Vitte C, Panaud O (2005) LTR retrotransposons and flowering plant genome size: emergence of the increase/decrease model. Cytogenet Genome Res 110:91–107 Wang Z, Hobson N, Galindo L et al (2012) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72:461–473

35 Wei Q, Wang J, Wang W et al (2020) A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant. Hortic Res 7:153 Wenke T, Döbel T, Sörensen TR et al (2011) Targeted identification of short interspersed nuclear element families shows their widespread existence and extreme heterogeneity in plant genomes. Plant Cell 23:3117– 3128 Wessler SR, Bureau TE, White SE (1995) LTRretrotransposons and MITEs: important players in the evolution of plant genomes. Curr Opin Genet Dev 5:814–821 Wicker T, Gundlach H, Spannagl M et al (2018) Impact of transposable elements on genome structure and evolution in bread wheat. Genome Biol 19:103 Wicker T, Sabot F, Hua-Van A et al (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982 Wright SI, Agrawal N, Bureau TE (2003) Effects of recombination rate and gene density on transposable element distributions in Arabidopsis thaliana. Genome Res 13:1897–1903 Wu CM, Lai WY, Wang CY (2016) Effects of surface modification on the mechanical properties of flax/bpolypropylene composites. Materials (Basel, Switzerland) 9:5 Xiong W, He L, Lai J et al (2014) HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc Natl Acad Sci USA 111:10263–10268 Xu Z, Wang H (2007) LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35:W265–W268 Yan L, Helguera M, Kato K et al (2004) Allelic variation at the VRN-1 promoter region in polyploid wheat. Theor Appl Genet 109:1677–1686 Yang G (2013) MITE Digger, an efficient and accurate algorithm for genome wide discovery of miniature inverted repeat transposable elements. BMC Bioinform 14:186 Yang L, Bennetzen JL (2009) Distribution, diversity, evolution, and survival of Helitrons in the maize genome. Proc Natl Acad Sci USA 106:19922–19927 Ye C, Ji G, Liang C (2016) detectMITE: a novel approach to detect miniature inverted repeat transposable elements in genomes. Sci Rep 6:19688 You F, Cloutier S, Shan Y et al (2015) LTR Annotator: automated identification and annotation of ltr retrotransposons in plant genomes. Int J Bio Biochem Bioinform 5:165–174 You FM, Xiao J, Li P et al (2018) Chromosome-scale pseudomolecules refined by optical, physical and genetic maps in flax. Plant J 95:371–384 Zhang H, Liu D, Huang T et al (2020a) Threedimensional printing of continuous flax fiberreinforced thermoplastic composites by five-axis machine. Materials (Basel, Switzerland) 13:7 Zhang J, Qi Y, Wang L et al (2020b) Genomic comparison and population diversity analysis provide

36 insights into the domestication and improvement of flax. iScience 23:100967 Zhu J, Zhu H, Njuguna J et al (2013) Recent development of flax fibres and their reinforced composites based on different polymeric matrices. Materials (Basel, Switzerland) 6:5171–5198

N. Khan et al. Zhu T, Wang L, Rimbert H et al (2021) Optical maps refine the bread wheat Triticum aestivum cv Chinese spring genome assembly. Plant J 107:303–314

Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative

3

Yong-Bi Fu

3.1

Introduction

Flax (Linum usitatissimum L.) is one of the founding agricultural crops in the Near East (Zohary et al. 2012) and was cultivated for oil and fiber production (Pavelek et al. 2015; Hall et al. 2016). Fiber flax is bred for its long stem containing long fibers and is mainly grown in Russia, China, Egypt, and near the Northwestern European coast, whereas linseed is cultivated for short and highly branched plants to increase the number of flowers for seed production in Canada, China, USA, India, and Russia (Durrant 1976; Vromans et al. 2006). Modern flax breeding has faced many challenges with the overall objectives to develop flax cultivars with increased fiber or seed yields, better adaptation, and disease resistance for changing market needs (Green et al. 2008; Hall et al. 2016). Climate change is also expected to impact flax breeding and production (Kaur et al. 2017). The health-related properties of flax for human and animal nutrition

Y.-B. Fu (&) Plant Gene Resources of Canada, Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada e-mail: [email protected]

will stimulate the search for new traits in flax germplasm collections and their incorporation into breeding schemes (Muir and Westcott 2003). Meeting these challenges requires accelerated access to diverse flax gene pools and an effective search for useful genetic variability (Brozynska et al. 2016; Fu 2019). Natural variation is the raw material for any crop improvement and constitutes a critical part of any long-term strategy to enhance the productivity, sustainability and resilience of crop varieties and agricultural systems (Godfray et al. 2010; Henry and Nevo 2014). Pale flax (L. bienne) is long known to be the wild progenitor of cultivated flax (Heer 1872; Tammes 1928; Gill 1966; Fu et al. 2002a). It is also expected to harbor an important source of genetic variability for flax genetic improvement (Diederichsen and Hammer 1995; Uysal et al. 2012; Soto-Cerda et al. 2014). Here we conducted a literature review to update our knowledge on pale flax in taxonomy, biology, domestication, genetics, genomics, utilization, and conservation. It is clear that pale flax is an important wild relative of cultivated flax and remains underexplored in many aspects such as conservation, utilization, genetics and genomics. Little is known about this important flax wild relative. It is our hope that this review would stimulate more efforts to conserve and utilize pale flax.

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_3

37

38

3.2

Y.-B. Fu

Taxonomy

Pale flax is the English common name of L. bienne. It is a flowering plant and belongs to the section of Linum in the genus Linum L. with about 200 species of the family Linaceae DC. ex Perleb (Winkler 1931; Maguilla et al. 2021). It has synonyms such as: L. bienne Mill., Gard. dict. ed. 8 (1768) n. 8—L. usitatissimum subsp. angustifolium (Huds.) Thell., Fl. adv. Montp. (1912) 361; other synonyms are L. ambiguum Jord., L. hohenhackeri Boiss., L. usitatissimum subsp. hispanicum Thell., L. dehiscens Vav. et Ell. subsp. angustifolium (Huds.) Vav. et Ell. and L. angustifolium Huds. (Hammer 2001). Oswald von Heer (1809–1883) proposed that pale flax should be treated as a subspecies of L. usitatissimum (Heer 1872) as he identified pale flax as the possible wild progenitor of cultivated flax (Diederichsen and Hammer 1995). Pale flax can grow as a winter annual or perennial plant, mainly in wet places such as moist grassy areas, springs, seepage areas on rocky slopes, limestones, and marshy lands. It can also grow in relatively dry environments, particularly in northern latitudes (Rocio Perez-Barrales, personal communication) and in the Balkans rocky slopes and limestones. Its stems are narrow and long (up to 60 cm tall), and its leaves are slender and long (1.5–2.5 cm). It has flowers with five petals about 1 cm long and nearly round. The flowers vary in color, but mainly light violet, and are streaked with a darker color (Uysal et al. 2012). Pale flax normally flowers in spring and, at least in more temperate regions, through the summer. The flowers are homostylous (RuizMartin et al. 2018). The capsules open spontaneously, and the seeds shatter. Its seed is generally smaller than cultivated flax (Diederichsen and Hammer 1995), and the plants need vernalization to induce flowering (Diederichsen 2019). Jules Émile Planchon (1823–1888) presented the first detailed characterization of the differences between cultivated flax and pale flax (Planchon 1848). Figure 3.1 illustrates the major differences in the growth, boll openness and seed size between cultivated flax and pale flax. These

features, together, could serve as the major keys to distinguish between pale flax and cultivated flax. Note that the capsules of L. usitatissimum convar. crepitans can open septicidally and loculicidally during ripening (Diederichsen and Richards 2003). Pale flax is indigenous to the geographical territory bordering the Mediterranean Sea, Iran, and the Canary Islands (Diederichsen and Hammer 1995). More specifically, the Plants of the World online, Kew Science, shows its native ranges in the following countries and regions: Albania, Algeria, Baleares, Bulgaria, Croatia, Montenegro, Canary Isand, Corse, Cyprus, East Aegean Island, France, Great Britain, Greece, Iran, Iraq, Ireland, Italy, Kriti, Krym, LebanonSyria, Libya, Madeira, Morocco, North Caucasus, Palestine, Portugal, Sardegna, Saudi Arabia, Sicilia, Spain, Transcaucasus, Tunisia, Turkey, Turkey-in-Europe, and Yugoslavia (Fig. 3.2A in olive color). Pale flax has been observed in the UK, but not beyond the Midlands (Robin Allaby, personal communication). These native ranges are consistent with those shown in Fig. 8 of Weiss and Zohary (2011). Pale flax was also introduced into other geographic regions: Argentina Northeast, British Columbia, California, Chile Central, Chile North, Chile South, New Zealand North, Oregon, and Pennsylvania (Fig. 3.2A in purple) and South Africa (specifically the Western Cape Region; Rocio PerezBarrales, personal communication). Its worldwide occurrences in the Global Biodiversity Information Facility database were shown in Fig. 3.2B, indicating the continuous expansion of its introduction to different parts of the world. However, the history of these introductions remains obscure (e.g., see van Kleunen et al. 2020) but is scientifically intriguing, as pale flax introduction may provide a model to investigate how a wild plant spreads across the globe under human influence. It is possible that these introductions were associated with the European expansion of flax cultivation in the seventeenth– nineteenth centuries (Warnes 1846) through European migration to new territories (Whitman 1888).

3

Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative

39

Fig. 3.1 Comparative illustration of the differences in the growth (top panel), boll openness (middle panel), and seed size (bottom panel) between cultivated flax (left) and pale flax (right). The middle right panel was provided by courtesy of Dr. Hüseyin Uysal, Adnan Menderes University, Turkey

3.3

Biology

Pale flax, like cultivated flax, is an inbreeding species with homostylous flowers. However, it also shows a phylogenetic proximity of polystyly (Ruiz-Martin et al. 2018) and can be pollinated by various insects. There is no report on the estimation of outcrossing rate in natural populations of pale flax, and it would be interesting to

know if the outcrossing rate would be higher than 6% as reported by Robinson (1937) for fiber flax. No research on male sterility in pale flax was found. Like cultivated flax, pale flax is a diploid species with 2n = 30 chromosomes (Reynders 1926 as cited in Tammes 1928), but its genome size was estimated to be 573.30 Mbp (Pustahija et al. 2013). Cytologically, it is differentiated from cultivated flax by one chromosome translocation and from other Linum species that

40

Y.-B. Fu

Fig. 3.2 Worldwide distribution and germplasm collections of pale flax. Panel A shows the geographic distribution of native (in olive color) and introduced (in purple) pale flax. Panel B plots the worldwide occurrence records of pale flax in the Global Biodiversity Information Facility (GBIF) database. Panel C displays the geographic locations of 100 accessions (in red dots) conserved at worldwide genebanks and 14 accessions (in blue dots) collected for four scientific publications. Panel A was adopted from the Plants of the World online, Kew Science (http://www. plantsoftheworldonline.org), and Panel B was adopted from the GBIF website (https://www.gbif.org/species/ 2873896)

are wild relatives of cultivated flax by two translocations (Gill 1966). Little research was made on the evolutionary ecology of pale flax (Landoni et al. 2022). Early studies showed that pale flax has a crossability not only with cultivated flax (Tammes 1928), but also with several other wild relatives, to produce fertile hybrids (e.g., Gill and Yermanos 1967; Seetharam 1972), although

further verification on those crosses is needed (Rocio Perez-Barrales, personal communication). Biochemical studies of pale flax seeds revealed fatty acid profiles compatible with those reported in cultivated flax (Plessers 1966; Yermanos 1966; Yermanos et al. 1966; Rogers 1972; Uysal and Kurt 2014). Some genes resistant to flax rust strains were identified in pale flax germplasm (Henry 1928; Islam and Mayo 1990). Pale flax

3

Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative

displays larger variations in many quantitative and qualitative characters than cultivated flax (Diederichsen and Hammer 1995; Uysal et al. 2012). Favorable alleles, with the potentially positive effect to improve flax yield through yield components in pale flax, were identified (SotoCerda et al. 2014). These major biological characteristics show the potential value of pale flax as an important crop wild relative. Some flax wild relatives have been listed among additional prioritized U.S. crop wild relatives and wild utilized species for further actions in field collections for germplasm conservation (Khoury et al. 2013). However, the conservation status of pale flax is not yet assessed in the databases of NatureServe (https://explorer. natureserve.org) and the International Union for Conservation of Nature (https://www.iucnredlist. org/). This largely reflects the lack of botanical and ecological investigations on pale flax and other flax wild relatives. As flax is not on the list of the Annex I crops of the International Treaty for Plant Genetic Resources for Food and Agriculture (ITPGRFA), its wild relatives were not covered under the 10-year crop wild relatives project of the Crop Trust (https://www. cwrdiversity.org/), further limiting the evaluation of pale flax vulnerability in natural habitats and affecting its conservation.

3.4

Domestication

Since the first report by Heer (1872) of the close relatedness of pale flax to cultivated flax, several lines of studies have been pursued to confirm if pale flax is the progenitor of cultivated flax. Crossing experiments showed that pale flax could be easily hybridized with cultivated flax in both directions to yield fertile hybrids (Tammes 1928; Gill 1966; Mhiret 2019). Some cytogenetic analysis further confirmed the close relationship between cultivated and pale flax (e.g., see Muravenko et al. 2010). A genetic analysis confirmed that cultivated flax had a genetically closer relationship to pale flax than to any other wild relatives (Fu et al. 2002a). Also, more than 90% of the microsatellite markers developed in

41

cultivated flax were transferrable to pale flax, 2–3 times higher than those to other wild relatives, indicating a high genomic similarity between the two species (Fu and Peterson 2010; Soto-Cerda et al. 2011; Landoni et al. 2020). The domestication of pale flax probably occurred in the Near East some 10,000 years ago (Heer 1872; de Candolle 1886; Weiss and Zohary 2011; Zohary et al. 2012). The earliest archeological finds of pale flax come from Tell Abu Hureyra in Northern Syria 11,200– 10,500 years ago (Hillman 1975). The first occurrence of cultivated forms of flax with increased seed sizes is evidenced in archeological records at Tell Ramad in Syria 9000 years ago (van Zeist and Bakker-Heeres 1975). Flax cultivation was evident in Egypt between 4500 and 4000 BC (Melelli et al. 2021), spread from the Near East to Europe and the Nile Valley (Bosi et al. 2011), and reached Switzerland around 3000 BC (Helbaek 1959). Flax fiber has been identified in prehistoric sites in Israel, Syria, and Georgia (van Zeist and Bakker-Heeres 1975; Kvavadze et al. 2009), suggesting their use by prehistoric hunter-gatherers. However, the finds of 30,000-year-old fiber fragments of wild flax from the Dzudzuana Cave, Georgia were contested (Bergfjord et al. 2010). The finds of clear flax textiles (ca. 6500–6400 BC) at Çatalhöyük, Turkey, were reported (Fuller et al. 2014). The transition from pale flax to cultivated flax under domestication shows a typical feature of the domestication syndrome: non-dehiscent capsules and the increase of seed size (like in many grain crops; Fuller 2007; Karg et al. 2018) as well as the selection for higher oil yield or longer stems with a high amount of long fibers. Genetic inferences made of the pale flax domestication revealed evidence for multiple flax domestication paths for oil-associated traits before selection for other domestication-associated traits of seed dispersal loss and fiber production (Allaby et al. 2005; Fu et al. 2012). Other genetic evidence for early flax domestication was obtained from the analysis of capsular dehiscence (Fu 2011). Based on population-based resequencing, an ancestral winter group of cultivated flax was also identified, offering new insight into the flax

42

Y.-B. Fu

domestication process (Fu 2012). A recent genotyping-by-sequencing analysis revealed that pale flax contributed to the adaptation of cultivated flax in the European climate through postdomestication gene flow (Gutaker et al. 2019). Together, however, we are still far away from understanding the history and process of pale flax domestication. Many interesting questions are rooted in early thoughts of plant evolution (de Candolle 1886; Vavilov 1926; Elladi 1940; Sinskaja 1969) and largely remain (Lay and Dybing 1989; Allaby et al. 2005; Gutaker et al. 2019). Where was pale flax first domesticated in the Near East? When was it domesticated for oil use and fiber production? When and how did domestication spread northward into Europe? Searching for genetic evidence for pale flax domestication requires the acquisition of more informative evolutionary signals accumulated in both the cultivated and pale flax gene pools, but a well-represented pale flax germplasm collection is still absent, as described below.

3.5

Genetics

Dr. Jantina Tammes (1871–1947) was an influential Dutch plant geneticist who devoted her time investigating the inheritance of different characters in the genus Linum as early as 1907. In her influential work “The genetics of the genus Linum” (Tammes 1928), she summarized her long-term research findings on the inheritance of many flax traits such as seed color, length and width, and updated the genetic knowledge gained from the other scientists working on flax. In particular, she described her finding in 1911 of at least four alleles controlling seed length, reasoned from the frequency distribution of seed lengths for the parents and first (F1) and second (F2) generations of a hybrid cross of cultivated flax and pale flax. This finding was extremely significant at the time, as Mendelian inheritance could explain the inheritance of continuous traits through the multiple-factor hypothesis or, nowadays, quantitative genetics. Her early crossing experiments not only generated novel genetic knowledge of many flax traits, but also

laid a foundation for later research for genetic improvement of quantitative traits such as oil content and disease resistance (Dillman 1936). More relevant to this review is the statement she made from many of her own experiments “though differing in many characters, these two species are easily crossed in both directions and the hybrids are no less fertile than the parents.” Many cytogenetic studies of the genus Linum were conducted before the 1970s, advancing our knowledge about the evolution of the genus Linum (Gill 1966, 1987), including pale flax. First, early cytogenetic analyses confirmed that pale flax has a haploid chromosome count of 15 (e.g., see Ray 1944). Second, pale flax differs from cultivated flax by one translocation and from L. africanum, L. corymbiferum and L. decumbens by two translocations (Gill and Yermanos 1967). Third, pale flax has crossability with cultivated flax and several other wild relatives such as L. decumbens (Gill and Yermanos 1967; Seetharam 1972). These findings, although likely compounded with species misidentification (e.g., see Diederichsen 2019), support the early proposal of pale flax as the wild progenitor of cultivated flax (Heer 1872). Many genetic markers were developed over the last 20 years for genetic analyses of cultivated flax and pale flax. Related and beneficial to pale flax are the genetic markers of isozyme (Yurenkova et al. 2005), random amplified polymorphic DNA (RAPD) (Fu et al. 2002a; Muravenko et al. 2009), sequence-specific amplification polymorphism (SSAP) (Melnikova et al. 2014), single primer amplification reactions (SPAR) (Noormohammadi et al. 2017), inter-simple-sequence repeat (ISSR) (Uysal et al. 2010), inter-retrotransposon-amplified polymorphism (IRAP) (Smýkal et al. 2011; Mhiret and Heslop-Harrison 2018), and simple-sequence repeat (SSR) (Fu and Peterson 2010; SotoCerda et al. 2011; Habibollahi et al. 2016; Landoni et al. 2020). Some efforts were also made to assess organellar genetic variations in pale flax (Fu and Allaby 2010; McDill and Simpson 2011; Fu et al. 2016). Many applications of genetic markers were made to study the evolutionary relationships of the genus Linum (Fu et al.

3

Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative

2002a) and flax domestication (e.g., see Allaby et al. 2005). As summarized in the previous section, marker-based studies enhanced our understanding of pale flax domestication and diversity. Still, these studies were limited due to the lack of well-represented germplasm collected from the extant pale flax gene pool. Genetic diversity analysis of pale flax was also conducted using ISSR markers (Uysal et al. 2010). It was found that pale flax in Turkey is genetically diverse, having similar levels of genetic variation as cultivated flax, and that significant spatial genetic autocorrelation existed in natural populations. Genetic distances among the assayed pale flax accessions were significantly associated with their geographic distributions and elevational differences among sites, suggesting their strong local adaptation (Uysal et al. 2010). Similar patterns of genetic variation in pale flax were also found with SSR markers (Soto-Cerda et al. 2014). A 5sRNA marker analysis of Turkish pale flax accessions revealed two major groups (Christopher Cullis, personal communication), suggesting the existence of different genetic backgrounds in the Turkish pale flax accessions. However, there are a lot of gaps present in the genetic analyses of pale flax. No specific research was done to develop a genetic linkage map for pale flax and identify QTLs for important traits. Little effort was made to investigate its population genetics (Jhala et al. 2008; Gutaker et al. 2019) and evolutionary vulnerability in natural populations, and to study the evolution and functional diversification of its organelles (Fu et al. 2016).

3.6

Genomics

The last decade has also seen increasing research conducted to generate genomic resources for the genomic studies of pale flax. Most of these genomic resources mentioned below were deposited in the National Centre for Biotechnology Information (NCBI) database. First, genotype-by-sequencing technology via 454 pyrosequencing was applied to identify contigs

43

and SNPs from five pale flax plants (Fu and Peterson 2012). This effort successfully generated the first novel set of 228 pale flax genomic contigs and 481 SNPs. Similarly, a restriction site associated DNA sequencing (RAD-seq) approach was applied to assay 28 pale flax plants and identified 993 polymorphic RAD tags encompassing a total of 1686 SNPs for the inference of population genetic structure in pale flax (Gutaker et al. 2019). A multiplexed shotgun sequencing analysis of 18 Linum samples representing 16 species, including pale flax, revealed 6143 chloroplast, 2673 mitochondrial, and 19,562 nuclear SNPs for the genus Linum (Fu et al. 2016). Also, some transcriptomic resources were available for pale flax (Sveinsson et al. 2014). Second, whole genome sequencing of a pale flax plant was recently completed, and a draft genome assembly of a pale flax plant was released in 2020 (Zhang et al. 2020). The genome assembly consisted of 2609 scaffolds (293.5 Mb) and 10,198 contigs (287.9 Mb), with the scaffold N50/contig N50 length of 384 Kb/59 Kb, respectively, and covered about 79% of the pale flax genome, assuming an equivalent genome size of 373 Mb of the cultivated flax (Wang et al. 2012). However, the total gap length is 5,635,035 bp. The genome annotation also identified 43,500 protein-coding genes, 2600–2800 non-coding RNAs, and 244,460 (about 109.4 Mb) repetitive sequences. Overall, this assembly is well suited for the genomic analysis of sequence variation, but not genome structure, of pale flax. Thus, more effort is still needed to improve the published genome assembly and coverage (You et al. 2018). With limited genomic resources, informative genomic research of pale flax lagged behind, when compared to cultivated flax (Gutaker et al. 2019). However, Dr. Sylvie Cloutier and her colleagues conducted the first genomic analysis of 125 pale flax accessions (Soto-Cerda et al. 2014). More specifically, through molecular diversity and association analyses, they identified favorable alleles in pale flax with potentially positive effects to improve flax yield components. This effort was significant, as it

44

Y.-B. Fu

represented the first genomic research to explore the potential of pale flax as a source of useful genetic variation for cultivated flax. Further genomic analyses of adaptive traits in pale flax would be more fruitful, as pale flax generally displayed larger variation in vegetative plant parts and growth habit than cultivated flax and more heterogeneity within accessions. A higher degree of variation within pale flax was observed in many generative parts, such as flower characteristics compared to capsule and seed traits (Uysal et al. 2012).

3.7

Utilization

Pale flax grows in different types of habitats, including disturbed places over a wide geographic distribution (Fig. 3.1A), but there was no report to assess the potential of its use for habitat restoration or other ecosystem services in its natural species range. However, it has been long introduced to many parts of the world (Fig. 3.1A, B) and largely classified as a weed or invasive plant (Kraus et al. 2020). For example, it has been introduced into North America, where it is naturalized on the Pacific coast from Oregon to California central coast and in Pennsylvania (Flora of North America Editorial Committee 2016). Pale flax can also be grown as a garden plant (Huxley 1992). Pale flax contributed, as the progenitor of cultivated flax, to the domestication of flax for oil and fiber uses 10,000 years ago, but there was no documentation that pale flax was directly used as breeding material in modern flax breeding (Diederichsen 2007; Cullis 2011; Tork et al. 2019). However, a few studies are encouraging for modern flax breeding. Dr. Worku Negash Mhiret in Ethiopia performed different hybridization experiments between L. bienne (PI522290) plants and Ethiopian linseed cultivars and revealed that selfed F2 hybrids scored the highest coefficients of variation for all assayed characteristics, including seed weight and fatty acid compositions (Mhiret 2019). Another crossing effort was also made between pale flax

and cultivated flax in the flax breeding program at the University of Saskatchewan with the goal to identify exotic genes conditioning dehiscence and pasmo resistance and several advanced F6 lines with pasmo resistance were obtained (Helen Booker and Lester Young, personal communication). Uysal et al. (2012) characterized phenotypically 34 accessions of pale flax collected from Turkey in a greenhouse and revealed large variations in 12 quantitative and 7 qualitative characters covering vegetative and generative plant parts, including phenological traits. This finding is useful for guiding further field-based agronomic studies of pale flax germplasm that are currently missing (Diederichsen and Hammer 1995; Hall et al. 2016). Similarly, Uysal and Kurt (2014) also investigated fatty acid composition in 34 accessions of pale flax germplasm, revealed some interesting fatty acid profiles, and identified some pale flax germplasm that could be explored for flax breeding to reduce the saturated fatty acid content in linseeds. As mentioned above, SotoCerda et al. (2014) showed that pale flax is a potential source of novel variation to improve multiple traits in cultivated flax and that association mapping is a suitable approach to screening pale flax germplasm to identify favorable quantitative trait locus alleles. With the increased genomic resources in pale flax, exploring the pale flax gene pool will become more feasible and fruitful for modern flax breeding (Qi et al. 2022). To explore the pale flax gene pool for its utilization in flax breeding, several additional lines of research would also be beneficial (Hall et al. 2016; Kaur et al. 2017). First, biochemical studies of pale flax seeds have been conducted on fatty acid composition (Plessers 1966; Yermanos 1966; Yermanos et al. 1966, 1969; Rogers 1972; Ranjzad et al. 2009; Uysal and Kurt 2014), tocopherol and plastochromanol (Velasco and Goffman 2000), protein (Sharifnia and Assadi 2003) and lignan (Schmidt et al. 2006; 2010). Three main parts of the flax seed that provide health benefits are fiber (both soluble fiber and insoluble fiber), alpha-linolenic acid/omega 3fatty acid, and lignans (Kajla et al. 2015; Mohagheghzadeh et al. 2009; Parikh et al. 2019).

3

Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative

However, little research was done on these nutrient components of pale flax (Muir and Westcott 2003; Green et al. 2008). Second, the genetic base of fiber flax cultivars is extremely narrow (Fu et al. 2002b; Goudenhooft et al. 2019; Duk et al. 2021) and widening the breeding gene pool for the genetic improvement of fiber flax is warranted for bast fiber production (Galinousky et al. 2020). The immediate source for genetic introgression is pale flax and other flax wild relatives (Abbo et al. 2015; Fu 2019). However, the studies of physical, chemical, and molecular characteristics of bast fiber in pale flax and other wild relatives are scarce (Esau 1943; Deyholos 2006), hindering the utilization of pale flax germplasm in fiber flax breeding. Third, disease resistance in pale flax has been investigated since 1928 (Henry 1928; Madill et al. 1964; Misra 1966; Smith and Hobin 1973; Wicks and Hammond 1978), and some resistant genes to rust strains were identified (Islam and Mayo 1990). Given the diversity of flax diseases (Rashid 2003), more disease screening in the pale flax gene pool would increase the chance to acquire new sources of resistant genes for modern flax breeding (Kaur et al. 2017). However, the challenge remains that the pale flax gene pool has not been well represented in the existing collections of pale flax germplasm (Diederichsen 2007).

3.8

Conservation

The exact number of pale flax germplasm accessions conserved in worldwide genebanks and botanic gardens are unknown, but about 350–400 accessions were suggested by Diederichsen (2007) (Table 3.1). An effort was made by directly contacting some genebanks and botanic gardens or searching related databases to update the holdings of pale flax germplasm with partial success. The largest original collection of 134 accessions is currently conserved at LeibnizInstitut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Gatersleben, Germany, including collections from botanical gardens and from

45

natural habitats in Europe. Plant Gene Resources of Canada has expanded its collection to 66 accessions (49 accessions with CN coding for seeds available for distribution and 17 accessions with TMP coding still needing seed increase). These accessions included 36 accessions collected from Turkey (Uysal et al. 2010) and 16 accessions collected from the Balkan countries (Gutaker 2014). Our effort also identified four genebanks with pale flax accessions (Israel Plant Gene Bank with 13 accessions; Kew Royal Botanic Gardens with 11 accessions; Centre for Genetic Resources, the Netherlands, with four accessions; and Mediterranean Germplasm Database, Italy with three accessions), along with 14 accessions reported in four scientific publications in Table 3.2. However, the counts of pale flax accessions in four genebanks reported by Diederichsen (2007) were not able to be verified for updating. Although flax is not part of the ITPGRFA Annex I crops, the German, Canadian, and US national genebanks have decided to place all flax genetic resources in the Multilateral System for Access and Benefit-sharing of the ITPGRFA, and they are available based on the ITPGRFA Standard Material Transfer Agreement. In other cases, special material transfer agreements may be required. One major issue with the updated counts of pale accessions was that considerable duplication exists among the genebank collections, and unique pale flax accessions should be fewer than the updated counts in Table 3.1. To understand the range of coverage of existing pale flax germplasm and to enhance the germplasm utilization and research, we compiled a new list of 114 less duplicated pale flax accessions with global positioning system (GPS) coordinates in Table 3.2. This was achieved by extracting those accessions with described locations or GPS coordinates from the genebank collections and by searching pale flax accession records with GPS coordinates from published literature. Figure 3.2C showed the geographic origins of the 100 accessions conserved in genebanks (in red dots) and 14 accessions reported in scientific publications (in blue dots). Overall, the existing germplasm collections have relatively poor

46

Y.-B. Fu

Table 3.1 Early report of Dr. Axel Diederichsen (2007) and current update on the counts of pale flax accessions conserved at worldwide genebanks and collected with GPS coordinates for scientific publications Source

Table 4 of Diederichsen (2007)

Update in 2021

120

134

N.I.Vavilov Research Institute of Plant Industry, Russia (http://db.vir.nw.ru/ virdb/maindb)

45

45*

U.S. National Plant Germplasm System (https://npgsweb.ars-grin.gov/ gringlobal/search)

14

14

4

4*

All-Russian Flax Research Institute, Russia (http://vniil.narod.ru/index.html)

64

64*

Plant Gene Resources of Canada (https://pgrc.agr.gc.ca/search_grincarecherche_rirgc_e.html)

13

66

Suceava Genebank, Romania (https://www.svgenebank.ro/svgbdefault.asp)

17

16

2

2*

Genebank Information System of the IPK Gatersleben (https://gbis.ipkgatersleben.de/gbis2i/faces/index.jsf)

National Centre for Plant Genetic Resources, Poland (https://bankgenow. edu.pl/en/zamow-obiekty/)

The Centre for Plant Diversity, Hungary (http://www.nodik.org/english/) Centre for Genetic Resources, the Netherlands (https://cgngenis.wur.nl/)

4

Israel Plant Gene Bank (https://igb.agri.gov.il/web/index.php)

13

Mediterranean Germplasm Database, Italy (https://ibbr.cnr.it/mgd/)

3

Kew gardens millennium seed bank (https://www.kew.org/wakehurst/whatsat-wakehurst/millennium-seed-bank)

11

Four scientific publications in Table 3.2

14

Total

279

390

Note The counts of pale flax accessions labeled with * were unable to be verified for updating in 2021

coverage in the European and Mediterranean regions and large geographical gaps in the species distribution. No germplasm was collected from the regions of northern Africa, Jordan, Iraq, Syria, and Iran. Also, no reports were found on the development of in situ conservation strategies to protect pale flax natural populations. Thus, urgent actions are clearly needed to assess the species vulnerability, particularly in those regions without any collected germplasm, and to develop effective in situ conservation strategies. Given the extent of gaps in ex situ conserved germplasm, a collective global action is also needed for its long-term germplasm conservation by collecting extant germplasm across the complete natural geographical range of species distribution before being threatened and/or lost from the impacts of habitat destruction and climate change (Khoury et al. 2020).

3.9

Future Research

This literature review shows that pale flax is an important flax wild relative, but a little exploration has been made, particularly in the utilization and conservation. Actions are urgently needed to assess the species vulnerability across the distribution range of the species, to develop in situ conservation strategies and to collect extant pale flax germplasm for long-term ex situ conservation before being threatened and/or lost in natural populations. To enhance pale flax germplasm utilization, several lines of research would be beneficial: (1) biochemical studies on fiber, alpha-linolenic acid/omega 3-fatty acid, and lignans; (2) disease resistance investigations for the discovery and introgression of novel resistant genes to flax breeding; and (3) genomic

3

Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative

Table 3.2 List of 114 pale flax accessions with GPS coordinates (latitude, longitude, and altitude) conserved in worldwide genebanks and collected for scientific publications

47

Accession

Origin

Lat.

Long.

Alt. (m)

Source

300,951

ALB

39.960

301,082

ITA

42.140

19.960

269

Italy

11.930

479

Italy

301,166

FRA

41.740

9.400

17

Italy

IPK93457

GRC

35.253

23.688

200

IPK

IPK89604

PRT

37.716

− 8.212

NA

IPK

IPK88875

ITA

37.863

13.366

NA

IPK

IPK95413

ITA

38.082

12.807

NA

IPK

IPK69102

ITA

38.316

16.033

50

IPK

IPK74537

PRT

38.767

− 9.383

NA

IPK

IPK68749

PRT

38.899

− 9.189

NA

IPK

IPK93724

PRT

38.901

− 9.184

NA

IPK

IPK97852

ESP

39.000

− 9.000

NA

IPK

IPK79786

ITA

39.077

16.560

1,250

IPK

IPK97853

ESP

39.183

− 8.950

NA

IPK

IPK93725

PRT

39.196

− 8.937

NA

IPK

IPK79783

ITA

39.452

17.039

NA

IPK

IPK69258

PRT

40.116

− 8.500

NA

IPK

IPK89602

PRT

40.166

− 8.317

NA

IPK

IPK69526

ITA

40.550

14.266

NA

IPK

IPK74735

PRT

41.220

− 7.606

NA

IPK

IPK35823

ITA

41.758

15.999

700

IPK

IPK35828

ESP

41.833

2.816

NA

IPK

IPK84340

GEO

42.081

42.524

NA

IPK

IPK87732

GEO

43.154

40.339

NA

IPK

IPK35826

ITA

43.373

11.313

NA

IPK

IPK60433

ITA

43.448

11.171

NA

IPK

IPK71449

ITA

43.716

10.383

5

IPK

IPK93744

ITA

43.737

10.424

5

IPK

IPK35858

ITA

44.200

8.300

300

IPK

IPK60425

ITA

44.216

9.519

NA

IPK

IPK35857

ITA

44.333

8.550

35

IPK

IPK87725

ITA

46.066

11.133

NA

IPK

IPK35841

GBR

50.392

− 4.883

NA

IPK

27,164

ISR

32.438

34.894

12

ISR

24,845

ISR

32.159

35.107

226

ISR

24,839

ISR

31.998

35.105

351

ISR

22,683

ISR

33.216

35.633

92

ISR

22,559

ISR

32.802

35.035

2

ISR

22,469

ISR

32.703

34.942

1

ISR (continued)

48 Table 3.2 (continued)

Y.-B. Fu Accession

Origin

Lat.

22,292

ISR

31.801

Long. 35.240

Alt. (m)

Source

780

ISR

21,339

ISR

32.504

34.958

18

ISR

104,374

ISR

33.088

35.617

100

ISR

104,373

ISR

33.240

35.751

1

ISR

104,372

ISR

33.240

35.751

1

ISR

19,827

ISR

32.500

34.949

20

ISR

19,826

ISR

32.650

34.926

4

ISR

39,741

GRC

36.250

28.167

3

Kew

806,822

GBR

50.595

− 1.958

84

Kew

70,546

GBR

51.075

− 4.175

3

Kew

76,548

GBR

51.083

− 4.183

2

Kew

1,063,336

GBR

51.680

− 4.246

8

Kew

8176

MNE

42.188

18.971

700

Kew

555,360

GEO

42.798

41.492

492

Kew

CN113623

TUR

36.920

31.020

62

PGRC

CN113622

TUR

36.920

31.000

4

PGRC

CN113621

TUR

37.030

27.370

25

PGRC

CN113620

TUR

37.300

28.020

165

PGRC

CN113619

TUR

37.350

27.720

50

PGRC

CN113618

TUR

37.570

27.470

40

PGRC

CN113610

TUR

37.820

29.650

419

PGRC

CN113617

TUR

38.080

27.150

138

PGRC

CN113616

TUR

38.100

27.150

180

PGRC

CN113638

TUR

40.000

26.320

10

PGRC

CN113637

TUR

40.200

28.430

12

PGRC

CN113641

TUR

40.300

27.000

23

PGRC

CN113636

TUR

40.430

29.930

190

PGRC

CN113635

TUR

40.720

31.450

812

PGRC

CN113642

TUR

40.750

39.550

713

PGRC

CN113639

TUR

40.820

26.630

210

PGRC

CN113634

TUR

40.920

32.020

447

PGRC

CN113640

TUR

41.150

28.770

149

PGRC

CN113632

TUR

41.180

31.930

297

PGRC

CN113630

TUR

41.180

33.730

309

PGRC

CN113601

TUR

41.200

36.120

460

PGRC

CN113629

TUR

41.220

33.230

728

PGRC

CN113628

TUR

41.230

32.180

635

PGRC

CN113605

TUR

41.350

36.180

190

PGRC

CN113604

TUR

41.350

36.170

205

PGRC (continued)

3

Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative

Table 3.2 (continued)

Accession

Origin

Lat.

CN113603

TUR

41.350

CN113602

TUR

CN113607

TUR

CN113606 CN113631

49 Long.

Alt. (m)

Source

36.170

235

PGRC

41.350

35.880

540

PGRC

41.370

36.220

12

PGRC

TUR

41.380

36.180

5

PGRC

TUR

41.420

31.800

226

PGRC

CN113626

TUR

41.580

35.330

46

PGRC

CN113608

TUR

41.630

35.450

17

PGRC

CN113633

TUR

41.830

31.820

228

PGRC

CN113627

TUR

41.830

35.120

8

PGRC

TMP24769

GRC

38.983

21.150

1

PGRC

TMP24770

GRC

39.050

21.867

946

PGRC

TMP24771

GRC

39.783

21.633

621

PGRC

TMP24768

GRC

39.900

20.367

281

PGRC

TMP24772

GRC

39.967

21.500

835

PGRC

TMP24773

GRC

40.350

23.933

2

PGRC

TMP24767

ALB

41.167

19.467

2

PGRC

TMP24774

BGR

41.583

23.717

591

PGRC

TMP24766

MNE

42.233

18.883

10

PGRC

TMP24765

HRV

43.017

17.450

1

PGRC

TMP24764

HRV

43.900

16.450

358

PGRC

TMP24775

HRV

45.130

18.233

149

PGRC

TMP24776

HRV

45.367

16.267

268

PGRC

W082

GRC

40.583

23.783

NA

Gutaker et al. (2019)

W066

HRV

43.467

16.833

NA

Gutaker et al. (2019)

W096

HRV

45.417

15.367

NA

Gutaker et al. (2019)

2,011,128 HSBU

Iran

37.080

49.650

113

Habibollahi et al. (2016)

POP 6

ESP

37.940

− 5.710

529

Landoni et al. (2020)

POP CGA1

ITA

38.220

13.320

53

Landoni et al. (2020)

POP 11

ESP

38.330

− 3.580

710

Landoni et al. (2020)

POP L01

FRA

43.260

6.240

200

Landoni et al. (2020)

POP LLA

ESP

43.410

− 4.690

26

Landoni et al. (2020)

POP VIL

FRA

45.090

− 1.050

21

Landoni et al. (2020)

POP IOW2

GBR

50.680

− 1.070

9

Landoni et al. (2020)

POP TYM

GBR

53.300

− 3.550

5

Landoni et al. (2020)

POP SUT

GBR

53.350

− 0.960

15

Landoni et al. (2020)

Sinop

TUR

41.830

35.120

8

Uysal and Kurt (2014)

Note Accession label or number is the same as shown in genebank or publication; country of origin follows ISO 3166-1 alpha-3 country codes; altitude in meter has some NAs (not available); and genebank information in source follows those in Table 3.1 and four scientific publications are shown in the reference section Lat. latitude; Long. longitude; Alt. altitude

50

explorations of pale flax genes of importance to agriculture and evolutionary inference. These collective efforts would conserve the important primary gene pool of cultivated flax and allow for an effective utilization of pale flax germplasm to enhance flax improvement and production. Acknowledgements The author would like to thank Ms. Carolee Horbach for her assistance in the literature search; Mr. Gregory W. Peterson for his assistance in generating Fig. 3.2 and literature search; Mr. Dallas Kessler, PGRC, Saskatoon, Canada, Dr. Andreas Börner at IPK, Germany, and Dr. Christopher Cockel, Kew Gardens, UK, for their assistance in the acquisition of pale flax collection information; Drs. Hüseyin Uysal and Orhan Kurt, Turkey, for providing the pale flax picture and comments on the early version of the manuscript; Drs. Helen Booker and Lester Young for providing the information on their pale flax crossing effort; and Drs. Axel Diederichsen, Robin Allaby, Rocío Pérez-Barrales and Rafal Gutaker for their helpful reading of the early version of the manuscript; and Drs. Frank You and Bourlaye Fofana for their editing. This work was supported by an A-Base research project of Agriculture and Agri-Food Canada to YBF.

References Abbo S, Zezak I, Lev-Yadun S, Shamir O, Friedman T, Gopher A (2015) Harvesting wild flax in the Galilee, Israel and extracting fibers—bearing on Near Eastern plant domestication. Isr J Plant Sci 62:52–64 Allaby RG, Peterson GW, Merriwether DA, Fu YB (2005) Evidence of the domestication history of flax (Linum usitatissimum L.) from genetic diversity of the sad2 locus. Theor Appl Genet 112:58–65 Bergfjord C, Karg S, Rast-Eicher A, Nosch M-L, Mannering U, Allaby RG, Murphy BM, Holst B (2010) Comment on “30,000-year-old wild flax fibers.” Science 328:1634 Bosi G, Rinaldi R, Mazzanti MB (2011) Flax and weld: archaeobotanical records from Mutina (Emilia Romagna, Northern Italy), dated to the Imperial Age, first half 1st century AD. Veg Hist Archaeobot 20:543 Brozynska M, Furtado A, Henry RJ (2016) Genomics of crop wild relatives: expanding the gene pool for crop improvement. Plant Biotechnol J 14:1070–1085 Cullis C (2011) Linum. In: Kole C (ed) Wild crop relatives: genomic and breeding resources oilseeds. Springer, New York, pp 177–189 de Candolle A (1886) Origin of cultivated plants. D. Appleton and company, New York, USA Deyholos MK (2006) Bast fiber of flax (Linum usitatissimum L.): biological foundations of its ancient and modern uses. Isr J Plant Sci 54:273–280

Y.-B. Fu Diederichsen A (2007) Ex situ collections of cultivated flax (Linum usitatissimum L.) and other species of the genus Linum L. Genet Resour Crop Evol 54:661–678 Diederichsen A (2019) A taxonomic view on genetic resources in the genus Linum L. for flax breeding. In: Cullis C (ed) Genetics and genomics of Linum. Springer, Cham, pp 1–15 Diederichsen A, Hammer K (1995) Variation of cultivated flax (Linum usitatissimum L. subp. usitatissimum) and its wild progenitor pale flax (subsp. angustifolium (Huds.) Thell.). Genet Resour Crop Evol 42:263–272 Diederichsen A, Richards KW (2003) Cultivated flax and the genus Linum L.—taxonomy and germplasm conservation. In: Muir A, Westcott N (eds) Flax, the genus Linum. Taylor & Francis, London, UK, pp 22–54 Dillman AC (1936) Improvement in flax. United States of America department of agriculture, yearbook of agriculture 1936. United States of America Department of Agriculture, Washington, DC, pp 745–784 Duk M, Kanapin A, Rozhmina T, Bankin M, Surkova S, Samsonova A, Samsonova M (2021) The genetic landscape of fiber flax. Front Plant Sci 12:764612 Durrant A (1976) Flax and linseed. In: Simmonds NW (ed) Evolution of crop plants. Longman, London, pp 190–193 Elladi VN (1940) Linum usitatissimum (L.) Vav. Consp. Nov. – Len. (Russ.). In: Vul’f EV, Vavilov NI (eds) Kul’turnaja flora SSSR, fibre plants, Sel’chozgiz, Moskva, Leningrad, vol 5, Part 1, pp 109–207 Esau K (1943) Vascular differentiation in the vegetative shoot of Linum. III. The origin of bast fibers. Am J Bot 30:579–586 Flora of North America Editorial Committee, (ed) (2016) Flora of North America, North of Mexico, vol 12. Oxford University Press, New York Fu YB (2011) Genetic evidence for early flax domestication with capsular dehiscence. Genet Resour Crop Evol 58:1119–1128 Fu YB (2012) Population-based resequencing revealed an ancestral winter group of cultivated flax: implication for flax domestication processes. Ecol Evol 2:622–635 Fu YB (2019) A molecular view of flax gene pool. In: Cullis C (ed) Genetics and genomics of Linum. Springer, Cham, pp 17–37 Fu YB, Allaby RG (2010) Phylogenetic network of Linum species as revealed by non-coding chloroplast DNA sequences. Genet Resour Crop Evol 57:667–677 Fu YB, Diederichsen A, Allaby RG (2012) Locus-specific view of flax domestication history. Ecol Evol 2:139– 152 Fu YB, Diederichsen A, Richards KW, Peterson G (2002a) Genetic diversity within a range of cultivars and landraces of flax (Linum usitatissimum L.) as revealed by RAPDs. Genet Resour Crop Evol 49:167– 174 Fu YB, Dong Y, Yang M-H (2016) Multiplexed shotgun sequencing reveals congruent three-genome phylogenetic signals for four botanical sections of the flax genus Linum. Mol Phylogenet Evol 101:122–132

3

Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative

Fu YB, Peterson GW (2010) Characterization of expressed sequence tag-derived simple sequence repeat markers for 17 Linum species. Botany 88:537–543 Fu YB, Peterson G, Diederichsen A, Richards KW (2002b) RAPD analysis of genetic relationships of seven flax species in genus Linum L. Genet Resour Crop Evol 49:253–259 Fu YB, Peterson GW (2012) Developing genomic resources in two Linum species via 454 pyrosequencing and genomic reduction. Mol Ecol Resour 12:492– 500 Fuller DQ (2007) Contrasting patterns in crop domestication and domestication rates: recent archaeological insights from the Old World. Ann Bot 100:903–924 Fuller DQ, Bogaard A, Charles M, Filipović D (2014) Botanical archive report 2013 in Çatalhöyük 2014 Archive Report. http://www.catalhoyuk.com/archive_ reports/2014 Galinousky D, Mokshina N, Padvitski T, Ageeva M, Bogdan V, Kilchevsky A, Gorshkova T (2020) The toolbox for fiber flax breeding: a pipeline from gene expression to fiber quality. Front Genet 11:589881 Gill KS (1966) Evolutionary relationship among Linum species. Dissertation, University of California Gill KS (1987) Linseed. Indian Council of Agricultural Research, New Dehli Gill KS, Yermanos DM (1967) Cytogenetic studies on the genus Linum I. Hybrids among taxa with 15 as the haploid chromosome number. Crop Sci 7:623–627 Godfray HCJ, Beddington JR, Crute IR, Haddad L, Lawrence D, Muir JF, Pretty J, Robinson S, Thomas SM, Toulmin C (2010) Food security: the challenge of feeding 9 billion people. Science 327:812–818 Goudenhooft C, Bourmaud A, Baley C (2019) Flax (Linum usitatissimum L.) fibers for composite reinforcement: exploring the link between plant growth, cell walls development, and fiber properties. Front Plant Sci 10:411 Green AG, Chen Y, Singh SP, Dribnenki (2008) Flax. In: Chittaranjan K, Hall TC (eds) Compendium of transgenic crop plants: transgenic oilseed crops. Blackwell Publishing Ltd, Chichester, pp 200–226 Gutaker R (2014) The genetic variation of cultivated flax (Linum usitatissimum L.) and the role of its wild ancestor (Linum bienne Mill.) in its evolution. Dissertation, University of Warwick Gutaker RM, Zaidem M, Fu YB, Diederichsen A, Smith O, Ware R, Allaby RG (2019) Flax latitudinal adaptation at LuTFL1 altered architecture and promoted fiber production. Sci Rep 9:976 Habibollahi H, Noormohammadi Z, Sheidai M, Farahani F (2016) SSR and EST-SSR-based population genetic structure of Linum L. (Linaceae) species in Iran. Genet Resour Crop Evol 63:1127–1138 Hammer (2001) Linaceae. In: Hanelt P, Institute of Plant Genetics and Crop Plant Research (eds) Mansfeld’s encyclopedia of agricultural and horticultural crops (except ornamentals), vol 2. Springer, Berlin Heidelberg, pp 1106–1108

51

Hall LM, Booker H, Siloto RMP, Jhala AJ, Weselake RJ (2016) Flax (Linum usitatissimum L.). In: McKeon TA, Hayes DG, Hildebrand DF, Weselake RJ (eds) Industrial Oil Crops, 1st edn. Academic Press, Amsterdam, pp 157–194 Heer O (1872) Über den Flachs und die Flachskultur im Altertum. Neujahrsblatt Der Naturforschenden Gesellschaft Zürich 74:1–26 Helbaek H (1959) Domestication of food plants in the Old World. Science 130:365–372 Henry AW (1928) Reaction of Linum species of various chromosome numbers to rust and powdery mildew. Sci Agr 8:460–461 (Abstr) Henry RJ, Nevo E (2014) Exploring natural selection to guide breeding for agriculture. Plant Biotechnol 12:655–662 Hillman G (1975) The plant remains from Tell Abu Hureyra: a preliminary report. Proc Prehist Soc 41:70–73 Huxley A (1992) New royal horticultural society dictionary of gardening, vol. 3, p 93. Macmillan, UK Islam MR, Mayo GME (1990) A compendium on host genes in flax conferring resistance to flax rust. Plant Breed 104:89–100 Jhala AJ, Hall LM, Hall JC (2008) Potential hybridization of flax with weedy and wild relatives: an avenue for movement of engineered genes? Crop Sci 48:825–840 Kajla P, Sharma A, Sood DR (2015) Flaxseed—a potential functional food source. J Food Sci Technol 52:1857–1871 Karg S, Diederichsen A, Jeppson S (2018) Discussing flax domestication in Europe using biometric measurements on recent and archaeological flax seeds – a pilot study. In: Małgorzata Siennicka S, Rahmstorf L, Ulanowska A (eds) First textiles: the beginnings of textile manufacture in Europe and the mediterranean. Oxbow Books, Barnsley, pp 31–38 Kaur V, Yadav R, Wankhede DP (2017) Linseed (Linum usitatissimum L.) genetic resources for climate change intervention and its future breeding. J Appl Nat Sci 9:1112–1118 Khoury CK, Greene S, Wiersema J, Maxted N, Jarvis A, Struik PC (2013) An inventory of crop wild relatives of the United States. Crop Sci 53:1496–1508 Khoury CK, Carver D, Greene SL, Williams KA, Achicanoy HA, Schori M, León B, Wiersema JH, Frances A (2020) Crop wild relatives of the United States require urgent conservation action. Proc Natl Acad Sci 117:33351–33357 Kraus F, Daniel W, Wong L J, Pagad S (2020) Linum bienne Mill. In: Kraus F, Daniel W, Wong L J, Pagad S (2020). Global register of introduced and invasive species—United States of America (Contiguous). Version 1.4. Invasive Species Specialist Group ISSG. Checklist dataset https://doi.org/10. 15468/ehzr9f. Accessed via GBIF.org on 2021–03–01 Kvavadze E, Bar-Yosef O, Belfer-Cohen A, Boaretto E, Jakeli N, Matskevich Z, Meshveliani T (2009) 30,000year-old wild flax fibers. Science 325:1359 Landoni B, Viruel J, Gomez R, Allaby RG, Brennan AC, Pico FX, Perez-Barrales R (2020) Microsatellite

52 marker development in the crop wild relative Linum bienne using genome skimming. Appl Plant Sci 8: e11349 Landoni B, Suárez-Montes P, Habeahan RHF, Brennan AC, Pérez-Barrales R (2022) Local climate and vernalization requirements explain the latitudinal patterns of flowering initiation in the crop wild relative Linum bienne. BioRxiv Preprint. https://doi.org/10. 1101/2022.01.02.474722(January2,2022) Lay CL, Dybing CD (1989) Linseed. In: Robbelen G, Downey K, Ashri A (eds) Oil Crops of the World. McGraw-Hill, New York, pp 416–430 Maguilla E, Escudero M, Ruíz-Martín J, Arroyo J (2021) Origin and diversification of flax and their relationship with heterostyly across the range. J Biogeogr 48:1994–2007 Madill HD, Smith WE, Henry AW (1964) Inheritance of rust immunity in Linum angustifolium (Huds.). Can J Genet Cytol 6:467–471 Melelli A, Shah DU, Hapsari G, Cortopassi R, Durand S, Arnould O, Placet V, Benazeth D, Beaugrand J, Jamme F, Bourmaud A (2021) Lessons on textile history and fibre durability from a 4,000-year-old Egyptian flax yarn. Nat Plants 7:1200–1206 McDill J, Simpson BB (2011) Molecular phylogenetics of Linaceae with complete generic sampling and data from two plastid genes. Bot J Linn Soc 165:64–83 Melnikova NV, Kudryavtseva AV, Zelenin AV, Lakunina VA, Yurkevich OY, Speranskaya AS, Dmitriev AA, Krinitsina AA, Belenikin MS, Uroshlev LA, Snezhkina AV, Sadritdinova AF, Koroban NV, Amosova AV, Samatadze TE, Guzenko EV, Lemesh VA, Savilova AM, Rachinskaia OA, Kishlyan NV, Rozhmina TA, Bolsheva NL, Muravenko OV (2014) Retrotransposon-based molecular markers for analysis of genetic diversity within the genus Linum. Biomed Res Int 14, Article ID 231589 Mhiret WN (2019) Association and variation on boll and seed morphology among hybrids between linseed (Linum usitatissimum L.) and Linum bienne Mill. and their parents. Afr J Plant Sci 13:138–152 Mhiret WN, Heslop-Harrison JS (2018) Biodiversity in Ethiopian linseed (Linum usitatissimum L.): molecular characterization of landraces and some wild species. Genet Resour Crop Evol 65:1603–1614 Misra DP (1966) Genes conditioning resistance of Linum species to Indian races of linseed rust. Indian J Genet Pl Breed 26:63–72 Mohagheghzadeh A, Dehshahri S, Hemmati S (2009) Accumulation of lignans by in vitro cultures of three Linum species. Z Naturforsch C 64:73–76 Muir A, Westcott N (eds) (2003) Flax: the genus Linum. Taylor and Francis, London Muravenko OV, Bolsheva NL, Yurkevich OY, Nosova IV, Rachinskaya OA, Samatadze TE, Zelenin AV (2010) Karyogenomics of species of the genus Linum L. Russ J Genet 46:1339–1342 Muravenko OV, Yurkevich OY, Bolsheva NL, Samatadze TE, Nosova IV, Zelenina DA, Volkov AA,

Y.-B. Fu Popov KV, Zelenin AV (2009) Comparison of genomes of eight species of sections Linum and Adenolinum from the genus Linum based on chromosome banding, molecular markers and RAPD analysis. Genetica 135:245–255 Noormohammadi Z, Sakhaee M, Sheidai M, Talebi SM (2017) Assessment of genetic variation in Linum L. using SPAR markers. Biologija 63:49–57 Parikh M, Maddaford TG, Austria JA, Aliani M, Netticadan T, Pierce GN (2019) Dietary flaxseed as a strategy for improving human health. Nutrients 11:1171 Pavelek M, Tejklová E, Bjelková M (2015) Flax and linseed. In: Cruz VMV, Dierig DA (eds) Industrial Crops. Springer, New York, pp 233–263 Planchon JE (1848) Sur la Famille des LINEES. London J Bot 7:165–168 Plessers AG (1966) The variation in fatty acid composition of the seed of Linum species. Can J Genet Cytol 8:328–335 Pustahija F, Brown SC, Bogunić F, Bašić N, Muratović E, Ollier S, Hidalgo O, Bourge M, Stevanović V, SiljakYakovlev S (2013) Small genomes dominate in plants growing on serpentine soils in West Balkans, an exhaustive study of 8 habitats covering 308 taxa. Plant Soil 373:427–453 Qi Y, Wang L, Li W, Xie Y, Zhao W, Dang Z, Li W, Zhao L, Zhang J (2022) Phenotypic analysis of Longya-10  pale flax hybrid progeny and identification of candidate genes regulating prostrate/erect growth in flax plants. Front Plant Sci 13:1044415 Ranjzad M, Khayyami M, Asadi A (2009) Measuring and investigation of Omega 3 and 6 fatty acids in species of Linum ssp. J Med Plants 8:25–194 Rashid KY (2003) Principal diseases of flax. In: Muir AD, Westcott ND (eds) Flax: the genus Linum. Taylor and Francis Ltd., London, pp 92–123 Ray C (1944) Cytological studies oni the flax genus (Linum). Amer J Bot 31:241–324 Robinson BB (1937) Natural cross-pollination studies in fibre flax. J Am Soc Agron 29:644–649 Rogers CM (1972) The taxonomic significance of the fatty acid content of seeds of Linum. Brittonia 24:415– 419 Ruiz-Martín J, Santos-Gally R, Escudero M, Midgley JJ, Pérez-Barrales R, Arroyo J (2018) Style polymorphism in Linum (Linaceae): a case of Mediterranean parallel evolution? Plant Biol 20:100–111 Schmidt TJ, Hemmati S, Fuss E, Alfermann AW (2006) A combined HPLC-UV and HPLC-MS method for the identification of lignans and its application to the lignans of Linum usitatissimum L. and L. bienne Mill. Phytochem Anal 17:299–311 Schmidt TJ, Hemmati S, Klaes M, Konuklugil B, Mohagheghzadeh A, Ionkova I, Fuss E, Alfermann AW (2010) Lignans in flowering aerial parts of Linum species–chemodiversity in the light of systematics and phylogeny. Phytochemistry 71:1714–1728 Seetharam A (1972) Interspecific hybridization in Linum. Euphytica 21:489–495

3

Pale Flax (Linum Bienne): an Underexplored Flax Wild Relative

Sharifnia F, Assadi M (2003) Seed protein analysis in relation to taxonomy of the Iranian Linum species. Iran J Bot 10:49–54 Sinskaja EN (1969) Istoricˇeskaja geografija kul’turnoj flory. [Historical geography of cultivated plants]. Kolos, Leningrad, p 480 Smith WE, Hobin B (1973) Rust-conditioning genes in wild flax (Linum angustifolium Huds.). Genetics 74:259 (Abstr.) Smýkal P, Bačová-Kerteszová N, Kalendar R, Corander J, Schulman AH, Pavelek M (2011) Genetic diversity of cultivated flax (Linum usitatissimum L.) germplasm assessed by retrotransposon-based markers. Theor Appl Genet 122:1385–1397 Soto-Cerda BJ, Diederichsen A, Duguid S, Booker H, Rowland G, Cloutier S (2014) The potential of pale flax as a source of useful genetic variation for cultivated flax revealed through molecular diversity and association analyses. Mol Breed 34:2091–2107 Soto-Cerda BJ, Saavedra HU, Navarro CN, Ortega PM (2011) Characterization of novel genic SSR markers in Linum usitatissimum (L.) and their transferability across eleven Linum species. Electron J Biotechnol 14:4 Sveinsson S, McDill J, Wong GK, Li J, Li X, Deyholos MK, Cronk QC (2014) Phylogenetic pinpointing of a paleopolyploidy event within the flax genus (Linum) using transcriptomics. Ann Bot 113:753–761 Tammes T (1928) The genetics of the genus Linum. Bibliogr Genet 4:1–36 Tork DG, Anderson NO, Wyse DL, Betts KJ (2019) Domestication of perennial flax using an ideotype approach for oilseed, cut flower, and garden performance. Agronomy 9:707 Uysal H, Kurt O (2014) Determination of fatty acid composition of pale flax (Linum bienne Mill.) populations originated from Turkey. Anadolu Tarim Bilim Derg 29:121 Uysal H, Fu YB, Kurt O, Peterson GW, Diederichsen A, Kusters P (2010) Genetic diversity of cultivated flax (Linum usitatissimum L.) and its wild progenitor pale flax (Linum bienne Mill.) as revealed by ISSR markers. Genet Resour Crop Evol 57:1109–1119 Uysal H, Kurt O, Fu YB, Diederichsen A, Kusters P (2012) Variation in phenotypic characters of pale flax (Linum bienne Mill.) from Turkey. Genet Resour Crop Evol 59:19–23 van Kleunen M, Xu X, Yang Q, Maurel N, Zhang Z, Dawson W, Essl F, Kreft H, Pergl J, Pyšek P, Weigelt P, Moser D, Lenzner B, Fristoe TS (2020) Economic use of plants is key to their naturalization success. Nat Commun 11:3201 van Zeist W, Bakker-Heeres JAH (1975) Evidence for linseed cultivation before 6000 BC. J Archaeol Sci 2:215–219 Vavilov NI (1926) Studies on the origin of cultivated plants. Bull Appl Bot 16:139–248 Velasco L, Goffman FD (2000) Tocopherol, plastochromanol and fatty acid patterns in the genus Linum. Plant Syst Evol 221:77–88

53

Vromans J, van de Bilt E, Stam P, van Eck HJ (2006) The molecular genetic variation in the genus Linum. In: Molecular genetic studies in flax (Linum usitatissimum L.). Dissertation, Wageningen University, pp 41–59 Weiss E, Zohary D (2011) The Neolithic Southwest Asian founder crops: their biology and archaeobotany. Curr Anthropol 52:S237–S254 Wang Z, Hobson N, Galindo L, Zhu S, Shi D, McDill J, Yang L, Hawkins S, Neutelings G, Datla R, Lambert G, Galbraith DW, Grassa CJ, Geraldes A, Cronk QC, Cullis C, Dash PK, Kumar PA, Cloutier S, Sharpe AG, Wong G-KS, Wang J, Deyholos MK (2012) The genome of flax (Linum usitatissiumum) assembled de novo from short shotgun sequence reads. Plant J 72:461–473 Warnes J (1846) On the cultivation of flax: the fattening of cattle with native produce; box-feeding; and summer-grazing. W. Clowes and Son, Stamford Street, London Whitman EA (1888) Flax culture: an outline of the history and present condition of the flax industry in the United States and a consideration of the influence exerted on it by legistration. Rand Avery Company, Boston, p 102 Wicks ZW, Hammond JJ (1978) Screening of flax species for new sources of genes resistant to Melampsora lini (Ehrenb.) Lev. Crop Sci 18:7–10 Winkler H (1931) Linaceae, Trib. I. 3. LinoideaeEulineae. In: Engler A (ed) The natural plant families with their genera and most important species, in particular of used plants, 2nd edn. W. Engelmann, Leipzig, pp 111–120 Yermanos DM (1966) Variability in seed oil composition of 43 Linum species. J Am Oil Chem Soc 43:546–549 Yermanos DM, Beard BH, Gill KS, Anderson MP (1966) Fatty acid composition of seed oil of wild species of Linum. Agron J 58:30–32 Yermanos DM, Patil SH, Hemstreet S (1969) Temperature effects on the fatty acid composition of the seed oil of wild species of flax. Agron J 61:819–820 You FM, Xiao J, Li P, Yao Z, Jia G, He L, Zhu T, Luo MC, Wang X, Deyholos MK, Cloutier S (2018) Chromosome-scale pseudomolecules refined by optical, physical and genetic maps in flax. Plant J 95:371– 384 Yurenkova SI, Kubrak SV, Titok VV, Khotyljova LV (2005) Flax species polymorphism for isozyme and metabolic markers. Russ J Genet 41:256–261 Zhang J, Qi Y, Wang L, Wang L, Yan X, Dang Z, Li W, Zhao W, Pei X, Li X, Liu M (2020) Genomic comparison and population diversity analysis provide insights into the domestication and improvement of flax. iScience 23:100967 Zohary D, Hopf M, Weiss E (2012) Domestication of Plants in the Old World: the origin and spread of domesticated plants in Southwest Asia, Europe, and the Mediterranean Basin. Oxford University Press, New York

4

Flax Breeding Mukhlesur Rahman and Ahasanul Hoque

4.1

Taxonomy, Origin, Domestication, and Use of Flax

The family Linaceae comprises 22 genera of which Linum is well known for having large interspecific diversity. The genus Linum consists of about 200 species (Egorova 1996; McDill et al. 2009) and is divided into six subsections such as Linum, Dasylinum, Linastrum, Cathartolinum, Syllinum, and Cliococca (Ockendon and Walters 1968; Winkler 1931). The section Linum contains ornamental species such as Linum perenne L. and L. grandiflorum Desf., as well as agronomically important species L. usitatissimum L. and its wild relative L. angustifolium Huds. Two species such as L. angustifolium Huds. (Gill and Yermanos 1967; Seetharam 1972) and L. bienne Mill. (Lay and Dybing 1989) (also known as pale flax) are considered as the ancestor of flax because these two species exhibit the same chromosome number and crosscompatibility to cultivated flax. A recent molecular study suggests that L. bienne is the sister species to L. usitatissimum (Vromans 2006), but some researchers considered L. angustifolium and L. bienne as the same species (Tutin et al.

M. Rahman (&)  A. Hoque North Dakota State University, Fargo, USA e-mail: [email protected]

1968; Zohary and Hopf 2000). Another study including genome comparison with molecular markers confirms that L. bienne is a subspecies of L. usitatissimum and L. angustifolium (Muravenko et al. 2003). Dillman and Goar (1937) and Hall et al. (2016) reported that L. angustifolium and L. bienne are the wild progenitor of cultivated flax (L. usitatissimum) (Dillman and Goar 1937; Hall et al. 2016). Flax is thought to have originated in the Mediterranean region of Southern Europe, the Near East, or Central Asia (Tammes 1925; Helbaek 1959; Zeven 1982). Flax domestication history has not been clearly delineated (Duk et al. 2021; Harlan 1965). However, it is reported that the oilseed flax was domesticated during the Neolithic period in the Fertile Crescent of the Middle East and Mediterranean basin. Vavilov (1926) reported that cultivated flax was bred from the wild parent plant L. bienne in the Indian subcontinent, Abyssinia, and the Mediterranean region, and probably, the geographic isolation created the cultivated form of L. usitatissimum (Duk et al. 2021). The cultivated flax, L. usitatissimum (Linn.), is one of the first crops domesticated by humans as early as 8000 years ago for food, feed, oil, fiber, industrial, and medicinal value (Allaby et al. 2005; Fu et al. 2012; van Zeist and BakkerHeeres 1975; Zohary 1999). The phylogenetic evidence indicates that the oilseed flax was first domesticated for oil purpose (Allaby et al. 2005). According to Balter (2009), humans used cloth from animal skin about 70,000 years ago. Later,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_4

55

56

M. Rahman and A. Hoque

people learned to weave plant-based fiber. Kvavadze et al. (2009) reported that over 1000 fibers flax plants were discovered in Dzudzuana cave of the Caucasus Mountain, Georgia, which were used for making cords, weaving baskets, or sewing garments. Flax was grown by the Egyptians between 4500 and 4000 BC, in Switzerland at about 3000 BC, and little later in England (Smith 1969). In 1875, the European settlers grew pale flax (L. bienne) on the Canadian prairies. Both cultivated (L. usitatissimum) and pale flax are homostylous, share the chromosome number (n = 15), and can easily hybridize with each other (Gill 1987; Tammes 1925; Hall et al. 2016). Cultivated flax (L. usitatissimum L.) is a diploid (2n = 2x = 30) self-pollinating, annual crop. The genome size is about 370 Mb (Ragupathy et al. 2011). Flax plants have obtained commercial importance because of utilization of the whole plant for multiple uses such as stem for high quality fiber, seed for high quantity omega3 fatty acid, digestible proteins, lignans (Omaha 2001). Omega eggs are obtained from flaxseedfed hens in North America. Due to high dietary fiber, high omega-3, and anti-carcinogenic lignans, human consumption of flaxseed is increasing rapidly. Flax is classified as flaxseed or linseed when used for oilseed and as fiber flax or flax (in Europe) when used for fiber (VaiseyGenser and Diane 2003). These two types differ considerably based on their morphological characteristics and growth habits. The architecture of oilseed flax plants is shorter with more branches

4.2

Flax Production in the World

Oilseed flax type is grown over a wider area in the world, and the fiber flax type is grown in the cool-temperate regions of Western Europe (Green et al. 2008). Both oilseed and fiber flax production declined worldwide from 1961 to 2020. The oilseed flax production declined by 19% between 1961–1970 and 2011–2020 (3.39 million tons to 2.75 million tons) (Fig. 4.1) (FAOSTAT, 2022). Similarly, fiber flax production has slightly (6%) declined (745,361 tons to 700,087 tons) in the above same period (Fig. 4.1) (FAOSTAT 2022). Europe is leading in production of both oilseed and fiber flax. The top ten oilseed flax

World Flax (Oilseed and Fibre) Production 4500000 Flax Oilseed production (tonnes)

4000000 3500000 3000000

Tonnes

Fig. 4.1 Oilseed and fiber flax production in the last 60 years in the world. The production decline is calculated based on the 10year average of 1961–1970 to the 10-year average of 2011– 2020. Source FAOSTAT (2022)

and produces more seeds, whereas the fiber flax type is taller with fewer branches and less seeds (Gill 1987; Diederichsen and Ulrich 2009). The best fiber is harvested from the fiber flax plants during the flowering and seed set stages (before ripening stage), whereas oilseed flax is harvested after complete ripening of the crop. Therefore, utilization of flax for both oilseed and fiber purposes is not a compatible option in the same crop. Oilseed flax is used commercially for highquality drying oil in paints, varnishes, ink and coatings, putty, linoleum, and other industrial applications (Juita et al. 2012). It has a high capacity to be “loaded” with color for color printing.

2500000 2000000 1500000 1000000 500000 0 1960

1970

1980

1990

Year

2000

2010

2020

Flax Breeding

Fig. 4.2 Major oilseed flax producing countries in the world. Source data averaged from 2016 to 2020, FAOSTAT (2022)

57

Oilseed Flax Production (Average 2011-2020) 30.0

Production (%)

4

27.8 21.6

25.0 20.0

17.4

15.0

11.6

10.0

4…

5.0

4.6

2.8

1.6

1.4

1.3

0.8

0.3

0.3

0.0

Countries

Fibre Flax Production (Average 2016-2020)

Production (%)

Fig. 4.3 Major fiber flax producing countries in the world. Source data averaged from 2016 to 2020, FAOSTAT (2022)

80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0

75.8

8.9

4.7

4.2

1.7

1.6

1.2

Countries

producing countries in the world are Kazakhstan (27.8%), Russian Federation (21.6%), Canada (17.4%), China (11.6%), USA (4.8%), India (4.6%), Ethiopia (2.8%), United Kingdom, Great Britain, Ireland (1.4%), and Ukraine (1.3%) (Fig. 4.2). France alone produces the threefourths (75.8%) fiber flax in the world (Fig. 4.3).

fiber flax harvested area was observed between the 10-year average 1961–1970 period and the 10-year average 2011–2020 period, for a 87.5% (1.85 million hectares to 0.23 million hectares) reduction (Fig. 4.4) (FAOSTAT 2022).

4.4 4.3

Flax Harvested Area in the World

Over the last sixty years, a large-scale flax harvesting area declined all over the world. The world oilseed flax harvesting area declined from 7.27 million hectares to 2.81 million hectares (61.3% reduction) from the 10-year average 1961–1970 period to the 10-year average 2011– 2020 (Fig. 4.4). And even a higher reduction of

Flax Seed Yield Per Unit Area

Although the drastic reduction of oilseed and fiber flax harvested area occurred worldwide, a significant intensive breeding effort has been given to the crop, preventing a significant reduction of the world total production. In fact, a dramatic increase (733%) of fiber flax yield per hectare was observed between the 10-year average of 1961–1970 and the 10-year average of 2011–2020 (Fig. 4.5). Similarly, a sharp increase (112%) of flaxseed yield per hectare from the 10-

58

World Flax (Oilseed and Fibre) Harvested Area 9000000 8000000

Oilseed Flax Area (hectare)

7000000 Fibre Flax Area (hectare)

6000000

Hectare

Fig. 4.4 Oilseed and fiber flax harvested area (hectare) in the last 60 years in the world. The reduction was calculated based on the 10year average 1961–1970 period to the 10-year average 2011–2020 period. Source FAOSTAT (2022)

M. Rahman and A. Hoque

5000000 4000000 3000000 2000000 1000000 0 1960

1970

1980

1990

2000

2010

2020

Year

year average of 1961–1970 to the 10-year average of 2011–2020 was also observed in oilseed flax (Fig. 4.5) (FAOSTAT 2022). The fiber flax yield potential per unit area varied a lot in the world. It was significantly higher in Belgium, the Netherlands, France, and China (4808–5490 kg/ha) compared to other major fiber flax producing countries such as United Kingdom, Great Britain, Ireland, Chile, Belarus, Russian Federation, Argentina, and Egypt (866–1443 kg/ha) (Fig. 4.6) (FAOSTAT 2022). Variation of oilseed flax yield potential was also observed in the different countries around the world. However, this variation was not as big as observed in fiber flax. United Kingdom, Great Britain, and Ireland (1868 kg/ha) had the highest oilseed flax yield potential, followed by France (1841 kg/ha), Canada (1477 kg/ha), USA (1339

Fig. 4.5 Oilseed and fiber flax yield per hectare in the last 60 years in the world. Source FAOSTAT (2022)

4500 4000

Yield (kg/ha)

3500

kg/ha), China (1273 kg/ha), Ethiopia (1053 kg/ ha), and other countries (Fig. 4.7) (FAOSTAT 2022).

4.5

Genetic Diversity

Genetic diversity is one of the most important prerequisites for a sustainable breeding program of a particular crop. More diversity of germplasm provides a better option in selecting parents for developing need-based cultivars. Therefore, it is important to understand the extent of genetic variation among various flax germplasms by studying the genetic relationships among the accessions that can be used in breeding programs for increased genetic diversity in the breeding lines (Ayad et al. 1997). Initially, the diversity of

World Flax (Fibre and oilseed) Yield per hactare Oilseed Flax Yield (Kg/ha) Fibre Flax Yield (Kg/ha)

3000 2500 2000 1500 1000 500 0 1960

1970

1980

1990

Year

2000

2010

2020

Flax Breeding

Fig. 4.6 Fiber yield per hectare of major fiber flax producing countries in the world. Source data averaged from 2011 to 2020, FAOSTAT (2022)

59

Fibre Flax Yield (Average 2011-2020) 6000 5000 Yield (Kg/ha)

4

4000 3000 2000 1000 0

Countries

Oilseed Flax Yield Potential (Average 2011-2020) Yield (Kg/ha)

Fig. 4.7 Seed yield per hectare of major oilseed flax producing countries in the world. Source data averaged from 2011 to 2020, FAOSTAT (2022)

2000 1800 1600 1400 1200 1000 800 600 400 200 0

1868

1841 1477

1339

1273 1053

990

826

777 510

Countries

flax germplasm was assessed based on morphological parameters (Diederichsen 2001; Diederichsen and Raney 2006; Saeidi 2012). The use of biochemical markers such as isozyme markers was also encouraging (Månsby et al. 2000; Tyson et al. 1985). However, morphometric diversity often leads to false prediction as the morphological characteristics depend on plant developmental stage and environment conditions (van Beuningen and Busch 1997). Morphological characterization is also labor-intensive and time-consuming. On the other hand, isozyme marker is affected by plant developmental stage (Falkenhagen 1985; Kuhns and Fretz 1978), and it allows scoring a limited number of loci (Eckert et al. 1981; Tobolski and Kemery 1992; Bretting

and Widrlechner 1995). The limitations of morphological and biochemical markers have led to the development of DNA-based markers which are environment-independent and do not require previous pedigree information (Bohn et al. 1999). Different DNA molecular marker techniques such as random amplification of polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), inter-simple sequence repeats (ISSR), simple sequence repeat (SSR), and inter-retrotransposon amplified polymorphism (IRAP) have been used to assess the genetic diversity of flax germplasm (El Sayed et al. 2018; Everaert et al. 2001; Fu et al. 2003; Kumar et al. 2018; Kumar Yadav et al. 2018; Mhiret and Heslop-Harrison 2018; Soto-Cerda

60

M. Rahman and A. Hoque

et al. 2014). Single nucleotide polymorphism (SNP) is the most abundant marker in the plant genome. Availability of high-quality reference genome has enabled the researchers to harvest the advantages of SNP markers in analyzing the genetic diversity of flax (Hoque et al. 2020; Singh et al. 2019).

4.6

Breeding Strategy

Flax is an autogamous (primarily self-pollinating) crop. However, about 5–10% cross-fertilization can happen (Friedt 1993; Nichterlein 2003). Cultivated flax has low genetic diversity compared to wild relatives (Smykal et al. 2011). Many important agronomic traits in flax, such as plant height, branch number, boll number, days to flowering, days to maturity, and seed yield, are quantitative in nature with additive gene effects (Patel and Chopde 1981; Salej et al. 2007; Bhateria et al. 2006; Mohammadi et al. 2010). Breeding objectives of flax have been focused on stabilizing yield across diverse environments, enhancing oil production and quality, improving omega-3 fatty acids, discovering durable resistance to wilt and rust diseases (Kenaschuk and Rowland 1993; Mpofu and Rashid 2001), improving lodging resistance and other agronomic traits, and adopting crop phenology for regional climatic limitations. Selection of parents, specific breeding procedures, and genotype stabilization are critical steps in flax breeding programs. In the USA, flax is usually planted later than other crops, and therefore, cultivars adapted to late seeding date need to be developed. The breeding methods including (1) mass selection method, (2) pure-line selection method, (3) pedigree method, (4) bulk selection method, (5) single-seed descent method including multi-parent advanced generation inter-cross (MAGIC) and nested association mapping (NAM) populations development, and (6) marker-assisted breeding are used in flax breeding program.

4.6.1 Mass Selection Method It is one of the oldest breeding methods for selfpollinated crop species, which is still practiced in the agriculture of many developing countries. The farmers usually save the seeds from desirable plants for planting in the next season. The objective of the mass selection method is to increase the gene frequencies of desirable traits to improve the performance of selected plants over the base population. The general procedure in mass selection is to select the desirable plants based on phenotypic performance or rogue out the off-types and bulk harvest the plot or row. The harvested seeds are bulked and sown in the next generation. This procedure may continue multiple times. Modern flax breeders use a slightly modified mass selection method where the best plants are harvested separately to grow and compared to their progenies. This way the poor progenies are discarded and better performing progenies are harvested. The quantitative traits with low heritability can be improved efficiently using this progeny selection method.

4.6.2 Pure-Line Selection Method Flax is a self-pollinating crop and the landraces are considered as a mixture of pure lines. However, small-scale heterozygous individuals can be identified from a low frequency of natural crosspollination. Therefore, the pure lines can be developed from the natural population by selecting desired single plants followed by repeated cycles of selfing. The variation within the pure line is caused by environmental factors but not by genetic factors. In commercial flax breeding programs, the pure lines are developed via hybridization of desired parents with traits of interest followed by repeated cycles of selfing and selection from the segregating progeny (Culbertson 1954; Kenaschuk 1975).

4

Flax Breeding

4.6.3 Pedigree Method Pedigree method is a widely used breeding method of genetic improvement of selfpollinated crops. This method has two basic concepts: hybridization and pure-line selection. In this breeding method, desired plants are selected from a segregating population in each generation and information of the ancestry is recorded. This record allows the breeder to trace parent-progeny information and can go back to an interested parent/progeny to use again. As flax is a self-pollinated crop, inbreeding is used to fix the genes where the variability is generated by crossing. Therefore, the pedigree breeding method is most commonly used in many flax breeding programs for cultivar development (Friedt 1993; Bergmann and Friedt 1997). Kenaschuk (1975) proposed a pedigree method for flax that can be used in any flax breeding program. In this method, parents are selected based on breeding objectives and simple or complex crosses are made. The F1s is advanced to the F2 generation. The F2 segregating populations are planted in the field in single-row plots which provides the first opportunity to select the best plants from each row, and the F2-3 seeds are harvested separately. The pedigree information of each single plant is maintained in each generation. The progenies of F3 families are planted in single-row plots with commercial check cultivars. Progeny rows are selected based on breeder’s impression and comparison with check cultivars. The selected single rows (F3-4) are cleaned from off-types and bulk harvested. This selection cycle between sister lines and families can be repeated in the next generation. Again, the selected single rows (F4-5) are cleaned from offtypes and bulk harvested for following year small plot field trials. The selected F5 families are evaluated in unreplicated small yield plots at multiple locations. Data on yield, quality, agronomic traits, disease resistance, and breeder’s impression are taken. The off-types are cleaned from the plots and bulk harvested from the plot. The data are used to select the desirable lines for further evaluation. Superior F6 lines are further evaluated in replicated small yield plots at

61

multiple locations to simulate commercial cultivation practice as close as possible. Again, the off-types are cleaned from the plots, and the pure lines (F6-7) are bulk harvested from the respective plots. Superior F7 lines are advanced into multilocation yield trials with commercial cultivars to evaluate the performance and adaptability of the lines. The best selected breeding lines (F8) may enter into flax variety trials depending on the country's regulations. Also, the breeding lines can be used for small-scale seed increase for the development of breeder’s seed. Usually, the plant breeders evaluate the breeding line(s) at least 20 environments (4–5 years  4–5 representative locations) to release as a variety for commercial production.

4.6.4 Bulk Method In the bulk method, crosses are made between the selected parents. The F1 seeds are planted and harvested in bulk. The F2 and subsequent generations until the F6 generation are planted and harvested in bulk. At this stage, the number of F6 seeds is generally 40–50 thousand, and the seeds are space planted. The F6 plants are largely homozygous. The individual superior plants (usually 1000–5000) are harvested separately. The selected individual plant’s progenies are planted in single or multiple-row plots. Based on the breeder’s visual score and agronomic traits, the superior progenies are selected and bulk harvested. The selected progenies are further evaluated in preliminary yield trial, advanced yield trial, regional yield trial, and variety trial with standard varieties as check to select the best adapted lines for release as variety.

4.6.5 Single-Seed Descent Method Single-seed descent is a modified bulk breeding method. Here, a single seed is randomly selected from each F2 plant and bulked to generate F3 generation. Similarly, in F3 and the subsequent generations, randomly selected single seed per plant in the population is bulked to advance to

62

the next generation. This process is continued until F5 or F6 generation when the plants reach the desired level of homozygosity. The focus on this method is to increase homozygosity as soon as possible with no selection. In the F5 or F6 generation, a large number of single plants are selected and harvested separately. The selected individual plant progenies are planted in single or multiple-row plots. This time, selection is made based on desired characteristics. The selected progenies are further evaluated in preliminary yield trial, advanced yield trial, regional yield trial, and variety trial with standard varieties as check to select the best adapted lines for release as variety. In plant breeding programs, usually the population is developed using bi-parental crosses, three-way crosses (involving three parents), or double crosses (involving four parents) to increase genetic variation in the breeding population. According to Huang et al. (2009), the bi-parental populations have only one chance for crossing over and have about 34 breaking points per crossing generation. That limits the chance of recombination in the segregation population. To increase the chance of crossing over, breaking points, and shuffling the diverse genomic regions for increasing recombination in crossing populations, Cavanagh et al. (2008) for the first time proposed a MAGIC strategy in Arabidopsis. Under this strategy, the population is developed by inter-crossing multiple desired parental lines (usually 8 parents or 16 parents) followed by several generation self-pollination using singleseed descent method to generate recombinant inbred lines (RILs). This RIL population represents a mosaic genome of eight or sixteen founder lines. Since the method is described, MAGIC populations have been developed in many other crops including rice (Bandillo et al. 2013), chickpea (Gaur et al. 2013), wheat (Mackay et at. 2014), barley (Sannemann et al. 2015), tomato (Pascual et al. 2015; Campanelli et al. 2019), and cowpea (Huynh et al 2018), etc. Another approach to increase genetic variation and chance of crossing over is to develop NAM population. In the NAM population, multiple diverse parents (founder donor parents) are crossed with a single parent (common parent).

M. Rahman and A. Hoque

Each cross is used to create independent RIL populations. The RILs from different crosses constitute the NAM population. Yu et al. (2008) for the first time developed a NAM population by crossing 25 diverse founder parents with a common parent (B73), and 5000 RILs were developed for this NAM population. NAM populations have been developed in many other crops including rapeseed (Li et al. 2016; Hu et al. 2018), soybean (Shivakumar et al. 2019; Diers et al. 2018), Rice (Fragoso et al. 2017), wheat (Jordan et al. 2018), and barley (Chen et al. 2019), etc. Since diverse parents with divergent traits are involved to create MAGIC or NAM populations, many studies can be undertaken using these populations.

4.6.6 Marker-Assisted Breeding In the conventional plant breeding method, the program starts with selection of parents, crosses are made, a considerable number of F2 plants are grown, and then F2:3 plants are grown in individual rows. At this point, breeders start phenotypic selection to select the best plants/rows and harvest individual plant/row in bulk, and the process is repeated until the F5 or F6 generation. Finally, preliminary yield trial, advanced yield trial, regional yield trial, and variety trial with standard varieties as checks are done subsequently to release the variety. The entire process requires nine to eleven years. The success of a breeding program through utilization of the mentioned conventional methods solely depends on accuracy of phenotypic selection. However, phenotyping traits having complex inheritance, low heritability is very difficult, time-consuming, and expensive. Phenotyping of certain traits is also complicated, as it requires specific environmental conditions and developmental stages for expression. Recent advancement in genomics has empowered the breeder to overcome the limitations of phenotypic selection (Lorenz et al. 2011). As an alternative to phenotypic selection, marker-assisted selection (MAS), i.e., indirect selection for target traits using molecular markers, is being used by breeders in many breeding

4

Flax Breeding

programs to enhance breeding program efficiency and genetic gain (Xu and Crouch 2008). The MAS requires the identification of markertrait, i.e., marker-QTL associations. Researchers do linkage mapping using bi-parental population and GWAS to reveal marker-QTL-trait associations. The identified markers and QTLs have been widely used for MAS to improve monogenic or oligogenic traits in many major crops such as rice (KS et al. 2022; Nihad et al. 2022; Sun et al. 2022), wheat (Alsaleh et al. 2022; Soriano et al. 2022; Yadav et al. 2015), and maize (Hao et al. 2014; Yang et al. 2018; Yathish et al. 2022). To date, a total of almost 330 QTLs and 490 QTNs associated with different traits has been identified in flax. Among them, 313 QTLs for 31 quantitative traits have been well documented (You and Cloutier 2020). However, no report is available regarding the utilization of the identified QTLs and markers in MAS for improvement of any specific flax traits. In many crops, MAS has shown success for traits that are controlled by large effect QTLs. However, improvement of quantitative traits controlled by multiple QTLs with minor effects is challenging. Multi-marker MAS system can be used to improve quantitative traits, but it is very difficult to identify and account for all the allele effects (Becker and Bernardo 1998; Bernardo and Woodbury 2020). In the case of quantitative traits, the limitations of MAS could be overcome by considering all marker effects regardless of significance. Based on this concept, genomic selection (GS), also known as genome-wide selection or genomic prediction, was proposed (Lorenz et al. 2011; Meuwissen et al. 2001). The GS was first successfully used in animal breeding, where it was applied to dairy cattle (Schaeffer 2006). In the last decade, plant breeders started using GS as an alternative or supplement to phenotypic selection. GS was first successfully applied in maize breeding (Massman et al. 2013). Later on, breeders are applied GS in many major crops such as rice (Frouin et al. 2019; Jarquin et al. 2020; Monteverde et al. 2019), wheat (Ben-Sadoun et al. 2020; Lozada and Carter 2015; Sarinelli et al. 2019), and

63

soybean (Howard and Jarquin 2019; Qin et al. 2019; Ravelombola et al. 2020). Despite the wide application of GS in many crops, its utilization in flax breeding has not flourished yet. Few reports are available on this aspect. You et al. (2016) first applied GS in flax breeding. The authors determined prediction accuracy and relative efficiency of three GS models to phenotypic selection. They showed the potential of GS in improving seed yield, oil content, and different fatty acid content. Pasmo resistance in flax is complex in nature with low heritability. GS, using previously identified QTLs and the new marker set, showed prediction accuracy > 0.9 (He et al. 2019), indicates GS is highly effective for pasmo resistance prediction. Recently, Lan et al. (2020) applied GS in a set of 260 lines from bi-parental populations. They showed a higher prediction accuracy for seven traits using QTLs identified by single and multi-locus models than that from whole genome-wide marker set. The advanced genomic tools named genomic cross-prediction will play an important role in flax breeding by enabling the breeder to choose potential parents for crossing based on estimated breeding values of segregation individuals of crosses (You et al. 2022).

4.7

Conclusion and Future Perspectives

Flax (Linum usitatissimum L., 2n = 2x = 30) is one of the oldest cultivated plants grown thousands of years for multiple purposes. Two types of flax are grown by the producers: oilseed flax and fiber flax. The oilseed flax is more economically important and is used for highquantity omega-3 fatty acid, high-quality drying oil in paints, ink, coatings, whole or ground seeds for baking breads and cookies, digestible proteins, and lignans. The fiber flax is grown for linen fabrics, rope, printed banknotes, rolling paper, etc. The cultivated flax species has been domesticated from the wild species Linum bienne, known as pale flax. Currently, the genetic diversity within the crop in a breeding program is

64

low. However, genetic diversity can be improved by using diversified parents domesticated in different parts of the world, which may result in greater genetic variability in the population and will have better phenotypic expression than it is in existing cultivars. The major objectives in the oilseed flax breeding program are to increase seed yield, seed oil, higher omega-3 fatty acid and to improve fiber yield and quality for fiber flax. However, both the oilseed and fiber flax should have improved agronomic traits and resistance to diseases, lodging, and abiotic stresses. Traditional plant breeding is an ancient activity started with domestication of plants with repeated selection practices started over 10,000 years ago. The common breeding methods for flax include mass selection, pure-line selection, pedigree, bulk population, single-seed descent, and backcrossing. With the advent of molecular marker technologies and practical use of markers in plant breeding programs, it has given new dimensions to reduce the time span of developing new and improved varieties for the growers. The conventional breeding programs strive to maximize genetic gain and rely on accurate phenotyping selection of breeding lines to maximize the selection accuracy. Currently, phenotyping is a time-consuming and an expensive process. With the invention of nextgeneration sequencing techniques, the genotyping costs have dramatically declined, while the phenotyping costs have increased due to higher cost of labor and land-use expenses. Genomic selection uses the information of genotyping data to predict breeding values for traits of interest and can be used more accurately to select desired breeding lines in crop improvement programs. The success of the future flax breeding program requires a visionary approach in designing the breeding program for sustainable production, increasing genetic diversity, need-based breeding objectives, collaboration between public and private sectors, and acquiring new technologies. Therefore, the combination of conventional breeding methods and molecular breeding strategies is needed to develop new varieties for the growers.

M. Rahman and A. Hoque

References Allaby RG, Peterson GW, Merriwether DA, Fu YB (2005) Evidence of the domestication history of flax (Linum usitatissimum L.) from genetic diversity of the sad2 locus. Theoretical and Applied Genetics 112(1): 58–65. https://doi.org/10.1007/s00122-005-0103-3. Epub 2005 Oct 8. PMID: 16215731 Alsaleh A, Baloch FS, Sesiz U, Nadeem MA, Hatipoğlu R, Erbakan M, Özkan H, Hussain S, Alsaleh A, Baloch FS, Sesiz U, Nadeem MA, Hatipoğlu R, Erbakan M, Özkan H, Hussain S (2022) Markerassisted selection and validation of DNA markers associated with cadmium content in durum wheat germplasm. Crop Pasture Sci. https://doi.org/10.1071/ CP21484 Ayad WG, Hodgkin T, Jaradat A, Rao VR (1997) Molecular genetic techniques for plant genetic resources. Report of an IPGRI workshop, 9–11 October 1995. International Plant Genetic Resources Institute, Rome, Italy Balter M (2009) Clothes make the (Hu) man, Science 325 (5946):1329. https://doi.org/10.1126/science.325_1329a Bandillo N, Raghavan C, Muyco PA, Sevilla MA, Lobina IT, Dilla-Ermita CJ et al (2013) Multi-parent advanced generation inter-cross (MAGIC) populations in rice: progress and potential for genetics research and breeding. Rice 6:11 Becker HC, Bernardo R (1998) A model for markerassisted selection among single crosses with multiple genetic markers. Theor Appl Genet 97(3):473–478. https://doi.org/10.1007/S001220050919 Bergmann R, Friedt W (1997) Haploidy and related biotechnological methods in linseed (Linum usitatissimum L.). In: Jain SM, Sopory SK, Veilleux RE (eds) In Vitro haploid production in higher plants, vol 5. Kluwer Academic, Publishers, pp 1–16 Bernardo R, Woodbury SP (2020) Breeding for quantitative traits in plants third edition. http://stemmapress. com Ben-Sadoun S, Rincent R, Auzanneau J, Oury FX, Rolland B, Heumez E, Ravel C, Charmet G, Bouchet S (2020) Economical optimization of a breeding scheme by selective phenotyping of the calibration set in a multi-trait context: application to bread making quality. Theor Appl Genet 133 (7):2197–2212. https://doi.org/10.1007/S00122-02003590-4/FIGURES/5 Bhateria S, Sood SP, Pathania A (2006) Genetic analysis of quantitative traits across environments in flax (Linum usitatissimum L.). Euphytica 150:185–194 Bohn M, Utz HF, Melchinger AE (1999) Genetic similarities among winter wheat cultivars determined on the basis of RFLPs, AFLPs, and SSRs and their use for predicting progeny variance. Crop Sci 39(1):228–237 Bretting PK, Widrlechner MP (1995) Genetic markers and plant genetic resource management. In: Janick J (ed) Plant Breeding Reviews, vol 13. John Wiley and

4

Flax Breeding

Sons, Inc., New York, pp 11–87. https://doi.org/10. 1002/9780470650059.ch2 Campanelli G, Sestili S, Acciarri N, Montemurro F, Palma D, Leteo F, Beretta M (2019) Multi-Parental advances generation inter-cross population, to develop organic tomato genotypes by participatory plant breeding. Agronomy 9:119 Cavanagh C, Morell M, Mackay I, Powell W (2008) From mutations to MAGIC: resources for gene discovery, validation and delivery in crop plants. Curr Opin Plant Biol 11:215–221 Chen Q, Yang CJ, York AM, Xue W, Daskalska LL, DeValk CA, Krueger KW, Lawton SB, Spiegelberg BG, Schnell JM, Neumeyer MA, Perry JS, Peterson AC, Kim B, Bergstrom L, Yang L, Barber IC, Tian F, Doebley JF (2019) TeoNAM: a nested association mapping population for domestication and agronomic trait analysis in maize. Genetics 213:1065–1078 Culbertson JO (1954) Breeding flax. Adv Agron 6:174– 178 Diederichsen A, Ulrich A (2009) Variability in stem fibre content and its association with other characteristics in 1177 flax (Linum usitatissimum L) genebank accessions. Ind Crop Prod 30:33–39 Diederichsen A, Raney JP (2006) Seed colour, seed weight and seed oil content in Linum usitatissimum accessions held by plant gene resources of Canada. Plant Breed 125(4):372–377 Diederichsen A (2001) Comparison of genetic diversity of flax (Linum usitatissimum L.) between Canadian cultivars and a world collection. Plant Breed 120(4): 360–362. https://doi.org/10.1046/J.1439-0523.2001. 00616.X Diers BW, Specht J, Rainey KM, Cregan P, Song Q, Ramasubramanian V, Graef G, Nelson R, Schapaugh W, Wang D, Shannon G, McHale L, Kantartzi SK, Xavier A, Mian R, Stupar RM, Michno J-M, An Y-QC, Goettel W, Ward R, Fox C, Lipka AE, Hyten D, Cary T, Beavis WD (2018) Genetic architecture of soybean yield and agronomic traits. G3 8:3367–3375 Dillman AC, Goar LG (1937) Flaxseed production in the far western states. Rep. Farmer’s Bulletin no. 1792. United States Department of Agriculture: Beltsville, MD Duk M, Kanapin A, Rozhmina T, Bankin M, Surkova S, Samsonova A, Samsonova M (2021) The genetic landscape of fiber flax. Front Plant Sci 12:764612. https://doi.org/10.3389/fpls.2021.764612 Eckert RT, Joly RJ, Neale DB (1981) Genetics of isozyme variants and linkage relationships among allozyme loci in 35 eastern white pine clones. Can J for Res 11 (3):573–579 Egorova TV (1996) Family Linaceae DC. ex SF Gray. Flora Vostochnoi Evropy 9:346–361 El Sayed AA, Ezzat SM, Mostafa SH, Zedan SZ, AbdelSattar E, El Tanbouly N (2018) Inter simple sequence repeat analysis of genetic diversity and relationship in four egyptian flaxseed genotypes. Pharmacognosy Research 10(2):166

65 Everaert I, De Riek J, De Loose M, van Waes J, van Bockstaele E (2001) Most similar variety grouping for distinctness evaluation of flax and linseed (Linum usitatissimum L.) varieties by means of AFLP and morphological data. Plant Varieties Seeds 14(2):69–87 Falkenhagen ER (1985) Isozyme studies in provenance research of forest trees. Theor Appl Genet 69(4):335– 347 Fragoso CA, Maria M, Zuoheng W, Christopher H, Lady JA, John AA, Natalia F, Luz ER, Karine L, Zhao H, Stephen LD, Mathias L (2017) Genetic architecture of a rice nested association mapping population. G3 7:1913–1926 Friedt W (1993) Breeding and agronomic development of linseed and sunflower for technical markets. In: Anthony KRM, Meadley J, Röbbelen G (eds) New crops for temperate regions. Chapman and Hall, London, pp 222–234 Frouin J, Labeyrie A, Boisnard A, Sacchi GA, Ahmadi N (2019) Genomic prediction offers the most effective marker assisted breeding approach for ability to prevent arsenic accumulation in rice grains. PLoS ONE 14(6):e0217516. https://doi.org/10.1371/ JOURNAL.PONE.0217516 Fu YB, Diederichsen A, Allaby RG (2012) Locus-specific view of flax domestication history. Ecol Evol 2:139–152 Fu YB, Rowland GG, Duguid SD, Richards KW (2003) RAPD analysis of 54 North American flax cultivars. Crop Sci 43(4):1510–1515 Gaur PM, Thudi M, Srinivasan S, Varshney RK (2013) Advances in chickpea genomics. In: Nadarajan N, Gupta DS (eds) Legumes in the Omic Era. Springer, New York, NY, USA, pp 73–94 Gill KS, Yermanos DM (1967) Cytogenetic studies on the genus Linum I. Hybrids among taxa with 15 as the haploid chromosome number 1. Crop Sci 7(6):623– 627 Gill KS (1987) Flax. Indian council of agricultural research, New Delhi, India. 386p Green AG, Chen Y, Singh SP, Dribnenki JCP (2008) Flax. In: Kole C, Hall TC (eds) Compendium of transgenic crop plants: transgenic oilseed crops. Blackwell Publishing, Chicester, pp 199–226 Hall LM, Booker H, Siloto RMP, Jhala AJ, Weselake RJ (2016) Industrial Oil crops (Chapter 6—Flax, Linum usitatissimum L.). In: McKeon TA, Hayes DG, Hildebrand DF, Weselake RJ (ed). AOCS Press, pp 157– 194. https://doi.org/10.1016/B978-1-893997-98-1. 00006-3 Hao X, Li X, Yang X, Li J (2014) Transferring a major QTL for oil content using marker-assisted backcrossing into an elite hybrid to increase the oil content in maize. Mol Breed 34(2):739–748. https://doi.org/10. 1007/S11032-014-0071-X/TABLES/2 Harlan JR (1965) Possible role of weed races in evolution of cultivated plants. Euphytica 14:173–176 He L, Xiao J, Rashid KY, Jia G, Li P, Yao Z, Wang X, Cloutier S, You FM (2019) Evaluation of genomic prediction for pasmo resistance in flax. Int J Mol Sci 20(2):359. https://doi.org/10.3390/IJMS20020359

66 Helbaek H (1959) Domestication of food plants in the old world. Science 130:365–372 Hoque A, Fiedler JD, and Rahman M (2020) Genetic diversity analysis of a flax (Linum usitatissimum L.) global collection. BMC Genomics 21(1):1–13. https:// doi.org/10.1186/S12864-020-06922-2 Howard R, Jarquin D (2019) Genomic prediction using canopy coverage image and genotypic information in Soybean via a hybrid model. Evol Bioinf Online 15:1176934319840026. https://doi.org/10.1177/11769 34319840026 Hu J, Guo C, Wang B, Ye J, Liu M, Wu Z, Xiao Y, Zhang Q, Li H, King GJ, Liu K (2018) Genetic properties of a nested association mapping population constructed with semi-winter and spring oilseed rapes. Front Plant Sci 9:1740 Huang X, Feng Q, Qian Q et al (2009) High-throughput genotyping by whole genome resequencing. Genome Res 19:1068–1076 Huynh BL, Ehlers JD, Huang BE, Muñoz-Amatriaín M, Lonardi S, Santos JRP et al (2018) A multi-parent advanced generation inter-cross (MAGIC) population for genetic analysis and improvement of cowpea (Vigna unguiculata L. Walp.). Plant J 93:1129–1142 Jarquin D, Kajiya-Kanegae H, Taishen C, Yabe S, Persa R, Yu J, Nakagawa H, Yamasaki M, Iwata H (2020) Coupling day length data and genomic prediction tools for predicting time-related traits under complex scenarios. Sci Rep 10(1):1–12. https://doi. org/10.1038/s41598-020-70267-9 Juita DBZ, Kennedy EM, Mackie JC (2012) Low temperature oxidation of linseed oil: a review. Fire Sci Rev 1:3. https://doi.org/10.1186/2193-0414-1-3 Jordan KW, Wang S, He F, Chao S, Lun Y, Paux E, Sourdille P, Sherman J, Akhunova A, Blake NK, Pumphrey MO, Glover K, Dubcovsky J, Talbert L, Akhunov ED (2018) The genetic architecture of genome-wide recombination rate variation in allopolyploid wheat revealed by nested association mapping. Plant J 95:1039–1054 Kenaschuk EO, Rowland GG (1993) Flax. In: Slinkard AE, Knott DR (eds) Harvest of gold: the history of field crop breeding in Canada. University of Saskatchewan, Saskatoon, Saskatchewan, pp 173–176 Kenaschuk EO (1975) In: Harpiak JT (ed) Oilseed and pulse crops in western canada—a symposium. Western Co-operative Fertilizers, Calgary, Alberta, pp 203–221 KS D, CH DR, Sinha P, HK MS, RR KR, EP, RM S (2022) Marker assisted pedigree breeding based improvement of the Indian mega variety of rice MTU1010 for resistance against bacterial blight and blast and tolerance to low soil phosphorus. PLOS ONE 17(1):e0260535. https://doi.org/10.1371/ JOURNAL.PONE.0260535 Kumar Yadav H, Chandrawati D, Singh N, Kumar R, Kumar S, Ranade SA (2018) Agro-Morphological traits and microsatellite marker s based genetic diversity in indian genotypes of linseed (Linum

M. Rahman and A. Hoque usitatissimum L.). http://hdl.handle.net/123456789/ 3670 Kumari A, Paul S, Sharma V (2018) Genetic diversity analysis using RAPD and ISSR markers revealed discrete genetic makeup in relation to fibre and oil content in Linum usitatissimum L. genotypes. The Nucleus 61(1):45–53 Kuhns LJ, Fretz TA (1978) Distinguishing rose cultivars by polyacrylamide gel electrophoresis. I. Extraction and storage of protein and active enzymes from rose leaves [Chemotaxonomy]. J Am Soc Hortic Sci 103 (4):503–508 Kvavadze E, Bar-Yosef O, Belfer-Cohen A (2009) 30,000 years old wild flax fibers—testimony for fabricating prehistoric lilen. Science 5946:1359 Lan S, Zheng C, Hauck K, McCausland M, Duguid SD, Booker HM, Cloutier S, You FM (2020) Genomic prediction accuracy of seven breeding selection traits improved by QTL identification in flax. Int J Mol Sci 21(5):1577. https://doi.org/10.3390/IJMS21051577 Lay CL, Dybing DD (1989) Linseed in oil crops of the world. In: Robbelen G, Downey RK, Ashri A (eds). McGraw Hill, NY Li J, Anja B, Viola S, Benjamin S (2016) Comparison of statistical models for nested association mapping in rapeseed (Brassica napus L.) through computer simulations. BMC Plant Biol 16:26 Lorenz AJ, Chao S, Asoro FG, Heffner EL, Hayashi T, Iwata H, Smith KP, Sorrells ME, Jannink JL (2011) Genomic selection in plant breeding: knowledge and prospects. Adv Agron 110(C):77–123. https://doi.org/ 10.1016/B978-0-12-385531-2.00002-5 Lozada DN, Carter AH (2015) Accuracy of single and multi-trait genomic prediction models for grain yield in US pacific northwest winter wheat. https://doi.org/ 10.20900/cbgg20190012 Mackay IJ, Bansept-Basler P, Barber T, Bentley AR, Cockram J, Gosman N et al (2014) An eight-parent multiparent advanced generation inter-cross population for winter-sown wheat: creation, properties, and validation. G3 4:1603–1610 Månsby E, von Díaz O, Von Bothmer R (2000) Preliminary study of genetic diversity in Swedish flax (Linum usitatissimum). Genet Resour Crop Evol 47(4):417– 424 Massman JM, Jung HJG, Bernardo R (2013) Genomewide selection versus marker-assisted recurrent selection to improve grain yield and stover-quality traits for cellulosic ethanol in maize. Crop Sci 53(1):58–66. https://doi.org/10.2135/CROPSCI2012.02.0112 McDill J, Repplinger M, Simpson BB, Kadereit JW (2009) The phylogeny of Linum and linaceae subfamily linoideae, with implications for their systematics, biogeography, and evolution of heterostyly. Syst Bot 34(2):386–405 Meuwissen THE, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157(4):1819–1829. https://doi. org/10.1093/GENETICS/157.4.1819

4

Flax Breeding

Mhiret WN, Heslop-Harrison JS (2018) Biodiversity in Ethiopian linseed (Linum usitatissimum L.): molecular characterization of landraces and some wild species. Genetic Resour Crop Evol 1–12 Mohammadi AA, Saeidi G, Arzani A (2010) Genetic analysis of some agronomic traits in flax (Linum usitatissimum L.). Aust J Crop Sci 4:343–352 Monteverde E, Gutierrez L, Blanco P, Pérez de Vida F, Rosas JE, Bonnecarrère V, Quero G, McCouch S (2019) Integrating molecular markers and environmental covariates to interpret genotype by environment interaction in rice (Oryza sativa L.) grown in subtropical areas. G3 Genes|Genomes|Genetics 9(5):1519– 1531. https://doi.org/10.1534/G3.119.400064 Mpofu SI, Rashid KY (2001) Vegetative compatibility groups within Fusarium oxysporum f.sp. lini from Linum usitatissimum (flax) wilt nurseries in western Canada. Can J Bot 79:836–843 Muravenko OV, Lemesh VA, Samatadze TE, Amosova AV, Grushetskaya ZE, Popov KV, Semenova OY, Khotyuleva LV, Zelenin AV (2003) Genome comparisons with chromosomal and molecular markers for three closely related flax species and their hybrids. Russ J Genet 39(4):414–421 Nichterlein K (2003) Anther culture of linseed (Linum usitatissimum L.). In: Maluszynski M, Kasha K, Forster BP, Szarejko I (eds) Doubled haploid production in crop plants. a manual. IAEA, pp 249–254 Nihad SAI, Hasan MK, Kabir A, Hasan MdA-I, Bhuiyan MdR, Yusop MR, Latif MA (2022) Linkage of SSR markers with rice blast resistance and development of partial resistant advanced lines of rice (Oryza sativa) through marker-assisted selection. Physiol Mol Biol Plants 1–17.https://doi.org/10.1007/S12298-022-01141-3 Ockendon DJ, Walters SM (1968) Linum L. Flora Europaea 2:206–211 Omaha BD (2001) Flaxseed as a functional food source. J Sci Food Agri 81:889–894. https://doi.org/10.1002/ jsfa.898 Pascual L, Desplat N, Huang BE, Desgroux A, Bruguier L, Bouchet JP et al (2015) Potential of a tomato magic population to decipher the genetic control of quantitative traits and detect causal variants in the resequencing era. Plant Biotechnol J 13:565– 577 Patel VD, Chopde PR (1981) Combining ability analysis over environments in diallel crosses of flax (Linum usitatissimum L.). Theor Appl Genet 60:339–343 Qin J, Shi A, Song Q, Li S, Wang F, Cao Y, Ravelombola W, Song Q, Yang C, Zhang M (2019) Genome wide association study and genomic selection of amino acid concentrations in soybean seeds. Front Plant Sci 10:1445. https://doi.org/10.3389/FPLS.2019. 01445/BIBTEX Ragupathy R, Rathinavelu R, Cloutier S (2011) Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome. BMC Genomics 12:217. http://www. biomedcentral.com/1471-2164/12/217

67 Ravelombola WS, Qin J, Shi A, Nice L, Bao Y, Lorenz A, Orf JH, Young ND, Chen S (2020) Genome-wide association study and genomic selection for tolerance of soybean biomass to soybean cyst nematode infestation. PLoS ONE 15(7):e0235089. https://doi.org/10. 1371/JOURNAL.PONE.0235089 Saeidi G (2012) Genetic variation and heritability for germination, seed vigour and field emergence in brown and yellow-seeded genotypes of flax. Int J Plant Prod 2(1):15–22 Salej S, Kalia NR, Bhateria S, Sanjeev K (2007) Detection of genetic components of variation for some biometrical traits in Linum usitatissimum L. in sub-mountain Himalayan region. Euphytica 155:107–115 Sannemann W, Huang BE, Mathew B, Leon J (2015) Multi-parent advanced generation inter-cross in barley: high-resolution quantitative trait locus mapping for flowering time as a proof of concept. Mol Breed 35:86 Sarinelli JM, Murphy JP, Tyagi P, Holland JB, Johnson JW, Mergoum M, Mason RE, Babar A, Harrison S, Sutton R, Griffey CA, Brown-Guedira G (2019) Training population selection and use of fixed effects to optimize genomic predictions in a historical USA winter wheat panel. Theor Appl Genet 132 (4):1247–1261. https://doi.org/10.1007/S00122-01903276-6/TABLES/5 Schaeffer LR (2006) Strategy for applying genome-wide selection in dairy cattle. J Anim Breed Genet 123 (4):218–223. https://doi.org/10.1111/J.1439-0388. 2006.00595.X Seetharam A (1972) Interspecific hybridization in Linum. Euphytica 21(3):489–495 Shivakumar M, Kumawat G, Nataraj V, Gireesh C, Gupta S, Satpute GK, Ratnaparkhe MB, Yadav DP (2019) NAM population–a novel genetic resource for soybean improvement: development and characterization for yield and attributing traits. Plant Genetic Resource 17:545–553 Singh N, Agarwal N, Yadav HK (2019) Genome-wide SNP-based diversity analysis and association mapping in linseed (Linum usitatissimum L.). Euphytica 215 (8):139 Smith CE (1969) From vavilov to the present—a review. Econ Bot 23:2–19 Smykal P, Bacova-Kerteszova N, Kalendar R, Corander J, Schulman AH, Pavelek M (2011) Genetic diversity of cultivated flax (Linum usitatissimum L.) germplasm assessed by retrotransposon-based markers. Theor Appl Genet 122:1385–1397 Soriano M, Bariana H, Kant L, Qureshi N, Forrest K, Miah H, Bansal U (2022) Identification and characterisation of stripe rust resistance genes Yr66 and Yr67 in wheat cultivar VL Gehun 892. Agronomy 12(2): 318. https://doi.org/10.3390/AGRONOMY12020318 Soto-Cerda BJ, Diederichsen A, Duguid S, Booker H, Rowland G, Cloutier S (2014) The potential of pale flax as a source of useful genetic variation for cultivated flax revealed through molecular diversity

68 and association analyses. Mol Breeding 34(4):2091– 2107. https://doi.org/10.1007/s11032-014-0165-5 Sun L, Wang R, Tang W, Chen Y, Zhou J, Ma H, Li S, Deng H, Han L, Chen Y, Tan Y, Zhu Y, Lin D, Zhu Q, Wang J, Huang D, Chen C (2022) Robust identification of low-Cd rice varieties by boosting the genotypic effect of grain Cd accumulation in combination with marker-assisted selection. J Hazard Mater 424:127703. https://doi.org/10.1016/J.JHAZMAT.2021.127703 Tammes T (1925) Mutation and evolution. Z. Induct. Abstamm. u. VererbLehre 36:417–426 Tobolski JJ, Kemery RD (1992) Identification of red maple cultivars by isozyme analysis. HortScience 27 (2):169–171 Tutin TG, Heywood VH, Burges NA, Moore DM, Valentine DH, Walters SM, Webb DA (1968) Flora Europaea. In: Rosaceae to umbelliferae, vol 2. Cambridge University Press, Cambridge, R.-U. Tyson H, Fieldes MA, Cheung C, Starobin J (1985) Isozyme relative mobility (R m) changes related to leaf position; apparently smooth R m trends and some implications. Biochem Genet 23(9–10):641–654 Vaisey-Genser M, Diane HM (2003) History of the cultivation and uses of flaxseed. In: Muir AD, Westcott ND (eds) Flax: The Genus Linum. Taylor and Francis, New York, pp 1–21 van Beuningen LT, Busch RH (1997) Genetic diversity among North American spring wheat cultivars: III. Cluster analysis based on quantitative morphological traits. Crop Sci 37(3):981–988 van Zeist W, Bakker-Heeres JAH (1975) Evidence for linseed cultivation before 6000 BC. J Archaeol Sci 2:215–219 Vavilov NI (1926) Studies in the origin of cultivated plants. Leningrad, Vsesoiuz. Inst. Priklad, Moscow, Russia, 248p Vromans J (2006) Molecular genetic studies in flax (Linum usitatissimum L.). PhD thesis Wageningen University, The Netherlands Winkler H (1931) Linaceae. In: Engler A, Prantl, K (ed) Die Natürl. Pflanzenfam, 2 Yadav PS, Mishra VK, Arun B, Chand R, Vishwakarma MK, Vasistha NK, Mishra AN, Kalappanavar IK, Joshi AK (2015) Enhanced resistance in

M. Rahman and A. Hoque wheat against stem rust achieved by marker assisted backcrossing involving three independent Sr genes. Curr Plant Biol 2:25–33. https://doi.org/10.1016/J. CPB.2015.05.001 Yang R, Yan Z, Wang Q, Li X, Feng F (2018) Markerassisted backcrossing of lcyE for enhancement of proA in sweet corn. Euphytica 214(8):1–12. https://doi.org/ 10.1007/S10681-018-2212-5/TABLES/7 Yathish KR, Karjagi CG, Singh Gangoliya S, Kumar A, Preeti P, Yadav HK, Srivastava S, Kumar S, Singh A, Phagna RK, Das AK, Chandra Sekhar J, Hossain F, Gadag RN (2022) Introgression of the low phytic acid locus (lpa2) into elite maize (Zea Mays L.) inbreds through marker-assisted backcross breeding (MABB). https://doi.org/10.21203/rs.3.rs-1293507/v1 You FM, Booker HM, Duguid SD, Jia G, Cloutier S (2016) Accuracy of genomic selection in biparental populations of flax (Linum usitatissimum L.). Crop J 4 (4):290–303. https://doi.org/10.1016/J.CJ.2016.03.001 You FM, Cloutier S (2020) Mapping quantitative trait loci onto chromosome-scale pseudomolecules in flax. Methods Protoc 3(2):28. https://doi.org/10.3390/ MPS3020028 You FM, Zheng C, Bartaula S, Khan N, Wang J, Cloutier S (2022) Genomic cross prediction for linseed improvement. Accelerated Plant Breed 4:451–480. https://doi. org/10.1007/978-3-030-81107-5_13 Yu J, Holland JB, McMullen MD, Buckler ES (2008) Genetic design and statistical power of nested association mapping in maize. Genetics 178(1):539–551. https://doi.org/10.1534/genetics.107.074245 Zeven AC (1982) Dictionary of cultivated plants and their centers of diversity, excluding ornamentals, forest tress and lower plants. Center for Agricultural Publishing and Documentation, Wageningen, The Netherlands, 263p Zohary D, Hopf M (2000) Domestication of plants in the Old World: the origin and spread of cultivated plants in West Asia. Europe, and the Nile Valley Oxford University Press, New York Zohary D (1999) Monophyletic and polyphyletic origin of the crops on which agriculture was formed in the Near East. Genet Resour Crop Evol 46:133–142

5

QTL Mapping: Strategy, Progress, and Prospects in Flax Frank M. You, Nadeem Khan, Hamna Shazadee, and Sylvie Cloutier

5.1

Introduction

Quantitative genetics refers to the study of quantitative traits and their inheritance using dedicated genetic designs and statistical (Falconer 1960). Quantitative traits targeted in plant breeding are mostly controlled by polygenes. Traditional quantitative genetics is limited to estimating the overall genetic effects or variances of the polygenes at a phenotypic level. With the availability of high-density and genome-wide molecular markers, polygenic loci can be precisely positioned onto chromosomes, and their effects on phenotypes can be estimated using statistical methods. Such positioned polygenic loci are commonly called quantitative trait loci (QTLs). A QTL may be a genetically defined or a physically defined region on a chromosome, where the latter could be as narrowly defined as a single nucleotide called a quantitative trait nucleotide (QTN). Over the last several decades, methods for QTL identification in plants have significantly evolved, starting with the first linkage map reporting morphological markers (seed color or pattern) and the seed size in beans (Sax 1923). With the availability of large

F. M. You (&)  N. Khan  H. Shazadee  S. Cloutier Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada e-mail: [email protected]

amounts of genomic resources and the development of computational tools, a wide range of appropriate statistical methods for complex quantitative traits and mapping populations have been developed. There are two major strategies for QTL identification: linkage map-based QTL mapping (LM) and linkage disequilibrium (LD)-based association mapping, commonly known as genome-wide association study (GWAS). Genetic linkage mapping is based on a biparental segregating population that has a large population size and is derived from parents with sufficient genetic differences in the traits of interest (Nordborg and Weigel 2008). Compared to the traditional LM approach, GWAS based on a diverse population is a powerful approach for finding alleles linked to traits and has the potential to provide a higher mapping precision without the need to develop mapping populations (Flint-Garcia et al. 2005; Yu and Buckler 2006). However, for certain plant species with limited genetic diversity, GWAS is a less efficient approach than LM (Zhao et al. 2007). The two methods are complementary and can be used to cross-validate one another (Motte et al. 2014; Sonah et al. 2015; Chen et al. 2016). This chapter addresses both LM and GWAS and summarizes the strategies, the major advances in QTL identification and candidate gene prediction in flax.

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_5

69

70

5.2

F. M. You et al.

Linkage Map-Based QTL Mapping (LM)

Linkage map-based QTL mapping or linkage mapping (LM) is a first step toward the identification of causal genes through the positioning of a QTL within an interval defined by flanking markers (Tsai et al. 2021). Using statistical models, LM can also estimate the additive, dominant, and epistatic effects of the QTLs (Gallais et al. 2001). LM includes five main steps: (1) development of a bi-parental mapping population, (2) development of a high-density molecular marker set, (3) construction of a linkage map, (4) evaluation of phenotypic traits in multiple environments (years and locations), and (5) statistical modelbased QTL mapping for marker-trait associations between genotypic and phenotypic data, which can be followed by gene prediction analysis.

5.2.1 Bi-Parental Populations LM relies on segregating bi-parental populations, such as F2, recombinant inbred line (RIL), doubled haploid (DH), and backcross (BC) populations, to construct a linkage map using molecular makers (Price 2006). For QTL mapping, a sufficiently large population is needed. Mapping populations can be divided into two types, transitory populations such as F2 populations, and permanent populations such as BC, DH, and RIL (Sandhu et al. 2018) populations. When constructing a RIL population, progenies from the F2 generation are self-pollinated for several generations (up to F6 or higher) via single seed descent method. RILs are highly homozygous so that they can be used to detect QTL effects under multiple environments. It is worth noting that the dominance effect of a QTL cannot be estimated from a homozygous population because all loci are homozygous.

5.2.2 Linkage Map Construction Technically, the construction of a linkage genetic map usually follows these steps: (1) establish a proper bi-parental mapping population,

(2) develop polymorphic markers from the mapping population and perform statistical tests on the polymorphic markers to remove strongly distorted markers, (3) construct a linkage map using high-quality markers, and (4) calculate the relative genetic distances between markers. In LM, the genetic map is a key component of the methodology. A high-quality genetic map with adequate marker density and resolution relies on the size of the segregating population, the number of genetic markers, the accuracy of the genotyping, and the methods used for ordering the markers within the linkage group (Tao et al. 2018). Improvements in linkage map resolution have come from the many advances in marker technologies. Most early linkage maps of major plant species were constructed using combinations of dominant and co-dominant markers, such as amplified fragment length polymorphism (AFLP) (Semagn et al. 2006) and random amplified polymorphic DNA (RAPD) (Harushima et al. 1998). The low density and technological limitations of these genetic markers led to low-resolution maps (Sun et al. 2011). The limitations of these early-generation markers such as their low reproducibility, limited numbers, and high cost hindered their applications in high-resolution mapping. With the development of high-throughput sequencing technologies, abundant co-dominant markers such as simple sequence repeat (SSR) (Kirungu et al. 2018) and single nucleotide polymorphism (SNP) markers (Palumbo et al. 2019) have emerged to replace the earlier generation markers. SNPs are by far the most widespread markers in genomes, making them ideal for large-scale genotyping and the development of high-density linkage maps (Lindblad-Toh et al. 2000). Advances in nextgeneration sequencing technologies have transformed genotyping. Derivatives of these technologies, such as restriction site-associated DNA (RAD) technique (Miller et al. 2007), generate large numbers of genetic molecular markers at reasonable costs. RAD has been effectively used in many plant species, including some with large genomes (Cui et al. 2018; Kajiya-Kanegae et al. 2020), hereby combining the benefits of easy library preparation, genome coverage, and

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

platform flexibility. Genotyping-by-sequencing (GBS) is now the most commonly used method to generate a large amount of genome-wide molecular markers for QTL identification (You et al. 2018b; Wang et al. 2020). Once the molecular markers are developed for a mapping population, testing for segregation distortion of the markers, i.e., genotypes deviating from expected Mendelian segregation ratios, is performed. Distorted markers can impede the statistical power of QTL identification (Liu et al. 2010) if not adequately accounted for. Usually, chi-squared test is used for testing segregation ratios (Wu et al. 2007). Two- and three-point analyses are usually used for determining marker order and calculating recombination fractions between adjacent markers. With increases in the marker density, the maximum likelihood function has been proposed to estimate the order and recombination fraction of markers. Lastly, recombination fractions are estimated as genetic distances using a map function. Morgan, Haldane, and Kosambi map functions are the most commonly used. The Morgan map function is applicable for short distance (cM), whereas the Haldane map function is suitable for long distance conversion. However, the Kosambi map function is usually preferred as it is superior to the Morgan and Haldane functions for long distance markers because it considers cross-over interference (Huehn 2011; Kivikoski et al. 2020).

5.2.3 Statistical Models for Linkage Map-Based QTL Mapping The statistical methods for QTL mapping of biparental populations and corresponding software packages are well developed (Kulwal 2018). Interval mapping (IM) (Jansen 1994) and composite or inclusive composite interval mapping (CIM/ICIM) (Zeng 1994; Wang et al. 1999) detect additive, dominant, and epistatic QTLs. IM adopts a maximum likelihood and a simple regression model for QTL positioning and additive effect estimation. When multiple QTLs exist in one region, the detected loci may deviate from

71

their true positions as a consequence of interactions between the QTLs. To increase the accuracy and power of QTL detection, the CIM model that combines multiple linear regressions with IM was proposed (Zeng 1994). CIM is a more stable and powerful method to detect QTLs across the whole genome because the linear model of CIM takes into consideration the background markers, but the epistatic effects and interactions between environment and loci are not considered (Zeng 1994). In contrast, ICIM, which is based on a mixed linear model, provides functions to estimate QTLs and environment interactions as well as epistatic effects (Wang et al. 1999). The genome-wide composite interval mapping (GCIM) has recently been proposed to estimate both large and small-effect QTLs (Zhang et al. 2020). GCIM is based on CIM and includes two steps, where the first is the scanning of the putative QTLs across the genome with a single-locus random mixed linear model used in GWAS. In this step, the background markers in CIM are replaced by estimated polygenic variance in GWAS. These selected putative QTLs are then integrated into a multi-QTL mixed linear model in the second step. The QTL effects are calculated using an empirical Bayes method with the likelihood ratio test employed on true QTL detection (Zhang et al. 2020). In addition, several older QTL mapping methods, such as penalized maximum likelihood mapping (Zhang and Xu 2005), multiple-interval mapping (Kao et al. 1999), and multi-marker analysis mapping (Broman and Speed 2002), are also available for detection of minor-effect QTLs. Many software tools have been developed to implement algorithms for genetic map constructions and QTL detection. They include MAPMAKER/QTL (Lander et al. 1987), QTL PLABQTL (Utz and Melchinger 1996), HAPPY (Mott et al. 2000), Map Manager (Manly et al. 2001), QTL Express (Seaton et al. 2002), QGene (Joehanes and Nelson 2008), QTLnetwork (Yang et al. 2008), R/qtl (Arends et al. 2010), mpMap (Huang and George 2011), WinQTL cartographer (Wang et al. 2012a), DOQTL (Gatti et al. 2014), QTL IciMapping (Meng et al. 2015),

72

F. M. You et al.

and QTL.gCIMapping (Zhang et al. 2020). QTL IciMapping is recommended for its functions of consensus genetic map construction and QTL mapping for various bi-parental and nested association mapping (NAM) populations (Meng et al. 2015). QTL.gCIMapping (v3.2) is an R package recommended for GCIM (Zhang et al. 2020).

5.3

Association Mapping

Association mapping, also known as “LD mapping,” is a method for uncovering genetic associations between markers and traits of interest using historic LD to link phenotypes to genotypes. Association mapping seeks to detect specific functional genetic loci or alleles linked to phenotypic differences in a trait. Genome-wide high-density markers, such as SNPs, are a prerequisite to identify these trait-associated functional variants; hence, the reason for the commonly used term GWAS. GWAS has been widely applied to study crop genetics and more specifically for the identification of QTLs and their candidate genes associated with complex traits in many crops, including flax. The fast evolution of GWAS methods and their applications are attributed to the rapid development of high-throughput sequencing and low cost genotyping technologies such as SNP chips and GBS, and to the development of powerful statistical models and diverse genetic populations.

5.3.1 Genetic Populations Although advanced statistical models for GWAS can control population structure and the relatedness among individuals in a population, an ideal genetic population should have a wide genetic variation for the traits of interest, little genetic relationship between individuals, and be of a large size. In practice, various types of genetic populations have been used for GWAS, such as germplasm collections, combination of germplasm accessions and breeding lines (You et al. 2022), bi-parental (You et al. 2018b; Sumitomo

et al. 2019), NAM (Kump et al. 2011), and multiparent advanced generation intercross (MAGIC) populations (Islam et al. 2016).

5.3.2 Statistical Models for Association Mapping The development of statistical models for GWAS has accelerated its applications in crop genetics study. These can be divided into two groups: single-locus and multi-locus models (Table 5.1). Single-locus models test the significance of marker-trait associations one marker at a time. These first-developed models include the general linear model (GLM) (Price et al. 2006) and the mixed linear model (MLM) (Yu et al. 2006) implemented in some popular software packages such as TASSEL (Bradbury et al. 2007) and rMVP (Liu et al. 2016). GLM controls population structure by fitting it as a fixed effect in the linear model, whereas MLM additionally fits kinship as a random effect to control the genetic background caused by the genetic relatedness among individuals of the population. Theoretically, MLM corrects the inflation from small polygenic effects, effectively controlling the population stratification bias (Wen et al. 2018) and reducing false positives (Liu et al. 2016). Other single-locus models, such as Efficient Mixed Model Association eXpedited (EMMAX) (Kang et al. 2010), GEMMA (Zhou and Stephens, 2012), and ECMLM (Li et al. 2014), have also been used to some extent (Table 5.1). The development of multi-locus models that allow the identification of small-effect QTNs for complex traits constitutes a significant advance in crop GWAS (Zhang et al. 2019). By simultaneously testing multiple markers, multi-locus statistical methods increase the statistical power and computation efficiency and reduce Type 1 errors compared to single-locus methods (Wang et al. 2016a; Li et al. 2017; Ren et al. 2017; Tamba et al. 2017; Zhang et al. 2017; Wen et al. 2018) (Table 5.2). First-generation multi-locus models such as multi-locus mixed model (MLMM) (Segura et al. 2012) became available several

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

73

Table 5.1 Single- and multi-locus statistical methods used for genome-wide association study (GWAS) Statistical model

Threshold for significant QTNs

Software

References

0.05/m

rMVP v1.0.1

Price (2006)

Single-locus models GLM

MLM

0.05/m

GAPIT v3.1.0

Lipka et al. (2012); Wang and Zhang (2021)

TASSEL v5.0

Bradbury et al. (2007)

rMVP v1.0.1

Yu and Buckler (2006)

TASSEL v5.0

Bradbury et al. (2007)

CMLM

0.05/m

GAPIT v3.1.0

Lipka et al. (2012)

EMMA

0.05/m

GAPIT v3.1.0

Lipka et al. (2012) Zhou and Stephens (2012)

GEMMA

0.05/m

GEMMA v0.96

EMMAX

0.05/m

EMMAX v

Multi-locus models mrMLM

LOD  3

mrMLM v5.0.1

Wang et al. (2016a)

FASTmrEMMA

LOD  3

mrMLM v5.0.1

Wen et al. (2018)

ISIS EM-BLASSO

LOD  3

mrMLM v5.0.1

Tamba et al. (2017)

pLARmEB

LOD  3

mrMLM v5.0.1

Ren et al. (2017)

pKWmEB

LOD  3

mrMLM v5.0.1

Ren et al. (2017)

FASTmrMLM

LOD  3

mrMLM v5.0.1

Tamba and Zhang (2018)

3VmrMLM

LOD  3

IIIVmrMLM v1.0

Li et al. (2022a)

RTM-GWAS

0.05 at the first step and 0.01at the second step

RTM-GWAS

He et al. (2017)

FarmCPU

0.05/m

rMVP v1.0.1

Liu et al. (2016)

0.05/m: p = 0.05 subjected by Bonferroni correction for multiple tests and m is the number of markers; this is the default threshold. All are R packages with the exception of RTM-GWAS which is a standalone software tool

years ago but more advanced multi-locus models have been developed since, including mrMLM (Wang et al. 2016a; Li et al. 2017), FASTmrEMMA (Wen et al. 2018), FASTmrMLM (Zhang and Tamba 2018), pKWmEB, pLARmEB (Zhang et al. 2017), and ISIS EM-BLASSO (Tamba et al. 2017). These models are all implemented in the R package “mrMLM.” More recently, a 3 variance-component multilocus random-SNP-effect mixed linear model (IIIVmrMLM) has been developed to detect QTNs and estimates their additive and dominance effects suitable for datasets containing many heterozygous marker genotypes (Li et al. 2022a), because the other six multi-locus models in the mrMLM package are limited to estimating

the overall genetic effects of QTNs. In addition, IIIVmrMLM expands the ability to identify QTN-by-environment interactions (QEIs) in multiple environments and QTN-by-QTN interactions (QQIs) using the same mixed-model framework. Monte Carlo simulation studies indicate that IIIVmrMLM correctly detected various types of loci and almost unbiasedly estimated their effects with a high power and accuracy and a low false-positive rate (Li et al. 2022a). This new model is implemented in the R package IIIVmrMLM. Another multi-locus model, called FarmCPU, implemented in the R package rMVP, is also widely used to identify relatively large-effect QTNs (Liu et al. 2016). These advanced statistical models accompanied

74

F. M. You et al.

Table 5.2 Comparison of single-locus models and multi-locus GWAS models implemented in the R package mrMLM (Zhang et al. 2019) and the IIIVmrMLM package (Li et al. 2022a) Feature

Single-locus GWAS

Multi-locus GWAS

QTN detection power

Low

High

p-value threshold of significant QTNs

0.05/m (m: the number of markers) or FDR-adjusted p value

2  10−4 or LOD  3.0

False-positive rate

Low (with Bonferroni correction)

Low (with LOD  3.0 or p  2  10−4)

Multiple test correction

Yes

No

Polygenic background control

Yes

Yes (at the first step)

Population structure control

Yes

Yes

SNP effect

Fixed

Random

No. of variance components

Two (polygenic background and residual variances)

Three (QTN, polygenic background, and residual variances at the first step)

Multi-locus genetic model

No

Yes (at the second step)

Run time

Fast (GLM, MLM, GEMMA, and EMMAX), slow (EMMA)

Fast (Models 2 and 6), slow (Models 5 and 7), moderate (other models)

Multi-locus models implemented in the R package mrMLM include (1) mrMLM, (2) FASTmrMLM, (3) FASTmrEMMA, (4) pLARmEB, (5) pKWmEB, (6) ISIS EM-BLASSO, and (7) IIIVmrMLM Source modified from Zhang et al. (2019)

by convenient software tools enhance the basis of association mapping methodology and accelerate the identification of QTL in crops. Alternatively, GWAS models can also be classified into single marker- and haplotypebased methods. All of the above-mentioned statistical models belong to the single-marker-based models. Haplotype-based GWAS has the potential to substantially improve the amount of phenotypic variance explained compared to individual SNPs (Barendse 2011). RTM-GWAS, implemented in a standalone software with a Windows and a Linux version (https://github. com/njau-sri/rtm-gwas), is a typical haplotypebased multi-locus GWAS method (He et al. 2017). RTM-GWAS first groups strong-LD neighboring SNPs into LD blocks called SNPLDBs that are used to define bi- or multiallelic haplotypes, followed by a two-step association analysis for QTL mapping with other multi-locus models (He et al. 2017). These single- and multi-locus models have been evaluated in flax (He et al. 2019b; Lan et al. 2020; Sertse et al. 2021; You et al. 2022), wheat

(Fatima et al. 2020), and in many other crops (Fang et al. 2020; Zhong et al. 2021). Overall, single-locus models identified mostly large-effect QTNs while multi-locus models detected both large- and small-effect QTNs. Some QTNs were detected by multiple models, but the different models yielded somewhat different sets of QTNs for the same traits, perhaps as an indication of their uniqueness and complementarity (He et al. 2019b). Thus, the combined use of single and multi-locus models is recommended to identify the most reliable QTN-trait associations and to counteract the limitations of single-locus models (Zhang et al. 2019; Lan et al. 2020).

5.3.3 Selection of Threshold for Significant MarkerTrait Associations Significant marker-trait associations are determined by a threshold value in GWAS. Selection of an appropriate threshold has been a critical issue in the application of statistical methods for

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

GWAS in differentiating the true positives from the false positives and false negatives (Zhang et al. 2019). In human genetics, a p value of 5  10−8 has been used as a ‘genome-wide significance’ standard, but for crop genetics, such a p value is considered too stringent because of the large experimental errors of trait phenotyping. The Bonferroni correction is commonly applied to reduce the false-positive rate in singlelocus models (Table 5.1). The Bonferroni correction is implemented by the formula p/m, where p is a given probability level, e.g., 0.05, and m is the number of SNPs used in the GWAS. The Bonferroni correction is considered the most conservative method because of the assumption that every genetic variant tested is independent from the rest. Modified Bonferroni correction methods have been proposed and assessed (Zhang et al. 2019). For example, the number of markers in the Bonferroni correction formula can be replaced by the effective number of markers (me) (Wang et al. 2016b). Despite its high stringency, the Bonferroni correction remains a popular method for single-locus GWAS models. The false discovery rate (FDR) adjusted p value, also called the Benjamini–Hochberg correction (Benjamini and Hochberg 1995), is another commonly used significance threshold for crop GWAS. It was adopted in our previous QTL mapping studies of flax flowering time (Soto-Cerda et al. 2021) and wheat leaf rust resistance (Fatima et al. 2020). FDR controls the expected proportion of false positives among the rejected null hypotheses and is less stringent compared to the Bonferroni correction (Kaler and Purcell 2019). FDR is defined as a ratio of the number of false to total positive test results, i.e., a p value of 0.05 implies that 5% of all tests will result in false positives. An FDR-adjusted p value (also called q-value) of 0.05 indicates that 5% of significant tests will result in false positives. In practice, the p values of SNPs reported by the GWAS software tools can be directly transformed to FDR-adjusted p values using the Benjamini–Hochberg correction of multiple tests (Benjamini and Hochberg 1995) with the following steps: (1) sort p values of all markers (m) calculated in GWAS software in an

75

ascending order; (2) assign ranks from 1 to m based on the sorted order; (3) calculate adjusted p values for the ith marker using the formula pi  m/ranki ; and (4) designate significance based on SNPs that have FDR-adjusted p values less than the specified threshold, e.g., p < 0.05 or p < 0.01. Some statistical software tools are available for calculating the FDRadjusted p value, such as the R function p.adjust. Of note, both Bonferroni and Benjamini–Hochberg corrections have independence of hypotheses, and strong LD in the SNP data may decrease the statistical power and generate an excess of false negatives (Buzdugan et al. 2016). For multi-locus models, the correction for multiple tests is theoretically not necessary because all markers are fitted to a single linear model and their effects are estimated and tested simultaneously (Zhang et al. 2019). A high p value may result in a high detection power and obtain a greater number of QTNs but also may result in more false positives. To keep an optimal balance between the detection power and the false-positive rate, a LOD score of 3.0 or p = 0.0002 is suggested by developers of the mrMLM package as a threshold of significant marker-trait associations (Zhang et al. 2019). This default criteria have been used in many flax GWAS (He et al. 2019b; Soto-Cerda et al. 2021; You et al. 2022). The haplotype-based multi-locus model RTM-GWAS applies the default p = 0.05 threshold for preselection of markers for the single-locus model step and p = 0.01 for the multi-locus and multi-allele model step for significant QTNs (He et al. 2017).

5.3.4 Post-QTN Identification Analysis Post-QTN identification analysis for QTNs generated by GWAS software tools is required to remove potentially redundant QTNs and false positives, especially when multiple models and a saturated marker set are used. If single markerbased models are used for GWAS, potential redundancy of QTNs within haplotype blocks (HBs)

76

F. M. You et al.

Fig. 5.1 Pipeline for quantitative trait nucleotide (QTN) identification and analysis

may exist. In addition, different tools may produce different R2 estimates for the same QTN. To solve these issues, we developed a pipeline that incorporates several post-QTN identification analyses, including R2 estimation, significance test of QTNs in all phenotypic datasets, stability analysis, redundancy analysis, and favorable allele analysis (Fig. 5.1).

5.3.4.1 Haplotype Blocks (HBs) As previously mentioned, strong LD between SNPs affects the detection power of Bonferroni or Benjamini–Hochberg corrections (Kaler and Purcell 2019). Removing SNPs with strong LDs from a high-density SNP dataset is a good quality control for GWAS. On the other hand, caution in doing so is warranted because depending on the stringency of SNP filtering before GWAS,

multiple QTNs could be located within a genomic region that has no historical recombination events (Kaler and Purcell 2019). To increase the detection power and find unique QTNs, HBs are recommended in order to remove redundant SNPs before GWAS or redundant QTNs postGWAS. Two strategies are implemented in GWAS. The first is to identify QTNs using single SNPs first, followed by their grouping into HBs (He et al. 2019b; You et al. 2022). In this case, all QTNs in the same HB are deemed to represent a single QTL. The second strategy is to identify HBs from all SNPs prior to GWAS. RTMGWAS is a typical HB-based GWAS method that combines HB and QTL identification into one pipeline and falls under this second strategy (He et al. 2017).

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

Some HB partitioning software tools have been developed to produce a unique set of HBs (Purcell et al. 2007; Barrett 2009; Kim et al. 2019). These tools vary in terms of the theoretical components used to partition SNPs into blocks and differ based on their respective definitions of what constitutes a HB. HBs can be defined by the confidence interval (CI) method (Gabriel et al. 2002), solid spine of LD (Bush et al. 2009), four-gamete rule (Wang et al. 2002), and other custom definitions (Pook et al. 2019). The CI method relies on grouping SNPs into HBs based on the extent of LD determined by either r2 or D′. The solid spine of LD attempts to locate spines of LD between consecutive SNPs such that the first and last marker in the spine is in strong LD with all the interior markers, but the interior markers are not necessarily in LD with each other (Bush et al. 2009). The four-gamete rule computes population frequencies for each of the four haplotypes observed for all SNP pairs and HBs are formed between consecutive pairs of SNPs when three haplotypes are observed (Shim et al. 2009). To date, the most popular LDbased algorithm is the CI method proposed by Gabriel et al. (2002), which defines pairs of SNPs to be in strong LD if the one-sided upper 95% confidence bound on D’ is > 0.98 and the lower bound is above 0.7. Strong evidence for historical recombination of SNP pairs is when the upper confidence bound on D’ is < 0.9. Several popular CI-based software tools are available, such as Haploview (Barrett 2009), PLINK (Purcell et al. 2007), RTM-GWAS (He et al. 2017), GPART (Kim et al. 2019), and LDExplorer (https://github.com/cran/LDExplorer ). We compared these four tools using 258,708 SNPs from a flax core collection of 378 accessions. The HBs generated from each software tool are summarized in Table 5.3. GPART did not generate any singletons, while large numbers of singletons were identified by the other three tools. In reality, a singleton can be defined as a HB with only one SNP, thus singletons are usually treated as HBs and included as such in GWAS. GPART identified a similar number of multi-SNP blocks as PLINK and RTM-GWAS, but a significantly

77

lower number of total blocks (blocks and singletons). On the other hand, LDExplorer identified the fewest blocks. On average, the blocks obtained from GPART were significantly smaller than the blocks obtained from the other three tools. Overall, PLINK and RTM-GWAS adopted similar partitioning algorithms and consistently produced similar numbers of HBs, making them the most commonly recommended.

5.3.4.2 QTN Effects The effect of a QTN is the average phenotypic difference between its two alleles. Alternatively, for comparison purposes, the size of the QTN effect is usually calculated as R2, i.e., the proportion of the phenotypic variance of a trait explained by the QTN, which is sometimes called phenotypic variation explained (PVE). R2 values are calculated based on simple regressions of QTNs on the trait phenotypes. GWAS software tools generally provide R2 estimates for each marker. However, the estimated R2 values for the same markers largely vary between software tools. Thus, it is necessary to generate comparable R2 values for QTNs to facilitate comparisons across QTNs and software. One way to achieve that is to calculate the R2 value of a QTN as the variation explained by the QTN divided by the total phenotypic variation of the trait (You et al. 2018b, 2022; He et al. 2019b; Lan et al. 2020). GWAS is conducted using the best linear unbiased predictor (BLUP) values from multiple environments (years and/or locations) and/or the phenotypic datasets of individual environments. A QTN may be identified from a single or several phenotypic datasets, but it may still have significant QTL effect in phenotypic datasets where it was not detected by GWAS. QTNs identified by GWAS that have no significant QTN effect in all phenotypic datasets are indicative of potential false positives. Thus, we suggest a statistical test for significance of identified QTN alleles in all phenotypic datasets. Statistically significant differences between alleles provide a verification of the identified QTNs. In GWAS for flax pasmo (He et al. 2019b) and powdery mildew

78

F. M. You et al.

Table 5.3 Haplotype blocks (HBs) identified for 258,708 SNPs from a flax core collection of 378 accessions using four software tools Software tool

No. of blocks

No. of singletons

GPART

48,083

0

Total 48,083

Block size (SNPs)

Block size (Kb)

Min

Max

x±s

Min

Max

x±s

4

50

5.38 ± 2.64

0.03

1642.49

4.90 ± 14.80

LDExplorer

15,173

217,806

232,979

2

15

2.70 ± 1.27

0.00

119.01

1.25 ± 3.25

PLINK

49,974

107,423

157,397

2

32

3.03 ± 2.01

0.00

1642.49

1.93 ± 9.78

RTM-GWAS

49,875

107,668

157,543

2

32

3.03 ± 2.01

0.00

99.929

1.69 ± 4.05

x mean; s standard deviation

(PM) resistance (You et al. 2022), we performed Wilcox non-parametric tests for each QTN in all phenotypic datasets. QTNs that had no significant allele effect in any of the datasets at a 5% probability level were considered false-positive and removed. To be considered significant, QTNs must have a significant effect in at least one phenotypic dataset. We calculated the coefficient of variation (CV) of the allelic effect values for all QTNs across all phenotypic datasets as a measure of their stability. A QTN was declared stable if the effect differences across datasets were significant in more than one phenotypic dataset. CV for allele effect values across datasets can be used to measure the stability of QTNs. We found that R2 of QTNs were highly related to the CVs of QTNs and to the number of phenotypic datasets showing significant QTN effects (Fig. 5.2). The more

Fig. 5.2 Relationships between R2 (%) of quantitative trait nucleotides (QTNs), the coefficients of variation (CV, %) of the QTN effects across environments, and the number of flax powdery mildew (PM) datasets used for GWAS that showed significant QTN effects. Source You et al. (2022)

stable QTNs are those with the most datasets showing significant QTN effects and the smallest CV values. Stable QTNs for PM were therefore defined with R2 > 10% and CV < 100% (You et al. 2022).

5.3.4.3 Favorable Alleles of QTNs The favorable allele of a QTN is defined as the allele that contributes to the favorable performance of the trait. The favorable allele count in individual plants for all identified QTNs is useful for assessing the additivity and attesting the reliability of the QTNs as well as to predict the performance of individuals. Our previous studies demonstrated significantly positive correlations between the number of QTN favorable alleles and trait performance, including seed yield and quality traits (You et al. 2018b) as well as disease resistance (He et al. 2019b; You et al. 2022).

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

Fig. 5.3 Relationship between the number of favorable alleles and flax powdery mildew (PM) ratings. Source You et al. (2022)

Figure 5.3 shows a significant correlation (R2 = 0.62) between the number of favorable alleles in flax genotypes and their PM ratings, indicating that highly resistant breeding lines pyramid more favorable alleles.

5.4

79

applied to single traits and to multiple correlated traits; the latter allowing evaluation of pleotropic effects between the traits (Zhu et al. 2018). Meta-GWAS is based on QTL mapping information of independent studies (Shook et al. 2021) and primarily uses the p values or effect size signs of QTLs/QTNs (Zeggini and Ioannidis 2009). METAL, which is based on a fixed-effect model (Willer et al. 2010), and METASOFT, which is based on a random-effect model (Han and Eskin 2011), are commonly used for metaGWAS (Table 5.4). Meta-QTL analysis is a statistical method to integrate QTL information from linkage mapbased QTL mapping studies, making it possible to improve the statistical power by integrating multiple datasets derived from different marker and population types (Goffinet and Gerber 2000). The resulting consensus map displays new genetic positions that are congruent with all the maps from multiple studies and identifies consistent genomic regions (called meta-QTLs). Software tools such as Meta-QTL (Veyrieras et al. 2007) and BioMercator (Arcade et al. 2004) (Table 5.4) are commonly used for meta-QTL analysis. BioMercator is used in most meta-QTL studies because of its features and robustness.

Meta-Analysis of GWAS and QTLs 5.4.1 Meta-GWAS

The advent of GWAS has allowed considerable progress in the identification of QTLs and candidate genes associated with quantitative traits of interest in crops. Despite the success achieved with the improvement of statistical methods for GWAS, meta-analysis of GWAS (also called metaGWAS), which is essentially the pooling of results from multiple independent GWAS via statistical approaches, has become a popular tool to increase the overall sample size and statistical power of GWAS, empowering genetic discoveries. This method has proven powerful in dissecting complex diseases in human and nonhuman species (Evangelou and Ioannidis 2013; Pasaniuc and Price 2017; Bouwman et al. 2018) as well as in crops (Battenfield et al. 2018; Zhao et al. 2019; Shook et al. 2021). It has been

Meta-GWAS has been used in crops such as wheat (Battenfield et al. 2018), tomato (Zhao et al. 2019), soybean (Shook et al. 2021), and cowpea (Lo et al. 2019). Battenfield et al. (2018) performed a meta-GWAS using a total of 4095 wheat breeding lines from the International Maize and Wheat Improvement Center (CIMMYT) bread wheat breeding program during 2009–2014 and 1625 SNP markers for the processing and end-use quality traits. The breeding lines were unbalanced in different years. First, GWAS was conducted for individual site-year datasets with at least 200 lines each using the QK-mixed model (a MLM) in the JMPGenomics 7.1 package. The estimated SNP effects and standard errors from each site-year GWAS were then combined using a meta-

80

F. M. You et al.

Table 5.4 Software tools for meta-analysis of genome-wide association study (GWAS) and quantitative trait loci (QTLs) Software

Description

Source

METAL

Meta-GWAS with a fixed-effect model combines the evidence for associations from individual studies

Willer et al. (2010)

METASOFT

Meta-GWAS with a random-effect model

Han and Eskin (2011)

BioMercator

Meta-QTL analysis tool

Arcade et al. (2004)

Meta-QTL

Meta-QTL analysis tool

Veyrieras et al. (2007)

GWAS with an inverse-variance and fixed-effect model. Fifty-two FDR-corrected significant meta-marker-trait associations covering 40 unique SNPs were identified (p < 0.001) and 17 of them co-localized to seven genomic regions. Their results demonstrated that meta-GWAS is a powerful approach for dissecting the genetic architecture of important traits in breeding programs. Zhao et al. (2019) used the data from three publicly available GWAS panels, including 163, 291, and 402 tomato accessions, respectively, and a set of common tomato flavor-related quality chemical traits measured in each panel. By combining the three studies, a total of 775 unique tomato accessions and 2,316,117 SNPs were used for the final meta-analysis of 31 flavor-related traits. EMMAX software was first used for the QTL mapping of each panel. Then, METAL and METASOFT were used for the meta-analysis. A total of 305 significant marker-trait associations and some candidate genes for flavor-related traits were identified (Zhao et al. 2019). Shook et al. (2021) conducted a large-scale meta-GWAS combining 73 published studies in soybean, including 17,556 unique accessions and 69 traits. Using traditional GWAS for individual studies and a meta-GWAS analysis with METAL, 393 unique loci harboring 59 candidate genes across traits were identified, confirming many previously reported genes. As an alternative application of meta-GWAS, Kang et al. (2014) proposed a meta-analysis procedure that jointly analyzes multiple studies with different environmental conditions based on

a random-effect model to identify loci involved in genotype by environment (G  E) interaction loci. This method relies on GWAS results from a single panel grown in different environments to identify environmentally specific genetic effects. The basic idea is to treat the environments as different populations. Loci that have different effects across all environments are G  E interaction loci. The method was first used in a mouse experiment for detecting interaction loci in multiple environments (Kang et al. 2014). Seventeen studies consisting of 4965 distinct animals were combined and identified 26 significant loci associated with high-density lipoprotein cholesterol, some of which involved G  E interactions. This approach can increase the detection power, improve the resolution compared to individual studies, and identify more loci (Kang et al. 2014). Lo et al. (2019) applied the same method to domesticated cowpea to identify genomic regions and candidate genes associated with seed size. A panel of 368 diverse cowpea accessions from 51 countries, 51,128 SNPs, and the phenotypic data for seed size-related traits from three environments were used. The GWAS performed separately for seed size data from each environment using MLM in TASSEL V5.0 identified 13 loci. The GWAS results from three environments pooled for the meta-analysis led to the discovery of an additional four loci. These four loci were not detected in any single environment probably due to small effects, but were significant when the metaanalysis was applied to the pooled GWAS results (Lo et al. 2019).

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

5.4.2 Meta-QTL Analysis Bi-parental population-based QTL mapping has identified a large number of QTLs with different types of markers, such as SSRs and SNPs in crops, allowing the use of meta-QTL analyses. For example, Said et al. (2013) performed a meta-QTL analysis in cotton for 1223 QTLs identified from 42 individual QTL mapping studies for traits such as fiber quality, yield per se, yield-related, morphological, drought tolerance, and disease resistance, from which they identified putative QTL clusters. In rice, Swamy et al. (2011) combined 15 studies that identified 53 QTLs associated with grain yield under drought stress conditions and identified 14 meta-QTLs. Another meta-QTL study in rice involved 333 QTLs from 27 QTL studies for resistance to rice blast, sheath blight, and bacterial leaf blight, from which 48 meta-QTLs were identified (Kumar and Nadarajah 2020). These highconfidence meta-QTLs are well suited for the breeding of disease resistance and drought tolerance.

5.5

Candidate Gene Identification

Aside from marker development for markerassisted selection, the ultimate purpose of QTL mapping is the identification of candidate genes associated with the traits of interest. Definite association between a QTL and its genetic causal feature(s) remains challenging, and few candidate genes have been functionally validated to date. Some QTLs are found within genes, but most are located in intergenic regions, and while protein-coding genes cannot be assumed to be the only causal features, they are often the first to be investigated. Therefore, the current QTL mapping methodologies are not specifically designed for the identification of the causal genes, and additional experiments and functional validation are required. To infer the causal genes linked to a QTL or a QTN, a commonly used method is to scan the annotated genes in their vicinity. One can search for candidate genes within the LD block harboring the QTL or QTN

81

(Soto-Cerda et al. 2018). In this case, the size of the LD block depends on many factors, such as marker density, genetic diversity, population type, the software tool used to define the LD block, and so on. Alternatively, one can scan the annotated genes of a fixed-size region, such as a window of 100–200 kb downstream and upstream of a QTL or QTN (Kumar et al. 2015; He et al. 2019b; Sertse et al. 2019; You and Cloutier 2020) or a variation thereof where the fixed-window size is determined based on a genome-wide or chromosome LD decay curve (You et al. 2018b; Kanapin et al. 2021). This method also has disadvantages, the main being that the recombination rates can vary greatly between the fixed-size windows. Regardless of the method used for predicting candidate genes, these remain only “candidates” to guide further experiments that will hopefully lead to their functional validation. Keeping in mind the cautionary tale behind candidate genes, one can nevertheless use strategies to narrow down the list based on functional knowledge or protein features for examples. To predict candidate genes for flax PM resistance, we performed a separate GWAS using only SNPs located within flax-resistant gene analogs (RGAs) because they are the most likely disease resistance genes (You et al. 2022). With this input dataset, all targeted QTNs would be located within RGAs. A subset of 3230 SNPs present within 838 flax RGAs was used, and 42 QTNs located within these RGAs were identified (You et al. 2022).

5.6

QTL Mapping in Flax

Both bi-parental populations and germplasm collections have been used for QTL mapping in flax. To date, seven studies on linkage map-based QTL mapping with different types of bi-parental populations, including DH, RIL, and F2-F4 have been reported. In addition, 14 studies on GWAS performed on germplasm collections alone or combined with bi-parental populations have reported QTLs or QTNs in flax. The results of these studies for 37 traits are summarized (Tables

82

5.5 and 5.6). The traits include 16 seed yield and agronomic traits, 11 seed quality and fatty acid composition traits, four fiber-related traits, four disease resistance traits (pasmo, PM, Fusarium wilt, and Alternaria blight resistance), and two abiotic stress traits (salt and drought tolerance) (Table 5.5). More than 1000 QTLs or QTNs have been identified for these traits, and the numbers vary between traits, marker density, genetic populations, and statistical models used. The limited genetic variation in bi-parental populations combined with the low density of SSR or SNP markers of the maps yielded few and only large-effect QTLs for seed yield, agronomic traits, seed quality, fatty acid composition, and disease resistance traits (Spielmeyer et al. 1998; Cloutier et al. 2011; Asgarinia et al. 2013; Kumar et al. 2015) (Table 5.6). Studies using diverse core collections, high-density genome-wide SNPs, and multiple single- and multi-locus GWAS models, propelled QTL mapping in flax and led to many more large- and small-effect QTNs for flowering time (Soto-Cerda et al. 2021), drought tolerance (Sertse et al. 2021), pasmo (He et al. 2019b), and PM resistance (Li et al. 2022b; You et al. 2022). The earliest-detected QTLs were based on genetic maps or on the scaffold-based flax reference genome v1.0 (Wang et al. 2012b) and SSR markers (Table 5.6) and were difficult to incorporate into comparative analysis of QTLs and candidate genes from later studies based on the chromosome-scale flax reference genome assembly v2.0 (You et al. 2018a). Thus, You and Cloutier (2020) developed computer scripts and mapped 195 of the 200 QTLs identified before 2020 onto this superior assembly, which grouped them into 133 co-located QTL clusters. The gene annotation information of the flax reference genome v2.0 and the computer scripts were also provided to facilitate candidate gene scanning within fixed-window sizes (You and Cloutier 2020).

5.6.1 Yield and Agronomic Traits Thirteen yield and related agronomic traits, including seed yield, seed size, flowering and

F. M. You et al.

maturity times, branching, lodging, plant height, and root- and shoot-related traits have been investigated for marker-trait associations (Table 5.5). Of those, yield, seed size, and flowering and maturity times are the most important targets in today’s flax breeding programs.

5.6.1.1 Yield Although seed yield (YLD) is the most important and complex quantitative trait in flax breeding, only four unique QTLs have been identified from several bi-parental populations in flax: QYld.BM. crc-LG4 (14,489,225–14,489,333 bp on Chr 4)/ QYLD-Lu4.1 (13,593,668 – 14,966,967 bp on Chr 4), Marker770415 (11,929,857– 11,930,253 bp on Chr 6), Marker1073071 (8,701,939–8,702,324 bp on Chr 6), and Marker799956 (3,856,362–3,856,771 on Chr 13) (Kumar et al. 2015; Wu et al. 2018; You et al. 2018b). QYld.BM.crc-LG4 and QYLD-Lu4.1 are the same QTL that was identified in two independent studies (Kumar et al. 2015; You et al. 2018b). 5.6.1.2 Seed Size Seed size is one of the important target traits in flax breeding. Flax seed size can be measured in mg/seed but, in breeding, it is mostly reported as thousand-seed weight (TSW). Seed width (SW) and seed length (SL) are highly correlated to TSW. A total of 43 unique QTLs associated with TSW have been reported in five QTL mapping studies (Table 5.5). These were scattered on 13 of the 15 chromosomes (Guo et al. 2019; You and Cloutier 2020). Ten and fifteen QTLs associated with SL and SW, respectively, were also identified from the 200 accessions of a Chinese collection, but most of them were colocated with the TSW QTLs (Guo et al. 2019). A total of 21 candidate genes for these QTLs were proposed in two association studies (Xie et al. 2018a; Guo et al. 2019), and seven QTNs were associated with candidate genes (Table 5.7). 5.6.1.3 Flowering and Maturity QTLs QDm.BM.crc-LG4 (13,171,757– 15,042,104 bp on Chr 4) and QDTM-Lu11.2 (14,768,686–14,768,686 on Chr 11) associated

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

83

Table 5.5 Summary of quantitative trait loci (QTLs) associated with 37 traits in flax Category

No

Trait

Abbr

QTL

Seed yield and agronomic

1

Seed yield

YLD

1

2

Thousand seed weight (g)

TSW

Source Kumar et al. (2015)

3

Wu et al. (2018)

1

You et al. (2018b)

5

Soto-Cerda et al. (2013)

1

Kumar et al. (2015)

10

Xie et al. (2018a)

8

Xie et al. (2018b)

21

Guo et al. (2019)

3

Seed weight/plant

SWP

2

Singh et al. (2021)

4

Capsule weight/plant

CWP

5

Seed length (mm)

SL

10

2

Singh et al. (2021) Guo et al. (2019)

6

Seed width (mm)

SW

15

Guo et al. (2019)

7

Seeds per boll

SEB

1

8

Fruit number

FN

8

Xie et al. (2018a)

1

Xie et al. (2018b) Soto-Cerda et al. (2013)

9

Branching score

BSC

1

10

Number of branches

NB

13

11

Days to flowering

DTF

12

13

14

Days to maturity

Plant height (cm)

Technical length (cm)

DTM

PLH

TL

15

Lodging Roots and shoots

LDG

Xie et al. (2018a)

1

Soto-Cerda et al. (2013)

27

Soto-Cerda et al. (2021)

50

Saroha et al. (2022)

1

Kumar et al. (2015)

2

You et al. (2018b)

30

Saroha et al. (2022)

2

Soto-Cerda et al. (2013)

1

Wu et al. (2018)

9

Xie et al. (2018a)

2

You et al. (2018b)

14

Zhang et al. (2018)

27

Saroha et al. (2022)

1

Wu et al. (2018)

3

Xie et al. (2018a)

3

Xie et al. (2018b)

10 16

Kumar et al. (2015)

1 228

Zhang et al. (2018) Soto-Cerda et al. (2013) Sertse et al. (2019) (continued)

84

F. M. You et al.

Table 5.5 (continued) Category

No

Trait

Abbr

QTL

Seed quality

17

Iodine value

IOD

2

Source Cloutier et al. (2011)

1

Soto-Cerda et al. (2014)

2

Kumar et al. (2015)

3

You et al. (2018b)

18

Protein content (%)

PRO

2

Kumar et al. (2015)

19

Oil content (%)

OIL

1

You et al. (2018b)

20 21

22

23

24

Fiber

Oleic (%) Palmitic (%)

Stearic (%)

Linoleic (%)

Linolenic (%)

OLE PAL

STE

LIO

LIN

Soto-Cerda et al. (2014)

1

Kumar et al. (2015)

8

You et al. (2018b)

3

Kumar et al. (2015)

1

You et al. (2018b)

1

Cloutier et al. (2011)

1

Kumar et al. (2015)

1

Xie et al. (2018b)

4

You et al. (2018b)

1

Soto-Cerda et al. (2014)

3

Kumar et al. (2015)

2

Xie et al. (2018b)

2

You et al. (2018b)

9

Cloutier et al. (2011)

3

Soto-Cerda et al. (2014)

2

Kumar et al. (2015)

1

Xie et al. (2018b)

3

You et al. (2018b)

10

Cloutier et al. (2011)

3

Soto-Cerda et al. (2014)

1

Kumar et al. (2015)

3

Xie et al. (2018b)

3

You et al. (2018b)

25

Seed mucilage content

MC

7

Soto-Cerda et al. (2018)

26

Seed hull content

HC

4

Soto-Cerda et al. (2018)

27

Seed color

SC

1

Cloutier et al. (2011)

28

Straw weight (g)

STW

1

Kumar et al. (2015)

3

Wu et al. (2018)

29

Fiber yield (g)

FY

2

Wu et al. (2018)

30

Fiber content (%)

FC

31

Cell walls (%)

CEW

2

Wu et al. (2018)

2

Xie et al. (2018b)

1

Kumar et al. (2015) (continued)

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

85

Table 5.5 (continued) Category

No

Trait

Abbr

Biotic stress

32

Fusarium wilt rating

FW

33

Powdery mildew rating

PM

34

Pasmo rating

PAS

35

Alternaria blight resistance

AB

36

A set of drought tolerance traits

-

37

A set of salt tolerance traits

ST

Abiotic stress

QTL 2

Source Spielmeyer et al. (1998)

15

Kanapin et al. (2021)

3

Asgarinia et al. (2013)

388

You et al. (2022)

67

He et al. (2019b)

2

Singh et al. (2021)

157

Sertse et al. (2021)

64

Li et al. (2022b)

Abbr abbreviation Source modified from You and Cloutier (2020)

with days to maturity (DTM) were identified from a bi-parental population using GWAS and SNP markers (You et al. 2018b). QDm.BM.crcLG4 also co-located with the YLD QTL QYld. BM.crc-LG4 and the plant height (PLH) QTL QPLH-Lu4.3 on Chr 4 and confirmed the same QTL (QDm.BM.crc-LG4) identified from the same population with SSR markers (Kumar et al. 2015). These genomic regions are likely important in controlling YLD, DTM, and PLH. Linked genes or potentially pleotropic gene(s) could play a role in more than one of these traits. In addition, a GWAS based on 131 accessions of an Indian collection using multiple multi-locus models identified 30 QTNs for DTM (Saroha et al. 2022). Association mapping for flowering time (FT) or days to flowing (DTF) was performed using 200 accessions subsampled from the Canadian core collection (Diederichsen et al. 2013) and several single- and multi-locus models. The 27 QTLs discovered spanned regions harboring genes orthologous to 27 Arabidopsis thaliana and Oryza sativa FT-related genes, including Lus10013532 on Chr 13 (flowering locus T), Lus10028817 on Chr 7 (flowering locus D), Lus10021215 on Chr 5 (transcriptional regulator SUPERMAN), and Lus10037816 on Chr 12 (gibberellin 2-beta-dioxygenase 2) to name a few (Soto-Cerda et al. 2021). A GWAS study using 131 Indian accessions and multi-locus models identified 53 QTNs and 36 candidate

genes for flowering time as estimated at 5% flowering (DTF5), 50% (DTF50), and 95% (DTF95) (Saroha et al. 2022).

5.6.2 Fiber Traits Three fiber-related traits have been investigated in association studies, and six QTLs for fiber yield, four for fiber content, and one for cell wall were detected (Kumar et al. 2015; Wu et al. 2018) (Table 5.8). Pleiotropy between the fiber yield QTL QSw.BM.crc-LG4 and the cell wall content QTL QCw.BM.crc-LG4 was observed (Kumar et al. 2015; You and Cloutier 2020).

5.6.3 Seed Quality Traits 5.6.3.1 Protein and Oil Content Despite the negative genetic correlation between protein content (PRO) and oil content (OIL) in oilseed crops, including flax (Hwang et al. 2014), high PRO and high OIL are still targets for flax breeding (You et al. 2016). Ten unique QTLs for OIL and two QTLs for PRO have been identified from bi-parental populations (Kumar et al. 2015; You et al. 2018b) and from the Canadian core collection (Soto-Cerda et al. 2014) (Table 5.5). The PRO QTL QPRO-Lu15.1 and the OIL QTL QOIL-Lu15.7 co-located on Chr 15 (You et al. 2018b).

390

390

224

224

200

370

115

200

200

Canadian core collection

Canadian core collection

Chinese collection

Chinese collection

Canadian core collection

Canadian core collection

Canadian mini-core collection

Chinese collection

Chinese collection

191 SSRs

154

112

260

233

RIL

F2

F2

243

RIL

RIL and DH

2339 SNPs

300

F3-F4

674,074 SNPs

674,074 SNPs

7707 SNPs

258,873 SNPs

771,914 SNPs

584,987 SNPs

146,959 SNPs

460 SSRs

464 SSRs

17,288 SNPs

4497 SNPs

329 SNPs, 362 SSRs

143 SSRs

113 SSRs, 5 SNPs, 4 genes

78

DH

8 RFLPs, 213 AFLPs

Markers

59

Pop size

DH

Population type

RCPs

RCPs

RCPs

RCPs

PCPs

SS

SS

GM

GM

PCPs

GM

GM

GM

GM

GM

GM

GM

Ref

SM

SM

MM

SM + MM

SM

SM

SM

SM

SM

SM

CIM

CIM

CIM

MIM

CIM

CIM

IM

Stat. model

Table 5.6 Quantitative trait locus (QTL) mapping studies in flax

46

228

67

11

23

43

9

11

33

10

12

24

20

3

9

2

Total QTL

64/ST

10/SL; 15/SW; 21/TSW

228 QTNs for 16 root and shoot traits

67/PAS

7/MC; 4/HC

2/PLH; 1/FN; 8/TSW; 3/TL; 1/PAL; 2/STE; 1/LIO; 3/LIN; 2/FC

9/PLH; 3/TL; 13/NB; 8/FN; 10/TSW

1/OIL; 1/STE; 3/LIO; 3/LIN; 1/IOD

5/TSW; 1/DTF; 2/PLH; 1/BSC; 2/LDG

1/YLD; 8/OIL; 5/PLH; 4/PAL; 3/IOD, LIN, LIO, 2/DTM; 2/STE; 1/PRO; 1/OLE

2/AB; 2/SWP; 2/CWP

1/PLH; 1/TL; 3/YLD; 3/STW; 2/FY; 2/FC

14/PLH; 10/TL

1/PAL; 3/STE; 3/OLE; 2/LIO; 1/LIN; 2/IOD; 1/OIL; 1/PRO; 1/CEW; 1/STW; 1/TSW; 1/SEB; 1/YLD; 1/DTM

3/PM

2/LIO, LIN, IOD; 1/PAL; 2/SC

2/FW

No. of QTL identified/trait

Li et al. (2022b) (continued)

Guo et al. (2019)

Sertse et al. (2019)

He et al. (2019b)

Soto-Cerda et al. (2018)

Xie et al. (2018b)

Xie et al. (2018a)

Soto-Cerda et al. (2014)

Soto-Cerda et al. (2013)

You et al. (2018b)

Singh et al. (2021)

Wu et al. (2018)

Zhang et al. (2018)

Kumar et al. (2015)

Asgarinia et al. (2013)

Cloutier et al. (2011)

Spielmeyer et al. (1998)

Source

86 F. M. You et al.

447

131

Canadian core collection + breeding lines

Indian collection

68,925 SNPs

247,160 SNPs

70,935 SNPs

RCPs

RCPs

RCPs

RCPs

RCPs

Ref

MM

SM + MM

SM + MM

SM + MM

SM

Stat. model

109

64

27

157

15

Total QTL

53/DTF; 30/DTM; 27/PL

388/PM

27/DTF

144 QTNs and 13 LD blocks for 6 drought-related traits

15/FW

No. of QTL identified/trait

Saroha et al. (2022)

You et al. (2022)

Soto-Cerda et al. (2021)

Sertse et al. (2021)

Kanapin et al. (2021)

Source

Ref reference sequences or linkage maps for QTL identification; LM bi-parental population-based QTL mapping; AM association mapping or genome-wide association study; GM genetic map; SS scaffolds-based reference sequence v1.0 (Wang et al. 2012b); RCPs recent release of the chromosome-scale pseudomolecules v2.0 (You et al. 2018a); PCPs pre-released version of the chromosome-scale pseudomolecules; SM single-locus model; MM multi-locus model; IM interval mapping; CIM composite interval mapping; MIM multiple-interval mapping; Stat. model statistical models for GWAS. See Table 5.5 for trait abbreviations Source modified from You and Cloutier (2020)

200

Canadian core collection

12,316 SNPs

73,526 SNPs

297

115

Russian collection

Markers

Pop size

Canadian mini-core collection

Population type

Table 5.6 (continued)

5 QTL Mapping: Strategy, Progress, and Prospects in Flax 87

88

F. M. You et al.

Table 5.7 Quantitative trait nucleotides (QTNs) and candidate genes associated with seed size in flax (Xie et al. 2018a; Guo et al. 2019) Trait

QTN

Chr

Position

Candidate genes

Annotation

Source

TSW

Lu1-13,675,535

1

13,675,535

Lus10008438

Kinesin family (KF)

Guo et al. (2019)

TSW

scaffold112_184204

1

18,514,049

Lus10018116a

Glycogen/starch synthases, ADP-glucose type

Xie et al. (2018a)

TSW

scaffold132_713877

1

24,877,317

Lus10028085a

Serine/threonine-protein kinase MPS1-like (STK-MPS1)

TSW

scaffold101_354340

3

20,942,454

Lus10018116a

Uncharacterized protein

SL

Lu4-8,904,836

4

8,904,836

Lus10029035

Cytochrome P450 (P450-1)

SL

Lu4-13,877,858

4

13,877,858

Lus10034378

Ribosomal protein (RP)

TSW

Lu5-8,559,897

5

8,559,897

Lus10011772

Malate dehydrogenase (MDH)

TSW

scaffold15_1207948

5

16,914,987

Lus10039696a

SPX and EXS domain-containing protein 1 (PHO1)

Xie et al. (2018a)

TSW

Lu6-6,431,359

6

6,431,359

Lus10017775

Cytochrome P450 (P450-2)

TSW

Lu7-6,015,848

7

6,015,848

Lus10035470

26S proteasome regulatory subunit (RPN)

Guo et al. (2019)

TSW

scaffold1519_272169

9

1,027,739

Lus10007527a

Autophagy-related protein (ARP)

Xie et al. (2018a)

TSW

Lu9-1,710,055

9

1,710,055

Lus10008949

Ubiquitin carboxyl-terminal hydrolase (UCH-1)

Guo et al. (2019)

TSW

scaffold123_1191347

11

3,875,819

Lus10042202a

Terpene synthase (TS)

Xie et al. (2018a)

SL

Lu11-8,148,722

11

8,148,722

Lus10036372

Ubiquitin-conjugating enzyme E2 (UBC)

Guo et al. (2019)

TSW

Lu12-16,347,940

12

16,347,940

Lus10043126

Ubiquitin carboxyl-terminal hydrolase (UCH-2)

TSW

Lu14-4,549,028

14

4,549,028

Lus10000489

Ankyrin-repeat protein (ARP)

TSW

Lu14-6,548,762

14

6,548,762

Lus10013178

COP1-interacting protein (CIP)

TSW

Lu14-7,321,617

14

7,321,617

Lus10004079

Auxin canalization (AC)

TSW

scaffold1155_171787

15

7,690,615

Lus10004029a

NAD dependent epimerase/ dehydratase family (NAD-DEF)

Xie et al. (2018a)

TSW

Lu15-11,940,681

15

11,940,681

Lus10022567

RING/U-box protein (RING/U-box)

Guo et al. (2019)

TSW

scaffold1317_154716

15

15,275,145

Lus10007888

Trypsin and protease inhibitor (TPI)

Xie et al. (2018a)

TSW thousand-seed weight; SL seed length QTN was identified within the gene

a

Guo et al. (2019)

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

89

Table 5.8 Quantitative trait loci (QTLs) associated with fiber-related traits in flax Trait

QTL/marker ID

Chr

Fiber yield (g)

Marker2603286

3

Marker1722134

13

10,603,161–10,603,485

4

14,489,225–14,489,333

Kumar et al. (2015)

22,241,866–22,242,226

Wu et al. (2018)

Straw weight (g)

Cell wall (%) Fiber content (%)

QSw.BM.crc-LG4 Marker326151

8

Marker2368217

10

Marker614116

10

QCw.BM.crc-LG4

4

Coordinates

Reference

6,573,623–6,574,023

Wu et al. (2018)

7,140,622–7,140,988 7,219,061–7,219,445 14,489,225–14,489,333

Kumar et al. (2015) Xie et al. (2018b)

scaffold179-179,593

2

2,253,135

scaffold866-116,645

6

1,083,247

Marker1561746

4

8,748,431–8,748,795

Marker1051901

8

21,807,786–21,808,148

5.6.3.2 Iodine Value and Linolenic Acid Content High iodine value (IOD) and high linolenic acid content (LIN) are also major target traits in flax breeding. IOD is a measure of the degree of unsaturation of the oil that is calculated from the fatty acid composition obtained by gas chromatography (Banik et al. 2011). By definition, IOD is highly correlated with LIN and linoleic acid content (LIO), the two most unsaturated fatty acids (Banik et al. 2011; You et al. 2018b). A total of seven unique QTLs for IOD, ten unique QTLs for LIN, and 11 unique QTLs for LIO were identified from the same three biparental populations, but different types of markers (Cloutier et al. 2011; Soto-Cerda et al. 2014; Kumar et al. 2015; You et al. 2018b). QTLs for IOD, LIO, and LIN co-located on chromosomes 4, 7, and 12, respectively (You et al. 2018b). The two genomic regions on chromosomes 7 (14,540,265–17,976,903 bp) and 12 (489,561–2,981,562 bp) harbored the two fatty acid desaturase genes, FAD3a and FAD3b, which are responsible for linolenic acid composition (Vrinten et al. 2005). 5.6.3.3 Seed Mucilage and Hull Content Flaxseed mucilage is a heterogeneous polysaccharide present in the seed coat. Seed mucilage and hull make up to 8–10% and 37–48% of the seed weight, respectively (Soto-Cerda et al.

Wu et al. (2018)

2018). Low mucilage content (MC) and hull content (HC) in flaxseed meal are desirable for livestock and poultry feed. MC and HC are heritable quantitative traits with a moderate to high narrow-sense heritability. A GWAS using 200 diverse flaxseed accessions genotyped with 1.7 million SNPs identified seven QTNs for MC and four QTNs for HC, each explaining 11.82– 17.43% of the phenotypic variation. Seven candidate genes harboring four QTNs for MC and four candidate genes co-located with three QTNs for HC were identified (Table 5.9). All had been shown to play a role in mucilage synthesis and release, seed coat development, and anthocyanin biosynthesis in A. thaliana.

5.6.3.4 Seed Coat Color The Y, b1, b1vg, d, and g genes in flax control seed coat color (Mittapalli and Rowland 2003; Sudarshan et al. 2017). Proanthocyanidin synthesis and accumulation in flax are responsible for brown seed coat (Xie and Dixon 2005). Glutathione Stransferases (GSTs) are involved in the transport of anthocyanidins during the synthesis of the proanthocyanidins (Young et al. 2022). A RIL population with 320 lines derived from a cross between CDC Bethune (brown) and S95407 (yellow) was used to fine-map the G gene onto chromosome 6 where the candidate gene Lus10019895 (13,777,372–13,781,838 bp) that encodes a lambda-GST was located (Young et al. 2022).

90

F. M. You et al.

Table 5.9 Quantitative trait nucleotides (QTNs) and candidate genes associated with mucilage content (MC) and hull content (HC) in flax (Soto-Cerda et al. 2018) Trait

QTN

Chr

Allele

MAF

R2

MC

Lu2-22,298,066

2

T/C

0.07

17.32

Lu3-25,559,600

3

G/T

0.06

13.42

Candidate gene

A. Thaliana ortholog

Lus10009311

Galactosyl transferase-like 5 (GATL5)

Lus10009288

Mucilage-modified 4 (MUM4)

Lus10009287 Lus10009313

HC

Lu3-26,033,342

3

C/G

0.07

13.25

Lus10007101

Transparent testa 8 (TT8)

Lu3-7,398,487

3

C/T

0.41

11.82

Lus10007083

Subtilisin-like serine protease (SBT1.7)

Lu5-3,808,878

5

G/A

0.1

14.97

Lus10008285

Lu7-13,225,294

7

G/A

0.34

12.05

NAC-regulated seed morphology 1 (NARS1)

Lu11-2,498,303

11

C/G

0.16

13.18 Lus10035456

Agamous-like MADS-box protein AGL62 (AGL62)

Lu7-6,577,527

7

A/C

0.13

14.66

Lu10-21,552,161

10

G/A

0.09

16.32

Lu12-5,267,706

12

C/T

0.06

13.83

Lus10018306

Glycosyl hydrolase family 17 (GH17)

Lu13-2,803,224

13

T/C

0.06

17.43

Lus10026902

Lariat debranching enzyme (DBR1)

Lus10026926

UDP-glucose flavonol 3-O-Glucosyl transferase (UGT79B1)

Chr chromosome, MAF minor allele frequency

Using a 78 individual DH population, QTLs QL*.crc-LG22 and Qb*.crc-LG22, both linked to SSR marker Lu178, were associated with the color phenotypes L* (brightness) and b* (yellow-blue chromacity), respectively (Cloutier et al. 2011). Based on the chromosome coordinates of Lu178, these two QTLs mapped to Chr 8 (14,838,877– 14,839,100 bp) (You and Cloutier 2020).

5.6.4 Abiotic Traits 5.6.4.1 Drought Tolerance Flax is greatly affected by low moisture conditions, especially during the early development and reproductive stages. With the added impact climate change has on drought frequency, drought tolerance is becoming a critical trait for flax cultivar development. A GWAS using a flax mini-core collection of 106 accessions, which is a subset of the Canadian flax core collection (Diederichsen et al. 2013), 12,316 SNPs, and

several single-locus and multi-locus models identified 144 QTNs and 13 LD blocks or QTLs associated with six drought-related traits measured from irrigated and non-irrigated fields. Of these QTNs/QTLs, 16 explained more than 15% of the phenotypic variation in the traits. Most large-effect loci (R2 > 15%) co-located with gene (s) previously predicted to play roles in mediating stress responses. Flax genes Lus10009480 (216,042–216,661 bp on Chr 14) and Lus10030150 (10,760,458–10,760,892 bp on Chr 12) were deemed candidates based on their functions because they were predicted to encode WAX INDUCER1 and STRESS-ASSOCIATED PROTEIN (SAP), respectively.

5.6.4.2 Salt Tolerance Soil salinization is one of the major abiotic stress factors affecting crop growth and development, leading to yield reduction (Li et al. 2022b). The Chinese flax collection of 200 diverse flax accessions was evaluated under controlled

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

91

Table 5.10 Seven candidate genes for salt stress QTLs consistent with a previously reported transcriptomic study (Li et al. 2022b) QTL

Chr

Flax gene

Arabidopsis ortholog

Functional Annotation

qRSL3.6

3

Lus10012628

AT5G53110.1

RING/U-box superfamily protein

qRGR4.3

4

Lus10041550

AT3G27030.1

Unknown protein

qRGR11.1

11

Lus10026381

AT1G07160.1

Protein phosphatase 2C family protein

qRGR11.1

11

Lus10026376

AT3G47340.1

Glutamine-dependent asparagine synthase 1

qRSL12.1

12

Lus10006732

AT4G11170.1

Disease resistance protein (TIR-NBS-LRR class) family

qRSL14.3

14

Lus10013312

AT3G11590.1

Unknown protein

qRGR15.4

15

Lus10037940

AT5G03530.1

RAB GTPase homolog C2A

conditions for salt tolerance during the germination stage (Li et al. 2022b). The relative germination rate (RGR), relative shoot length (RSL), and relative root length (RRL) were used to measure salt tolerance under controlled salt stress conditions. A GWAS with 674,074 SNPs was conducted for the three traits using the single-locus models GLM and MLM. A total of 902 QTNs were identified, grouped into 64 unique QTLs, and estimated to explain 14.48– 29.38% of the phenotypic variation for the traits. In addition, candidate gene scan combining transcriptome data and homologous gene annotation identified 268 candidate genes, of which seven (Table 5.10) were validated by transcriptome analysis (Li et al. 2022b). In addition, Lus10033213 (17,306,783–17,346,619 on Chr 2) co-locating with QTL qRGR2.3 encodes a GST protein. Interestingly, GSTs are also involved in flax seed coat color (Young et al. 2022).

5.6.5 Biotic Traits 5.6.5.1 Pasmo Caused by Septoria linicola (Speg.) Garassini, pasmo is one of the most common diseases affecting flax production. This fungus infects flax plants from seedling through maturity, but the symptoms are the most severe at the ripening stage when both temperatures and humidity are high. Pasmo also has an adverse effect on the seed and fiber quality and can cause seed yield

losses of up to 75% (Hall et al. 2014; He et al. 2019b). A GWAS was performed on 370 accessions from the Canadian flax core collection to detect genetic regions associated with pasmo rating (PR) (He et al. 2019b). A total of 692 potential QTNs were identified using singleand multi-locus models on 258,873 SNPs distributed across all 15 flax chromosomes. Based on LD blocks between contiguous QTNs, the 692 QTNs were merged into 500 QTLs, including 67 large effect (R2: 3–23%) with high stability across PR datasets. All these QTLs explained 32–64% of the total variation in PR datasets. This study also detected 85 RGAs that were associated with toll interleukin receptor, nucleotide-binding site, and leucine-rich repeat type genes (He et al. 2019b).

5.6.5.2 Powdery Mildew (PM) PM, caused by the obligate biotrophic ascomycete Oidium lini Skoric, is one of the most devastating and prevalent flax foliar diseases (Aly et al. 2012; Asgarinia et al. 2013). In Western Canada, this disease was first reported in 1997 (Rashid 1998). Early infection by PM reduces production and seed quality significantly (Beale 1991). Under field conditions with natural inoculum, the majority of Canadian flax genotypes are moderately resistant to PM (You et al. 2016). The most effective strategy to lower PM incidence and save growers’ money is to utilize resistant cultivars in conjunction with a comprehensive disease management program. Three

92

major dominant genes for resistance to PM were identified in flax germplasm (Rashid and Duguid 2005). You et al. (2022) recently reported a detailed GWAS analysis of PM using the phenotypes of 447 accessions of flax and 247,160 SNPs, from which 349 QTNs associated with PM ratings were discovered, including 44 stable and large-effect QTNs (R2 = 10–30%). This research also revealed that 445 RGAs co-located with the QTNs, including nucleotide-binding site and leucine-rich repeat receptors, WRKY, and mildew locus O encoding genes. Taken together, these findings represent a significant body of genomic resources with tangible potential to improve PM resistance in flax.

5.6.5.3 Fusarium Wilt (FW) Fusarium wilt (FW) is caused by Fusarium oxysporum f. sp. lini, a seed- and soil-borne fungus. Historically, FW has been considered a major limiting factor not only in flax, but also in many crop plants, causing major economic losses by inducing necrosis and wilting (Fall et al. 2001; Galindo-Gonzalez and Deyholos 2016; Pegg et al. 2019; Kanapin et al. 2021). The FW fungus infects plants through the roots and continues to grow inside the water-conducting tissues, hindering water transport and eventually causing wilting, necrosis, and chlorosis of the above-ground parts (Rashid 2003; Ma et al. 2013). FW pathogens can survive for 5–10 years in the soil (Rashid 2003), allowing for recurring infections if the soil and residues are not treated and in the absence of proper crop rotations. FW resistance is moderate or high in most cultivated flax cultivars (Rozhmina and Loshakova 2016). However, a rapid decline in genetic diversity of cultivars and the never-ending host–pathogen interactions contribute to increasing the risk of disease development. Global warming aggravates the situation even further. Climate change may result in increased pathogen virulence that could translate into a reduction of the host’s resistance (Timmusk et al. 2020). In castor beans, cultural, chemical, and biological controls of wilt are all ineffective (Dange et al. 2006; Shaw et al. 2022). In addition, the use of chemicals may be detrimental to human, animal, and environmental

F. M. You et al.

health (Sánchez-Martín and Keller 2019). As a result, the most practical and safest approach for controlling FW in flax remains the breeding of resistant cultivars. Indeed, host plant resistance is usually regarded as one of the most costeffective, easiest, and safest ways (Agrios 2005). To ensure long-term and successful protection, it is imperative for flax breeding programs to take proactive measures and develop varieties with effective pathogen resistance against FW. To identify the genetic factors associated with resistance to FW in flax, Kanapin et al. (2021) performed a GWAS on the genetic panel of 297 accessions with a total of 72,526 SNPs. A total of 15 stable QTNs were discovered and harbored 13 candidate genes involved in the pathogen recognition and plant immunity response, including the KIP1-like protein (Lus10025717) and NBS-LRR protein (Lus10025852).

5.7

Perspectives

The success of QTL mapping depends on many factors, such as a large and diverse population, accurate trait phenotyping in multiple environments, high-density genome-wide markers, and advanced statistical methods. Thanks to the rapid development of high-throughput genotyping technologies, such as SNP chips and GBS, genome-wide high-density SNPs can be rapidly generated at low costs, especially for crops with small genomes such as flax. However, a highly saturated and linked marker set may lead to a multi-collinearity issue in GWAS with single SNPs-based statistical models, reducing the efficiency and accuracy of these statistical methods. The removal of redundant genomic markers can increase computational efficiency and mitigate the impact caused by multi-collinearity. Most of the important agronomic traits in crops are complex quantitative traits. Recent advances in multi-locus statistical models have allowed the identification of large- and smalleffect QTNs for complex traits with high accuracy and detection power. Recent association mapping studies in flax (He et al. 2019b; Lan

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

et al. 2020; Soto-Cerda et al. 2021; You et al. 2022) and other crops (Fang et al. 2020; Fatima et al. 2020; Zhong et al. 2021) have demonstrated the advantages of using multiple statistical methods, especially multi-locus models to detect small-effect QTNs of complex traits. Because of the different genetic or statistical assumptions in statistical models, the combined use of several statistical models is a good strategy to take advantage of the strengths of each model and to mitigate their shortcomings. On the other hand, the use of multiple statistical models for GWAS with multiple phenotypic datasets from different environments often leads to different sets of QTNs being identified from the different models and/or different environments. Post-QTN identification analyses are a good quality-control strategy to remove redundancies and false-positive QTNs, as well as to identify the stable and environment-specific QTNs. The post-QTN identification pipeline we designed has been used successfully in several studies (You et al. 2022). The meta-analysis proposed by Kang et al. (2014) is another approach that adds power to the detection of environment-specific QTNs and a better handle on G  E (Lo et al. 2019). A major issue of QTL studies for all crops, including flax, is the challenge of comparing the results across studies. Indeed, statistical models and parameters such as the threshold of significance and probability correction methods (Bonferroni or FDR-based corrections) often differ across studies making direct comparisons impossible. The use of meta-GWAS or metaQTL analyses has potential to bypass these issues and to increase the detection power by increasing the effective population size with the expected outcome being consensus QTLs/QTNs or metaQTLs. Meta-GWAS or meta-QTL analyses have been widely used in many crops, but have yet to be capitalized upon in flax. With the growing number of independent association studies on different populations with commonality of traits across them, a meta-analysis of GWAS or QTLs is warranted.

93

QTL mapping produces two outcomes: QTLs/QTNs and the co-located putative candidate genes. There are several ways to make full use of the results of association studies. Largeeffect QTLs allow the development of markers, such as breeder-friendly kompetitive allelespecific PCR (KASP) markers (Ma et al. 2017), for further validation and application in markerassisted selection (MAS). The use of multiple models has generated a large number of largeand small-effect QTNs, providing an efficient way to use QTNs associated with traits of interest as markers instead of genome-wide random SNPs for genomic selection (GS) model development and breeding selection. Our previous studies have shown that QTN-based GS has a high prediction ability for pasmo and PM (He et al. 2019a; Lan et al. 2020; You et al. 2022). QTNs can also be exploited to perform genomic cross prediction (You et al. 2021) by integrating computer simulations and GS to predict the genetic performance of crosses. This approach evaluates the breeding values and genetic variances of the segregating populations derived from virtual crosses and enhances the potential for success in cross breeding. Favorable allele analysis for the identified QTNs is also helpful for the evaluation of parents or germplasm. Breeding crosses can also be assessed by directly counting and comparing favorable alleles accumulated in parents and by evaluating the complementation of favorable alleles between parents. The ultimate goal of QTL mapping is to identify the causal genes that control the traits of interest. Candidate genes can be predicted through QTL mapping, but the predicted candidate genes require further functional validation. Thanks to the rapid advances in biotechnologies, omics data such as transcriptomic, proteomic, metabolomic, and epigenetic data provide functional genomics resources to validate candidate genes, accelerating the development of markertrait association studies and the application of detected QTLs in genomics-assisted breeding. Acknowledgements We thank Dr. Bourlaye Fofana for review and editing, and Tara Edwards for English editing.

94

References Agrios GN (2005) Plant pathology. Elsevier Academic Press, Amsterdam Aly AA, Mansour M, Mohamed HI, Abd-Elsalam KA (2012) Examination of correlations between several biochemical components and powdery mildew resistance of flax cultivars. Korean Soc Plant Pathol 28:149–155 Arcade A, Labourdette A, Falque M, Mangin B, Chardon F et al (2004) BioMercator: integrating genetic maps and QTL towards discovery of candidate genes. Bioinformatics 20:2324–2326 Arends D, Prins P, Jansen RC, Broman KW (2010) R/qtl: high-throughput multiple QTL mapping. Bioinformatics 26:2990–2992 Asgarinia P, Cloutier S, Duguid S, Rashid K, Mirlohi A et al (2013) Mapping quantitative trait loci for powdery mildew resistance in flax (Linum usitatissimum L.). Crop Sci 53:2462–2472 Banik M, Duguid S, Cloutier S (2011) Transcript profiling and gene characterization of three fatty acid desaturase genes in high, moderate, and low linolenic acid genotypes of flax (Linum usitatissimum L.) and their role in linolenic acid accumulation. Genome 54:471– 483 Barendse W (2011) Haplotype analysis improved evidence for candidate genes for intramuscular fat percentage from a genome wide association study of cattle. PLoS One 6:e29601 Barrett JC (2009) Haploview: visualization and analysis of SNP genotype data. Cold Spring Harb Protoc 2009: pdb ip71 Battenfield SD, Sheridan JL, Silva L, Miclaus KJ, Dreisigacker S et al (2018) Breeding-assisted genomics: applying meta-GWAS for milling and baking quality in CIMMYT wheat breeding program. PLoS One 13:e0204757 Beale R (1991) Studies of resistance in linseed cultivars to Oidium lini and Botrytis cinerea. Aspects Appl Biol 28:85–90 Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc 57:289–300 Bouwman AC, Daetwyler HD, Chamberlain AJ, Ponce CH, Sargolzaei M et al (2018) Meta-analysis of genome-wide association studies for cattle stature identifies common genes that regulate body size in mammals. Nat Genet 50:362–367 Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y et al (2007) TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23:2633–2635 Broman KW, Speed TP (2002) A model selection approach for the identification of quantitative trait loci in experimental crosses. J Royal Stat Soc: Series B 64:641–656 Bush WS, Chen G, Torstenson ES, Ritchie MD (2009) LD-Spline: Mapping SNPs on genotyping platforms to

F. M. You et al. genomic regions using patterns of linkage disequilibrium. BioData Mining 2:7 Buzdugan L, Kalisch M, Navarro A, Schunk D, Fehr E et al (2016) Assessing statistical significance in multivariable genome wide association analysis. Bioinformatics 32:1990–2000 Chen J, Shrestha R, Ding J, Zheng H, Mu C et al (2016) Genome-wide association study and QTL mapping reveal genomic loci associated with Fusarium ear rot resistance in tropical maize germplasm. G3:Genes Genom Genet 6:3803–3815 Cloutier S, Ragupathy R, Niu Z, Duguid S (2011) SSRbased linkage map of flax (Linum usitatissimum L.) and mapping of QTLs underlying fatty acid composition traits. Mol Breed 28:437–451 Cui J, Luo S, Niu Y, Huang R, Wen Q et al (2018) A RAD-based genetic map for anchoring scaffold sequences and identifying QTLs in bitter gourd (Momordica charantia). Front Plant Sci 9:477 Dange S, Desai A, Patel S (2006) Wilt of castor and its management—a review. Agril Rev 27:147–151 Diederichsen A, Kusters PM, Kessler D, Bainas Z, Gugel RK (2013) Assembling a core collection from the flax world collection maintained by plant gene resources of Canada. Genet Resour Crop Evol 60:1479–1485 Evangelou E, Ioannidis JP (2013) Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 14:379–389 Falconer DS (1960) Introduction to quantitative genetics. Oliver & Boyd, Edinburgh/London, UK Fall AL, Byrne PF, Jung G, Coyne DP, Brick MA et al (2001) Detection and mapping of a major locus for fusarium wilt resistance in common bean. Crop Sci 41:1494–1498 Fang Y, Liu S, Dong Q, Zhang K, Tian Z et al (2020) Linkage analysis and multi-locus genome-wide association studies identify QTNs controlling soybean plant height. Front Plant Sci 11:9 Fatima F, McCallum BD, Pozniak CJ, Hiebert CW, McCartney CA et al (2020) Identification of new leaf rust resistance loci in wheat and wild relatives by array-based SNP genotyping and association genetics. Front Plant Sci 11:583738 Flint-Garcia SA, Thuillet AC, Yu J, Pressoir G, Romero SM et al (2005) Maize association population: a high-resolution platform for quantitative trait locus dissection. Plant J 44:1054–1064 Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J et al (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229 Galindo-Gonzalez L, Deyholos MK (2016) RNA-seq transcriptome response of flax (Linum usitatissimum L.) to the pathogenic fungus Fusarium oxysporum f. sp. lini. Front Plant Sci 7:1766 Gallais A, Dillmann C, Goldringer (2001) Quantitative genetics and breeding methods: the way ahead. Paris, France Gatti DM, Svenson KL, Shabalin A, Wu L-Y, Valdar W, et al (2014) Quantitative trait locus mapping methods

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

for diversity outbred mice. G3:Genes Genom Genet 4:1623–1633 Goffinet B, Gerber S (2000) Quantitative trait loci: a metaanalysis. Genetics 155:463–473 Guo D, Jiang H, Yan W, Yang L, Ye J et al (2019) Resequencing 200 flax cultivated accessions identifies candidate genes related to seed size and weight and reveals signatures of artificial selection. Front Plant Sci 10:1682 Hall LM, Booker H, Siloto RMP, Jhala AJ, Weselake RJ (2014) Flax (Linum usitatissimum L.): Domestication, agronomy, breeding, genetic engineering and industrial applications. In: McKeon T, Hildebrand D, Weselake RJ, Hayes D (eds) Industrial Oil crops, AOCS Oilseed monograph series. ACSO Press, Urbana, USA, pp 157–194 Han B, Eskin E (2011) Random-effects model aimed at discovering associations in meta-analysis of genomewide association studies. Am J Hum Genet 88:586– 598 Harushima Y, Yano M, Shomura A, Sato M, Shimano T et al (1998) A high-density rice genetic linkage map with 2275 markers using a single F2 population. Genetics 148:479–494 He J, Meng S, Zhao T, Xing G, Yang S et al (2017) An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding. Theor Appl Genet 130:2327–2343 He L, Xiao J, Rashid KY, Jia G, Li P et al (2019a) Evaluation of genomic prediction for pasmo resistance in flax. Int J Mol Sci 20:359 He L, Xiao J, Rashid KY, Yao Z, Li P et al (2019b) Genome-wide association studies for pasmo resistance in flax (Linum usitatissimum L.) Front Plant Sci 9:1982 Huang BE, George AW (2011) R/mpMap: a computational platform for the genetic analysis of multiparent recombinant inbred lines. Bioinformatics 27:727–729 Huehn M (2011) On the bias of recombination fractions, Kosambi’s and Haldane’s distances based on frequencies of gametes. Genome 54:196–201 Hwang EY, Song Q, Jia G, Specht JE, Hyten DL et al (2014) A genome-wide association study of seed protein and oil content in soybean. BMC Genomics 15:1 Islam MS, Thyssen GN, Jenkins JN, Zeng L, Delhom CD et al (2016) A MAGIC population-based genomewide association study reveals functional association of GhRBB1_A07 gene with superior fiber quality in cotton. BMC Genomics 17:903 Jansen RC (1994) High resolution of quantitative traits into multiple loci via interval mapping. Genetics 1447–1455 Joehanes R, Nelson JC (2008) QGene 4.0, an extensible Java QTL-analysis platform. Bioinformatics 24:2788– 2789 Kajiya-Kanegae H, Takanashi H, Fujimoto M, Ishimori M, Ohnishi N et al (2020) RAD-seq-based high-density linkage map construction and QTL mapping of biomass-related traits in sorghum using the Japanese

95

landrace Takakibi NOG. Plant Cell Physiol 61:1262– 1272 Kaler AS, Purcell LC (2019) Estimation of a significance threshold for genome-wide association studies. BMC Genomics 20:618 Kanapin A, Bankin M, Rozhmina T, Samsonova A, Samsonova M (2021) Genomic regions associated with Fusarium wilt resistance in flax. Int J Mol Sci 22:12383 Kang HM, Sul JH, Service SK, et al (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348– 354 Kang EY, Han B, Furlotte N, Joo JW, Shih D et al (2014) Meta-analysis identifies gene-by-environment interactions as demonstrated in a study of 4,965 mice. PLoS Genet 10:e1004022 Kao C-H, Zeng Z-B, Teasdale RD (1999) Multiple interval mapping for quantitative trait loci. Genetics 152:1203–1216 Kim SA, Brossard M, Roshandel D, Paterson AD, Bull SB et al (2019) gpart: human genome partitioning and visualization of high-density SNP data by identifying haplotype blocks. Bioinformatics 35:4419– 4421 Kirungu JN, Deng Y, Cai X, Magwanga RO, Zhou Z et al (2018) Simple sequence repeat (SSR) genetic linkage map of D genome diploid cotton derived from an interspecific cross between Gossypium davidsonii and Gossypium klotzschianum. Int J Mol Sci 19:204 Kivikoski M, Rastas P, Löytynoja A, Merilä J (2020) Mathematical function for predicting recombination from map distance. bioRxiv. https://doi.org/10.1101/ 2020.1112.1114.422614 Kulwal PL (2018) Trait mapping approaches through linkage mapping in plants. Adv Biochem Eng Biotechnol 164:53–82 Kumar IS, Nadarajah K (2020) A meta-analysis of quantitative trait loci associated with multiple disease resistance in rice (Oryza sativa L.). Plants (Basel) 9:1491 Kumar S, You FM, Duguid S, Booker H, Rowland G et al (2015) QTL for fatty acid composition and yield in linseed (Linum usitatissimum L.). Theor Appl Genet 128:965–984 Kump KL, Bradbury PJ, Wisser RJ, Buckler ES, Belcher AR et al (2011) Genome-wide association study of quantitative resistance to southern leaf blight in the maize nested association mapping population. Nat Genet 43:163–168 Lan S, Zheng C, Hauck K, McCausland M, Duguid SD et al (2020) Genomic prediction accuracy of seven breeding selection traits improved by QTL identification in flax. Int J Mol Sci 21:1577 Lander ES, Green P, Abrahamson J, Barlow A, Daly MJ et al (1987) MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1:174–181

96 Li M, Liu X, Bradbury P, et al (2014) Enrichment of statistical power for genome-wide association studies. BMC Biol 12:73 Li H, Zhang L, Hu J, Zhang F, Chen B et al (2017) Genome-wide association mapping reveals the genetic control underlying branch angle in rapeseed (Brassica napus L.). Front Plant Sci 8:1054 Li M, Zhang YW, Zhang ZC, Xiang Y, Liu MH et al (2022a) A compressed variance component mixed model for detecting QTNs and QTN-by-environment and QTN-by-QTN interactions in genome-wide association studies. Mol Plant 15:630–650 Li X, Guo D, Xue M, Li G, Yan Q et al (2022b) Genomewide association study of salt tolerance at the seed germination stage in flax (Linum usitatissimum L.). Genes (Basel) 13:486 Lindblad-Toh K, Winchester E, Daly MJ, Wang DG, Hirschhorn JN et al (2000) Large-scale discovery and genotyping of single-nucleotide polymorphisms in the mouse. Nat Genet 24:381–386 Lipka AE, Tian F, Wang Q, Peiffer J, Li M et al (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28:2397–2399 Liu X, Guo L, You J, Liu X, He Y et al (2010) Progress of segregation distortion in genetic mapping of plants. Res J Agron 4:78–83 Liu X, Huang M, Fan B, Buckler ES, Zhang Z (2016) Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet 12:e1005767 Lo S, Munoz-Amatriain M, Hokin SA, Cisse N, Roberts PA et al (2019) A genome-wide association and meta-analysis reveal regions associated with seed size in cowpea [Vigna unguiculata (L.) Walp]. Theor Appl Genet 132:3079–3087 Ma L-J, Geiser DM, Proctor RH, Rooney AP, O’Donnell K et al (2013) Fusarium pathogenomics. Annu Rev Microbiol 67:399–416 Ma Y, Coyne CJ, Main D, Pavan S, Sun S et al (2017) Development and validation of breeder-friendly KASPar markers for er1, a powdery mildew resistance gene in pea (Pisum sativum L.). Mol Breed 37:151 Manly KF, Cudmore JRH, Meer JM (2001) Map Manager QTX, cross-platform software for genetic mapping. Mamm Genome 12:930–932 Meng L, Li H, Zhang L, Wang J (2015) QTL IciMapping: Integrated software for genetic linkage map construction and quantitative trait locus mapping in biparental populations. Crop J 3:269–283 Miller MR, Dunham JP, Amores A, Cresko WA, Johnson EA (2007) Rapid and cost-effective polymorphism identification and genotyping using restriction site associated DNA (RAD) markers. Genome Res 17:240–248 Mittapalli O, Rowland G (2003) Inheritance of seed color in fax. Crop Sci 43:1945–1951 Mott R, Talbot CJ, Turri MG, Collins AC, Flint J (2000) A method for fine mapping quantitative trait loci in outbred animal stocks. Proc Natl Acad Sci USA 97:12649–12654

F. M. You et al. Motte H, Vercauteren A, Depuydt S, Landschoot S, Geelen D et al (2014) Combining linkage and association mapping identifies receptor-like protein kinase1 as an essential Arabidopsis shoot regeneration gene. Proc Natl Acad Sci USA 111:8305–8310 Nordborg M, Weigel D (2008) Next-generation genetics in plants. Nature 456:720–723 Palumbo F, Qi P, Pinto VB, Devos KM, Barcaccia G (2019) Construction of the first SNP-based linkage map using genotyping-by-sequencing and mapping of the male-sterility gene in leaf chicory. Front Plant Sci 10:276 Pasaniuc B, Price AL (2017) Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet 18:117–127 Pegg KG, Coates LM, O’Neill WT, Turner DW (2019) The epidemiology of Fusarium wilt of banana. Front Plant Sci 10:1395 Pook T, Schlather M, de Los CG, Mayer M, Schoen CC et al (2019) HaploBlocker: creation of subgroupspecific haplotype blocks and libraries. Genetics 212:1045–1061 Price AH (2006) Believe it or not, QTLs are accurate! Trends Plant Sci 11:213–216 Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA et al (2007) PLINK: a tool set for wholegenome association and population-based linkage analyses. Am J Hum Genet 81:559–575 Rashid K (1998) Powdery mildew on flax: a new disease in western Canada. Can J Plant Pathol 20:216 Rashid K, Duguid S (2005) Inheritance of resistance to powdery mildew in flax. Can J Plant Pathol 27:404– 409 Rashid KY (2003) Principal diseases of flax. pp 93–123 Ren WL, Wen YJ, Dunwell JM, Zhang YM (2017) pKWmEB: integration of Kruskal-Wallis test with empirical Bayes under polygenic background control for multi-locus genome-wide association study. Heredity (Edinb) 120:208–218 Rozhmina TA, Loshakova NI (2016) New sources of effective resistance genes to Fusarium wilt in flax (Linum usitatissimum L.) depending on temperature. Agric Biol 51:310–317 Said JI, Lin Z, Zhang X, Song M, Zhang J (2013) A comprehensive meta QTL analysis for fiber quality, yield, yield related and morphological traits, drought tolerance, and disease resistance in tetraploid cotton. BMC Genomics 14:776 Sánchez-Martín J, Keller B (2019) Contribution of recent technological advances to future resistance breeding. Theo Appl Genet 132:713–732 Sandhu K, You F, Conner R, Balasubramanian P, Hou A (2018) Genetic analysis and QTL mapping of the seed hardness trait in a black common bean (Phaseolus vulgaris) recombinant inbred line (RIL) population. Mol Breed 38:1–13

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

Saroha A, Pal D, Gomashe SS, Akash X, Kaur V et al (2022) Identification of QTNs associated with flowering time, maturity and plant height traits in Linum usitatissimum L. using genome wide association study. Front Genet 13 (In press) Sax K (1923) The association of size differences with seed-coat pattern and pigmentation in Phaseolus vulgaris. Genetics 8:552–560 Seaton G, Haley CS, Knott SA, Kearsey M, Visscher PM (2002) QTL Express: mapping quantitative trait loci in simple and complex pedigrees. Bioinformatics 18:339–340 Segura V, Vilhjalmsson BJ, Platt A, Korte A, Seren U et al (2012) An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet 44:825–830 Semagn K, Bjørnstad A, Skinnes H, Marøy AG, Tarkegne Y et al (2006) Distribution of DArT, AFLP, and SSR markers in a genetic linkage map of a doubled-haploid hexaploid wheat population. Genome 49:545–555 Sertse D, You FM, Ravichandran S, Cloutier S (2019) The complex genetic architecture of early root and shoot traits in flax revealed by genome-wide association analyses. Front Plant Sci 10:1483 Sertse D, You FM, Ravichandran S, Soto-Cerda BJ, Duguid S et al (2021) Loci harboring genes with important role in drought and related abiotic stress responses in flax revealed by multiple GWAS models. Theor Appl Genet 134:191–212 Shaw RK, Shaik M, Prasad MSL, Prasad RD, Mohanrao MD et al (2022) Genomic regions associated with resistance to Fusarium wilt in castor identified through linkage and association mapping approaches. Genome 65:123–136 Shim H, Chun H, Engelman CD, Payseur BA (2009) Genome-wide association studies using singlenucleotide polymorphisms versus haplotypes: an empirical comparison with data from the North American Rheumatoid Arthritis Consortium. BMC Proc 3:S35 Shook JM, Zhang J, Jones SE, Singh A, Diers BW et al (2021) Meta-GWAS for quantitative trait loci identification in soybean. G3:Genes Genom Genet. https:// doi.org/10.1093/g1093journal/jkab1117 Singh S, Kumar R, Kumar S, Singh PK, Yadav HK (2021) Mapping QTLs for alternaria blight in linseed (Linum usitatissimum L.). Biotech 11:91 Sonah H, O’Donoughue L, Cober E, Rajcan I, Belzile F (2015) Identification of loci governing eight agronomic traits using a GBS-GWAS approach and validation by QTL mapping in soya bean. Plant Biotechnol J 13:211–221 Soto-Cerda BJ, Aravena G, Cloutier S (2021) Genetic dissection of flowering time in flax (Linum usitatissimum L.) through single- and multi-locus genome-wide association studies. Mol Genet Genomics 296:877– 891 Soto-Cerda BJ, Cloutier S, Quian R, Gajardo HA, Olivos M et al (2018) Genome-wide association

97

analysis of mucilage and hull content in flax (Linum usitatissimum L.) seeds. Int J Mol Sci 19:2870 Soto-Cerda BJ, Duguid S, Booker H, Rowland G, Diederichsen A et al (2013) Genomic regions underlying agronomic traits in linseed (Linum usitatissimum L.) as revealed by association mapping. J Integrat Plant Biol 56:75–87 Soto-Cerda BJ, Duguid S, Booker H, Rowland G, Diederichsen A et al (2014) Association mapping of seed quality traits using the Canadian flax (Linum usitatissimum L.) core collection. Theor Appl Genet 127:881–896 Spielmeyer W, Green AG, Bittisnich D, Mendham N, Lagudah ES (1998) Identification of quantitative trait loci contributing to Fusarium wilt resistance on an AFLP linkage map of flax (Linum usitatissimum). Theor Appl Genet 97:633–641 Sudarshan GP, Kulkarni M, Akhov L, Ashe P, Shaterian H et al (2017) QTL mapping and molecular characterization of the classical D locus controlling seed and flower color in Linum usitatissimum (flax). Sci Rep 7:15751 Sumitomo K, Shirasawa K, Isobe S, Hirakawa H, Hisamatsu T et al (2019) Genome-wide association study overcomes the genome complexity in autohexaploid chrysanthemum and tags SNP markers onto the flower color genes. Sci Rep 9:13947 Sun X, Habier D, Fernando RL, Garrick DJ, Dekkers JC (2011) Genomic breeding value prediction and QTL mapping of QTLMAS2010 data using Bayesian Methods. BMC Proc 5(Suppl 3):S13 Swamy BP, Vikram P, Dixit S, Ahmed HU, Kumar A (2011) Meta-analysis of grain yield QTL identified during agricultural drought in grasses showed consensus. BMC Genomics 12:319 Tamba CL, Ni YL, Zhang YM (2017) Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies. PLoS Comput Biol 13:e1005357 Tamba CL, Zhang Y-M (2018) A fast mrMLM algorithm for multi-locus genome-wide association studies. BioRxiv. https://doi.org/10.1101/341784 Tao S, Wu J, Yao D, Chen Y, Yang W et al (2018) Identification of recombination events in outbred species with next-generation sequencing data. BMC Genomics 19:398 Timmusk S, Nevo E, Ayele F, Noe S, Niinemets Y (2020) Fighting Fusarium pathogens in the era of climate change: a conceptual approach. Pathogens 9:419 Tsai H, Kippes N, Firl A, et al (2021) Efficient construction of a linkage map and haplotypes for Mentha suaveolens using sequence capture. G3:Genes Genom Genet 11:9 Utz H, Melchinger A (1996) PLABQTL: a program for composite interval mapping of QTL Veyrieras JB, Goffinet B, Charcosset A (2007) MetaQTL: a package of new computational methods for the metaanalysis of QTL mapping experiments. BMC Bioinformatics 8:49

98 Vrinten P, Hu Z, Munchinsky MA, Rowland G, Qiu X (2005) Two FAD3 desaturase genes control the level of linolenic acid in flax seed. Plant Physiol 139:79–87 Wang DL, Zhu J, Li ZKL, Paterson AH (1999) Mapping QTLs with epistatic effects and QTLenvironment interactions by mixed linear model approaches. Theor Appl Genet 99:1255–1264 Wang J, Zhang Z (2021) GAPIT Version 3: Boosting power and accuracy for genomic association and prediction. Genomics Proteomics Bioinform 19:629– 640 Wang N, Akey JM, Zhang K, Chakraborty R, Jin L (2002) Distribution of recombination crossovers and the origin of haplotype blocks: the interplay of population history, recombination, and mutation. Am J Hum Genet 71:1227–1234 Wang N, Yuan Y, Wang H, Yu D, Liu Y et al (2020) Applications of genotyping-by-sequencing (GBS) in maize genetics and breeding. Sci Rep 10:16308 Wang S, Basten C, Zeng ZB (2012a) Windows QTL Cartographer 2.5p. http://statgen.ncsu.edu/qtlcart/ WQTLCart.htm Wang SB, Feng JY, Ren WL, Huang B, Zhou L et al (2016a) Improving power and accuracy of genomewide association studies via a multi-locus mixed linear model methodology. Sci Rep 6:19444 Wang SB, Wen YJ, Ren WL, Ni YL, Zhang J et al (2016b) Mapping small-effect and linked quantitative trait loci for complex traits in backcross or DH populations via a multi-locus GWAS methodology. Sci Rep 6:29951 Wang Z, Hobson N, Galindo L, Zhu S, Shi D et al (2012b) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72:461–473 Wen YJ, Zhang H, Ni YL, Huang B, Zhang J et al (2018) Methodological implementation of mixed linear models in multi-locus genome-wide association studies. Brief Bioinform 19:700–712 Willer CJ, Li Y, Abecasis GR (2010) METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26:2190–2191 Wu J, Zhao Q, Zhang L, Li S, Ma Y et al (2018) QTL mapping of fiber-related traits based on a high-density genetic map in flax (Linum usitatissimum L.). Front Plant Sci 9:885 Wu R, Casella G, Ma C (2007) Statistical genetics of quantitative traits: linkage, maps, and QTL (statistics for biology and health), 1, Aufl. Springer-Verlag, New York, NY Xie D, Dai Z, Yang Z, Sun J, Zhao D et al (2018a) Genome-wide association study identifying candidate genes influencing important agronomic traits of flax (Linum usitatissimum L.) using SLAF-seq. Front Plant Sci 8:2232 Xie D, Dai Z, Yang Z, Tang Q, Sun J et al (2018b) Genomic variations and association study of agronomic traits in flax. BMC Genomics 19:512

F. M. You et al. Xie DY, Dixon RA (2005) Proanthocyanidin biosynthesis–still more questions than answers? Phytochemistry 66:2127–2144 Yang J, Hu C, Hu H, Yu R, Xia Z et al (2008) QTLNetwork: mapping and visualizing genetic architecture of complex traits in experimental populations. Bioinformatics 24:721–723 You FM, Cloutier S (2020) Mapping quantitative trait loci onto chromosome-scale pseudomolecules in flax. Methods Protoc 3:28 You FM, Duguid SD, Lam I, Cloutier S, Rashid KY et al (2016) Pedigrees and genetic base of the flax varieties registered in Canada. Can J Plant Sci 96:837–852 You FM, Rashid KY, Zheng C, Khan N, Li P et al (2022) Insights into the genetic architecture and genomic prediction of powdery mildew resistance in flax (Linum usitatissimum L.). Int J Mol Sci 23:4960 You FM, Xiao J, Li P, Yao Z, Gao J et al (2018a) Chromosome-scale pseudomolecules refined by optical, physical, and genetic maps in flax. Plant J 95:371– 384 You FM, Xiao J, Li P, Yao Z, Jia G et al (2018b) Genome-wide association study and selection signatures detect genomic regions associated with seed yield and oil quality in flax. Int J Mol Sci 19:2303 You FM, Zheng C, Bartaula S, Khan N, Wang J et al (2021) Genomic cross prediction for linseed improvement. In: SS G, SH W (eds) Accelerated Plant Breeding. Springer, Cham, pp 451–480 Young L, Akhov L, Kulkarni M, You F, Booker H (2022) Fine-mapping of a putative glutathione S-transferase (GST) gene responsible for yellow seed colour in flax (Linum usitatissimum). BMC Res Notes 15:72 Yu J, Buckler ES (2006) Genetic association mapping and genome organization of maize. Curr Opin Biotechnol 17:155–160 Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M et al (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203–208 Zeggini E, Ioannidis JP (2009) Meta-analysis in genomewide association studies. Pharmacogenomics 10:191– 201 Zeng ZB (1994) Precision mapping of quantitative trait loci. Genetics 136:1457–1468 Zhang J, Feng JY, Ni YL, Wen YJ, Niu Y et al (2017) pLARmEB: integration of least angle regression with empirical Bayes for multilocus genome-wide association studies. Heredity (Edinb) 118:517–524 Zhang J, Long Y, Wang L, Dang Z, Zhang T et al (2018) Consensus genetic linkage map construction and QTL mapping for plant height-related traits in linseed flax (Linum usitatissimum L.). BMC Plant Biol 18:160 Zhang YM, Jia Z, Dunwell JM (2019) Editorial: The applications of new multi-locus GWAS methodologies in the genetic dissection of complex traits. Front Plant Sci 10:100

5

QTL Mapping: Strategy, Progress, and Prospects in Flax

Zhang YM, Tamba CL (2018) A fast mrMLM algorithm for multi-locus genome-wide association studies. bioRxiv https://doi.org/10.1101/341784 Zhang YM, Xu S (2005) A penalized maximum likelihood method for estimating epistatic effects of QTL. Heredity 95:96–104 Zhang YW, Wen YJ, Dunwell JM, Zhang YM (2020) QTL.gCIMapping.GUI v2.0: An R software for detecting small-effect and linked QTLs for quantitative traits in bi-parental segregation populations. Comput Struct Biotechnol J 18:59–65 Zhao J, Sauvage C, Zhao J, Bitton F, Bauchet G et al (2019) Meta-analysis of genome-wide association studies provides insights into genetic control of tomato flavor. Nat Commun 10:1534

99

Zhao K, Aranzana MJ, Kim S, Lister C, Shindo C et al (2007) An Arabidopsis example of association mapping in structured samples. PLoS Genet 3:e4 Zhong H, Liu S, Sun T, Kong W, Deng X et al (2021) Multi-locus genome-wide association studies for five yield-related traits in rice. BMC Plant Biol 21:364 Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821–824 Zhu Z, Anttila V, Smoller JW, Lee PH (2018) Statistical power and utility of meta-analysis methods for crossphenotype genome-wide association studies. PLoS One 13:e0193256

6

Genetics of Abiotic Stress in Flax Bijendra Khadka and Sylvie Cloutier

6.1

Introduction

Plants, being sessile organisms, have evolved to cope with a wide range of harsh environmental conditions such as drought, extreme temperatures, salinity, and heavy metals. These external stress factors, collectively referred to as abiotic stresses, are liable for restricting growth, development, yield, seed quality, and production of important crops, putting global food security at risk (Halford et al. 2015; Zhu 2016). Abiotic stressors, such as drought and heat, are wreaking havoc on crop yield worldwide (Wheaton et al. 2008; Lamaoui et al. 2018). According to a new study, agricultural losses in Europe have tripled in the last half-century due to severe drought and heat waves (Brás et al. 2021). Similarly, salinity and heavy metal toxicities resulting from increased salinization, rapid industrialization, and modern agriculture practices are also rising threats that limit plant growth and agricultural productivity (Patra et al. 2004; Munns and Tester 2008). The adverse effects of abiotic stresses on crop productivity and food availability are further exacerbated by the rapidly changing global climate and a growing human population (Godfray

B. Khadka  S. Cloutier (&) Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada e-mail: [email protected]

et al. 2010). Given that current climate models forecast a rise in the frequency and severity of such events, developing crop varieties resilient to abiotic stresses is critical (Ciais et al. 2005; Fedoroff et al. 2010; Hirabayashi et al. 2013; Diffenbaugh et al. 2017). Detailed summaries of the molecular genetic pathways involved in plant response to these external stress stimuli are provided in several review articles (Zhu 2016; Ma et al. 2020; Zhang et al. 2021a). In general, abiotic stress factors often elicit cascades of molecular and physiological responses controlled by a plethora of genes, resulting in overlapping defense responses, implying interconnected networks of conserved genes and stress signaling pathways (Seki et al. 2003; Yamaguchi-Shinozaki and Shinozaki 2006; Smekalova et al. 2014). The characterization of putative-tolerant genes and pathways linked to diverse abiotic stressors and their incorporation into breeding strategies has long been a goal of plant breeders (Ashraf 2010). As a result, understanding the basic genetic pathways underpinning plant responses to abiotic stressors is crucial for developing stress-tolerant varieties through traditional breeding methods or targeted engineering strategies (Mittler and Blumwald 2010; Varshney et al. 2011). Flax, also called flaxseed or linseed when explicitly grown for seed, is an important commercial crop in both ancient and modern times. It is currently grown in several countries for its seed as a source of food and feed, its seed oil

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_6

101

102

used in human diet and in industrial applications, its fiber, and its medicinal compounds (Van Zeist and Heeres 1975; Kvavadze et al. 2009; Singh et al. 2011; Schmidt et al. 2012; Shayan et al. 2020; Melelli et al. 2021). Flax is a member of the Linum genus of the Linaceae family. The genus Linum contains over 200 species with variable numbers of chromosomes. Flax (L. usitatissimum; 2n = 2x = 30) is the most widely cultivated species of the genus and it is unique in its dual purpose, i.e., stem fiber (linen and other products) and seed (oil, lignan, meal) (McDill et al. 2009; Rice et al. 2015). A wide range of abiotic stresses, such as drought (Kariuki et al. 2016), hot and cold temperatures (Gusta et al. 1996), salinity (Guo et al. 2013), and heavy metals (Ivanova et al. 2003), are known to reduce the productivity and sustainability of flax cultivation. However, despite being one of the most versatile crops with both nutritional and industrial values, few studies on the impact of abiotic stresses have been conducted on flax. Furthermore, the conventional breeding approaches have not contributed significantly to improving flax’s abiotic stress tolerance because abiotic stressrelated traits are complex, quantitative in nature, and the genes and their products involved are part of intricate networks of stress-induced signaling pathways (Zhu 2016). Therefore, understanding the genetic fundamentals of abiotic stress tolerance is critical to developing improved flax cultivars through traditional breeding and genetic engineering (Zhang et al. 2000; Zhu 2016). The goal is to maintain and increase flax production through the development of better varieties despite the many environmental stresses. Recent advances in next-generation sequencing (NGS) technologies have dramatically transformed our ability to identify genes and quantitative trait loci (QTLs) related to abiotic stressors. In conjunction with genomics tools, genetic approaches such as linkage mapping and genome-wide association study (GWAS) are powerful tools to pinpoint genomic regions and genes influencing various traits in flax (Spielmeyer et al. 1998; You and Cloutier 2020). Until recently, most studies on flax largely focused on

B. Khadka and S. Cloutier

identifying QTLs controlling agronomic traits (Soto-Cerda et al. 2014), fatty acid composition (Cloutier et al. 2011; Kumar et al. 2015), oil yield (Chandrawati and Yadav 2017), fiber (Wu et al. 2018), and seed and flower traits (Sudarshan et al. 2017; Xie et al. 2018). However, with the availability of whole-genome sequencing, high-throughput genotyping, and transcriptome data, analytical approaches such as GWAS and gene association studies can be readily applied to identify potential candidate genes and signaling pathways involved in flax abiotic stress tolerance mechanisms. As a result, an increasing number of QTLs, putative genes, and gene families induced in response to various abiotic stress conditions have recently been identified in flax (Dash et al. 2014; Yu et al. 2014; Soto-Cerda et al. 2019; Khan et al. 2020; Wang et al. 2021). In this book chapter, we summarize the current status of the knowledge of the genetics of abiotic stress in flax and highlight recent advances in genes/pathways and regulatory networks, with a particular emphasis on recent breakthroughs in three key abiotic stressors: drought, salinity, and heavy metals.

6.2

Genetics and QTL Identification Studies in Flax

Cultivated flax is a diploid with 15 chromosomes in its haploid (2n = 2x = 30). The haploid genome complement of flax is estimated to be *368 Mb in size according to the bacterial artificial chromosome (BAC)-based physical map (Ragupathy et al. 2011) or *373 Mb based on flow cytometry (Wang et al. 2012). Several functional genes and genetic networks control complex quantitative traits such as abiotic stresses. Identification of QTLs or genes linked to traits can lead to strategies to improve or modify these traits. Genetic linkage maps are valuable tools for identifying chromosomal locations harboring genes, regulatory elements and QTLs associated with traits of interest within populations. In flax, a limited number of linkage maps have been created to date. The first linkage map of flax was constructed with 213 amplified

6

Genetics of Abiotic Stress in Flax

fragment length polymorphism (AFLP) markers in 1998 (Spielmeyer et al. 1998). Since then, molecular markers such as restricted fragment length polymorphism (RFLP) (Hausner et al. 1999; Oh et al. 2000), random amplification of polymorphic DNA (RAPD) (Cullis et al. 1999; Oh et al. 2000; Fu et al. 2002; Kumari et al. 2017), AFLP (Everaert et al. 2001; Adugna et al. 2006; Chandrawati et al. 2014), specific locus amplified fragment (SLAF) (Wu et al. 2018), single nucleotide polymorphism (SNP) (Yi et al. 2017) and simple sequence repeat (SSR) (Cloutier et al. 2009, 2011; Soto-Cerda et al. 2011) have been used to construct maps with a higher density of markers (up to 700–800 SSRs or >1000 SNPs) (Cloutier et al. 2019). The availability of high-density genetic maps of flax serves as a platform for efficient and accurate mapping of QTLs to identify loci and genes related to abiotic stresses. Molecular markers and genetic linkage maps have been capitalized upon in flax to identify, map and characterize genes and QTLs of important agronomic, seed quality, fiber, and disease resistance traits. A recent report (You and Cloutier 2020) noted that, thus far, a total of 313 QTLs for 31 different quantitative traits have been reported in flax. SSRs and SNPs were used to identify most of these QTLs, indicating the reliability and popularity of these molecular markers. Although these studies have allowed for a remarkable improvement in our understanding of the genes and QTLs associated with agronomic traits in flax, few studies have been dedicated to the genetics of abiotic stress-related traits. Many stress-related QTLs have been identified in major crops such as wheat (Bennett et al. 2012; Sharma et al. 2017; Asif et al. 2021), barley (Xue et al. 2009; Fan et al. 2015), rice (Barik et al. 2020; Chen et al. 2020), cotton (Abdelraheem et al. 2015; Diouf et al. 2017), maize (Cui et al. 2015; Luo et al. 2017) and soybean (Wang et al. 2016; Zhang et al. 2021b). However, the information on stress-related QTLs in flax remains limited. The study of abiotic stresses is complicated by the genetically complex response of plants to these stresses, which

103

are controlled by several common and conserved regulatory pathways, and influenced strongly by environmental factors (Zhu 2016; Zhang et al. 2021a). For instance, QTLs for drought tolerance contribute to growth and productivity under salinity stress conditions by reducing salt accumulation (Sharma et al. 2011, 2014), suggesting that some QTLs or genes can have pleiotropic effects under multiple stress conditions. Recent and rapid improvements in sequencing technologies, genome analysis methods, and marker development in flax have created the opportunity to revisit how we tackle specific components of abiotic stresses and are ultimately paving the way for marker-assisted pyramiding of QTLs/genes to improve abiotic stress tolerance in flax. In recent years, GWAS has emerged as a powerful approach for detecting QTLs or quantitative trait nucleotides (QTNs) linked to both agronomic and drought-related traits (SotoCerda et al. 2018, 2019). GWAS in flax led to the identification of drought tolerance QTNs and their putatively associated candidate genes (Sertse et al. 2019, 2021).

6.2.1 Increasing Flax Genome Resources and Their Role in Abiotic Stress Study During the past few decades, the revolutionary advances in the field of genomics have resulted in a significant reduction in the cost of sequencing (Liu et al. 2012; Wambugu et al. 2018). These cost reductions combined with higher throughputs and robust computational pipelines have led to the production of reference genomes of many species, including large and complex crop genomes, such as wheat (International Wheat Genome Sequencing Consortium (IWGSC) 2018; Wambugu et al. 2018; Purugganan and Jackson 2021). Our knowledge of flax genetics and genomics has been facilitated by genotyping and sequencing of a large number of genome-wide genetic variants. Soon after the publication of detailed genetic and physical maps of flax (Cloutier et al. 2012a, b), the first flax (Canadian linseed cultivar CDC Bethune v.1.0)

104

genome assembly was published in 2012 using Illumina paired-end libraries (Wang et al. 2012). The genome assembly was supplemented by expressed sequence tags (ESTs) of more than ten tissues (Venglat et al. 2011), a BAC-based physical map and its BAC-end sequences (BES) (Ragupathy et al. 2011) and a consensus linkage map based on SSRs (Cloutier et al. 2009). Following that, a chromosome-scale pseudomolecule (CDC Bethune v.2.0) was published through the incorporation of a Bionano genome (BNG)-based optical map, a BAC-based physical map, and the information derived from multiple genetic maps (You et al. 2018). More recently, the genome sequence of Russian fiber flax cultivar Atlant was assembled by combining high-accuracy Illumina short reads with errorprone long reads from Oxford Nanopore Technology (ONT) (Dmitriev et al. 2020b) and that of Chinese fiber flax cultivar Yiya-5 was assembled using PacBio Hi-Fi and Hi-C sequencing data to improve the contiguity, accuracy, and completeness (Sa et al. 2021). These high-quality reference genomes are being used for transcriptome studies, as templates for genomic sequences of a wide variety of germplasm, as a resource for the identification of gene families and polymorphisms, and for comparative analyses of gene expression in response to abiotic stresses (Gorshkova et al. 2018; Guo et al. 2019; Wu et al. 2019b; Dmitriev et al. 2020a; Khan et al. 2020; Zhang et al. 2020; Ali et al. 2021; Wang et al. 2021). Furthermore, the advent of high-throughput technologies such as NGS, proteomics, and transcriptome sequencing (RNA-Seq) has led to the discovery of thousands of candidate genes and enormous datasets related to abiotic stresses in a variety of model plant organisms such as Arabidopsis and rice (Todaka et al. 2012; Atkinson et al. 2013; Haak et al. 2017). This sets the stage for dissecting the genetic mechanisms through novel gene discovery and their functions, gene networks or pathways, as well as their expression patterns during abiotic stress adaptation in flax. Findings in these species provide instrumental support for developing strategic breeding approaches in flax and leveraging

B. Khadka and S. Cloutier

effective genetic engineering and biotechnology approaches to enhance abiotic stress tolerance in flax. However, despite the progressive improvement in flax genomic resources over the past few years, our understanding of the molecular mechanisms underlying abiotic stress-induced response in flax remains largely unexplored. In the following sections, we describe recent studies on abiotic stresses in flax, focusing on drought, salinity, and heavy metals, and we highlight the current state of knowledge including a description of the stress-related genes and gene families hypothesized to play a role in abiotic stress tolerance.

6.2.1.1 Drought Stress Drought is one of the most devastating abiotic stressors that affects global crop productivity (Lesk et al. 2016; Zhang et al. 2018; Leng and Hall 2019). Driven by its significance, progress has been made over the past few decades in understanding the molecular mechanisms of drought tolerance in model plant species (e.g., Arabidopsis), and stress-induced gene networks and signal transduction pathways have been uncovered (Umezawa et al. 2006; Xiong et al. 2006; Shanker et al. 2014; Takahashi et al. 2018). Drought tolerance in plants is complex and governed by genetic and environmental factors, and their interactions. Moreover, microRNAs (Zhou et al. 2013; Ferdous et al. 2015), QTLs (Fan et al. 2015; Sertse et al. 2021), phytohormones such as auxins and abscisic acid (ABA) (Shi et al. 2014; Sah et al. 2016), membrane transporters, such as ATP-binding cassette (ABC), ion transporters and aquaporins (Kuromori et al. 2010; Jarzyniak and Jasiński 2014), kinases, such as SnRKs, PP2Cs, MAPK, CPDK, RLKs (Chen et al. 2021), and transcription factors (TFs), such as WRKY, NAC, bZIP, MYB, GATA (Yoshida et al. 2010; Golldack et al. 2011; Yuan et al. 2021; Zhao et al. 2021) have been identified to play key roles during drought stress signaling. Due to its shallow root system, flax is more susceptible to drought than other economic crops (Sertse et al. 2019). Under water-limiting conditions, fiber yield and quality decrease (Heller

6

Genetics of Abiotic Stress in Flax

and Byczyńska 2015). In flax, a few studies have been undertaken to identify drought-resilient genotypes (Qi et al. 2010; Sharma et al. 2012; Asgarinia et al. 2017; Mahfouze and Mahfouze 2017; Sertse et al. 2019, 2021) and the adaptive mechanisms and gene networks underlying tolerance remain largely speculative. Therefore, elucidating the genetic basis of flax response to drought stress is crucial for developing flax varieties with improved agronomic traits (e.g., high fiber/seed yield and quality) that can withstand such stress. In recent years, genome-wide analyses, transcriptomics, and functional genomics have been conducted to identify candidate genes associated with drought tolerance in flax (Table 6.1). For instance, microarrays and RNA-Seq were used to investigate the modifications in gene expression in roots and shoots of flax under drought stress (Dash et al. 2014). Using microarray-based genome-wide expression analysis, the same authors also reported a first high-resolution transcriptome dataset of drought-tolerant cultivars generated at various developmental stages under drought stress conditions (Dash et al. 2017). Their study provided a genomic resource for uncovering the genes and intrinsic pathways involved in drought tolerance. Further, to understand the molecular mechanisms underlying drought stress in flax, a transcriptome analysis of drought-tolerant and susceptible varieties of flax (linseed) at the seedling stage was performed by combining PacBio’s single-molecule real-time (SMRT) long-read and Illumina shortread sequencing (Wang et al. 2021). Their examination of differentially expressed genes (DEGs) revealed the overexpression of NADP biosynthesis genes, as well as pyrroline-5carboxylate synthase (P5CS) and pyrroline-5carboxylate reductase (P5CR) genes that encode enzymes for the biosynthesis of proline, which plays a key role in maintaining osmotic balance and shielding DNA from reactive oxygen species (ROS) damage. Additionally, upregulation of plant-specific TFs, such as dehydration responsive element binding (DREB) and heat stress transcriptional factors (HSFs), was also reported.

105

In another study, 141 flax accessions were assessed to identify drought-resilient flax genotypes (Soto-Cerda et al. 2019). Genome-wide selection sweep and multiple regression association (MRA) analyses were performed using 394 genome-wide SSR loci to identify those associated with drought tolerance. Further, a search for candidate genes linked to these markers revealed several candidates, including some involved in abscisic acid, calcium (Ca2+), and auxin signaling pathways. The orthologues of these genes in model plants encode cyclin-dependent kinase subunit (CKS) proteins, plant-specific droughtresponsive TFs (WRKY, GRAS, MYB1R-1), and small auxin upregulated RNA-75 (SAUR75) proteins that are known to play prominent roles in root growth and abiotic stress response. Following that, the same authors also assessed the agronomic and root traits of 41 flax accessions grown under drought stress and irrigated conditions across three environments (Soto-Cerda et al. 2020). A genomic dataset of 170,534 SNP markers was selected to perform single- and multi-locus GWAS that identified 15 QTNs associated with drought stress tolerance. Further, a search for potential candidate genes located near major QTNs using the flax reference genome and public databases revealed several genes related to drought-responsive pathways. In a similar field-based study, large-effect QTNs loci identified genes previously reported to play a role in drought and other abiotic stress conditions (Sertse et al. 2021). These included genes Lus10009480 and Lus10030150 that encode a WAX INDUCER1 protein and a STRESSASSOCIATED PROTEIN (SAP), respectively, as well as stress-responsive genes such as SHN1/WIN1, CIPK10 and TIC110 (Table 6.1).

6.2.1.2 Salinity Stress Salinity is another important abiotic stress that threatens the growth, development, and production of flax and other major crops in a number of nations (Ashraf and Fatima 1994; Wu et al. 2019a; Dubey et al. 2020; Zhao et al. 2020). According to estimates, salinity is expected to harm more than 59% of the world’s arable land

TF Auxin-responsive protein

Regulation of drought-responsive genes

Root growth and abiotic stress response such as drought tolerance

Greater root length and survival rate during drought stress

Pigeonpea cyclindependent kinase regulatory subunit (CKS)

Drought-responsive GRAS 23 (GRAS23)

MIZU KUSSEI 1

MYB one repeats (1R)-1 (MYB1R-1)

PHD finger protein alfin-like

Small auxinupregulated RNA 75

Pyrroline5-carboxylate synthase (P5CS)

GRAS23 (Lus10036623)

MIZ1 (Lus10036630)

MYB1R-1 (Lus10036652)

AL (Lus10026268)

SAUR75 (Lus10026294)

P5CS

Proline content and improvement of drought tolerance

Hydrotropism in lateral roots by maintaining auxin level and contributing to drought adaptation

Overexpression improves drought tolerance

Overexpression enhances tolerance to drought

Rate-limiting enzyme in proline biosynthesis

Drought

Drought

Drought

Drought

Drought



TF

Drought

Drought

TF

Cell cycle-dependent protein kinase

Drought

CKS (Lus10036579)

Improves tolerance to drought stress and seed development

Zinc finger protein

Stress-associated protein (SAP)

Drought

Stress

Transcription factor (TF)

Lus10030150

Regulate cuticular wax and cutin biosynthesis

Wax inducer 1/SHIN1

Molecular function

WINI/SHN1 (Lus10009480)

Biological function

Protein name

Gene name/ID

Table 6.1 List of genes/proteins reported as candidates in abiotic stress tolerance in flax (Linum usitatissimum L.)

RT-PCR

OBFP

OBFP

OBFP

OBFP

OBFP

OBFP

OBFP

Orthologybased function prediction (OBFP)

Discovery method

(continued)

Wang et al. (2021)

Sertse et al. (2019)

Sertse et al. (2019)

Sertse et al. (2019)

Sertse et al. (2019)

Sertse et al. (2019)

Sertse et al. (2019)

Sertse et al. (2021)

Sertse et al. (2021)

References

106 B. Khadka and S. Cloutier

Protein name

Pyrroline5-carboxylate reductase (P5CR)

DNA mismatch repair protein MLH1

Dehydrin

Superoxide dismutase (SOD)

Polyphenol oxidase (PPO)

Peroxidase (POD)

WRKY transcription factor

N/A

N/A

UDPglycosyltransferases

Glutathione S-transferases

Gene name/ID

P5CR

MLH1

DHN

SOD

PPO

POD

WRKY

miR398

miR530

UGT

GST

Table 6.1 (continued)

Oxidative stress defense

Detoxification of ROS and cell wall modification

Targets WRKY family transcription factors known to be involved in abscisic acid signaling, and other abiotic stress tolerance mechanisms

Targets genes encoding a copper/zinc superoxide dismutase (CuZnSOD) known to protect plants from oxidative stress

Abscisic acid signaling, and other abiotic stress tolerance mechanisms

Neutralization of reactive oxygen species (ROS)

Oxygen-dependent oxidation of phenols to quinines

First line of defense against oxidative stress

Overexpression of gene improves drought tolerance

DNA repair function and induction of drought tolerance

Proline content and improvement of photosynthetic response under drought stress

Biological function

Detoxification-associated enzyme

HM-Al

Heavy metal (HM) aluminum (Al)

Salinity

miRNA

Detoxification-associated enzyme

Salinity

Salinity

Salinity

Salinity

Salinity

Drought

Drought

Drought

Stress

Micro RNA (miRNA)

TF

Detoxification-associated enzyme

Detoxification-associated enzyme

Detoxification-associated enzyme

Stress-related protein

DNA mismatch repair protein

Last catalytic step of the biosynthesis of proline

Molecular function

Almoataz Bellah et al. (2021) Yu et al. (2016) Yu et al. (2016)

Yu et al. (2016)

– – –



qPCR analysis

Genetics of Abiotic Stress in Flax (continued)

Dmitriev et al. (2016)

Dmitriev et al. (2016)

Almoataz Bellah et al. (2021)



qPCR analysis

El-Beltagi et al. (2008); Almoataz Bellah et al. (2021)

Wang et al. (2021)

Wang et al. (2021)

Wang et al. (2021)

References

Gel electrophorosis and SDSPAGE

RT-PCR

RT-PCR

RT-PCR

Discovery method

6 107

Ca2+-mediated intracellular regulation

Ca2+/H+exchanger 3

MADS-box protein AGL62

NAC domaincontaining protein 100

Polygalacturonase

Cellulose synthase

Beta-glucosidase

GDSL esterase

Acyl-CoA-binding domain-containing protein 1 (ACBP1)

Acyl-CoA-binding domain-containing protein 2 (ACBP2)

Superoxide dismutase (SOD)

Peroxidase isoform 1 (POD1)

Peroxidase isoform 2 (POD2)

CAX3

AGL62

NAC100

PGLR

CESA

Bgl

At2g38180

LuACBP1

LuACBP2

LuSOD1

LuPOD1

LuPOD2

Hydrogen peroxide-dependent substrate oxidation

Hydrogen peroxide-dependent substrate oxidation

First line of defense against oxidative stress

Binds lead (Pb)

Binds lead (Pb)

Negative regulation of auxin signaling

Cell wall biosynthesis and plant development

Biosynthesis of cell wall polymers

Degradation of pectin

Phytohormone signaling and growth regulation

Plant growth, development, and stress response

Biological function

Protein name

Gene name/ID

Table 6.1 (continued)

Heme-containing glycoprotein/antioxidantenzyme

Heme-containing glycoprotein/antioxidantenzyme

Antioxidant-enzyme

Transporter protein

Transporter protein

Hydrolase enzyme

HM-Pb

HM-Pb

HM-Pb

HM-Pb

HM-lead (Pb)

HM-Al

HM-Al

HM-Al

b-glycosyltransferase Glycosyl hydrolase enzyme

HM-Al

HM-Al

HM-Al

HM-Al

Stress

Pectin-modifying enzyme

TF

TF

Vacuolar ion transporter

Molecular function

qRT-PCR

qRT-PCR

qRT-PCR

qRT-PCR

qRT-PCR

RNA-Seq

RNA-Seq

RNA-Seq

RNA-Seq

RNA-Seq

RNA-Seq

qPCR analysis

Discovery method

(continued)

Pan et al. (2020)

Pan et al. (2020)

Pan et al. (2020)

Pan et al. (2020)

Pan et al. (2020)

Krasnov et al. (2019)

Krasnov et al. (2019)

Krasnov et al. (2019)

Krasnov et al. (2019)

Krasnov et al. (2019)

Krasnov et al. (2019)

Zyablitsin et al. (2018)

References

108 B. Khadka and S. Cloutier

Protein name

N/A

N/A

N/A

ATP-binding cassette subfamily C (ABCC9)

ATP-binding cassette subfamily C (ABCC10)

ATP-binding cassette subfamily G (ABCG58)

ATP-binding cassette subfamily G (ABCG59)

ATP-binding cassette subfamily G (ABCG71)

ATP-binding cassette subfamily G (ABCG72)

Gene name/ID

miR393

miR390

miR319

LuABCC9

LuABCC10

LuABCG58

LuABCG59

LuABCG71

LuABCG72

Table 6.1 (continued)

ATPase activity, and metal ion binding and transport

ABC transporter

ABC transporter

ABC transporter

ATPase activity, and metal ion binding and transport

ATPase activity, and metal ion binding and transport

ABC transporter

ABC transporter

ATP-binding cassette (ABC) transporter

miRNA

miRNA

miRNA

Molecular function

ATPase activity, and metal ion binding and transport

ATPase activity, and metal ion binding and transport

ATPase activity, and metal ion binding and transport

Targets teosinte branched/cycloidea/PCF (TCP) transcription factors (TCP3 and TCP4) controlling growth and development

Targets growth-regulating factor 5 (GRF5) and TAS3- tasiRNAs involved in growth and development processes under stress conditions

Regulates auxin signaling F-box 2 (AFB2) protein expression

Biological function

HM-Cd

HM-Cd

HM-Cd

HM-Cd

HM-Cd

HM-cadmium (Cd)

HM-Al

HM-Al

HM-Al

Stress

OBFP

OBFP

OBFP

OBFP

OBFP

OBFP

qPCR

(continued)

Khan et al. (2020)

Khan et al. (2020)

Khan et al. (2020)

Khan et al. (2020)

Khan et al. (2020)

Khan et al. (2020)

Dmitriev et al. (2017)

Dmitriev et al. (2017)

– qPCR

References

Discovery method

6 Genetics of Abiotic Stress in Flax 109

Protein name

ATP-binding cassette subfamily G (ABCG73)

Heavy metalassociated 3 (HMA3)

Heavy metalassociated 4 (HMA4)

NAC domaincontaining protein (NAC83_ARATH)

Heat stress transcription factor A-2a

Gene name/ID

LuABCG73

LuHMA3

LuHMA4

LuNAC003

HSFA2a

Table 6.1 (continued)

Heat shock protein (HSP) synthesis

Protoxylem vessel formation in roots

ATPase activity, and metal ion binding and transport

ATPase activity, and metal ion binding and transport

ATPase activity, and metal ion binding and transport

Biological function

TF

TF

Heavy metal transporter

Heavy metal transporter

ABC transporter

Molecular function

Heat, salinity and drought

Heat, salinity and drought

HM-Cd

HM-Cd

HM-Cd

Stress

RT-qPCR

RT-qPCR

OBFP

OBFP

OBFP

Discovery method

Saha et al. (2021)

Saha et al. (2021)

Khan et al. (2020)

Khan et al. (2020)

Khan et al. (2020)

References

110 B. Khadka and S. Cloutier

6

Genetics of Abiotic Stress in Flax

by 2050 (Wang et al. 2003). Approximately, 30% of the agricultural land of the Canadian Prairies has either been salinized or is at risk of becoming salinized (Steppuhn and Wall 1999; Wall et al. 2015). Salt-stress-related production losses in flax have recently intensified (Sadak and Dawood 2014; Wu et al. 2019a). Salinityinduced stress is caused by hyperosmotic and hyperionic effects that cause membrane disorganization, metabolic toxicity, photosynthesis decline, and inhibition of enzymatic activities (Ashraf and Fatima 1994; Dubey et al. 2020; Zhao et al. 2020). Osmotic stress and toxicity stress caused by salinity were reported for several flax cultivars, leading to reductions in mineral uptake, protein, and oil content, as well as changes in fatty acid composition (Sadak and Dawood 2014). Despite these impacts, few have focused on the molecular genetic mechanisms that underpin flax’s salinity tolerance. Insights from several studies in model plant species reveal that salt tolerance is a complicated network involving the interplay of various physiological responses controlled by a number of genes and gene products (Munns and Tester 2008; van Zelm et al. 2020). This primarily includes genes that regulate salt absorption and transport, osmotic stress tolerance, and cell and tissue development of plants grown in high-salt conditions (Munns and Tester 2008). Furthermore, research on model plant organisms has revealed that drought-inducible genes and TFs are also triggered by high salinity stress, implying that drought and salt stresses are interconnected (Seki et al. 2002; Golldack et al. 2014). In order to understand the molecular mechanism of salinity tolerance, a digital gene expression (DGE) analysis in flax to examine the transcriptome profile in response to salinealkaline stress was performed (Yu et al. 2014). Under salt-induced stress, they noted an enhanced expression of genes encoding TFs (e.g., NAC), heat shock protein 70 (HSP70) and betaglucosidase. Further, a high level of expression of key regulatory genes that are involved in abiotic stresses, such as calcium-dependent protein kinase (CDPK), mitogen-activated protein kinase kinase kinase (MAPKKK), phytohormones (e.g.,

111

ABA), WRKY, peroxiredoxin (PrxR), and ion transporters, was reported. In another study, they used high-throughput sequencing to examine the microRNA (miRNA) and the corresponding degradome of flax seedlings grown under different salt stress conditions to investigate the potential epigenetic mechanisms at play in salt tolerance (Yu et al. 2016). The degradome and transcriptome revealed complementary expression between 29 miRNA and their targets under different saline conditions. Furthermore, two miRNAs, viz. miR398, and miR530, were suggested to be involved in salt stress response. The antioxidant-associated enzyme copper/zinc superoxide dismutase (Cu/ZnSOD) and the TF WRKY, both of which have been linked to salt stress, were found to be among the genes targeted by the differentially expressed miRNAs (Table 6.1). More recently, transcriptome sequencing was performed to investigate differentially expressed unigenes (DEUs) in flax under different NaCl stress conditions (Wu et al. 2019a). The authors performed quantitative reverse transcription PCR (qRT-PCR) to validate the discovered DEUs and used publicly available resources to perform large-scale analyses of EST-derived SSRs to interpret the role of genes involved in salt stress response in flax. Their transcriptome sequence analysis uncovered DEUs that were significantly altered in response to salt exposure at three different time intervals. The majority of the DEUs found were classified as being involved in signal transduction of plant hormones, photosynthesisantenna proteins, and amino acid biosynthesis. In addition, some of the DEUs were homologous to known salt-tolerant genes and abiotic stress regulating plant TFs, such as bZIP, HD-Zip, WRKY, NAC, MYB, GATA, CAMTA, and B3 (Wu et al. 2019a). The above-cited research, which led to the discovery of several genes, enzymes, and miRNAs involved in salinity tolerance in flax, is summarized in Table 6.1.

6.2.1.3 Heavy Metal Stress The occurrence of a wide range of heavy metals (HMs) in soil can either benefit or harm plants. At higher concentrations, HMs create toxicity

112

problems, inhibiting growth. The majority of hazardous HMs are present in the roots from which they are translocated to other parts of the plant (DalCorso et al. 2013). At the cellular level, HMs exert their toxicity via four major mechanisms: (1) competition for adsorption via roots because of their similarity with nutrient cations, such as nickel (Ni) competing with iron (Fe) and zinc (Zn) that can cause deficiencies in the latter, (2) disruption of the function of essential proteins by directly binding and reacting with sulfhydryl groups (–SH), (3) displacement of essential cations from their native-binding site, rendering the protein inert, and (4) production of ROS (Sharma and Dietz 2009; Lal 2010; DalCorso et al. 2013; Emamverdian et al. 2015). At the genomic level, HM stress activates multiple specific stress-induced genes and signaling pathways, including calcium signaling, hormone signaling, MAPK signaling, ROS signaling, and a slew of signaling molecules that relay HMinduced stress signals, resulting in increased expression of stress-responsive genes and TFs (Lal 2010; Kumar and Trivedi 2016). Metal chelators and transporter proteins are among the stress-responsive genes routinely activated during HM stress (Lal 2010; Li et al. 2020). Differential expression of miRNA in HM-induced stress hints at their role in HM avoidance and tolerance mechanisms (Mendoza-Soto et al. 2012; Yang and Chen 2013). Flax absorbs and accumulates high concentrations of HMs compared to other crops (Angelova et al. 2004). As such, flax is a promising candidate for phytoremediation (Smykalova et al. 2010; Saleem et al. 2020a). Flax response to lead (Pb), cadmium (Cd), zinc (Zn), copper (Cu), and nickel (Ni) has been reported in a number of studies. For instance, the effects of various HMs on the germination of flax cultivars were investigated and significant changes in root elongation among cultivars were noted (Soudek et al. 2010). Even though HMs are a problem for flax cultivation (Dmitriev et al. 2019), little is known about the underlying molecular genetic mechanisms that control flax response and tolerance to metal toxicity.

B. Khadka and S. Cloutier

Breakthroughs in genomics, transcriptomics, proteomics, and metabolomics have aided in the characterization of stress-inducible genes in flax (Table 6.1). In one study, high-throughput sequencing data revealed significant changes in the expression of glutathione S-transferase (GST) and UDP-glycosyltransferase (UGT) genes; hypothesizing that these enzymes play an important role in cell wall modification and the protection against oxidative damage induced by aluminum (Al) stress (Dmitriev et al. 2016). In another study, high-throughput sequencing and qPCR analysis led to the discovery of the altered expression of miR319, miR390, and miR393, from which they postulated a role in Al-induced stress through the regulation of target proteins like Cu/ZnSOD and WRKY family TFs that are involved in flax growth and development (Dmitriev et al. 2017). Further, the transcriptome of flax cultivars with various levels of Al tolerance was analyzed (Krasnov et al. 2019). TFs such as agamous-like MADS-box protein AGL62 and NAC domain-containing protein 100 are suggested to play key roles in regulating flax growth in response to Al-induced stress. Indeed, high levels of expression of genes encoding enzymes involved in cell wall modification, such as cellulose synthase, pectinesterase, poly-galacturonase beta-glucosidase, and GDSL esterase, were suggested as key players in flax tolerance to aluminum. In a study to investigate the effect of lead, transcriptomics of 19 flax cultivars was performed to identify varieties that were tolerant and sensitive to lead stress (Pan et al. 2020). Further, transcript-level expression analysis of stressrelated genes in these cultivars identified multiple genes that are strongly expressed when exposed to lead. Similarly, increased levels of superoxidase dismutase (SOD) and peroxidase (POD) activity in roots and shoots of L. usitatissimum grown in copper contaminated soils were reported (Saleem et al. 2020b). These enzymes elevated activity suggested their crucial role in scavenging ROS to control oxidative stress generated after exposure to a high concentration of copper. A study of zinc deficiency-

6

Genetics of Abiotic Stress in Flax

induced stress in flax under unfavorable pH revealed the induction of genes encoding metal ion transporters and proteins involved in cell wall biosynthesis and photosynthesis (Dmitriev et al. 2019). The roles of several genes involved in the mechanism of cadmium tolerance have become known in some model plant species (Gallego et al. 2012; Song et al. 2017; Fu et al. 2019; Sheng et al. 2019). In flax, to varying degrees, high levels of cadmium inhibited root growth and was accompanied by increased levels of lipid peroxides, protein oxidation, hydrogen peroxide, and membrane permeability (Belkadhi et al. 2013, 2015). Significant changes in the activities of scavenging and antioxidant-associated enzymes like catalase, guaiacol peroxidase, SOD, and ascorbate peroxidase were observed. More recently, a genome-wide analysis and characterization of ABC transporter and heavy metal-associated (HMA) domain-containing gene families was conducted using the available flax genomes (Khan et al. 2020). This study identified a total of nine ABC transporter and HMA genes, for which the functional orthologues are linked to cadmium accumulation in Arabidopsis, maize, and rice.

6.3

Conclusions and Future Directions

The abiotic stresses in plants are a primary cause of crop loss worldwide. Over the past decades, exposure to a range of abiotic stresses such as drought, salinity, heat, and heavy metal toxicity has hampered the productivity and cultivation of flax. This book chapter highlights the recent developments in identifying various genetic factors involved in flax’s response to abiotic stresses (Fig. 6.1). To date, tolerance to heat and cold stresses remains largely unexplored. Recent progress in molecular genetics and genomics have contributed significantly to our understanding of the basic mechanisms of plant responses to various abiotic stresses. The available flax genome resources (Cloutier et al. 2012b; Wang et al. 2012; You et al. 2018)

113

enabled the identification of potential QTLs, genes, and gene families implicated in abiotic stress tolerance. Most of the abiotic stressresponsive genes reported in flax are comprised of functional and regulatory genes. The functional genes encode for proteins (such as HSP, ABC transporters, SOD, POD, GST, and UGT) that are directly engaged in the protection against abiotic stresses, whereas the regulatory genes include those encoding transcription factors (e.g., GRAS23, NAC 100), miRNAs (e.g., miR390, miR398), and protein kinases (e.g., CDPK, CKS), which modulate signal transduction and gene expression during stress events. The stressspecific overexpression of regulatory genes encoding TFs such as WRKY, NAC has been shown to enhance drought and salt tolerance in model plant species (Datta et al. 2012; Goel and Madan 2014; Nakashima et al. 2014; Jiang et al. 2019). To date, there have been no reports of genetically engineering flax for abiotic stress resistance. The increasing availability of highquality flax genomics and transcriptomics data will aid in overcoming the hurdles provided by flax genome complexity and facilitate the identification of potential stress-responsive genes and their crosstalk during abiotic stress signaling. Further, the discovery of novel stressinducible genes and the characterization of their transcripts may shed light on gaps in our knowledge of complicated flax responses to a variety of environmental stressors. Additionally, the systematic dissection of QTLs determining the genetic diversity related to abiotic stress will aid researchers in developing more tailored and effective flax genotypes for abiotic stress conditions. The QTLs and putative genes can be exploited using cutting-edge technologies like transcription activator-like effector nucleases (TALENs) and CRISPR/Cas9-based genome editing to introduce abiotic stress tolerance in flax (Sauer et al. 2016; Yolcu et al. 2020; Zaidi et al. 2020). Importantly, implementing such advanced techniques will necessitate significant investments in flax research and will be critical to continually improving our understanding of the genetics of abiotic stress in flax. Overall, an improved understanding of genetic factors

114

B. Khadka and S. Cloutier

Fig. 6.1 Flax abiotic stress tolerance: current methodologies and gene discoveries

involved in abiotic stress tolerance will be essential for informed biological improvement and will aid in developing flax cultivars resilient to environmental challenges.

References Abdelraheem A, Mahdy E, Zhang J (2015) The first linkage map for a recombinant inbred line population in cotton (Gossypium barbadense) and its use in studies of PEG-induced dehydration tolerance. Euphytica 205:941–958 Adugna W, Labuschangne M, Viljoen C (2006) The use of morphological and AFLP markers in diversity analysis of linseed. Biodiver Conserv 15:3193–3205 Ali E, Saand MA, Khan AR, Shah JM, Feng S et al (2021) Genome-wide identification and expression analysis of detoxification efflux carriers (DTX) genes family under abiotic stresses in flax. Physiol Plant 171:483–501

Angelova V, Ivanova R, Delibaltova I, Ivanov K (2004) Bio-accumulation and distribution of heavy metals in fibre crops (flax, cotton and hemp). Ind Crops Prod 19:197–205 Asgarinia P, Mirlohi A, Saeidi G, Mohamadi Mirik A, Gheysari M et al (2017) Selection criteria for assessing drought tolerance in a segregating population of flax (Linum usitatissimum L.). Can J Plant Sci 97:424–437 Ashraf M (2010) Inducing drought tolerance in plants: Recent advances. Biotechnol Adv 28:169–183 Ashraf M, Fatima H (1994) Intra-specific variation for salt tolerance in linseed (Linum usitatissimum L.). J Agro Crop Sci 173:193–203 Asif MA, Garcia M, Tilbrook J, Brien C, Dowling K et al (2021) Identification of salt tolerance QTL in a wheat RIL mapping population using destructive and nondestructive phenotyping. Funct Plant Biol 48:131–140 Atkinson NJ, Lilley CJ, Urwin PE (2013) Identification of genes involved in the response of Arabidopsis to simultaneous biotic and abiotic stresses. Plant Physiol 162:2028–2041 Barik SR, Pandit E, Mohanty SP, Nayak DK, Pradhan SK (2020) Genetic mapping of physiological traits

6

Genetics of Abiotic Stress in Flax

associated with terminal stage drought tolerance in rice. BMC Genet 21:76 Belkadhi A, De Haro A, Soengas P, Obregon S, Cartea ME et al (2013) Salicylic acid improves root antioxidant defense system and total antioxidant capacities of flax subjected to cadmium. OMICS 17:398–406 Belkadhi A, De Haro A, Obregon S, Chaibi W, Djebali W (2015) Exogenous salicylic acid protects phospholipids against cadmium stress in flax (Linum usitatissimum L.). Ecotoxicol Environ Saf 120:102–109 Bellah A, El-Mouhamady A, Gad AAM, Karim GSAA (2021) Biochemical and molecular genetic markers associated with salinity tolerance in flax (Linum usitatissimum L.). Ann Rom Soc Cell Biol 25:4828– 4844 Bennett D, Reynolds M, Mullan D, Izanloo A, Kuchel H et al (2012) Detection of two major grain yield QTL in bread wheat (Triticum aestivum L.) under heat, drought and high yield potential environments. Theor Appl Genet 125:1473–1485 Brás TA, Seixas J, Carvalhais N, Jägermeyr J (2021) Severity of drought and heatwave crop losses tripled over the last five decades in Europe. Env Res Lett 16:065012 Chandrawati, Yadav HK (2017) Development of linkage map and mapping of QTLs for oil content and yield attributes in linseed (Linum usitatissimum L.). Euphytica 213:258 Chandrawati MR, Singh PK, Ranade SA, Yadav HK (2014) Diversity analysis in Indian genotypes of linseed (Linum usitatissimum L.) using AFLP markers. Gene 549:171–178 Chen L, Wang Q, Tang M, Zhang X, Pan Y et al (2020) QTL mapping and identification of candidate genes for heat tolerance at the flowering stage in rice. Front Genet 11:621871 Chen X, Ding Y, Yang Y, Song C, Wang B et al (2021) Protein kinases in plant responses to drought, salt, and cold stress. J Integr Plant Biol 63:53–78 Ciais P, Reichstein M, Viovy N, Granier A, Ogee J et al (2005) Europe-wide reduction in primary productivity caused by the heat and drought in 2003. Nature 437:529–533 Cloutier S, Niu Z, Datla R, Duguid S (2009) Development and analysis of EST-SSRs for flax (Linum usitatissimum L.). Theor Appl Genet 119:53–63 Cloutier S, Ragupathy R, Niu Z, Duguid S (2011) SSRbased linkage map of flax (Linum usitatissimum L.) and mapping of QTLs underlying fatty acid composition traits. Mol Breed 28:437–451 Cloutier S, Miranda E, Ward K, Radovanovic N, Reimer E et al (2012a) Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.). Theor Appl Genet 125:685–694 Cloutier S, Ragupathy R, Miranda E, Radovanovic N, Reimer E et al (2012b) Integrated consensus genetic and physical maps of flax (Linum usitatissimum L.). Theor Appl Genet 125:1783–1795

115 Cloutier S, You FM, Soto-Cerda BJ (2019) Linum genetic markers, maps, and QTL discovery. In: Cullis CA (ed) Genetics and genomics of Linum. Springer International Publishing, pp 97–117 Cui D, Wu D, Somarathna Y, Xu C, Li S et al (2015) QTL mapping for salt tolerance based on SNP markers at the seedling stage in maize (Zea mays L.). Euphytica 203:273–283 Cullis CA, Swami S, Song Y (1999) RAPD polymorphisms detected among the flax genotrophs. Plant Mol Biol 41:795–800 DalCorso G, Manara A, Furini A (2013) An overview of heavy metal challenge in plants: from roots to shoots. Metallomics 5:1117–1132 Dash PK, Cao Y, Jailani AK, Gupta P, Venglat P et al (2014) Genome-wide analysis of drought induced gene expression changes in flax (Linum usitatissimum). GM Crops Food 5:106–119 Dash PK, Rai R, Mahato AK, Gaikwad K, Singh NK (2017) Transcriptome landscape at different developmental stages of a drought tolerant cultivar of flax (Linum usitatissimum). Front Chem 5:82 Datta K, Baisakh N, Ganguly M, Krishnan S, Yamaguchi Shinozaki K et al (2012) Overexpression of Arabidopsis and rice stress genes’ inducible transcription factor confers drought and salinity tolerance to rice. Plant Biotechnol J 10:579–586 Diffenbaugh NS, Singh D, Mankin JS, Horton DE, Swain DL et al (2017) Quantifying the influence of global warming on unprecedented extreme climate events. Proc Natl Acad Sci U S A 114:4881–4886 Diouf L, Pan Z, He SP, Gong WF, Jia YH et al (2017) High-density linkage map construction and mapping of salt-tolerant QTLs at seedling stage in upland cotton using genotyping by sequencing (GBS). Int J Mol Sci 18:2622 Dmitriev AA, Krasnov GS, Rozhmina TA, Kishlyan NV, Zyablitsin AV et al (2016) Glutathione S-transferases and UDP-glycosyltransferases are involved in response to aluminum stress in flax. Front Plant Sci 7:1920 Dmitriev AA, Kudryavtseva AV, Bolsheva NL, Zyablitsin AV, Rozhmina TA et al (2017) miR319, miR390, and miR393 are involved in aluminum response in flax (Linum usitatissimum L.). Biomed Res Int 2017:4975146 Dmitriev AA, Krasnov GS, Rozhmina TA, Zyablitsin AV, Snezhkina AV et al (2019) Flax (Linum usitatissimum L.) response to non-optimal soil acidity and zinc deficiency. BMC Plant Biol 19:54 Dmitriev AA, Novakovskiy RO, Pushkova EN, Rozhmina TA, Zhuchenko AA et al (2020a) Transcriptomes of different tissues of flax (Linum usitatissimum L.) cultivars with diverse characteristics. Front Genet 11:565146 Dmitriev AA, Pushkova EN, Novakovskiy RO, Beniaminov AD, Rozhmina TA et al (2020b) Genome sequencing of fiber flax cultivar atlant using Oxford nanopore and Illumina platforms. Front Genet 11:590282

116 Dubey S, Bhargava A, Fuentes F, Shukla S, Srivastava S (2020) Effect of salinity stress on yield and quality parameters in flax (Linum usitatissimum L.). Not Bot Horti Agrobo Cluj-Napoca 48:954–966 El-Beltagi HS, Salama ZA, El-Hariri DA (2008) Some biochemical markers for evaluation of flax cultivars under salt stress conditions. J Nat Fibers 5:316–330 Emamverdian A, Ding Y, Mokhberdoran F, Xie Y (2015) Heavy metal stress and some mechanisms of plant defense response. Sci World J 2015:756120 Everaert I, De-Riek J, De-Loose M, Van-Waes J, VanBockstaele E (2001) Most similar variety grouping for distinctness evaluation of flax and linseed (Linum usitatissimum L.) varieties by means of AFLP and morphological data. Plant Vari Seeds 14:69–87 Fan Y, Shabala S, Ma Y, Xu R, Zhou M (2015) Using QTL mapping to investigate the relationships between abiotic stress tolerance (drought and salinity) and agronomic and physiological traits. BMC Genomics 16:43 Fedoroff NV, Battisti DS, Beachy RN, Cooper PJ, Fischhoff DA et al (2010) Radically rethinking agriculture for the 21st century. Science 327:833–834 Ferdous J, Hussain SS, Shi BJ (2015) Role of microRNAs in plant drought tolerance. Plant Biotechnol J 13:293– 305 Fu Y, Peterson G, Diederichsen A, Richards K (2002) RAPD analysis of genetic relationships of seven flax species in the genus Linum L. Genet Res Crop Evol 49:253–259 Fu S, Lu Y, Zhang X, Yang G, Chao D et al (2019) The ABC transporter ABCG36 is required for cadmium tolerance in rice. J Exp Bot 70:5909–5918 Gallego SM, Pena LB, Barcia RA, Azpilicueta CE, Iannone MF et al (2012) Unravelling cadmium toxicity and tolerance in plants: insight into regulatory mechanisms. Environ Exp Bot 83:33–46 Godfray HC, Beddington JR, Crute IR, Haddad L, Lawrence D et al (2010) Food security: the challenge of feeding 9 billion people. Science 327:812–818 Goel S, Madan B (2014) Genetic engineering of crop plants for abiotic stress tolerance. In: Ahmad P, Rasool S (eds) Emerging technologies and management of crop stress tolerance. Academic Press, San Diego, pp 99–123 Golldack D, Lüking I, Yang O (2011) Plant tolerance to drought and salinity: stress regulating transcription factors and their functional significance in the cellular transcriptional network. Plant Cell Rep 30:1383–1391 Golldack D, Li C, Mohan H, Probst N (2014) Tolerance to drought and salt stress in plants: unraveling the signaling networks. Front Plant Sci 5:151 Gorshkova T, Chernova T, Mokshina N, Gorshkov V, Kozlova L et al (2018) Transcriptome analysis of intrusively growing flax fibers isolated by laser microdissection. Sci Rep 8:14570 Guo Y, Cs Q, Sh L, Deng X, Dm H et al (2013) The effect of salinity-alkalinity stress on seed germination of main flax cultivars in different region (Linum usitatissimum L.). Seed 12:001

B. Khadka and S. Cloutier Guo D, Jiang H, Yan W, Yang L, Ye J et al (2019) Resequencing 200 flax cultivated accessions identifies candidate genes related to seed size and weight and reveals signatures of artificial selection. Front Plant Sci 10:1682 Gusta L, O’Connor B, Bhatty R (1996) Flax (Linum usitatissimum L.) responses to chilling and heat stress on flowering and seed yield. Can J Plant Sci 77:97–99 Haak DC, Fukao T, Grene R, Hua Z, Ivanov R et al (2017) Multilevel regulation of abiotic stress responses in plants. Front Plant Sci 8 Halford NG, Curtis TY, Chen Z, Huang J (2015) Effects of abiotic stress and crop management on cereal grain composition: implications for food quality and safety. J Exp Bot 66:1145–1156 Hausner G, Rashid K, Kenaschuk E, Procunier J (1999) The development of codominant PCR/RFLP based markers for the flax rust-resistance alleles at the L locus. Genome 42:1–8 Heller K, Byczyńska M (2015) The impact of environmental factors and applied agronomy on quantitative and qualitative traits of flax fiber. J Nat Fibers 12:26–38 Hirabayashi Y, Mahendran R, Koirala S, Konoshima L, Yamazaki D et al (2013) Global flood risk under climate change. Nat Clim Change 3:816–821 International Wheat Genome Sequencing Consortium (IWGSC) (2018) Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361:eaar7191 Ivanova R, Angelova V, Delibaltova V, Ivanov K, Shamov D (2003) Accumulation of heavy metals in fibre crops flax, cotton and hemp. J Environ Protect Ecol 4:31–38 Jarzyniak KM, Jasiński M (2014) Membrane transporters and drought resistance—a complex issue. Front Plant Sci 5:687 Jiang D, Zhou L, Chen W, Ye N, Xia J et al (2019) Overexpression of a microRNA-targeted NAC transcription factor improves drought and salt tolerance in rice via ABA-mediated pathways. Rice 12:76 Kariuki LW, Masinde P, Githiri S, Onyango AN (2016) Effect of water stress on growth of three linseed (Linum usitatissimum L.) varieties. Springerplus 5:759–759 Khan N, You FM, Datla R, Ravichandran S, Jia B et al (2020) Genome-wide identification of ATP binding cassette (ABC) transporter and heavy metal associated (HMA) gene families in flax (Linum usitatissimum L.). BMC Genomics 21:722 Krasnov GS, Dmitriev AA, Zyablitsin AV, Rozhmina TA, Zhuchenko AA et al (2019) Aluminum responsive genes in flax (Linum usitatissimum L.). Biomed Res Int 2019:5023125 Kumar S, Trivedi PK (2016) Heavy metal stress signaling in plants. In: Ahmad P (ed) Plant metal interaction. Elsevier, pp 585–603 Kumar S, You FM, Duguid S, Booker H, Rowland G et al (2015) QTL for fatty acid composition and yield in linseed (Linum usitatissimum L.). Theor Appl Genet 128:965–984

6

Genetics of Abiotic Stress in Flax

Kumari A, Paul S, Sharma V (2017) Genetic diversity analysis using RAPD and ISSR markers revealed discrete genetic makeup in relation to fibre and oil content in Linum usitatissimum L. genotypes. Nucleus 61:45–53 Kuromori T, Miyaji T, Yabuuchi H, Shimizu H, Sugimoto E et al (2010) ABC transporter AtABCG25 is involved in abscisic acid transport and responses. Proc Natl Acad Sci U S A 107:2361–2366 Kvavadze E, Bar-Yosef O, Belfer-Cohen A, Boaretto E, Jakeli N et al (2009) 30,000-year-old wild flax fibers. Science 325:1359 Lal N (2010) Molecular mechanisms and genetic basis of heavy metal toxicity and tolerance in plants. In: Ashraf M, Ozturk M, Ahmad MSA (eds) Plant adaptation and phytoremediation. Springer, Netherlands, Dordrecht, pp 35–58 Lamaoui M, Jemo M, Datla R, Bekkaoui F (2018) Heat and drought stresses in crops and approaches for their mitigation. Front Chem 6:26 Leng G, Hall J (2019) Crop yield sensitivity of global major agricultural countries to droughts and the projected changes in the future. Sci Total Environ 654:811–821 Lesk C, Rowhani P, Ramankutty N (2016) Influence of extreme weather disasters on global crop production. Nature 529:84–87 Li J, Zhang M, Sun J, Mao X, Wang J et al (2020) Heavy metal stress-associated proteins in rice and Arabidopsis: genome-wide identification, phylogenetics, duplication, and expression profiles analysis. Front Genet 11:477 Liu L, Li Y, Li S, Hu N, He Y et al (2012) Comparison of next-generation sequencing systems. J Biomed Biotechnol 2012:251364 Luo M, Zhao Y, Zhang R, Xing J, Duan M et al (2017) Mapping of a major QTL for salt tolerance of mature field-grown maize plants based on SNP markers. BMC Plant Biol 17:140 Ma X, Su Z, Ma H (2020) Molecular genetic analyses of abiotic stress responses during plant reproductive development. J Exp Bot 71:2870–2885 Mahfouze H, Mahfouze S (2017) Assessment of flax varieties for drought tolerance. Annu Res Rev Bio 21:1–12 McDill J, Repplinger M, Simpson B, Kadereit J (2009) The phylogeny of Linum and Linaceae subfamily Linoideae, with implications for their systematics, biogeography, and evolution of heterostyly. Syst Bot 34:386–405 Melelli A, Shah DU, Hapsari G, Cortopassi R, Durand S et al (2021) Lessons on textile history and fibre durability from a 4,000-year-old Egyptian flax yarn. Nat Plants 7:1200–1206 Mendoza-Soto AB, Sanchez F, Hernandez G (2012) MicroRNAs as regulators in plant metal toxicity response. Front Plant Sci 3:105 Mittler R, Blumwald E (2010) Genetic engineering for modern agriculture: challenges and perspectives. Annu Rev Plant Biol 61:443–462

117 Munns R, Tester M (2008) Mechanisms of salinity tolerance. Annu Rev Plant Biol 59:651–681 Nakashima K, Yamaguchi-Shinozaki K, Shinozaki K (2014) The transcriptional regulatory network in the drought response and its crosstalk in abiotic stress responses including drought, cold, and heat. Front Plant Sci 5:170 Oh T, Gorman M, Cullis C (2000) RFLP and RAPD mapping in flax (Linum usitatissimum). Theor Appl Genet 101:590–593 Pan G, Zhao L, Li J, Huang S, Tang H et al (2020) Physiological responses and tolerance of flax (Linum usitatissimum L.) to lead stress. Acta Physiol Plant 42:113 Patra M, Bhowmik N, Bandopadhyay B, Sharma A (2004) Comparison of mercury, lead and arsenic with respect to genotoxic effects on plant systems and the development of genetic tolerance. Env Exp Botany 52:199–223 Purugganan MD, Jackson SA (2021) Advancing crop genomics from lab to field. Nat Genet 53:595–601 Qi X, Wang X, Xu J, Zhang J, Mi J (2010) Droughtresistance evaluation of flax germplasm at adult plant stage. Scientia Agric Sinica 43:3076–3087 Ragupathy R, Rathinavelu R, Cloutier S (2011) Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome. BMC Genomics 12:217 Rice A, Glick L, Abadi S, Einhorn M, Kopelman NM et al (2015) The chromosome counts database (CCDB)—a community resource of plant chromosome numbers. New Phytol 206:19–26 Sa R, Yi L, Siqin B, An M, Bao H et al (2021) Chromosome-level genome assembly and annotation of the fiber flax (Linum usitatissimum) genome. Front Genet 12:735690 Sadak M, Dawood M (2014) Role of ascorbic acid and a tocopherol in alleviating salinity stress on flax plant (Linum usitatissimum L.). J Stress Physiol Biochem 10:93–111 Sah SK, Reddy KR, Li J (2016) Abscisic acid and abiotic stress tolerance in crop plants. Front Plant Sci 7:571 Saha D, Shaw AK, Datta S, Mitra JJE, Botany E (2021) Evolution and functional diversity of abiotic stressresponsive NAC transcription factor genes in Linum usitatissimum L. Env Exp Botany 188:104512 Saleem MH, Ali S, Hussain S, Kamran M, Chattha MS et al (2020a) Flax (Linum usitatissimum L.): a potential candidate for phytoremediation? Biological and economical points of view. Plants 9:496 Saleem MH, Fahad S, Khan SU, Din M, Ullah A et al (2020b) Copper-induced oxidative stress, initiation of antioxidants and phytoremediation potential of flax (Linum usitatissimum L.) seedlings grown under the mixing of two different soils of China. Environ Sci Pollut Res Int 27:5211–5221 Sauer NJ, Narváez-Vásquez J, Mozoruk J, Miller RB, Warburg ZJ et al (2016) Oligonucleotide-mediated genome editing provides precision and function to

118 engineered nucleases and antibiotics in plants. Plant Physiol 170:1917–1928 Schmidt TJ, Klaes M, Sendker J (2012) Lignans in seeds of Linum species. Phytochemistry 82:89–99 Seki M, Ishida J, Narusaka M, Fujita M, Nanjo T et al (2002) Monitoring the expression pattern of around 7,000 Arabidopsis genes under ABA treatments using a full-length cDNA microarray. Funct Integr Genomics 2:282–291 Seki M, Kamei A, Yamaguchi-Shinozaki K, Shinozaki K (2003) Molecular responses to drought, salinity and frost: common and different paths for plant protection. Curr Opin Biotechnol 14:194–199 Sertse D, You FM, Ravichandran S, Cloutier S (2019) The complex genetic architecture of early root and shoot traits in flax revealed by genome-wide association analyses. Front Plant Sci 10:1483–1483 Sertse D, You FM, Ravichandran S, Soto-Cerda BJ, Duguid S et al (2021) Loci harboring genes with important role in drought and related abiotic stress responses in flax revealed by multiple GWAS models. Theor Appl Genet 134:191–212 Shanker AK, Maheswari M, Yadav SK, Desai S, Bhanu D et al (2014) Drought stress responses in crops. Funct Integr Genom 14:11–22 Sharma SS, Dietz KJ (2009) The relationship between metal toxicity and cellular redox imbalance. Trends Plant Sci 14:43–50 Sharma PC, Sehgal D, Singh D, Singh G, Yadav RS (2011) A major terminal drought tolerance QTL of pearl millet is also associated with reduced salt uptake and enhanced growth under salt stress. Mol Breed 27:207–222 Sharma J, Tomar S, Shivran R, Chandra P (2012) Water requirement water use efficiency consumptive use yield and quality parameters of linseed (Linum usitatissimum L.) varieties as influenced by fertility levels irrigation scheduling. Adv Life Sci 1:180–182 Sharma PC, Singh D, Sehgal D, Singh G, Hash CT et al (2014) Further evidence that a terminal drought tolerance QTL of pearl millet is associated with reduced salt uptake. Environ Exp Bot 102:48–57 Sharma DK, Torp AM, Rosenqvist E, Ottosen C-O, Andersen SB (2017) QTLs and potential candidate genes for heat stress tolerance identified from the mapping populations specifically segregating for Fv/ Fm in wheat. Front Plant Sci 8:1668 Shayan M, Kamalian S, Sahebkar A, Tayarani-Najaran Z (2020) Flaxseed for health and disease: Review of clinical trials. Comb Chem High Throughput Screen 23:699–722 Sheng Y, Yan X, Huang Y, Han Y, Zhang C et al (2019) The WRKY transcription factor, WRKY13, activates PDR8 expression to positively regulate cadmium tolerance in Arabidopsis. Plant Cell Environ 42:891– 903 Shi H, Chen L, Ye T, Liu X, Ding K et al (2014) Modulation of auxin content in Arabidopsis confers improved drought stress resistance. Plant Physiol Biochem 82:209–217

B. Khadka and S. Cloutier Singh KK, Mridula D, Rehal J, Barnwal P (2011) Flaxseed: a potential source of food, feed and fiber. Crit Rev Food Sci Nutr 51:210–222 Smekalova V, Doskocilova A, Komis G, Samaj J (2014) Crosstalk between secondary messengers, hormones and MAPK modules during abiotic stress signalling in plants. Biotechnol Adv 32:2–11 Smykalova I, Vrbova M, Tejklova E, Vetrocova M, Griga M (2010) Large scale screening of heavy metal tolerance in flax/linseed (Linum usitatissimum L.) tested in vitro. Ind Crop Prod 32:527–533 Song J, Feng SJ, Chen J, Zhao WT, Yang ZM (2017) A cadmium stress-responsive gene AtFC1 confers plant tolerance to cadmium toxicity. BMC Plant Biol 17:187 Soto-Cerda B, Carrasco R, Aravena G, Urbina H, Navarro C (2011) Identifying novel polymorphic microsatellites from cultivated flax (Linum usitatissimum L.) following data mining. Plant Mol Biol Report 29:753–759 Soto-Cerda BJ, Duguid S, Booker H, Rowland G, Diederichsen A et al (2014) Genomic regions underlying agronomic traits in linseed (Linum usitatissimum L.) as revealed by association mapping. J Integr Plant Biol 56:75–87 Soto-Cerda BJ, Cloutier S, Quian R, Gajardo HA, Olivos M et al (2018) Genome-wide association analysis of mucilage and hull content in flax (Linum usitatissimum L.) seeds. Int J Mol Sci 19:2870 Soto-Cerda BJ, Cloutier S, Gajardo HA, Aravena G, Quian R (2019) Identifying drought-resilient flax genotypes and related-candidate genes based on stress indices, root traits and selective sweep. Euphytica 215:41 Soto-Cerda BJ, Cloutier S, Gajardo HA, Aravena G, Quian R et al (2020) Drought response of flax accessions and identification of quantitative trait nucleotides (QTNs) governing agronomic and root traits by genome-wide association analysis. Mol Breed 40:15 Soudek P, Katrusakova A, Sedlacek L, Petrova S, Koci V et al (2010) Effect of heavy metals on inhibition of root elongation in 23 cultivars of flax (Linum usitatissimum L.). Arch Environ Contam Toxicol 59:194–203 Spielmeyer W, Green AG, Bittisnich DJ, Mendham NJ, Lagudah ES (1998) Identification of quantitative trait loci contributing to Fusarium wilt resistance on an AFLP linkage map of flax (Linum usitatissimum). Theor Appl Genet 97:633–641 Steppuhn H, Wall K (1999) Canada’s salt tolerance testing laboratory. Can Agri Eng 41:185–190 Sudarshan GP, Kulkarni M, Akhov L, Ashe P, Shaterian H et al (2017) QTL mapping and molecular characterization of the classical D locus controlling seed and flower color in Linum usitatissimum (flax). Sci Rep 7:15751 Takahashi F, Kuromori T, Sato H, Shinozaki K (2018) Regulatory gene networks in drought stress responses and resistance in plants. Adv Exp Med Biol 1081:189–214

6

Genetics of Abiotic Stress in Flax

Todaka D, Nakashima K, Shinozaki K, YamaguchiShinozaki K (2012) Toward understanding transcriptional regulatory networks in abiotic stress responses and tolerance in rice. Rice 5:6 Umezawa T, Fujita M, Fujita Y, Yamaguchi-Shinozaki K, Shinozaki K (2006) Engineering drought tolerance in plants: discovering and tailoring genes to unlock the future. Curr Opin Biotechnol 17:113–122 Van Zeist WB, Heeres JAH (1975) Evidence for linseed cultivation before 6000 B.C. J Archeol Sci 2:14–827 van Zelm E, Zhang Y, Testerink C (2020) Salt tolerance mechanisms of plants. Annu Rev Plant Biol 71:403– 433 Varshney RK, Bansal KC, Aggarwal PK, Datta SK, Craufurd PQ (2011) Agricultural biotechnology for crop improvement in a variable climate: hope or hype? Trends Plant Sci 16:363–371 Venglat P, Xiang D, Qiu S, Stone SL, Tibiche C et al (2011) Gene expression analysis of flax seed development. BMC Plant Biol 11:74 Wall K, Steppuhn H, Gatzke M (2015) Agriculture and agri-food Canada’s salinity tolerance testing laboratory. Soils and crops workshop. University of Saskatchewan, Saskatoon (SK), Canada, pp 1–11 Wambugu PW, Ndjiondjop M-N, Henry RJ (2018) Role of genomics in promoting the utilization of plant genetic resources in genebanks. Brief Funct Genom 17:198–206 Wang W, Vinocur B, Altman A (2003) Plant responses to drought, salinity and extreme temperatures: towards genetic engineering for stress tolerance. Planta 218:1– 14 Wang Z, Hobson N, Galindo L, Zhu S, Shi D et al (2012) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72:461–473 Wang N, Zhao SZ, Lv MH, Xiang FN, Li S (2016) Research progress on identification of QTLs and functional genes involved in salt tolerance in soybean. Yi Chuan 38:992–1003 Wang W, Wang L, Wang L, Tan M, Ogutu CO et al (2021) Transcriptome analysis and molecular mechanism of linseed (Linum usitatissimum L.) drought tolerance under repeated drought using singlemolecule long-read sequencing. BMC Genomics 22:109 Wheaton E, Kulshreshtha S, Wittrock V, Koshida G (2008) Dry times: hard lessons from the Canadian drought of 2001 and 2002. Can Geograph 52:241–262 Wu J, Zhao Q, Zhang L, Li S, Ma Y et al (2018) QTL mapping of fiber-related traits based on a high-density genetic map in flax (Linum usitatissimum L.). Front Plant Sci 9:885 Wu J, Zhao Q, Wu G, Yuan H, Ma Y et al (2019a) Comprehensive analysis of differentially expressed unigenes under NaCl stress in flax (Linum usitatissimum L.) using RNA-Seq. Int J Mol Sci 20:369 Wu W, Nemri A, Blackman LM, Catanzariti AM, Sperschneider J et al (2019b) Flax rust infection transcriptomics reveals a transcriptional profile that

119 may be indicative for rust Avr genes. PLoS ONE 14: e0226106 Xie D, Dai Z, Yang Z, Tang Q, Sun J et al (2018) Genomic variations and association study of agronomic traits in flax. BMC Genomics 19:512 Xiong L, Wang R-G, Mao G, Koczan JM (2006) Identification of drought tolerance determinants by genetic analysis of root response to drought stress and abscisic acid. Plant Physiol 142:1065–1074 Xue D, Huang Y, Zhang X, Wei K, Westcott S et al (2009) Identification of QTLs associated with salinity tolerance at late growth stage in barley. Euphytica 169:187–196 Yamaguchi-Shinozaki K, Shinozaki K (2006) Transcriptional regulatory networks in cellular responses and tolerance to dehydration and cold stresses. Annu Rev Plant Biol 57:781–803 Yang ZM, Chen J (2013) A potential role of microRNAs in plant response to metal toxicity. Metallomics 5:1184–1190 Yi L, Gao F, Siqin B, Zhou Y, Li Q et al (2017) Construction of an SNP-based high-density linkage map for flax (Linum usitatissimum L.) using specific length amplified fragment sequencing (SLAF-seq) technology. PLoS ONE 12:e0189785 Yolcu S, Alavilli H, Lee BH (2020) Natural genetic resources from diverse plants to improve abiotic stress tolerance in plants. Int J Mol Sci 21:8567 Yoshida T, Fujita Y, Sayama H, Kidokoro S, Maruyama K et al (2010) AREB1, AREB2, and ABF3 are master transcription factors that cooperatively regulate ABRE-dependent ABA signaling involved in drought stress tolerance and require ABA for full activation. Plant J 61:672–685 You FM, Cloutier S (2020) Mapping quantitative trait loci onto chromosome-scale pseudomolecules in flax. Methods Protoc 3:28 You FM, Xiao J, Li P, Yao Z, Jia G et al (2018) Chromosome-scale pseudomolecules refined by optical, physical and genetic maps in flax. Plant J 95:371– 384 Yu Y, Huang W, Chen H, Wu G, Yuan H et al (2014) Identification of differentially expressed genes in flax (Linum usitatissimum L.) under saline-alkaline stress by digital gene expression. Gene 549:113–122 Yu Y, Wu G, Yuan H, Cheng L, Zhao D et al (2016) Identification and characterization of miRNAs and targets in flax (Linum usitatissimum) under saline, alkaline, and saline-alkaline stresses. BMC Plant Biol 16:124 Yuan H, Guo W, Zhao L, Yu Y, Chen S et al (2021) Genome-wide identification and expression analysis of the WRKY transcription factor family in flax (Linum usitatissimum L.). BMC Genomics 22:375 Zaidi SS-e-A, Mahas A, Vanderschuren H, Mahfouz MM (2020) Engineering crops of the future: CRISPR approaches to develop climate-resilient and diseaseresistant plants. Genome Biol 21:289 Zhang J, Klueva NY, Wang Z, Wu R, Ho T-HD et al (2000) Genetic engineering for abiotic stress

120 resistance in crop plants. In Vitro Cell Dev Biol Plant 36:108–114 Zhang J, Zhang S, Cheng M, Jiang H, Zhang X et al (2018) Effect of drought on agronomic traits of rice and wheat: a meta-analysis. Int J Environ Res Public Health 15:839 Zhang J, Qi Y, Wang L, Wang L, Yan X et al (2020) Genomic comparison and population diversity analysis provide insights into the domestication and improvement of flax. iScience 23:100967 Zhang H, Zhu J, Gong Z, Zhu J-K (2021a) Abiotic stress responses in plants. Nat Rev Genet. https://doi.org/10. 1038/s41576-41021-00413-41570 Zhang Y, Liu Z, Wang X, Li Y, Li Y et al (2021b) Identification of genes for drought resistance and prediction of gene candidates in soybean seedlings based on linkage and association mapping. Crop J. https://doi.org/10.1016/j.cj.2021.1007.1010

B. Khadka and S. Cloutier Zhao C, Zhang H, Song C, Zhu JK, Shabala S (2020) Mechanisms of plant responses and adaptation to soil salinity. Innovation 1:100017 Zhao T, Wu T, Pei T, Wang Z, Yang H et al (2021) Overexpression of SlGATA17 promotes drought tolerance in transgenic tomato plants by enhancing activation of the phenylpropanoid biosynthetic pathway. Front Plant Sci 12:279 Zhou M, Li D, Li Z, Hu Q, Yang C et al (2013) Constitutive expression of a miR319 gene alters plant development and enhances salt and drought tolerance in transgenic creeping bentgrass. Plant Physiol 161:1375–1391 Zhu JK (2016) Abiotic stress signaling and responses in plants. Cell 167:313–324 Zyablitsin AV, Dmitriev AA, Krasnov GS, Bolsheva NL, Rozhmina TA et al (2018) CAX3 gene is involved in flax response to high soil acidity and aluminum exposure. Mol Biol (Mosk) 52:595–600

7

QTL and Candidate Genes for Flax Disease Resistance Chunfang Zheng, Khalid Y. Rashid, Sylvie Cloutier, and Frank M. You

7.1

Introduction

Rust, pasmo (PAS), powdery mildew (PM), and Fusarium wilt or wilt (FW) are four major fungal diseases of flax in Canada, causing yield losses and seed and fiber quality reduction. Rust, caused by a fungus Melampsora lini Ehrenb. (Lev.), can overwinter through teliospores on flax debris and complete its life cycle on the flax plant without an alternate host. Severe infections may result in complete defoliation of the plants. Flax rust resistance is race-specific and controlled by several major genes. Some gene families (K, L, M, N, and P) have been identified, and all have been cloned except K (Lawrence et al. 1995, 2010; Anderson et al. 1997; Ellis et al. 1999; Dodds et al. 2001a, b). PAS, PM, and FW are three fungal diseases caused by Septoria linicola, Oidium lini, and Fusarium oxysporum f.sp. lini., respectively. Resistance to these three diseases is a heritable and quantitative. FW is a seedborne and soil-

C. Zheng  S. Cloutier  F. M. You (&) Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada e-mail: [email protected] K. Y. Rashid Morden Research and Development Centre, Agriculture and Agri-Food Canada, 101 Route 100, Morden, MB R6M 1Y5, Canada

borne fungus, whereas PAS and PM can overwinter in the soil on infected flax stubbles (Rashid 1998; Rashid et al. 1998; Vera et al. 2012). PAS infects the above-ground parts of the flax plants from the seedling stage to maturity. PAS thrives under favorable conditions of high humidity and warm temperatures, causing defoliation and premature ripening that result in reduced seed yield as well as seed and fiber quality (Rashid 2000; Vera et al. 2012). PM generates a white powdery mass of mycelia that starts as small spots and rapidly spreads to cover the entire leaf surface. Early infections of PM may cause severe defoliation of the flax plants (Rashid 1998; Rashid et al. 1998). Seedborne and soilborne fungi, such as Fusarium oxysporum f.sp. lini, can invade roots, and then the above-ground parts of the plants, killing seedlings or causing leaf yellowing and wilting, followed by browning and death of the plants (Flax Council of Canada 1996). Because the fungus persists in the soil, and the mycelia and spores survive for many years in the debris of flax and other organic matters, the most effective control measure is the use of resistant varieties, considering that a minimum of three-year rotation between flax crops is necessary to sufficiently reduce the level of inoculum in the soil (Flax Council of Canada 1996). In Canada, rust and FW were historically major disease threats to flax production, but in the recent decade, PAS and FW have spread. Currently, all newly registered flax varieties are

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_7

121

122

C. Zheng et al.

immune to all North American races of rust, primarily the flax rust race 371 controlled by K1, L6, or M3 genes (Rashid and Kenaschuk 1994), illuminating the success of disease control by means of genetic improvement. Efforts have been made to improve the resistance of new varieties to PAS, PM, and FW by conventional breeding approaches. Most recently registered flax varieties are moderately resistant to FW, PM, and PAS. To maintain sustainable yield increases in the context of climate changes, improving the genetic resistance to rust, PAS, FW, and PM continues to be major flax breeding objectives. With the development of flax genetic populations, high-throughput genotyping, and largescale phenotyping capabilities for disease traits, many quantitative trait loci (QTLs) or quantitative trait nucleotides (QTNs), and the candidate genes associated with resistance to PAS, PM, and FW have been identified, providing genomic resources and the potential to improve genetic resistance through genomics-assisted breeding technologies, such as marker-assisted selection (MAS), genomic selection (GS), and gene editing. This chapter provides an overview of the major advances of QTL identification and candidate genes prediction for quantitative resistance to the three diseases.

7.2

Phenotypic Performance of Disease Resistance

Phenotyping for disease resistance is critical but time-consuming and costly for resistance breeding and genomic studies. QTL identification and GS necessitate accurate phenotypes of target traits evaluated in multiple years and locations for various types of genetic populations. Especially, field evaluation of disease resistance needs suitable soil and climate environments conducive to the development of pathogens and infection of the plants to produce disease symptoms. Field evaluations for disease resistance are therefore often variable across locations and/or years, and thus, acquisition of phenotyping over multiple years and/or locations is a must to obtain highquality phenotypic datasets.

7.2.1 Field Nurseries for Disease Resistance Phenotyping The experimental farm located at the Morden Research and Development Centre (MRDC) of Agriculture and Agri-Food Canada (AAFC) has established long-term flax disease nurseries for FW since 1950, for PAS since 1990, and for PM since 1998. The development of these disease nurseries over the last 70 years has created high-quality fields to allow large-scale field phenotyping for resistance to the three flax diseases. During the last decade, many genetic populations and breeding lines have been phenotyped for the three diseases.

7.2.2 Criteria for Field Nurseries for Disease Resistance Phenotyping Dr. Khalid Rashid established a rating system for the field evaluation of PAS, PM, and FW resistance (Rashid and Kenaschuk 1993; Rashid and Duguid 2005). Usually, flax lines are seeded in the disease nurseries during the second or third week of May every year. For PAS, approximately 200 g of pasmo-infested chopped straw from the previous growing season are spread between rows as inoculum when plants are *30 cm tall. A misting system is operated for 5 min every half hour for 4 weeks, except on rainy days, to ensure conidia dispersal and disease infection and development (He et al. 2019). For PM, the inoculated plants from the previous years are transplanted from the growth room into the field at the early flowering stage to ensure early disease infection and development in the field. One pot containing ten heavily infected plants is transplanted every ten rows (You et al. 2022b). The constant growth of flax in the same field for many decades ensures the persistence of the FW inoculum in the nursery because this pathogen is soilborne. No additional inoculations are required once established providing that continuous flax is grown. For wilt seedlings, the plants are scored in mid-July (pre-bloom stage); and for late-wilt, the plants are scored in mid-August (green boll stage). A scale of 1–9

7

QTL and Candidate Genes for Flax Disease Resistance

was used in the post-emergence and the late-wilt assessments: 1 = healthy, no sign of wilt; 2 = a slight yellowing of up to l0% of the lower leaves; 3 = yellowing of 10–30% of the lower leaves and a few wilted or dead plants; 4 = yellowing of 30–50% of the lower leaves and l% wilted or dead plants; 5 = yellowing of 50–80% of the lower leaves and up to l0% wilted or dead plants; 6 = yellowing of >80% of the lower leaves and l0–30% wilted or dead plants; 7 = yellowing of all the lower leaves and 30–50% wilted or dead plants; 8 = severe yellowing and 50–80% wilted or dead plants; and 9 = all plants severely wilted or dead (Rashid and Kenaschuk 1993). A similar disease severity rating system is used for the three diseases. Resistance is assessed on leaves and stems of all plants of single-row plots using a rating scale of 0–9 (Table 7.1). Field assessments are conducted at the early and late flowering stages as well as the green and early brown boll stages. A rating of 0–2 is considered resistant (R), 3–4 moderately resistant (MR), 5–6 moderately susceptible (MS), and 7–9 susceptible (S). Kanapin et al. (2021) reported a method to phenotype FW in greenhouse conditions for their association mapping study. Flax genotypes were arranged in a randomized complete block design. Each genotype was replicated 16 times by sowing all the seeds in cross-container rows. The Table 7.1 Field rating scale definition for pasmo, and powdery mildew as developed by Dr. Khalid Rashid (Rashid and Duguid 2005; He et al. 2019)

123

seeds were planted on the 12th day after the inoculation. Inoculation was conducted by adding 400 g pure culture of MI39 Fusarium oxysporum f.sp. lini strain into each container. The inoculum was prepared by growing MI39 on beer-wort agar medium followed by incubation on the oat grain substrate (grain to water ratio of 1:1.75) for 3–4 weeks for sufficient macro and microconidia development. The rating of the disease severity was performed at early yellow ripening (brown boll) stage. The FW severity was rated using a disease severity score (DSS) ranging from 0 to 3, where 0 represented healthy, 1 indicated partial plant or stem browning from one side, 2 indicated a fully browned plant with bolls, and 3 represented a fully browned plant collapsed prior to the formation of bolls. A disease severity index (DSI) was calculated as follows: P DSI ¼

ab  100%; AK

where a is the number of plants with identical DSS, b is the DSS of plants, A is the total number of plants, and K is the largest DSS (K = 3). The cplants were further grouped into resistant (DSI  20%), weakly susceptible (20% < DSI  30%), moderately susceptible (30% < DSI  50%), and completely susceptible (DSI > 50%).

Rating

Pasmo (%)

Powdery mildew (%)

Resistance class

0

No symptom

No symptom

HR

1

80

HS

The percentage is visually measured based on leaf and/or stem area infected HR highly resistant; R resistant; MR moderately resistant; MS moderately susceptible; S susceptible; HS highly susceptible

124

7.2.3 Broad-Sense Heritability of Flax Disease Traits The broad-sense heritability (H2) of a quantitative trait is a proportion of the genetic component of the total phenotypic variation; it represents a measurement that allows breeders to understand the accuracy of phenotypic selection in breeding. Based on the flax core collection and phenotypic data collected over three to five years, PM and FW had moderate H2 values of 0.52 and 0.60, respectively, while PAS had a low H2 value of 0.25 with a strong genotype  environment interaction (You et al. 2017).

7.2.4 Flax Genetic Populations and Their Disease Resistance Several diverse genetic populations have been developed for flax genomics studies including resistance genetics for PAS, PM, and FW. As a core subset of the flax germplasm, a flax core collection containing 391 accessions extracted from the 3378 accessions maintained by Plant Gene Resources of Canada (PGRC) and 26 released cultivars was compiled for evaluation and breeding utilization (Diederichsen et al. 2013; You et al. 2017). In addition to the evaluation of 24 major agronomic, seed quality and fatty acid composition traits over four years at two locations and three disease resistance traits, PAS, PM, and FW have been also phenotyped at MRDC, AAFC (You et al. 2017). This core collection shows high genetic variation with coefficient of variation (CV) of 18–28% for all three disease-related traits. However, most accessions in this collection are susceptible to the three diseases with an average rating of 4.8, 6.2, and 6.9 for PAS, PM, and FW, respectively (Table 7.2), and few accessions are resistant to the three diseases (Table 7.3). Resistant breeding lines have been selected from segregating populations, and other resistant germplasm has been identified from collections at MRDC, AAFC. A total of 75 selected breeding lines were field evaluated in the disease

C. Zheng et al.

nurseries at MRDC, AAFC, from 2010 to 2017 for PAS, PM, and FW. These 75 lines have slightly lower phenotypic variation with CV of 18.2–26.5% than the core collection for the three disease severity ratings (Table 7.2), but most of the lines are resistant to at least two of the three diseases (Table 7.3), offering an important prebreeding resistant germplasm for flax improvement. These lines have been merged to the flax core collection, greatly enriching the genetic variation and improving the detection power for marker-trait associations in genome-wide association studies (GWAS) (You et al. 2022b). Another Russian genetic panel has been compiled, consisting of 297 flax genotypes selected from the collection of the Federal Research Center for Bast Fiber Crops (Torzhok, Russia). This panel includes 179 fiber, 117 linseed, and 1 unknown morphotype accessions, which are either landraces, elite cultivars, or breeding lines (Kanapin et al. 2021). This panel has been evaluated for FW under greenhouse conditions from 2019 to 2021, and a large phenotypic variation (CV of 83.2%) was observed (Table 7.2). Some accessions were highly resistant to FW (Kanapin et al. 2021). To study the genetics of flax disease resistance, biparental populations have been also developed. For example, a recombinant inbred line (RIL) population of 818 lines derived from a cross of a resistant cultivar Bison and a susceptible cultivar Novelty was developed for QTL mapping of flax FW (Table 7.2). A 158 RIL population from the Linda (PM-resistant)  NorMan (PMsusceptible) was developed for QTL mapping of flax PM. These populations have been genotyped using simple sequence repeat (SSR) markers and phenotyped over multiple years at two or more locations. The F3 and F4 families from the cross between Linda and NorMan had previously been used to identify PM resistance QTL (Asgarinia et al. 2013). These populations have been genotyped using genotype by sequencing and phenotyped over multiple years at two or more locations. Significant positive correlations (p < 0.00001) among PAS, PM, and FW ratings were observed in the flax core collection and selected breeding lines (Fig. 7.1), especially between FW and PM

7

QTL and Candidate Genes for Flax Disease Resistance

125

Table 7.2 Resistance performance of flax germplasm and breeding lines for pasmo (PAS), powdery mildew (PM), and Fusarium wilt (FW) Population Core collection (A)

Selected breeding lines (B)

A+B

Trait

Population size

x±s

Range

CV

PAS rating (2012–2016)

382

4.8 ± 1.3

2.3–8.0

28.0

PM rating (2012–2016)

370

6.2 ± 1.3

1.8–9.0

21.3

FW rating (2012–2016)*

378

6.9 ± 1.2

2.5–9.0

17.7

PAS rating (2010–2017)

75

4.2 ± 1.1

2.8–6.9

26.5

PM rating (2010–2017)

75

2.8 ± 0.5

1.8–4.4

18.2

FW rating (2010–2017)*

75

2.1 ± 0.4

1.5–3.4

19.6

PAS rating (2012–2016)

445

5.9 ± 1.5

1.8–9.0

26.2

PM rating (2012–2016)*

447

4.5 ± 1.4

1.6–8.0

31.1

FW rating (2012–2016)

453

6.1 ± 2.1

1.5–9.0

34.6

Bison  Novelty RILs

FW rating

818

4.9 ± 1.3

2.5–8.9

27.4

Linda  NorMan RILs

PM rating

158

3.1 ± 0.7

1.8–4.7

21.3

Russian collection

FW DSI

297

38.2 ± 31.8

0–100

83.2

x average ratings over years; s standard deviation; CV coefficient of variation; RIL recombinant inbred line; DSI disease severity index (Kanapin et al. 2021) * Only data from 2012, 2013, and 2015 are available

(r = 0.44) and between PAS and FW (r = 0.60). These correlation results indicate that it is possible to screen genetic resources and to develop new cultivars resistant to all three diseases or at least two of them. Table 7.3 lists some accessions resistant to two or all three diseases. Resistance to PAS, PM, and FW is correlated with the flax morphotype. Of the 445 accessions of the flax core collection and selected breeding lines, 365 are linseed and 80 are fiber types. Linseed accessions have a higher mean PAS rating (6.1 ± 1.5) and a lower mean PM rating (5.4 ± 1.5) than fiber accessions (Fig. 7.2a, b). Most PM-resistant lines belong to the linseed morphotype while most PAS-resistant lines are fiber type. For example, in the core collection, the 10% most resistant accessions to PM are all linseed, while the 10% most susceptible accessions contain a higher proportion of fiber accessions than expected. However, in the 75 selected breeding lines, which all belong to the linseed morphotype, most are resistant to at least two diseases (Fig. 7.1; Table 7.3). No association was observed between FW and morphotypes (Fig. 7.2c).

7.3

Quantitative Trait Loci (QTLs) Associated with Flax Disease Resistance

A QTL represents a genomic region associated with a quantitative trait in a population. The size of the QTL in a genomic region may depend on many factors, such as the QTL identification method, the type and size of the population, the distribution and density of the genome-wide markers, and the genetic variation of the trait. Linkage map-based QTL mapping uses biparental populations such as F2–F4, RIL, doubled haploid (DH), and backcross (BC) populations (Wiesner and Wiesnerovà 2004; Cloutier et al. 2011; Fu 2011; Soto-Cerda et al. 2012; Asgarinia et al. 2013; Kumar et al. 2015). Due to the low allelic diversity and limited genetic recombination between two parents, mostly large-effect QTLs can be detected from biparental populations (Bandillo et al. 2013). The detected QTLs usually represent a genomic region between adjacent markers. The QTL size may span hundred of kilobases (Kb) to several megabases

126

C. Zheng et al.

Table 7.3 Flax accessions resistant (ratings  3) to at least two of the following three diseases: pasmo (PAS), powdery mildew (PM), and Fusarium wilt (FW) Genotype accession/ID

Morph

Pedigree

Rating PAS

PM

FW

8001

Linseed

(AC LinoraxFP935-8)-1  NT16-1

5.6

2.8

1.9

8002

Linseed

AC Linora-3-1  CH10-1

5.2

2.8

1.8

8003

Linseed

AC Linora-1-1  CH10-1

3.0

4.8

1.7

8005

Linseed

(VernexAC Linora)-1  CH10-1

3.0

2.2

1.8

8006

Linseed

(VernexAC Linora)-1  CH10-1

3.0

2.2

1.7

8007

Linseed

(VernexAC Linora)-1  CH10-1

2.6

1.6

1.6

8008

Linseed

(VernexAC Linora)-1  CH10-1

3.4

2.4

1.5

8009

Linseed

AC Linora-3-1  CH10-1

6.0

2.2

2.0

8010

Linseed

(VernexAC Linora)-1  CH10-1

3.0

1.8

1.6

8011

Linseed

(VernexAC Linora)-1  CH10-1

2.6

2.6

1.6

8012

Linseed

(VernexAC Linora)-1  CH10-1

3.8

2.4

1.7

8013

Linseed

(VernexAC Linora)-1  CH10-1

3.4

1.8

1.8

8014

Linseed

Webster

3.2

1.8

1.8

8015

Linseed

AG2-BLK

3.4

3.0

1.7

8016

Linseed

(VernexAC Linora)-1  CH10-1

3.0

2.2

1.7

8017

Linseed

AC Linora-3-1  CH10-1

5.2

2.2

1.8

8018

Linseed

(Verne  AC Linora)-1  CH10-1

2.8

2.4

1.9

8019

Linseed

Verne-1  NT16-1

4.0

2.8

2.0

8021

Linseed

(Verne  AC Linora)-1  CH10-1

3.2

2.6

1.6

8022

Linseed

(Verne  AC Linora)-1  CH10-1

2.6

3.0

1.6

8023

Linseed

(AC Linora  FP935-8)-1  NT16-1

4.6

2.6

1.9

8024

Linseed

(Verne  AC Linora)-1  CH10-1

3.6

2.6

1.7

8025

Linseed

(Verne  AC Linora)-1  CH10-1

3.8

2.8

1.8

8026

Linseed

Linda  Atalante

4.6

3.0

2.4

8027

Linseed

AC Emerson  Linda

4.6

2.8

2.1

8028

Linseed

(Verne  AC Linora)-1  CH10-1

3.2

3.0

1.6

8030

Linseed

AG2-BLK

3.6

2.6

2.5

8031

Linseed

Somme  Linda

2.2

3.4

2.5

8032

Linseed

Somme  Linda

2.4

2.2

2.1

8033

Linseed

(Verne  AC Linora)-1  CH10-1

3.4

2.6

1.8

8034

Linseed

Somme  Linda

3.4

2.6

2.0

8035

Linseed

(Verne  AC Linora)-1  CH10-1

2.8

2.4

1.8

8036

Linseed

(Verne  AC Linora)-1  CH10-1

2.4

3.0

2.2

8037

Linseed

(Verne  AC Linora)-1  CH10-1

3.0

2.6

2.2

8038

Linseed

McDuff  Linda

4.0

2.6

2.1 2.3

8042

Linseed

Somme  Linda

3.4

3.0

8043

Linseed

(Verne  AC Linora)-1  CH10-1

2.8

2.0 1.8 (continued)

7

QTL and Candidate Genes for Flax Disease Resistance

127

Table 7.3 (continued) Genotype accession/ID

Morph

Pedigree

Rating PAS

PM

FW

8044

Linseed

AC Linora-3–1  CH10-1

4.6

3.0

2.0

8048

Linseed

(Verne  AC Linora)-1  CH10-1

4.0

2.6

2.2

8049

Linseed

McDuff  Linda

4.2

3.0

2.3

8050

Linseed

AC Linora-3-1  CH17-3-1

3.2

2.8

1.8

8051

Linseed

(Verne  AC Linora)-1  CH10-1

2.8

3.0

2.2

8053

Linseed

M8192

6.8

3.2

2.2

8055

Linseed

M8196

6.2

2.6

2.2

8061

Linseed

Hanley  FP 2102

6.4

3.2

2.2

8065

Linseed

Somme  Linda

6.4

2.6

2.4

8066

Linseed

FP 2070  Double Low

6.0

2.8

2.4

8068

Linseed

(AC Linora  FP935-8)-1  NT16-1

4.0

3.0

2.6

8070

Linseed

AC Linora-3-1  CH10-1

2.6

3.4

2.5

CHN_CN101053

Fiber

L-8709-5-10

3.0

3.0

6.7

NLD_CN18983

Fiber

Laura

2.8

2.5

5.7

ETH_CN96991

Linseed

2.4

2.8

3.3

GEO_CN101367

Linseed

1.8

3.0

6.3

RUS_CN101298

Fiber

L. 541-02

2.8

2.3

5.3

RUS_CN101299

Fiber

L. 00-207

3.0

2.8

5.0

Morph. morphotype. CHN China; NLD Netherland; ETH Ethiopia; RUS Russian; CH10 a Chinese line No. 10; AG2BLK Agricultural Canada bulk, unknown origin

Fig. 7.1 Scatter plots of disease ratings showing the correlations between any two diseases. a Scatter plot between Fusarium wilt (FW) and powdery mildew (PM) ratings; b scatter plot between pasmo (PAS) and

PM ratings; and c scatter plot between PAS and FW ratings. The phenotypic data consist of the mean values over five years of the 370 accessions of the flax core collection and a set of 75 selected breeding lines

(Mb) for low density SSR markers. The natural genetic populations that contain diverse germplasm and breeding lines (Huang et al. 2010; You et al. 2022b), multi-parent populations such

as nested association mapping (NAM) populations (Yu et al. 2008), and multi-parent advanced generation intercross (MAGIC) populations (Mackay and Powell 2007; Cavanagh et al. 2008)

128

C. Zheng et al.

Fig. 7.2 Relationships between flax morphotypes and a pasmo (PAS), b powdery mildew (PM), and c Fusarium wilt (FW) ratings. The phenotypic data are the mean

values over years of the 370 accessions of the flax core collection and a set of 75 selected breeding lines

significantly increase the genetic variation of the traits. Combined with various single- and multilocus GWAS models, QTNs of large and small effects can be identified; these pinpoint to single nucleotides rather than large genomic regions. In flax, the Canadian core collection and additional breeding lines have been used to identify QTNs for PAS (He et al. 2019), PM (You et al. 2022b), and FW resistance (You et al. unpublished), while the Russian genetic panel was used to identify QTL associated with FW (Kanapin et al. 2021). Some biparental populations have been also used for QTL mapping of PM resistance (Asgarinia et al. 2013) and FW (unpublished).

all QTNs within the same LD block were grouped into the same QTL and a tag QTN was used to represent each QTL. Among the 500 QTLs, 67 were stable across all datasets and explained 32–64% of the total PAS variation in the phenotypic datasets. Several large-effect QTNs are noteworthy, including Lu9-4333365 (R2 = 23.39%), Lu91896658 (R2 = 17.12%), Lu8-23104696 2 (R = 16.53%), and Lu1-9232234 (R2 = 16.17%). In addition, chromosome (Chr) 8 harbors an important genomic region (14.3–23.1 Mb) associated with PAS resistance. Forty-nine QTNs (or 45 QTLs) were identified on Chr 8, of which nine were considered stable across years and major with high R2 values. Some of them co-locate with several candidate genes. For example, QTNs Lu818251174 (R2 = 10.38%) and Lu8-18447612 (R2 = 11.66%) both co-locate with toll interleukin-1 receptor-like nucleotide-binding site leucine-rich repeat (TNL) gene clusters. QTN Lu814317356 (R2 = 14.32%) has a high correlation with both Lus10016620 (RLK) and Lus10016612 (RLP) (Table 7.4). The number of favorable alleles (FAs) of QTNs in the flax core collection is significantly correlated with PAS rating (R2 = 0.62) (Fig. 7.3 b), suggesting that the QTL effects for this trait are primarily additive. The number of FAs is also significantly associated with the flax morphotype (R2 = 0.52), similarly to the PAS ratings (Fig. 7.2a). The fiber-type accessions are generally more resistant to PAS and possess more FAs than the linseed accessions, suggesting the

7.3.1 Quantitative Trait Loci (QTLs) Associated with Pasmo Resistance Using the flax core collection of 370 accessions, 258,873 genome-wide SNPs and six PAS phenotypic datasets (one for each of five years and one for overall mean dataset) identified 692 QTNs using three single-locus (GLM, MLM, GEMMA) and seven multi-locus (pLARmEB, pKWmEB, FASTmrMLM, ISIS EM-BLASSO, mrMLM, FASTmrEMMA, and FarmCPU) GWAS models (He et al. 2019). The single-locus models identified more large-effect QTNs while the multi-loci models detected more small-effect QTNs. Based on haplotype blocks of SNPs, the 692 QTNs were grouped into 500 QTLs, where

4

5

5

5

PM

PM

PM

4

PM

4

4

PM

PM

3

PM

PM

3

3

2

PM

PM

2

PM

PM

1

2

PM

1

PM

PM

Chr

Trait

Lu5-1535619

Lu5-1534998

Lu5-453626

Lu4-17170235

Lu4-16213043

Lu4-12432479

Lu4-11526385

Lu3-5748445

Lu3-1644588

Lu3-581507

Lu2-23956609

Lu2-3371783

Lu2-1672205

Lu1-28683876

Lu1-25354861

QTN and position

C/T

A/G

T/C

C/T

A/G

G/T

A/G

G/A

A/G

A/T

A/C

C/G

T/C

A/G

C/G

SNP

C

A

T

T

A

G

A

A

A

A

A

C

T

A

G

FA

32.6

27.83

17.52

10.75

3.4

11.51

10.76

1.19

11.92

27.62

5.68

11.56

30.2

4.44

11.14

R2

1744034

Lus10037310

11524694 11582820

Lus10036710* Lus10036725

17146906 17147750

Lus10009108 Lus10009109

Lus10004726*

1535502

1534218

17141532

Lus10009107

Lus10004727*

17129426

Lus10004040

16212942 17091433

Lus10004046

Lus10041860*

12429035

11468188

Lus10036704

Lus10036891*

11441987

Lus10036697

5746000

1738506

Lus10037311 Lus10040576*

1613460

580445

23955411

1583715

28681727

25354591

Gene start

Lus10037339

Lus10013417*

Lus10030587*

Lus10019654

Lus10006056*

Lus10027993*

Candidate gene

1538672

1535440

17148367

17147707

17146187

17131616

17094723

16215322

12433792

11585160

11530066

11468577

11442748

5750240

1747357

1742264

1614125

584432

23958647

1588562

28684082

25355612

Gene end

TNL (continued)

TNL

TNL

TNL

TNL

RLK

RLK

RLK

WRKY

WRKY

c

RLK

RLK

TNL

RLK

RLK

RLK

b

RLK

TNL

RLK

a

Gene family

Table 7.4 Quantitative trait nucleotides (QTNs) and candidate genes identified from the flax core collection for pasmo (PAS), powdery mildew (PM), and Fusarium wilt (FW) resistance

7 QTL and Candidate Genes for Flax Disease Resistance 129

5

5

5

5

5

5

5

5

5

5

5

5

5

PM

PM

PM

PM

PM

PM

PM

PM

PM

PM

PM

PM

5

PM

5

5

PM

PM

5

PM

PM

Chr

Trait

Table 7.4 (continued)

Lu5-14977699

Lu5-14967901

Lu5-13271207

Lu5-12195919

Lu5-12090990

Lu5-11130392

Lu5-10415251

Lu5-5549028

Lu5-4646212

Lu5-3224350

Lu5-3281960

Lu5-3052714

Lu5-3006723

Lu5-2037910

Lu5-1763832

Lu5-1569098

Lu5-1552921

QTN and position

G/A

G/A

A/G

T/C

T/G

A/G

C/T

A/C

C/T

A/G

T/C

T/G

T/C

T/C

G/T

G/T

C/T

SNP

G

G

A

T

T

A

C

A

C

A

T

G

T

T

G

G

C

FA

16.32

17.28

34.99

16.12

21.28

26.93

11.87

17.3

9.31

10.62

18.38

3.12

24.08

10.65

16.34

27.63

15

R2

1535502 1568734 1642118

Lus10004726 Lus10004719 Lus10008500

3336787

Lus10032372

14975372 15020577 15077659

Lus10028362* Lus10028359 Lus10028347

14967268 15020577

Lus10028364*

13270430

4643996

Lus10028359

Lus10029860*

Lus10034795*

3223137

3223137

Lus10032351 Lus10032351*

3206387

3052566

3005188

2035323

1714708

Lus10032345

Lus10032310*

Lus10032303*

Lus10002961*

Lus10008486

1568734

1534218

Lus10004727

Lus10004719*

1449598

Gene start

Lus10004747

Candidate gene

15081716

15021749

14977773

15021749

14969126

13273781

4646254

3225286

3338055

3225286

3209792

3053123

3006765

2038464

1718110

1573096

1644046

1573096

1538672

1535440

1453554

Gene end

Gene family

RLK (continued)

RLP

Unknown

RLP

e

TNL

RLK

RLK

WRKY

RLK

TM-CC

DIR

WRKY

d

RLK

TNL

TM-CC

TNL

TNL

TNL

TNL

130 C. Zheng et al.

7

8

8

8

PM

PM

PM

7

PM

PM

7

7

7

PM

PM

7

PM

PM

6

6

PM

6

PM

PM

5

5

PM

5

PM

PM

Chr

Trait

Table 7.4 (continued)

Lu8-18351964

Lu8-15999956

Lu8-7394085

Lu7-17659649

Lu7-17359522

Lu7-17007593

Lu7-7041879

Lu7-4927223

Lu7-3830168

Lu6-16666521

Lu6-15378264

Lu6-1883039

Lu5-16840013

Lu5-16602027

Lu5-15697144

QTN and position

C/T

A/G

A/G

A/G

G/A

C/T

T/A

G/T

C/A

C/T

G/A

A/G

T/A

T/C

T/C

SNP

T

G

G

G

G

C

T

T

C

T

A

A

T

T

T

FA

11.49

1.36

3.65

4.85

0.73

22.62

13.12

10.48

11.35

1.13

0.8

11.79

29.67

20.88

25.18

R2

15624852 15635008 15650262 15698530 15725123

Lus10024062 Lus10024063 Lus10024064 Lus10024074 Lus10024081

Lus10021001*

15376945

16979755 17006897

Lus10015461 Lus10015454* Lus10003971*

Lus10007812*

Lus10022265*

Lus10021849*

18350700

15999079

7386368

17657006

17357355

16967785

Lus10015465

Lus10023199*

16928445

7040717

16664523

Lus10015472

Lus10024477*

Lus10025216*

1878670

16923316

Lus10039699 Lus10017649*

16922051

Lus10039698

16636867

15611592

Lus10024061

Lus10039641

15601599

Gene start

Lus10024060

Candidate gene

Gene end

18354799

16001545

7423024

17660870

17360952

17014499

16981169

16968282

16928858

7044256

16666663

15378903

1885283

16925783

16922660

16641182

15728803

15699898

15653487

15648806

15631121

15617378

15607466

Gene family

TNL (continued)

RLK

RLK

TM-CC

TM-CC

g

MLO

TNL

RLK

f

WRKY

RLP

RLK

RLK

RLK

RLK

RLK

WRKY

RLK

RLK

RLK

RLK

RLK

7 QTL and Candidate Genes for Flax Disease Resistance 131

Chr

8

8

9

9

9

9

9

9

10

10

10

11

12

12

Trait

PM

PM

PM

PM

PM

PM

PM

PM

PM

PM

PM

PM

PM

PM

Table 7.4 (continued)

Lu12-892762

Lu12-720013

Lu11-17188390

Lu10-13226627

Lu10-11695343

Lu10-11682031

Lu9-20701159

Lu9-8499442

Lu9-6266682

Lu9-4948236

Lu9-3920670

Lu9-916748

Lu8-19040276

Lu8-18951212

QTN and position

C/G

A/G

C/T

A/T

T/C

A/G

G/A

T/G

T/C

T/C

T/C

T/G

G/A

T/A

SNP

C

G

C

A

C

A

G

T

C

C

T

G

G

T

FA

1.46

2.61

27.82

11.04

13.03

11.97

20.69

13.4

1.47

10.59

20.89

1.71

15.67

12.48

R2

19046619

Lus10002250

8529175

17129539 17188293

Lus10039295 Lus10039284* Lus10006732*

890823

719787

17096922

Lus10039304

Lus10006772*

17096179

11673977 Lus10039305

11656882

Lus10032761

11673977

Lus10032759

11656882

Lus10032761

20699726

Lus10032759

Lus10009543*

Lus10011609

6265980

5039187

Lus10028292 Lus10031043*

4972365

4010807

Lus10028279

Lus10040388

911825

19037487

Lus10010221*

19028964

Lus10002249*

19046619

Lus10002250 Lus10002248

19037487

Lus10002249 19020694

19028964

Lus10002248

Lus10002247

19020694

Gene start

Lus10002247

Candidate gene

Gene end

894178

720446

17191771

17131415

17098155

17096850

11677049

11659758

11677049

11659758

20701390

8532321

6269593

5042259

4975644

4014021

917007

19047488

19042009

19030371

19023979

19047488

19042009

19030371

19023979

Gene family

TNL (continued)

RLK

h

WRKY

TM-CC

TM-CC

RLK

CNL

RLK

CNL

Unknown

RLK

RLK

TM-CC

CNL

MLO

TNL

TNL

TNL

NL

TNL

TNL

TNL

NL

TNL

132 C. Zheng et al.

12

13

13

13

13

13

13

PM

PM

PM

PM

PM

PM

12

PM

13

12

PM

PM

12

PM

PM

Chr

Trait

Table 7.4 (continued)

Lu13-4900476

Lu13-4866704

Lu13-4791823

Lu13-4531367

Lu13-4048698

Lu13-2630454

Lu13-1749576

Lu12-19127670

Lu12-16614785

Lu12-5111993

Lu12-1896717

QTN and position

T/G

G/A

T/C

T/C

T/G

G/C

G/A

G/A

A/G

G/C

A/T

SNP

G

A

C

T

G

G

G

A

G

G

A

FA

10.9

12.24

13.79

14.08

12.66

13.26

1.23

2.78

20.28

2.03

1.66

R2

4086952 4136784 4141596 4144251

Lus10002743 Lus10002753 Lus10002754 Lus10002755

4816848 4871884 4940816

Lus10009328 Lus10009339

4940816

Lus10009339

4813804

4871884

Lus10009328

Lus10000836

4816848

Lus10000835

4813804

4871884

Lus10009328 Lus10000836

4816848

Lus10000836 Lus10000835

4791559 4813804

Lus10000746* Lus10000835

4520377

4066720

Lus10002737

Lus10019708*

4047735 4064480

Lus10002733* Lus10002736

3997321

2695609

1747160

19122119

16614431

5110353

1893862

Gene start

Lus10026169

Lus10026894

Lus10001336*

Lus10033608*

Lus10027903*

Lus10018289*

Lus10023323*

Candidate gene

Gene family

RLK (continued)

RPW8

RPW8

RPW8

RLK

RPW8

RPW8

RPW8

RPW8

RPW8

RPW8

j

TNL

RLP

RLP

RLP

RLP

CNL

CNL

i

RLK

RLK

MLO

RLK

RLP

TM-CC

RLK

QTL and Candidate Genes for Flax Disease Resistance

4945877

4875204

4818268

4814638

4945877

4875204

4818268

4814638

4875204

4818268

4814638

4792034

4539065

4147241

4143026

4140702

4089951

4068303

4065781

4049731

3997740

2696436

1750173

19129113

16619009

5112941

1897997

Gene end

7 133

15

15

15

15

1

PM

PM

PM

PM

PAS

14

PM

15

14

PM

PM

14

PM

14

14

PM

14

14

PM

PM

14

PM

PM

13

14

PM

13

PM

PM

Chr

Trait

Table 7.4 (continued)

Lu1-3420323

Lu15-13210936

Lu15-3991048

Lu15-2267652

Lu15-50397

Lu15-46304

Lu14-17203266

Lu14-15360622

Lu14-10551333

Lu14-5960489

Lu14-5959395

Lu14-5382091

Lu14-4021471

Lu14-3458382

Lu14-1171479

Lu13-18212664

Lu13-5142458

QTN and position

G/A

G/A

T/G

G/C

G/A

C/A

G/A

T/C

T/C

G/A

G/C

A/G

A/T

C/G

C/T

T/G

C/T

SNP

A

G

G

G

G

C

G

C

C

A

C

G

T

C

C

G

C

FA

2.89

10.02

18.51

11.81

15.63

11.6

14.04

3.03

5.78

5.75

6.82

8.75

3.75

7.47

10.89

4.89

8.14

R2

43694 46613 47907 51620 134861 138808 144985

Lus10007608 Lus10007609 Lus10007610 Lus10007611 Lus10007631 Lus10007632 Lus10007633

Lus10042324*

Lus10010024*

3417936

13209088

3990588

2328170

Lus10029312 Lus10012678*

2287700

Lus10029306

47907

29008

Lus10007610*

11369

Lus10007604

17203248

15357234

10547787

5960004

5955689

5375403

4018836

3457175

1171345

18211794

5141681

Gene start

Lus10007601

Lus10039211*

Lus10035674*

Lus10008320*

Lus10015649*

Lus10015648*

Lus10014150*

Lus10021448*

Lus10020534*

Lus10028639*

Lus10030845*

Lus10009364*

Candidate gene

3423798

13213426

4001681

2328667

2288290

50779

147520

141347

137811

55118

50779

47143

46128

31923

14331

17203932

15366290

10552007

5963280

5959658

5382647

4024903

3462957

1174215

18215062

5142476

Gene end

Gene family

RLK (continued)

k

WRKY

DIR

DIR

RLK

RLK

RLK

RLK

RLK

RLK

RLK

RLK

RLK

RLK

TNL

TNL

TNL

TNL

TNL

RLK

TM-CC

TNL

CNL

RLK

RLK

134 C. Zheng et al.

Chr

1

2

3

3

4

4

4

4

5

Trait

PAS

PAS

PAS

PAS

PAS

PAS

PAS

PAS

PAS

Table 7.4 (continued)

Lu5-1554121

Lu4-17214936

Lu4-17204590

Lu4-14615685

Lu4-14576826

Lu3-22688547

Lu3-19643168

Lu2-23730537

Lu1-28707496

QTN and position

T/A

G/T

C/A

A/T

A/G

C/G

G/A

A/T

G/A

SNP

T

T

A

A

G

C

G

T

G

FA

7.75

5.81

5.17

10.85

7.99

8.98

12.82

1.24

5.7

R2

28686182 28745009

Lus10006057 Lus10006067

19645703

Lus10008230

1535498 1534214

17250005

Lus10020794

Lus10004727

17305881

Lus10020779

Lus10004726

17147750

Lus10009109

1568730

17146906

Lus10004719

17141532

17250005

Lus10020794

Lus10009108

17147750

Lus10009109

Lus10009107

17146906

Lus10009108

17129426

17141532

Lus10004040

17129426

Lus10009107

14580893

Lus10004040

14567225

Lus10041512

14580893

Lus10041512 Lus10041509

14567225

Lus10041509

22753013

19692616

Lus10008222 Lus10033291

19698197

Lus10008221

23729163

28681727

Lus10006056

Lus10030634*

28669696

Gene start

Lus10006052

Candidate gene

Gene end

Gene family

TN (continued)

CNL

TNL

TM-CC

CNL

NBS

TX

TNL

RLK

TM-CC

NBS

TX

TNL

RLK

TM-CC

RLK

TM-CC

RLK

RLK

RLP

TNL

TNL

RLK

RLK

RLK

RLK

RLK

QTL and Candidate Genes for Flax Disease Resistance

1535436

1538668

1573092

17256770

17309057

17148367

17147707

17146187

17131616

17256770

17148367

17147707

17146187

17131616

14585131

14569339

14585131

14569339

22769152

19648642

19696547

19704058

23733190

28748864

28689003

28684082

28672239

7 135

6

7

7

7

8

8

8

8

PAS

PAS

PAS

PAS

PAS

PAS

PAS

PAS

6

PAS

6

5

PAS

6

5

PAS

PAS

5

PAS

PAS

Chr

Trait

Table 7.4 (continued)

Lu8-17749357

Lu8-17270785

Lu8-16366918

Lu8-14317356

Lu7-2491132

Lu7-2453965

Lu7-2452981

Lu6-15506450

Lu6-15455712

Lu6-14738507

Lu6-2081466

Lu5-13500692

Lu5-4604607

Lu5-1650980

QTN and position

G/A

C/G

C/T

A/T

G/A

T/C

C/T

A/G

A/G

C/T

T/C

G/A

A/G

C/G

SNP

G

C

C

A

G

T

C

A

A

C

C

G

A

C

FA

10.16

9.59

10.9

14.32

8.05

7.03

6.3

12.62

9.63

13.34

8.3

11.9

6.58

6.61

R2

17846243 17714229

Lus10011039 Lus10011064

17314911

16406251

Lus10022351 Lus10000591

16371577

Lus10022345

14267487 16348058

Lus10016612 Lus10022340

14319585

2527732

2527732

2527732

Lus10016620

Lus10012159

Lus10012159

Lus10012159

15477978

15477978

Lus10021022

Lus10021022

14821777 15385416

Lus10021003

Lus10014441

2049388

13512919

Lus10017611

Lus10029810

4643992

Lus10034795 13567238

4605636

Lus10029802

4557256

1696929

Lus10008491 Lus10034790

1714704

Lus10008486 Lus10034787

1568730

Gene start

Lus10004719

Candidate gene

17720524

17852504

17317279

16410487

16373613

16351015

14271419

14330057

2533122

2533122

2533122

15480078

15480078

15390253

14827895

2051511

13513227

13570424

4646250

4610440

4561539

1698777

1718106

1573092

Gene end

Gene family

RLP (continued)

RLP

TM-CC

CNL

RLK

RLK

RLP

RLK

RLK

RLK

RLK

RLK

RLK

RLK

RLP

RLK

TX

RLK

RLK

RLK

TM-CC

RLK

RLK

TNL

136 C. Zheng et al.

8

8

8

PAS

8

PAS

PAS

8

PAS

PAS

Chr

Trait

Table 7.4 (continued)

Lu8-23142500

Lu8-23104696

Lu8-22525597

Lu8-18447612

Lu8-18251174

QTN and position

T/C

C/A

T/C

T/C

G/A

SNP

T

C

T

T

G

FA

13.34

16.53

2.74

11.66

10.38

R2

18527829

Lus10008540

23240447 23181341

Lus10018470

23181341 Lus10018459

Lus10018470

22525026

18344363

Lus10007813 Lus10015350*

18350700

18160912

Lus10007852

18359124

18230065

Lus10007836

Lus10007812

18254394

Lus10007831

Lus10007811

18260420

Lus10007830

18366041

18263872

Lus10007829

18372540

18271093

Lus10007828

Lus10007810

18276064

Lus10007826

Lus10007809

18281240

Lus10007825

18376722

18293684

Lus10007823

18423088

18299941

Lus10007822

Lus10007808

18309564

Lus10007821

Lus10007795

18335847

Lus10007814

18462216

18344363

Lus10007813

Lus10007790

18350700

Gene start

Lus10007812

Candidate gene

Gene end

Gene family

TX (continued)

RLK

TX

TNL

RLK

TNL

TNL

TNL

TNL

NL

TNL

TM-CC

TNL

TX

TNL

TNL

NL

OTHER

TNL

TNL

TNL

OTHER

TNL

TNL

TNL

TNL

TNL

QTL and Candidate Genes for Flax Disease Resistance

23182036

23243996

23182036

22531343

18529634

18349288

18354799

18363387

18370123

18375566

18380935

18440817

18466320

18161680

18232282

18258346

18262824

18267734

18275151

18278228

18285241

18297416

18302829

18313636

18340501

18349288

18354799

7 137

9

PAS

13

13

15

15

15

PAS

PAS

PAS

12

PAS

PAS

12

PAS

PAS

12

PAS

12

12

PAS

13

12

PAS

PAS

12

PAS

PAS

10

11

PAS

PAS

9

9

PAS

10

9

PAS

PAS

9

PAS

PAS

Chr

Trait

Table 7.4 (continued)

Lu15-14719354

Lu15-995626

Lu15-976617

Lu13-14299019

Lu13-2227366

Lu13-1919638

Lu12-16056974

Lu12-5819991

Lu12-5795458

Lu12-2719326

Lu12-1874446

Lu12-1621325

Lu12-474480

Lu11-3330783

Lu10-16054459

Lu10-8700793

Lu9-19857367

Lu9-6270376

Lu9-4333365

Lu9-1430465

Lu9-1067536

QTN and position

T/C

T/A

T/A

A/G

T/C

G/A

A/C

C/G

A/G

C/T

G/A

T/A

C/T

A/T

A/G

A/G

G/A

A/G

C/A

G/C

A/C

SNP

C

T

T

G

C

G

A

G

G

C

A

T

T

A

G

A

G

A

C

G

A

FA

4.07

6.27

16.08

8.28

1.21

13.67

11.26

6.9

9.67

9.9

4.3

9.41

8.33

7.09

1.2

12.1

12.67

14.34

23.39

10.76

5.06

R2

928284

Lus10011229

14717190

960226 Lus10014810*

1010940

928284

Lus10011229 Lus10011223

960226

Lus10011223 Lus10011216

1010940

14265258

Lus10034642 Lus10011216

14239315

2142483

1944672

16087281

5788546

5788546

2697465

1873149

1559945

549126

3372874

16154230

8700849

Lus10034637

Lus10026988*

Lus10026845

Lus10043083

Lus10037786

Lus10037786

Lus10006971

Lus10023329*

Lus10023391

Lus10020016

Lus10042097

Lus10022900*

Lus10039958

19841995

6367203

Lus10011917

6265980

Lus10031058

4355025

1422722

1105723

Gene start

Lus10031043

Lus10040315

Lus10004333

Lus10028975*

Candidate gene

14720271

930629

963788

1011642

930629

963788

1011642

14268273

14242336

2146101

1945903

16089494

5793121

5793121

2700985

1874696

1566644

551933

3374229

16157202

8705057

19844984

6373033

6269593

4361780

1425075

1118250

Gene end

Gene family

RLK (continued)

TM-CC

RLK

TX

TM-CC

RLK

TX

RLK

RLK

RLK

TX

RLK

TM-CC

TM-CC

TM-CC

TN

RLK

CNL

TM-CC

CNL

RLP

RLK

TM-CC

RLK

TM-CC

RLK

TM-CC

138 C. Zheng et al.

13

13

FW

FW

Chr13:4884610

Chr13:545286

Chr11:6013057

Chr8:22560290

Chr8:22560236**

Chr1:1854337**

Chr1:1722812

Chr1:1528323

Chr1:1497939

Chr1:1462137**

Chr1:1413812

Chr1:1288653

Chr1:1288616

Chr1:1288166

Chr1:1213418**

QTN and position

T/G

A/G

A/G

C/A

G/A

T/C

C/G

C/G

G/C

A/G

T/A

T/C

C/A

A/G

C/T

SNP

G

G

A

A

G

C

C

C

G

A

T

T

C

A

C

FA

6.7

7

9.7

7.7

6.4

8.2

10.5

7.8

7.5

8.2

10.2

10.3

10

9

10.6

R2

4886276 4890865

Lus10009330 Lus10009332

6011724

22554684 22569641

Lus10015344 Lus10015339 Lus10035917*

22561110 22562785

Lus10015356 Lus10015357

1851791 1854958

Lus10025852

1722251

1527916

1463118

1285554

Gene start

Lus10025853

Lus10025823*

Lus10025773*

Lus10025756

Lus10025717*

Candidate gene

4893541

4888681

6014460

22570539

22556269

22565427

22562338

1870647

1853596

1726433

1531752

1465114

1289474

Gene end

x

w

v

u

t

s

r

q

p

o

n

m

l

Gene family

QTL and Candidate Genes for Flax Disease Resistance

Source Pasmo (He et al. 2019; You et al. 2022a); Powdery mildew (You et al. 2022b); Fusarium wilt (Kanapin et al. 2021) Chr chromosome; FA favorable allele; QTN quantitative trait nucleotide; SNP single nucleotide polymorphism; NBS nucleotide-binding site domain; LRR leucine-rich repeat; CC coiled-coil; TNL Toll/interleukin-1 receptor-like domain (TIR)–NBS–LRR; CNL CC–NBS–LRR; TN TIR–NBS; RLK receptor-like protein kinase; RLP receptor-like protein; TM-CC transmembrane coiled-coil protein. TX TIR-unknown/random * QTNs are identified from gene regions; **The QTN is an upstream transcript variant a C-8,7 sterol isomerase; b sucrose synthase 3; c nitrate transporter 1:2; d hydroxymethylglutaryl-CoA synthase; e Pectin lyase-like superfamily protein; f ERD (earlyresponsive to dehydration stress) family protein; g ZPR1 zinc-finger domain protein; h homeodomain-like superfamily protein; i DNAJ homologue 2; j AP2; k cellulose synthase-like D5; l KIP1-like protein; m CYP709B2 Cytochrome P450 family 709 subfamily B polypeptide 2; n Protein kinase superfamily protein; o NADP-malic enzyme 4; p DA1-related protein 4 nucleotide-binding leucine-rich repeat protein; q Exportin 1A; r SPFH/Band 7/PHB domain-containing membrane-associated protein family; s voltagedependent anion channel 1; t AAA-ATPase 1; u Pectate lyase family protein; v Rubredoxin-like superfamily protein; w receptor-like protein 12; x sucrose nonfermenting 1 (SNF1)-related protein kinase 2.3

8

11

8

FW

FW

1

FW

FW

1

1

FW

FW

1

1

FW

FW

1

1

FW

1

1

FW

FW

1

FW

FW

Chr

Trait

Table 7.4 (continued)

7 139

140

C. Zheng et al.

a

b R2 = 0.73

Number of favorable alleles

R2 = 0.62

Number of favorable alleles

Fig. 7.3 Relationship between the number of favorable alleles (NFA) in individuals and resistance to two flax diseases. a NFA versus pasmo (PAS) ratings and b NFAs versus powdery mildew (PM) ratings. The phenotypic data are the mean values over years. A combination of the

core collection and 75 breeding lines was used for PM but only the flax core collection was used for PAS. The curves represent locally weighted scatterplot smoothing (LOESS) lines

potential for fiber accessions to improve PAS resistance of linseed through the transfer of unique favorable alleles.

mainly dominant genes confer PM resistance in this population (Asgarinia et al. 2013). The SSR markers defining these three QTLs correspond to the following genomic regions: positions 16920407–18739647 on Chr 1, 3817603– 3817863 on Chr 7, and 357191–357510 on Chr 9 (You and Cloutier 2020). To further identify QTNs and candidate genes associated with PM, a large genetic panel of 372 accessions from the flax core collection and 75 selected breeding lines (Table 7.2) was used for GWAS. Here, two single-locus methods (GLM and MLM) and seven multi-locus methods (pLARmEB, pKWmEB, FASTmrMLM, ISIS EM-BLASSO, mrMLM, FASTmrEMMA, and FarmCPU) were used to identify QTNs from all 247,160 SNPs identified from the 447 flax genotypes. A total of 349 unique QTNs were identified for the six PM datasets (PM-2012, PM2013, PM-2014, PM-2015, PM-2016, and PMMean). Most QTNs are stable across years. Of 349 QTNs, 122 are stable across all six PM datasets. Of those, 44 are of large-effect QTNs with

7.3.2 Quantitative Trait Loci (QTLs) Associated with Powdery Mildew Resistance A genetic analysis of PM resistance identified one major dominant gene, designated Pm1, in three Canadian varieties (AC Watson, AC McDuff, and AC Emerson) and two introduced varieties (Atalante and Linda), and two additional dominant genes were also postulated in Linda (Rashid and Duguid 2005). Three PMresistant QTLs located on LG1 (QPM-crc-LG1), 7 (QPM-crc-LG7), and 9 (QPM-crc-LG9) were identified using phenotyping data obtained in both field and growth chamber studies from F3 and F4 families derived from a cross between NorMan (PM-susceptible) and Linda (PMresistant). The three QTLs explained 97% of the phenotypic variation, demonstrating that

7

QTL and Candidate Genes for Flax Disease Resistance

R2  10%, of which, the R2 values of 11 QTNs are greater than 20%, including Lu2-1672205 (30.2%), Lu3-581507 (29.71%), Lu5-11130392 (27.8%), Lu5-12090990 (27.6%), Lu5-15697144 (26.9%), Lu5-16602027 (25.2%), Lu5-16840013 (22.6%), Lu7-17007593 (21.3%), Lu9-3920670 (20.9%), Lu9-20701159 (20.9%), and Lu1117188390 (20.7%) (Table 7.4). Most large-effect QTNs for PM rating are clustered on the distal ends of chromosomes, especially on chromosome (Chr) 5 (0.4–5.6 Mb and 9.4–16.9 Mb) and Chr13 (2.6–4.9 Mb). Of the 44 large-effect QTNs with R2  10%, 15 are located on Chr 5 and five on Chr 13. Chr 8 also contains a QTL (18.9–19.0 Mb) that harbors two large-effect QTNs: Lu8-18951212 (R2 = 12.48%) and Lu8-19040276 (R2 = 15.67%). Similar to PAS, the number of favorable alleles (NFA) per genotypes is significantly correlated to the PM ratings of the accessions (Fig. 7.3a) (R2 = 0.73), demonstrating the additive nature of the identified QTNs.

7.3.3 Quantitative Trait Loci (QTLs) Associated with Fusarium Wilt Resistance Early studies based on genetic analysis at a phenotypic data level suggested that resistance to flax FW is controlled by several major genes and polygenes. Two independent and additive genes confer FW resistance in 143 DH lines derived from a cross between CRZY8/RA91 (FWresistant) and Glenelg (FW-susceptible) evaluated under greenhouse and field conditions (Spielmeyer et al. 1998). Two independent and recessive genes were also identified from the RIL population of the cross between Aurore (R) and Oliver (S) (Edirisinghe 2016). To locate the genes associated with flax FW onto genomic regions of chromosomes, a GWAS was performed using a Russian genetic panel of 297 accessions from the collection of the Federal Research Centre of the Bast Fiber Crops, Torzhok, Russia (Kanapin et al. 2021). The collection was phenotyped for FW under greenhouse condition during 2019–2021. All genotypes were

141

inoculated with the highly pathogenic Fusarium oxysporum f.sp. lini MI39 strain. A disease severity index (DSI) was calculated to represent the FW severity. Six single-locus GWAS statistical models (GLM, MLM, CMLM, FarmCPU, SUPER, and Blink implemented in the GAPIT3 R package) (Wang and Zhang 2021) were used with a subset of 72,526 SNPs. A total of 15 stable QTNs were identified from at least two of the three datasets. Among them, eight QTNs were detected from all three datasets (Table 7.4). Ten QTNs were located in a genomic region of 640 Kb on the distal end of Chr 1 (1.21–1.85 Mb), whereas the remaining five QTNs were scattered on Chr 8, 11, and 13. All stable QTNs showed significant allelic effects across datasets (Kanapin et al. 2021). Other GWAS using the flax core collection and the biparental RIL population Bison  Novelty have been performed for FW resistance. QTNs were identified, of which one large-effect QTL (Lu1_LDB_1769377_1966369, R2 = 31.4– 72.4%) on Chr 1 was identified in six FW datasets (three years at two locations) for the biparental population (unpublished data). It is notable that this QTL is located at the same region (1.72– 1.85 Mb) on Chr 1 from which two QTNs (Chr1:1722812 and Chr1:1854337) were identified from the Russian genetic panel (Kanapin et al. 2021).

7.4

Candidate Genes Co-localized with QTLs

Flax disease resistance can be either qualitative or quantitative. A qualitative trait is controlled by a few major-effect genes, whereas a quantitative trait is the result of interaction or additivity of large-effect QTLs and minor-effect genes that are difficult to distinguish. To improve the flax genetic resistance to its quantitative diseases, QTLs and co-localized resistance genes need to be identified and characterized. Resistance gene analogs (RGAs) belong to known disease resistance classes or possess features or domains commonly found in disease resistance genes which play important roles in

142

plant disease defense system and plant–pathogen interaction by detecting pathogen attacks and mounting a defense against pathogens. Consequently, RGAs are prime candidate genes and their prediction and evaluation constitute a valuable first step for investigation of candidate genes linked to QTLs for flax quantitative disease resistance (You et al. 2018b). Therefore, genome-wide RGAs identified in flax genome can assist candidate gene prediction. RGAs can be classified into five categories: (1) Toll/interleukin receptor (TIR)-NBS-LRR (TNL), (2) non-TNL/coiled-coil-NBS-LRR (CNL), (3) receptor-like kinase (RLK), (4) receptor-like protein (RLP), and (5) other variants (Hammond-Kosack and Jones 1997; Sanseverino et al. 2010). Each type of RGAs contains conserved domains and motifs. These structural features can be used to identify genome-wide RGAs using bioinformatics software tools such as the RGAugury pipeline (Li et al. 2016) and BLAST (Camacho et al. 2009). QTNs can be located either in coding regions or in intergenic regions but most are in intergenic regions because genes represent only a small proportion of plants’ genomes (You et al. 2022a). A QTN may not necessarily be located within the causal feature (e.g., gene) but both should be colocated within the same haplotype/LD block, which can be constructed based on the genomewide markers of a diversity panel (Purcell et al. 2007; Kim et al. 2019). More specifically to determine if a QTN is associated with a gene, the LD correlation (r2 or D′) between them must be sufficiently high (e.g., > 0.8) (You et al. 2022a). Since the LD blocks or correlations depend on the genetic diversity and the structure of a population, these theoretical estimates of LD are somewhat biased because they are specific to the panel. To avoid such biases, a fixed-size region flanking with a QTN is scanned to identify colocated candidate genes. Regions of 100–200 kb downstream and upstream of a QTN are commonly scanned for candidate gene identification (Kumar et al. 2015; You et al. 2018b; He et al. 2019; Sertse et al. 2019). The fixed window size can be estimated through analysis of LD decay curve (You et al. 2018b). Although it is simple

C. Zheng et al.

and straightforward, the disadvantage of the fixed window size is that it does not reflect the variable recombination rates across the genome. Regardless of the approach, validation of the candidate genes through functional genomics is necessary to confirm their causal effect(s) on disease resistance.

7.4.1 Candidate Genes Associated with Pasmo (PAS) Resistance To identify candidate genes for PAS resistance, a total of 1599 RGAs identified in the flax genome (You et al. 2018a) were scanned within a region of 200 Kb downstream and 200 Kb upstream of the 500 QTNs identified for PAS rating. A total of 372 RGAs were co-located with 314 QTNs. Among them, 85 RGAs were associated with 45 stable and large-effect QTNs (Table 7.4). These 85 RGAs include four types of RGAs: RLPs, RLKs, NBS coding genes (including TNL, TX, CNL, NL, TN, NBS, and others), and those encoding transmembrane coiled-coil proteins (TM-CC) (Sekhwal et al. 2015). The largest category is the RLKs representing 36.47% of the RGAs, followed by the TNLs at 22.35% (He et al. 2019). Eight QTNs were detected within the RGAs’ coding regions. They are Lu1-3420323 (Lus10042324, RLK), Lu2-23730537 (Lus1003 0634, RLK), Lu8-22525597 (Lus10015350, TNL), Lu9-1067536 (Lus10028975, TM-CC), Lu10-16054459 (Lus10022900, CNL), Lu121874446 (Lus10023329, TN), Lu13-2227366 (Lus10026988, RLK), and Lu15-14719354 (Lus10014810, RLK). Chromosome 8 contains an important genomic region (14.3–23.1 Mb) for PAS resistance. A total of 49 QTNs were detected in this region, including nine considered stable, major and with nearby RGAs (Table 7.4). QTNs Lu8-18251174 (R2 = 10.38%) and Lu8-18447612 (R2 = 11.66%) both co-locate with TNL gene clusters. (You et al. 2022a). In addition, QTN Lu822525597 (R2 = 2.74%) is located within a TNL gene Lus10015350 and QTN Lu8-14317356

7

QTL and Candidate Genes for Flax Disease Resistance

(R2 = 14.32%) is significantly correlated with genes Lus10016620 (RLK) and Lus10016612 (RLP).

7.4.2 Candidate Genes Associated with Powdery Mildew (PM) Resistance To identify candidate genes associated with flax PM resistance, two steps were taken (You et al. 2022b). In the first step, all 247,160 SNPs were used for GWAS and 349 QTNs were identified. In the second step, to detect QTNs specifically located within RGAs, a subset of 3,230 SNPs located on 838 flax RGAs was extracted for GWAS and subsequent analyses identified additional 39 QTNs within these RGAs. As such, a total of 388 QTNs were identified from the two complementary SNP datasets. Genome-wide scanning of RGAs along chromosomes within 200 Kb genomic regions of QTNs identified 445 RGAs for 269 of the 388 QTNs. These involve 13 gene families, including dirigent protein (DIR) disease resistance-zinc finger-chromosome condensation (DZC), extreme-drug-resistant (EDR), mildew resistance locus o (MLO), RLK, RLP, resistance to powdery mildew 8 (RPW8), TM-CC, TIR, NL, CNL, TNL, and WRKY. A total of 45 RGAs have QTNs located within their coding regions (Table 7.4). Six large-effect candidate RGAs have large R2 values (>20%). Lus10027903 on Chr 12 is an RLP gene (R2 = 20%). Another five candidate genes are located on chromosome 5. Lus10004727, Lus10004726, and Lus10004719 are tandem duplicate TNL genes (R2 = 24–28%). Lus10029860 is another TNL gene (R2 = 35%), and Lus10032303 is a WRKY gene (R2 = 24%). Some candidate RGA clusters are associated with PM resistance QTNs. They are scattered on 11 of the 15 flax chromosomes (Chr 2, 3, 5, 7– 10, 12–15) (Fig. 7.4). Each cluster contains at least three candidate genes. The largest candidate gene cluster is co-located with QTN Lu818351964 (R2 = 11.49%) on Chr 8 which was identified on the gene Lus10007812. This cluster contains 15 tandemly duplicated TN genes.

143

Another important gene cluster has TNL genes Lus10004726, Lus10004727, Lus10004719, and Lus10004747, which are associated with two QTNs: Lu5-1534998 (R2 = 27.83%) and Lu51535619 (R2 = 32.06%). QTN Lu5-1535619 was identified within Lus10004726, but each of the other three genes at this locus contains at least one SNP that had a significant allelic effect on PM resistance. Another remarkable gene cluster is the 108.6 Kb genomic region of Chr 13 (Fig. 7.5), which harbors three tandem duplicate RPW8 genes: Lus10000835, Lus10000836, and Lus10009328. From this region, four QTNs were identified: Lu13-4791823 (R2 = 13.79%), Lu134830850 (R2 = 7.17%), Lu13-4866704 (R2 = 12.24%), and Lu13-4900476 (R2 = 10.9%). The RPW8 genes have been found in several broadspectrum powdery mildew resistance proteins including Arabidopsis thaliana and other dicots (Xiao et al. 2001, 2003).

7.4.3 Candidate Genes Associated with Fusarium Wilt Resistance In the GWAS of flax FW resistance (Kanapin et al. 2021), 15 stable QTNs are associated with FW resistance in a genetic panel of 297 flax accessions (Tables 7.2 and 7.4). Candidate genes were scanned within a predefined window of the QTN flanking regions. The size of the window was determined by the extent of LD decay of the relevant chromosomes (Kanapin et al. 2021). A total of 13 putative candidate genes are located in the vicinity of eight QTNs (Table 7.4). Five QTNs harbor six candidate genes on Chr 1, one QTN associated with four genes on Chr 8, one gene on Chr 11, and two genes on Chr 13. Of these 13 candidate genes, nine are located in the vicinity of QTNs with physical distances ranging from 621 to 9405 bp. A QTN was located within the coding region of the other four candidate genes. For example, the genes Lus10025823 and Lus10035917 contain the QTN Chr1:1722812 and Chr11:6013057 within an intron. The genes Lus10025717 and

144

Fig. 7.4 Chromosomal distribution of 445 candidate resistance gene analogs (RGA) and 87 non-RGAs (labeled with “*”) co-located with 269 quantitative trait nucleotides (QTNs) within 200 Kb genome regions of the

C. Zheng et al.

QTNs. R2 represents the proportion of phenotypic variation explained by a QTN or a SNP on a candidate gene. The brackets show candidate gene clusters that contain at least three genes

7

QTL and Candidate Genes for Flax Disease Resistance

145

Fig. 7.4 (continued)

Lus10025773 contain the missense variant QTNs Chr1:1288653 and Chr1:1528323, respectively (Kanapin et al. 2021).

Most of these candidate genes are non-RGAs (Table 7.4). For example, the upstream region of the QTN Chr8:22560236 is the Lus10015344

146

C. Zheng et al.

Fig. 7.5 An RPW8 gene cluster on flax chromosome 13 harbors four QTNs for powdery mildew (PM) resistance: Lu13-4791823 (R2 = 13.79%), Lu13-4830850 (R2 =

7.17%), Lu13-4866704 (R2 = 12.24%), and Lu134900476 (R2 = 10.9%) (You et al. 2022b)

gene orthologous to Arabidopsis gene AT5G40010 that encodes an AAA-ATPase 1. Three other genes located downstream of this QTN are also candidate: Lus10015356 is orthologous to Arabidopsis HIR1 gene encoding a hypersensitive induced reaction protein; Lus10015357 is orthologous to Arabidopsis gene AT5G62740 encoding the voltage-dependent anion channel (VDAC) 1 protein; and Lus10015359 is orthologous to Arabidopsis gene AT3G01270 encoding a pectate lyase family protein.

geographical origins and genetic backgrounds for various important breeding targets including disease resistance, albeit it is somewhat lacking in highly resistant lines. As such, the 75 breeding lines selected for their disease resistance complemented the core collection and broaden the genetic variation of the collection for disease resistance to empower its use in genomic analyses for these traits. Large-scale phenotyping and genotyping of these populations provide plenty of genomic data to perform association mapping using multiple single- and multi-locus GWAS models and linkage map-based QTL identification from biparental populations. A total of 500 and 388 QTNs for PAS and PM, respectively, have been identified from the flax core collection with or without the addition of the 75 selected breeding lines, and 15 QTNs have been detected from the Russian genetic population for FW. Some putative candidate genes co-located with QTNs have been identified. GWAS using the RGAspecific SNPs helped to pinpoint QTNs located within RGAs. The identified QTNs for three disease resistance traits appear to have significant additive features facilitating the application of these QTNs in marker-assisted and genomic selections, and a high predictive ability is expected for these traits in disease resistance breeding. The candidate genes must be validated, upon which approaches such as gene editing can be envisioned to improve disease resistance in flax.

7.5

Conclusions

This chapter provides an overview and summarizes the studies on phenotyping, QTL identification, and candidate gene prediction for three quantitatively inherited diseases in flax, namely PAS, PM, and FW, as reported in the last decade. Several genetic populations including (1) the flax core collection of 391 accessions compiled from the 3378 accessions maintained by Plant Gene Resources of Canada (PGRC), (2) the genetic population of 297 genotypes selected from the collection of the Federal Research Center for Bast Fiber Crops (Torzhok Russia), (3) the 75 breeding lines selected mostly on the basis of their resistance to the three fungal diseases, and (4) biparental populations developed specifically for flax disease resistance genetics studies. The flax core collection preserves a considerably large genetic variation from a broad range of

Acknowledgements We thank Tara Edwards for English editing.

7

QTL and Candidate Genes for Flax Disease Resistance

References Anderson PA, Lawrence GJ, Morrish BC, Ayliffe MA, Finnegan EJ et al (1997) Inactivation of the flax rust resistance gene M associated with loss of a repeated unit within the leucine-rich repeat coding region. Plant Cell 9:641–651 Asgarinia P, Cloutier S, Duguid S, Rashid K, Mirlohi A et al (2013) Mapping quantitative trait loci for powdery mildew resistance in flax (Linum usitatissimum L.). Crop Sci 53:2462–2472 Bandillo N, Raghavan C, Muyco PA, Sevilla MAL, Lobina IT et al (2013) Multi-parent advanced generation inter-cross (MAGIC) populations in rice: progress and potential for genetics research and breeding. Rice 6:11 Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J et al (2009) BLAST+: architecture and applications. BMC Bioinform 10:421 Cavanagh C, Morell M, Mackay I, Powell W (2008) From mutations to MAGIC: resources for gene discovery, validation and delivery in crop plants. Curr Opin Plant Biol 11:215–221 Cloutier S, Ragupathy R, Niu Z, Duguid S (2011) SSRbased linkage map of flax (Linum usitatissimum L.) and mapping of QTLs underlying fatty acid composition traits. Mol Breed 28:437–451 Diederichsen A, Kusters PM, Kessler D, Bainas Z, Gugel RK (2013) Assembling a core collection from the flax world collection maintained by plant gene resources of Canada. Genet Resour Crop Evol 60:1479–1485 Dodds PN, Lawrence GJ, Ellis JG (2001a) Contrasting modes of evolution acting on the complex N locus for rust resistance in flax. Plant J 27:439–453 Dodds PN, Lawrence GJ, Ellis JG (2001b) Six amino acid changes confined to the leucine-rich repeat b-strand/bturn motif determine the difference between the P and P2 rust resistance specificities in flax. Plant Cell 13:163–178 Edirisinghe VP (2016) Characterization of flax germpLASM for resistance to Fusarium wilt caused by Fusarium oxysporum f.sp. lini. Department of Plant Science. University of Saskatchewan, p 120 Ellis JG, Lawrence GJ, Luck JE, Dodds PN (1999) Identification of regions in alleles of the flax rust resistance gene L that determine differences in genefor-gene specificity. Plant Cell 11:495–506 Flax Council of Canada (1996) Growing flax: production, management and diagnostic guide, 5th edn. Flax Council of Canada, pp 35–39 Fu YB (2011) Genetic evidence for early flax domestication with capsular dehiscence. Genet Resour Crop Evol 58:1119–1128 Hammond-Kosack KE, Jones JD (1997) Plant disease resistance genes. Annu Rev Plant Physiol Plant Mol Biol 48:575–607

147 He L, Xiao J, Rashid KY, Yao Z, Li P et al (2019) Genomewide association studies for pasmo resistance in flax (Linum usitatissimum L.). Front Plant Sci 9:1982 Huang X, Wei X, Sang T, Zhao Q, Feng Q et al (2010) Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet 42:961–967 Kanapin A, Bankin M, Rozhmina T, Samsonova A, Samsonova M (2021) Genomic regions associated with Fusarium wilt resistance in flax. Int J Mol Sci 22:12383 Kim SA, Brossard M, Roshandel D, Paterson AD, Bull SB et al (2019) gpart: human genome partitioning and visualization of high-density SNP data by identifying haplotype blocks. Bioinformatics 35:4419–4421 Kumar S, You FM, Duguid S, Booker H, Rowland G et al (2015) QTL for fatty acid composition and yield in linseed (Linum usitatissimum L.). Theor Appl Genet 128:965–984 Lawrence GJ, Finnegan EJ, Ayliffe MA, Ellis JG (1995) The L6 gene for flax rust resistance is related to the Arabidopsis bacterial resistance gene RPS2 and the tobacco viral resistance gene N. Plant Cell 7:1195– 1206 Lawrence GJ, Anderson PA, Dodds PN, Ellis JG (2010) Relationships between rust resistance genes at the M locus in flax. Mol Plant Pathol 11:19–32 Li P, Quan X, Jia G, Xiao J, Cloutier S et al (2016) RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants. BMC Genomics 17:852 Mackay I, Powell W (2007) Methods for linkage disequilibrium mapping in crops. Trends Plant Sci 12:57–63 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA et al (2007) PLINK: a tool set for wholegenome association and population-based linkage analyses. Am J Hum Genet 81:559–575 Rashid KY (1998) Powdery mildew on flax: a new disease in western Canada. Can J Plant Pathol 20:216 Rashid KY (2000) Pasmo disease in flax—impact and potential control. In: Proceeding of the Manitoba agronomists conference, Winnipeg, MB, pp 154–156 Rashid K, Duguid S (2005) Inheritance of resistance to powdery mildew in flax. Can J Plant Pathol 27:404– 409 Rashid KY, Kenaschuk EO (1993) Effect of trifluralin on fusarium wilt in flax. Can J Plant Sci 73:893–901 Rashid KY, Kenaschuk EO (1994) Genetics of resistance to flax rust in six Canadian flax cultivars. Can J Plant Pathol 16:266–272 Rashid KY, Kenaschuk EO, Platford RG (1998) Diseases of flax in Manitoba in 1997 and first report of powdery mildew on flax in western Canada. Can Plant Dis Surv 78:99–100 Sanseverino W, Roma G, De Simone M, Faino L, Melito S et al (2010) PRGdb: a bioinformatics platform for plant resistance gene analysis. Nucleic Acids Res 38:D814–D821

148 Sekhwal MK, Li P, Lam I, Wang X, Cloutier S et al (2015) Disease resistance gene analogs (RGAs) in plants. Int J Mol Sci 16:19248–19290 Sertse D, You FM, Ravichandran S, Cloutier S (2019) The complex genetic architecture of early root and shoot traits in flax revealed by genome-wide association analyses. Front Plant Sci 10:1483 Soto-Cerda BJ, Maureira-Butler I, Munoz G, Rupayan A, Cloutier S (2012) SSR-based population structure, molecular diversity and linkage disequilibrium analysis of a collection of flax (Linum usitatissimum L.) varying for mucilage seed-coat content. Mol Breed 30:875–888 Spielmeyer W, Lagudah ES, Mendham N, Green AG (1998) Inheritance of resistance to flax wilt (Fusarium oxysporum f.sp. lini Schlecht) in a doubled haploid population of Linum usitatissimum L. Euphytica 101:287–291 Vera CL, Duguid SD, Fox SL, Rashid KY, Dribnenki JCP et al (2012) Comparative effect of lodging on seed yield of flax and wheat. Can J Plant Sci 923–943 Wang J, Zhang Z (2021) GAPIT version 3: boosting power and accuracy for genomic association and prediction. Genom Proteom Bioinform 19:629–640 Wiesner I, Wiesnerovà D (2004) Statistical correlations of primer thermodynamic stability DG° for enhanced flax ISSR-PCR cultivar authentication. J Agric Food Chem 52:2568–2571 Xiao S, Ellwood S, Calis O, Patrick E, Li T et al (2001) Broad-spectrum mildew resistance in Arabidopsis thaliana mediated by RPW8. Science 291:118–120

C. Zheng et al. Xiao S, Charoenwattana P, Holcombe L, Turner JG (2003) The Arabidopsis genes RPW8.1 and RPW8.2 confer induced resistance to powdery mildew diseases in tobacco. Mol Plant Microbe Interact 16:289–294 You FM, Cloutier S (2020) Mapping quantitative trait loci onto chromosome-scale pseudomolecules in flax. Methods Protoc 3:28 You FM, Jia G, Xiao J, Duguid SD, Rashid KY et al (2017) Genetic variability of 27 traits in a core collection of flax (Linum usitatissimum L.). Front Plant Sci 8:1636 You FM, Xiao J, Li P, Yao Z, Gao J et al (2018a) Chromosome-scale pseudomolecules refined by optical, physical, and genetic maps in flax. Plant J 95:371– 384 You FM, Xiao J, Li P, Yao Z, Jia G et al (2018b) Genome-wide association study and selection signatures detect genomic regions associated with seed yield and oil quality in flax. Int J Mol Sci 19:2303 You FM, Rashid KY, Cloutier S (2022a) Genomic designing for genetic improvement of biotic stress resistance in flax. In: Kole C (ed) Genomic designing for biotic stress resistant oilseed crops. Springer, Cham, pp 311–345 You FM, Rashid KY, Zheng C, Khan N, Li P et al (2022b) Insights into the genetic architecture and genomic prediction of powdery mildew resistance in flax (Linum usitatissimum L.). Int J Mol Sci 23:4960 Yu J, Holland JB, McMullen MD, Buckler ES (2008) Genetic design and statistical power of nested association mapping in maize. Genetics 178:539–551

8

Key Stages of Flax Bast Fiber Development Through the Prism of Transcriptomics Tatyana Gorshkova, Natalia Mokshina, Nobutaka Mitsuda, and Oleg Gorshkov

8.1

Introduction

The boost in transcriptome analysis over the last two decades has provided an enormous amount of data that is of indispensable value for many fields of biology. The large-scale analysis of gene expression in flax was first based on a microarray approach (Day et al. 2005; Roach and Deyholos 2007) and the sequencing of ESTs (expressed sequence tags) from different flax tissues (Fenart et al. 2010; Venglat et al. 2011). Several years after flax genome sequencing (Wang et al. 2012), studies involving NGS (next-generation sequencing) were initiated in several laboratories (e.g., Dmitriev et al. 2016a, b; Galindo-González and Deyholos 2016; Gorshkov et al. 2017; Zhang and Deyholos 2016) and gave the opportunity to compare the RNA level of each gene in the tissues of interest. Several years ago, the earliest data were reviewed (Griffiths and Datla 2019). The set of

T. Gorshkova (&)  N. Mokshina  N. Mitsuda  O. Gorshkov Kazan Institute of Biochemistry and Biophysics, FRC Kazan Scientific Center of RAS, Lobachevsky Str., 2/31, 420111 Kazan, Russia e-mail: [email protected] N. Mitsuda Plant Gene Regulation Research Group, Bioproduction Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki 305-8566, Japan

the obtained NGS data has recently been accumulated in the online platform FIBexDB (plant FIBer expression DataBase) (Mokshina et al. 2021) that permits to analyze each gene by means of bioinformatics and compare the parameters of gene expression in various tissues and at different stages of bast (phloem) fiber formation. Though the transcript amount of a certain gene is not always proportional to the amount of the corresponding protein and especially to its activity, the changes in transcriptome patterns are very informative for realizing the peculiarities of general cell metabolism and detecting the key molecular players at major stages of bast fiber development. Flax is an important crop, and understanding its development and formation of resistance to various factors is important in itself. However, flax is also a convenient model for studies in general plant biology. Together with the relatively small genome size (43,484 coding sequences, Wang et al. 2012), the reasons for the importance of flax as a model are the relatively simple construction of the plant body, which is morphologically rather uniform along the stem length, and a good understanding of fiber development stages that include two important processes in plant biology—cell elongation and cell wall thickening. These two stages are intrusive elongation that starts soon after fiber initiation and fiber cell wall thickening induced further in the course of a fiber development (Gorshkova et al. 2003, 2005). Both stages are of crucial

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_8

149

150

importance for the yield and quality of the fiber crop (Mokshina et al. 2018). The localization of these major stages of fiber development within a plant is known, enabling to sample the appropriate stem portion for the analysis; the marker for the fiber developmental transition from elongation to cell wall thickening is the so-called snap point (Fig. 8.1), which is easily manually distinguished by the sharp increase in stem mechanical strength (Gorshkova et al. 2003). Flax phloem has only primary fibers that are initiated from the procambium close to the stem apical meristem (Esau 1943). This helps to avoid complications in data interpretation due to the development of secondary phloem fibers, as it occurs in many other fiber crops, like hemp and ramie. After the initiation of cell wall thickening, the bast fibers become mechanically so strong that they can be rather quickly and easily isolated from the stem by gentle pulling with a pestle in a mortar with 80% ethanol (Mokshina et al. 2014; Gorshkov et al. 2017). Such a sample is quite suitable to perform the NGS analysis, giving a chance to obtain a definite cell type at a definite stage of its development. This avoids the problem of plant tissue heterogeneity, which significantly complicates the data interpretation. This problem is often overlooked in the studies of bast fiber development and their reaction to various environmental factors. Changes in the transcriptome profiles of the whole stem pieces or of the fiber-enriched peels that together with bast fibers contain epidermis, several layers of parenchyma cells, and phloem sieve elements are discussed as if coming only from fibers (Long et al. 2012; Guo et al. 2017a). The special problems are faced to obtain intrusively growing fiber. At this stage, fibers are quite injurable, since getting quite long they still have only the primary cell wall (PCW). Laser microdissection combined with cryosectioning was developed for bast fibers of flax to obtain transcriptome profiles of elongating fibers (Gorshkova et al. 2018a). This made it possible to compare expression patterns with fibers that thicken the cell wall to figure out the genes with stage-specific expression (Fig. 8.1).

T. Gorshkova et al.

It should be noted that the term “fiber” has different meaning depending on the field of its application: In terms of plant biology, a fiber is an individual sclerenchyma cell that provides mechanical support to the plant and is characterized by an extreme length, a high length-todiameter ratio, and an impressively thick cell wall (van Dam and Gorshkova 2003). While describing fiber composition, biogenesis, and metabolism, we consider, first of all, fiber as an individual cell, while many of the traits of flax fibers evaluated after flax harvesting are referred to technical flax fibers, which represent fiber bundles obtained from harvested plants.

8.2

Intrusive Elongation

8.2.1 The Significance of Fiber Intrusive Elongation Bast fibers belong to the longest plant cells. The average length of individual flax phloem fibers is around 2.5 cm, while some of them can be several times longer (Snegireva et al. 2010). Most of the fiber length is acquired through intrusive growth—the type of cell elongation with a rate higher than that of its neighbors. The average increase in cell surface and volume during intrusive growth of flax primary phloem fibers is several thousand times (Gorshkova et al. 2018a). Intrusive growth leads to the disturbance of the former contacts with the neighboring cells through the middle lamella and the establishment of new ones (Esau 1965). Flax fibers elongate by diffuse growth, i.e., by the increase of the whole cell surface, as distinct from tip growth, which occurs only at cell extremities. Such a process cannot avoid the disruption of plasmodesmata, turning fibers into symplastically isolated domains (Ageeva et al. 2005; Snegireva et al. 2010). Although intrusive growth occurs at an early stage of fiber development, it inevitably and profoundly affects the final fiber yield and quality, as summarized in Mokshina et al. (2018). It determines the final size of each individual fiber,

8

Key Stages of Flax Bast Fiber Development Through …

151

Fig. 8.1 a Key stages of primary bast fiber development, their impact on the quality-related parameters (left side), and samples of fibers and other stem tissues used for RNA-Seq analysis to reveal key participants involved in fiber biogenesis at different stages of development. Snap point is a marker of fiber developmental transition from elongation to cell wall thickening. Samples for which

additionally miRNA-Seq was carried out are underlined (Gorshkov et al. 2019). b The list of samples for flax transcriptomic data that were generally used in this chapter. Transcriptome data for different flax samples were used for FIBexDB (Mokshina et al. 2021). PCW— primary cell wall, SCW—secondary cell wall, and TCW —tertiary cell wall

both in length and width. It is an intrusive growth that forms and shapes fiber bundles. It destroys and reestablishes the contacts between individual fibers and surrounding cells and leads to the tight packing of fibers within a bundle, with no intercellular spaces. Despite the significance of intrusive growth for fiber biogenesis,

crop processing, and technical fiber quality, it has been largely undervalued and poorly characterized from a molecular-genetic point of view, since the major focus in fiber research has always been on thick cell walls. An additional reason for the lack of information on intrusive growth is the difficulty to obtain RNA samples exclusively

152

from the intrusively growing fibers that is performed by relatively few cells located several cell layers away from the stem surface. Laser microdissection combined with stem cryosectioning was developed for bast fibers of flax for NGS sequencing (Gorshkova et al. 2018a), and this remains the only experimental paper on transcriptome analysis of isolated bast fibers obtained at the stage of intrusive elongation.

8.2.2 Approaches to Reveal the Genes for Proteins Especially Important for a Fiber at Intrusive Growth Stage of Development Transcriptomic analysis provides a comprehensive view of mRNA abundance for all genes from the genome that are transcribed in the sample of interest. If a sample consists of a definite cell type, such general information gives a good idea of what a cell is doing and/or preparing to do. The major approach to figure out the important molecular players in the “personal life” of a sample is the analysis of differentially expressed genes between two samples. Data from numerous experiments accumulated in FIBexDB help to reveal such genes by arbitrary selection of the samples for pairwise comparison and the considered level of differences in expression. Below, we present the results obtained for fibers at the intrusive elongation stage (samples iFIBa and iFIBb, Fig. 8.1). Sample iFIBa is located closer to the shoot apical meristem (SAM) than iFIBb and corresponds to the earlier stage of the fiber intrusive growth. Having in mind the sequence of fiber developmental stages (initiation and coordinated growth—intrusive elongation— cell wall thickening) that correspond to the set of SAM—iFIBa and iFIBb—tFIBa and tFIBb samples, we additionally included the sample of different tissues with the primary cell wall— cortical parenchyma (cPAR) that was also obtained by laser microdissection—into pairwise comparison by the FIBexDB DEG Finder—a special service that helps to reveal and analyze

T. Gorshkova et al.

the differentially expressed genes. With the input parameters set up as (iFIBa/SAM > 2), and (iFIBa/cPAR > 2), and (iFIBb/SAM > 2), and (iFIBb/cPAR > 2) (target exp > 10 for these four comparisons), and (tFIBa/iFIBa < 0.5), and (tFIBa/iFIBb < 0.5) (control exp > 10 for the two comparisons), the resulting list contains 318 flax genes, corresponding to 251 genes of Arabidopsis thaliana (Arabidopsis), upregulated at intrusive elongation compared to previous and following stages of fiber development and to different tissues with the primary cell wall (Fig. 8.2). The heatmap for the expression of each upregulated in intrusively growing fibers gene that fits the aforementioned parameters illustrates the level of their mRNA abundances not only for the stem samples from Fig. 8.1 but also in numerous flax entries present in the database, including those from roots and leaves (Fig. 8.2). Similarly, the genes downregulated at intrusively elongating fiber can be selected by DEG Finder with the following set of parameters: (iFIBa/SAM < 0.5), and (iFIBa/cPAR < 0.5), and (iFIBb/SAM < 0.5), and (iFIBb/cPAR < 0.5) (control exp > 10 for these four comparisons), and (tFIBa/iFIBa > 2), and (tFIBa/iFIBb > 2) (target exp > 10 for two comparisons). The result is 580 flax genes corresponding to 443 Arabidopsis genes. The full list of both up- and downregulated genes can be easily obtained from FIBexDB by setting up the above parameters in the DEG Finder. The downregulated genes included those annotated as cyclins and histones, as well as many other genes whose expression is mainly associated with cell division (Gorshkova et al. 2018a). Accordingly, the expression of these genes was detected mainly in the apical part of the stem. FIBexDB is also helpful to cluster the genes according to the pattern of their expression. The results of such clustering for the selected genes indicate that there is a large group of genes rather specifically expressed in intrusively growing fibers (cluster 1), as well as the groups that are additionally expressed in roots, leaves, hypocotyl, and/or xylem (Fig. 8.2). It should be noted that a young xylem sample (sXYLa) may contain

8

Key Stages of Flax Bast Fiber Development Through …

153

Fig. 8.2 Clustering of upregulated genes (318 genes) in the intrusively growing fibers of flax (iFIB). Selected DEGs for clustering (options in FIBexDB): iFIBa/SAM > 2, and iFIBa/cPAR > 2, and iFIBb/SAM > 2, and iFIBb/cPAR > 2, and tFIBa/iFIBa < 0.5, and tFIBa/iFIBb < 0.5. The heatmap for the selected genes has been arranged for numerous flax samples stored in the database. Samples: HYP—hypocotyls; root tips—control root samples (different cultivars, Dmitriev et al. 2016a, 2017); leaves—upper flax leaves (Dmitriev et al. 2016b); apex—shoot apical meristem and apical region (Zhang and Deyholos 2016,

unpublished); cPAR—cortical parenchyma with PCW (Gorshkova et al. 2018a); iFIB—the intrusively growing phloem fibers with PCW (Gorshkova et al. 2018a; Gorshkov et al. 2019); tFIB—the isolated phloem fibers with TCW (different cultivars, Gorshkov et al. 2019; Mokshina et al. 2020; Galinousky et al. 2020); and sXYL— xylem part of the stem enriched with SCW (Gorshkov et al. 2019; Mokshina et al. 2020). The detailed sample descriptions and references are presented in FIBexDB and the supplementary file of the related paper (Mokshina et al. 2021)

secondary xylem fibers at the stage of intrusive elongation. The presence of some fibers in flax organs other than stem has not been documented but cannot be excluded. An alternative way to figure out the genes specifically expressed in a definite sample or set of samples is the evaluation of the so-called tissue specificity score (Tau) (Mokshina et al. 2020). The calculation of Tau involves the comparison of a gene expression level in each sample with its maximal expression in overall tissues (Yanai et al. 2005). The Tau index ranges from 0 (expressed in all tissues) to 1 (expressed in a single tissue) (Kryuchkova-Mostacci and Robinson-Rechavi 2016). While considering the intrusive elongation and cell wall thickening stages, the set of genes gets Tau  0.8 (Fig. 8.3, 51 gene), meaning that in other tissues these

genes have much lower mRNA abundance. The major categories of such genes that can be considered as having bast fiber-specific expression are transport, signal transduction, and cell wall (Mokshina et al. 2020). Together with that, some genes have differential mRNA abundance at the two mentioned stages of fiber development and a high Tau score specifically for one stage (Fig. 8.3); such genes can be considered as genes having tissue- and stage-specific expression in bast fibers. There are only six genes that show a high Tau score for the expression in intrusively growing fibers. These genes are specially designated in the following chapters by the mark (Tau > 0.8). The genes that are upregulated in developing bast fibers, especially those having tissue and/or stage-specific character of expression, are quite

154

T. Gorshkova et al.

Fig. 8.3 Heatmap of genes specifically expressed in iFIB, tFIB, or iFIB and tFIB, selected based on Tau calculation. The sample description as in Fig. 8.2

useful for searching the key molecular players involved and can serve as a target for effective molecular-genetic modulation of fiber yield and quality. To present the transcriptomic view for the intrusive elongation of flax bast fibers in the following sections, we will cite the results provided by DEG Finder of FIBexDB with the listed above parameters. In some cases, we additionally consider the upregulated genes in iFIB that keep high expression levels at further stages of fiber development or are additionally expressed in other tissues. The presented figures contain, besides the set of samples that were used to analyze differential expression (SAM, iFIBa and iFIBb, tFIBa and tFIBb, mainly from long-fibered flax cultivar Mogilevsky), the fibers at the cell wall thickening stage isolated from different flax genotypes: long-fibered flax cultivar Grant (tFIB_Gr), linseed cultivar Lirina (tFIB_Li), another flax species Linum bienne (tFIB_Bi), as well as complex mixture of tissues in xylem part (sXYLa and sXYLb), and apical region of linseed cultivar Bethune (AR_Bet). All these samples are described in Fig. 8.1. Their inclusion helps to strengthen the revealed regularities and further focus on the “most specific” genes.

8.2.3 General Cell Physiology as Revealed by Transcriptomic Data Based on the obtained transcriptomic data coupled to the known parameters of an elongating fiber gained by microscopy (Ageeva et al. 2005; Snegireva et al. 2010), bast fiber physiology at an intrusive growth stage can be pictured by several major processes. One of them is the establishment of active photosynthetic machinery (Gorshkova et al. 2018a). The well-developed chloroplasts with the structure characteristic of active photosynthesis have been documented by electron microscopy (Ageeva et al. 2005). Genes encoding the components of photosynthesis machinery are highly expressed and often upregulated in the intrusively growing fibers if compared with the meristematic region, including genes for the various components of photosystems (Lus10004895, Lus10001169, Lus10006593, Lus10006594), large subunit of ribulose-bisphosphate carboxylase (Lus10032825), chloroplast ATP synthase complex (Lus10009173, Lus10032827), chloroplast ATPase (Lus10002548, Lus10004894), etc. (Gorshkova et al. 2018a). These genes are also highly expressed in many other tissues (Fig. 8.4).

8

Key Stages of Flax Bast Fiber Development Through …

155

Fig. 8.4 Expression of genes upregulated in intrusively growing fibers (iFIBa and iFIBb) encoding photosynthesisrelated proteins, various transporters, and other proteins

important for general cell physiology. The genes are presented in the descendance order of the expression level in iFIBb

Symplasmic isolation of fibers as a result of the disappearance of plasmodesmata means that intercellular transport can occur only through the fiber apoplast. Consequently, the set of transporters must be rearranged. The most specific changes in expression are detected for genes encoding amino acid transporters. Lus10012444 and Lus10028236 homologous, respectively, to AT1G10010 and AT5G49630 for the nonselective amino acid-proton symporters, AAP8 and AAP6, are upregulated quite significantly and specifically (Fig. 8.4). Amino acid transport can

get additional activation from the so-called glutamine dumper 1—a probable subunit of nonselective facilitators that stimulates amino acid export (Pratelli et al. 2010). The corresponding gene, Lus10006987, homologous to AT5G57685, is considerably activated (Fig. 8.4) at the stage of fiber intrusive growth. Similarly, Lus10021272 (homolog of AT3G11680 for Aluminumactivated malate transporter 8) has virtually zero transcript level in other tissues but is quite substantially expressed in iFIB samples (Tau 0.8). In addition, genes for several poorly

156

characterized transporters, like Lus10009314, and Lus10041994, are upregulated (Fig. 8.4); the substances translocated by these transporters are unknown. The abundance of mRNA for the bidirectional sugar transporters located in the plasma membrane —SWEET, also named MtN3 (Lus10016742, Lus10022436)—increases in intrusively growing fibers compared to other tissues with the primary cell wall. Such a transporter in Arabidopsis (AT5G50790) mediates both low-affinity uptake and efflux of sugar across the plasma membrane (https://www.uniprot.org/uniprot/Q9LUE3). At a more advanced stage offiber specialization, during cell wall thickening, the expression of the listed SWEET genes could be even higher than in iFIB samples, as distinct from all other mentioned transporters. The genes upregulated in iFIB samples include Lus10001087 and Lus10040134 for chloroplast beta-amylase that cleaves starch into low-molecular weight sugars and Lus10024605 for vacuolar invertase that cuts sucrose into two monosaccharides, raising osmolarity. Besides, the gene Lus10028303 for the tonoplast sugar transporter named early responsive to dehydration-like 6 is activated (Fig. 8.4). These changes in the pattern of sugar transport and metabolism suggest low-molecular carbohydrates as the most probable osmolytes involved in fiber elongation. Plant cell enlargement is largely based on the combination of cell wall surface extension and vacuole volume increase (Cosgrove 2005). The intensive water influx into a fiber that is necessary to maintain sufficient turgor pressure in an enlarging cell is provided by aquaporins, including plasma membrane intrinsic proteins. Different members of multigene families encoding aquaporins are expressed in iFIB, including those considerably upregulated in the intrusively growing fibers (Lus10036531, Lus10024651, Lus10032283) (Gorshkova et al. 2018a). This is typical for cells that undergo rapid elongation. Other genes for vacuolar transporters that are also activated in intrusively growing fibers include Lus10009314 (AT4G22990) for vacuolar phosphate transporter family protein, the same as Lus10004896 (AT3G21250) for ABC transporter

T. Gorshkova et al.

that pumps glutathione S-conjugates into the vacuole (Fig. 8.4). Plant cell elongation is coupled to acidification of the apoplast (Cosgrove 2005). In accordance, the Lus10026224 and Lus10042445 genes for H (+)-ATPase, which serves as a proton pump, are upregulated in intrusively growing fibers (Gorshkova et al. 2018a). Active energy supply is indicated by the high expression of genes for mitochondrial components, like the NADH dehydrogenase subunit (Lus10000460, Lus10015033, Lus10009720) or cytochrome oxidase (Lus10004084, Lus10009204) (Fig. 8.4). Two genes significantly upregulated in intrusively growing fibers, Lus10027755 and Lus10035538, are both homologs of AT3G54110 (Fig. 8.4). The encoded protein named as plant mitochondrial uncoupling protein has recently been identified as a mitochondrial transporter of aspartate, glutamate, and dicarboxylates and proposed to shuttle redox equivalents during photorespiration (Monné et al. 2018). Together with that, Lus10004037—the gene for glutamine synthetase cytosolic isozyme (AT5G37600) is highly activated in iFIB samples (Fig. 8.4). Moreover, three genes highly and specifically upregulated in iFIB encode the ACR (ACT DOMAIN REPEAT) proteins: Lus10011715 (AT5G65890, ACR1), Lus10019191, and Lus10019192 (both AT1G69040, ACR4). The functions of these proteins in plants are still unknown, but they are similar to bacterial GlnD proteins that contain two C-terminal ACT domains (named after aspartate kinase, chorismate mutase, and TyrA) and are involved in sensing glutamine (Kan et al. 2015). The activation of gene expression for all these proteins emphasizes the importance of glutamine metabolism in intrusively growing fiber, especially if additionally consider the aforementioned players involved in the transport of amino acids across the plasma membrane.

8.2.4 Cell Wall Rearrangement The enormous increase in the cell surface during fiber intrusive growth requires the active expression of the primary cell wall-related genes. The genes for

8

Key Stages of Flax Bast Fiber Development Through …

157

Fig. 8.5 Expression of genes encoding cell wall-related proteins upregulated in intrusively growing fibers (iFIBa and iFIBb). The genes are presented in the descendance order of the expression level in iFIBb

proteins involved in the biosynthesis of primary cell wall polysaccharides (xyloglucan, homogalacturonan, rhamnogalacturonan I (RG-I), rhamnogalacturonan II, and cellulose) have pronounced expression levels, both for the corresponding glycosyltransferases that synthesize these polymers and for enzymes involved in the interconversion of their monosaccharide substrates. However, none of these genes is considerably upregulated in intrusively growing fibers relative to other samples with the primary cell wall (Gorshkova et al. 2018a). The upregulated gene encodes b-1,3galactosyltransferase (Lus10031567 homologous to AT1G05170), but this enzyme is involved in protein glycosylation rather than in the biosynthesis of cell wall polysaccharides. Xyloglucan endotransglucosylases/hydrolases (XTH)—the enzymes considered the important agents in the cell elongation process (Van Sandt et al. 2007)—are encoded by numerous genes. Several genes for XTHs are actively expressed in intrusively growing fibers (e.g., Lus10039715, Lus10018503), but none of them is specifically activated in iFIB samples. The most notable changes in the expression of the cell wall-related genes occur among those involved in the metabolism of pectins and callose

(Fig. 8.5). Besides, several genes for expansins (Lus10034465, Lus10040801, and Lus10019088) are rather specifically upregulated in the intrusively growing fibers. Expansins are the bestknown proteins acting as wall-loosening agents (Cosgrove 1998). In the flax genome, the multigene family of expansins and expansin-like proteins has 65 genes. Pectins are the major components of the middle lamellae that have to be split during the intrusive elongation. The families of plant enzymes involved in the modification and degradation of pectins include polygalacturonases, pectin methylesterases, and their negative regulators—pectin methylesterase inhibitor proteins, pectate lyases, pectin acetylesterases, and rhamnogalacturonan lyases. Each of them is encoded by a multigene family, sometimes quite large. Of these, a substantial upregulation of expression in intrusively elongating fibers is characteristic for representatives of pectin lyaselike superfamily proteins, polygalacturonases, and several pectin methylesterase inhibitors (Lus10004327, Lus10028910, Lus10038914) (Fig. 8.5). Pectin lyases are enzymes that depolymerize demethylated homogalacturonan

158

by a transelimination mechanism (Sénéchal et al. 2014); several genes for pectin lyases (Lus10023679, Lus10011758, Lus10036231, Lus10022817) were upregulated in iFIB. Of the numerous genes for polygalacturonases—the enzymes that cleave homogalacturonan through the different mechanism—Lus10013009 is substantially upregulated in the intrusively growing fibers, while the other, Lus10029156, is highly expressed similarly to other tissues (Fig. 8.5). The actively expressed gene for pectin methylesterase, Lus10003933, has similar mRNA abundance in various tissues depositing the primary cell wall. Modification of RG-I in the intrusively growing fibers is confirmed by the stage-specific upregulation of genes for b-galactosidase (Lus10025108, Lus10014278) (Fig. 8.5), which can trim off the neutral side chains of RG-I. Disruption of plasmodesmata may be coupled to callose plug formation and its rearrangement in the course of further fiber elongation. As comes from the transcriptomic analysis, the entire machinery for callose synthesis, turnover, and interaction with other cell wall constituents is well established in intrusively growing fibers. It includes the glucan synthase-like protein similar to callose synthase (Lus10037471, Lus10003917, etc.), numerous O-glycosyl hydrolases of family 17 that are able to hydrolyze callose (e.g., Lus10017740, Lus10024535, Lus10023309, and Lus10038501 with Tau > 0.8), and callosebinding proteins with the X8 domain (Lus1000 1167) (Fig. 8.5). Genes for sucrose synthase 6—the enzyme that provides sugar moieties for callose synthesis—are upregulated in intrusively growing fibers (Lus10020791 and Lus10007372, both homologous to AT1G73370). Altogether, transcriptomic data indicate that cell wall extension during intrusive growth occurs due to the action of upregulated expansins, probable modification of RG-I, and cleavage of homogalacturonan, which is performed by pectate lyases rather than by polygalacturonases.

T. Gorshkova et al.

8.2.5 Cytoskeleton The extensive cell volume increase during fiber intrusive elongation requires spatial rearrangements of intracellular components. Such processes usually involve the cytoskeleton. Altogether, there are over 320 cytoskeleton-related genes in the flax genome; a large portion of them is upregulated in intrusively growing fibers, as illustrated by FIBexDB clustering (Fig. 8.6, left). Transcriptomic data allow us to elucidate the molecular elements of this intracellular network that are important for the intrusive elongation of phloem fibers. These include the definite isoforms of actin (actin 7—Lus10005819, Lus10005820, Lus10006783, and Lus10006784, all homologous to AT5G09810, and actin 11—Lus10004169, Lus10021057, both homologous to AT3G12110), tubulin (tubulin 8—Lus10027476 and Lus10039231, both homologous to AT5G23860), and the proteins that bind the filaments and microtubules and serve in their formation and/or function (Fig. 8.6). Two upregulated in iFIB genes encode actin-binding proteins with homology to the formin with the FH2 domain. Formins are engaged in actin assembly, including nucleation, elongation, capping, etc. Individual members of the multigene family, which in Arabidopsis accounts for over 20 members, may perform different functions; however, the exact specialization is still poorly characterized (van Gisbergen and Bezanilla 2013). Upregulated formins in iFIB, Lus10024450 and Lus10007447, are both homologous to AT2G25050, which encodes ATFH18 (Fig. 8.6). Even less is known about other upregulated genes for actin-binding proteins (Lus10038998, Lus10027293, both homologous to AT2G40820) that are plant-specific and are supposed to bundle and stabilize microfilaments (Zhao et al. 2011). Additionally, activation of expression is detected in intrusively growing fibers for homologs of VILLIN 2 (Lus10031066, Lus10035450, both homologous to AT2G41740) —Ca2+-regulated actin-binding protein involved in actin filament bundling (https://www.uniprot.

8

Key Stages of Flax Bast Fiber Development Through …

159

Fig. 8.6 Expression of genes for cytoskeleton-related proteins upregulated in intrusively growing fibers (iFIBa and iFIBb). Left—the heatmap of all cytoskeleton-related

gene expression in flax tissues. Right—the indicated cluster in the left heatmap is presented with the descendant order of a gene expression level in iFIBb

org/uniprot/O81644) and for Lus10025865 (AT5G48460) that encodes FIM2 from the fimbrin family, members of which link together actin filaments along their sides (Pollard and Cooper 2009) (Fig. 8.6). Intracellular trafficking along actin bundles and cytoplasm streaming are largely based on the plant-specific actin-myosin system. The genes for the specific member of the myosin family (Lus10017160, Lus10021570, both homologous to AT5G20490, myosin XI-17, also designated XI-K) are significantly upregulated in intrusively growing fibers (Fig. 8.6). Myosin XI-K has been

recently shown in Arabidopsis to interact with a special receptor located in Golgi membranes (Perico et al. 2021), indicating the involvement of this myosin isoform in the translocation of Golgi elements. The set of microtubule elements and interacting proteins is also adjusted during fiber intrusive growth. Besides tubulin b 8 isoforms, activation of gene expression is detected for numerous versions of kinesins (Fig. 8.6). Distinct from myosins moving macromolecular complexes along actin filaments, kinesins are cytoskeletal motor proteins that generate force

160

along microtubules; many tens of them encoded in plant genomes are grouped into several families (Nebenführ and Dixit 2018). Quite a number of kinesin-encoding genes are highly upregulated in iFIB samples (Fig. 8.6). Additionally, these kinesins often have increased mRNA abundance in young xylem sample (sXYLa), which may contain intrusively growing xylem fibers. The upregulated kinesins include members of the kinesin-14 family that is vastly overrepresented in plants compared to animals and has various functions (Nebenführ and Dixit 2018). Lus10010301 and Lus10010302 (AT2G47500), as well as Lus10001306 and Lus10032897 (AT3G44730), belong to this family. Together with that, the expression of Lus10010047, Lus10018731, and Lus10024805 (AT5G47820) that encode FRAGILE FIBER1 (FRA1) from the kinesin-4 family is induced (Fig. 8.6). FRA1 promotes secretion of noncellulosic cell wall material; fra1 mutants have thinner walls and reduced deposition rates of cell wall polysaccharides in Arabidopsis (Zhu et al. 2015). Additionally, the upregulation is observed for Lus10017496 and Lus10028790, both homologous to AT4G39050 encoding kinesin-7 that promotes microtubule polymerization (Nebenführ and Dixit 2018). Lus10035940 and Lus10025721 upregulated in iFIB (Fig. 8.6), are both homologous to AT5G10470, encoding the unusual kinesin-like protein KAC1 that is involved in chloroplast movement (Suetsugu et al. 2010). This kinesin belongs to the kinesin-14 family but binds to actin rather than tubulin. It is still unclear how this interaction might contribute to chloroplast movement because key residues required for ATPase activity are altered in this kinesin and it is unlikely to exert mechanical force (Nebenführ and Dixit 2018). The cortical microtubule organization is also affected by the tyrosine phosphatase PROPYZAMIDE-HYPERSENSITIVE 1 (AT5G23720) involved in phosphorylation cascades that control the dynamics of cortical microtubules (https://www.arabidopsis.org/); the gene for its homolog, Lus10032545, was specifically upregulated in intrusively growing fibers (Fig. 8.6).

T. Gorshkova et al.

Altogether, cytoskeleton-related proteins belong to the most differentially expressed players in intrusively elongating fibers, emphasizing the importance of spatial organization dynamics in cellular processes.

8.2.6 Regulation by Hormones, Transcription Factors, Kinases, and Other Regulatory Proteins An important aspect of any process is its regulation, which may occur at various levels, including the interrelated effects of phytohormones, transcriptional factors, receptors, kinases, etc. Cell elongation is usually associated with auxin involvement; intrusive growth of phloem fibers, as the extreme version of cell elongation, can be expected to involve specific upregulation of auxinrelated genes. However, the list of such genes is quite limited. The most pronounced one is Lus10003598, homologous to AT5G54510, which encodes an IAA-amido synthetase designated as DFL1 (dwarf in light) (Fig. 8.7). The dfl1 mutant displays shorter hypocotyls in light-grown plants compared to the wild type. This enzyme conjugates auxin to amino acids, removing the excessive hormone amount (Ludwig-Müller 2011). Three genes, Lus10005090, Lus10034360, and Lus10000087, all homologous to AT3G02260 that encodes a callosin-like protein required for polar auxin transport, are upregulated in iFIB compared to the apical meristem, but their expression drops down at later stages of fiber development. However, the same level of their expression as in iFIB is detected in the neighboring cortical parenchyma (Fig. 8.7). The encoded integral membrane protein has a molecular weight of 560 kDa and is named BIG or TIR3 (TRANSPORT INHIBITOR RESPONSE 3). It is considered as a hub that connects nutrient, light, and hormone signaling networks (Zhang et al. 2019). A similar expression pattern is characteristic for four flax genes homologous to the same AT5G37020 that encodes auxin response factor 8 (Lus10007386, Lus10010969, Lus10020804, and Lus10031354)

8

Key Stages of Flax Bast Fiber Development Through …

161

Fig. 8.7 Expression of hormone-related genes upregulated in intrusively growing fibers (iFIBa and iFIBb). The genes are presented in the descendant order of the expression level in iFIBb

(Fig. 8.7). ARFs are transcriptional factors that bind specifically to the DNA sequence 5ʹTGTCTC-3ʹ found in the auxin-responsive promoter elements and play a central role in the realization of auxin effects by controlling gene expression (Li et al. 2016). Genes for several auxin transporters are considerably upregulated in iFIB compared to other tissues but they are not stage-specific since the abundance of corresponding mRNA stays at a similar level or even increases at the cell wall thickening stage. These include genes for the auxin transporter-like protein LAX3— Lus10028078 and Lus10025628, both homologous to AT1G77690 that performs auxin influx (Fig. 8.7). Besides, genes for membrane proteins related to the vacuolar auxin transporter important for the subcellular distribution of auxin (WALLS ARE THIN 1, WAT1) (Ranocha et al. 2013)—Lus10024424 (AT2G37460) and Lus10005965 (AT2G39510) (Fig. 8.7) have similar gene expression patterns. Two more genes, Lus10019969 (AT4G28040) and Lus10033510 (AT5G64700) for WAT1-related proteins, have iFIB-specific expression profiles (Fig. 8.7).

The genes for other hormone-related proteins that are quite specifically activated in intrusively growing fibers (Fig. 8.7) include Lus10034775 (AT1G31770) for ABC transporter G family member 14 (ABCG14). This transporter acts as an efflux pump involved in long-distance cytokinin transport (Borghi et al. 2015). A similar expression profile is revealed by FIBexDB for Lus10037191 homologous to AT1G74520, which is the abscisic-acid-induced protein HVA22 that was first described in barley (Hordeum vulgare) (Fig. 8.7). HVA22 is an ER- and Golgi-localized protein involved in vesicular traffic in stressed cells. Additionally, there are Lus10016966, Lus10021291, and Lus10028849, all homologous to AT3G52490 for the strigolactone pathway regulator SUPPRESSOR OF MAX2 1-LIKE3 (Stanga et al. 2013). (Fig. 8.7). Strigolactones were recognized as endogenous plant signaling compounds rather recently (Umehara et al. 2008). Their role in intrusively growing fibers is totally unclear. The renowned regulators of cell development are transcriptional factors (TFs)—DNA-binding proteins activating and/or repressing

162

T. Gorshkova et al.

Fig. 8.8 Expression of genes for transcription factors upregulated in intrusively growing fibers (iFIBa and iFIBb). The genes are presented in the descendant order of the expression level in iFIBb

transcription. The database for plant transcriptional factors, PlantTFDB (http://planttfdb.cbi. pku.edu.cn/), indicates that the flax genome contains 2481 genes for TFs classified into 57 families (Gorshkova et al. 2018a). Upregulation in iFIB samples is detected for genes encoding transcription factors from the AP2/ERF, AS2, bHLH, bZIP, DOF, GARP, GRAS, HD, HSF, MYB, WRKY, and TCP families (Fig. 8.8). The transcription factor WRINKLED 1 from the AP2/ERF family (Lus10008939, AT3G54320, WRI1) is involved in the activation of a subset of sugar-responsive genes. TCP transcription factors constitute a small family of plant-specific bHLH-containing, DNA-binding proteins. TCP14, whose homologs (Lus10008621 and Lus10042195) have upregulated expression in iFIB, is required for elongation and gene expression responses to auxin (Ferrero et al. 2021). The highly expressed genes Lus10030938 and Lus10016566 encode members of the plantspecific DOF family, DOF3, that are involved in seed germination and photomorphogenesis (Ruta et al. 2020). Of special relevance is the upregulation of Lus10000719 homologous to

AT5G60690 that encodes INTERFASCICULAR FIBERLESS 1, also named as REVOLUTA—a homeodomain-leucine zipper protein regulating interfascicular fiber differentiation in Arabidopsis (Zhong and Ye 1999). Lus10019898 (AT5G56270) highly and specifically upregulated in intrusively growing fibers encodes WRKY2 from one of the largest families of transcriptional regulators in plants. Many other genes for transcription factors are revealed by FIBexDB (Fig. 8.8); however, up to now, none of the TFs has been experimentally proven to be involved in bast fiber intrusive elongation. The presented list gives good candidates to arrange such work. The set of genes, rather specifically upregulated during bast fiber intrusive growth, encodes kinases that are involved in regulation and signaling (Fig. 8.9). Among them, there are three flax genes, all homologous to AT5G65700, leucine-rich repeat receptor-like kinase BAM1 (BARELY ANY MERISTEM 1). Plasma-membrane located BAM receptors interact with CLAVATA signaling peptides and regulate stem cell specification (Deyoung and Clark 2008). Besides, a nuclear-

8

Key Stages of Flax Bast Fiber Development Through …

163

Fig. 8.9 Expression of genes for kinases upregulated in intrusively growing fibers (iFIBa and iFIBb). The genes are presented in the descendant order of the expression level in iFIBb

located cyclin-dependent kinase is activated (Lus10005228, homologous to AT5G39420). Such kinases require binding to a cyclin protein for activity and are mainly involved in cell cycle regulation (Joubès et al. 2000). A specific transcriptional regulation of selected genes occurs with the involvement of monomeric nuclear-located ACTIN-RELATED PROTEIN 6 (ARP6). The corresponding gene, Lus10006654 (AT3G33520), is upregulated in iFIB (Fig. 8.6). ARP6 is a crucial non-catalytic component of the chromatin remodeling complex SWR1. It mediates the ATP-dependent swapping of histone H2A for the H2A variant H2A.F/Z leading to the shift of nucleosomal status and fine-tuning gene expression. (Kapoor and Shen 2014; Aslam et al. 2019). This mechanism performs various decisive functions in cellular processes.

8.2.7 Other Genes Specifically Upregulated in Intrusively Growing Fibers Analysis by the FIBexDB DEG Finder reveals Lus10013445, Lus10040999, and Lus10010167 (all homologous to AT5G08350 encoding GRAM domain-containing protein) among the most specifically expressed in intrusively growing fibers (Fig. 8.10). Information about the function of such proteins in plants is quite limited. The GRAM domain is listed among the

lipid-binding domains that attract the proteins to membranes (de Jong and Munnik 2021). In human cells, GRAM domain-containing proteins are located at sites of endoplasmic reticulumplasma membrane contact sites that are considered as crucial regulatory hubs (Besprozvannaya et al. 2018). Another probable lipid-related protein that seems to be quite important for fiber intrusive elongation is GDSL-like lipase/esterase; the corresponding gene, Lus10032316 (AT1G29660), has a well-pronounced peak in the expression profile in iFIB samples (Fig. 8.10). However, proteins with such a domain are quite numerous and have broad substrate specificity (Su et al. 2020), making it non-reliable to predict the function based on homology. Rather unexpectedly, genes for cytidine triphosphate (CTP) synthase (Lus10038403, Lus10001218, both AT1G30820) have quite specific expression in intrusively growing fibers (Fig. 8.10). CTP is the building block for nucleic acids; it can also act as a cofactor during various biochemical processes like phospholipid biosynthesis or coenzyme A production. Some CTP synthases are able to form filamentous structures while being inactive (Daumann et al. 2018); however, the closest homolog of the upregulated flax genes, AT1G30820, encodes CTPS1, which is unable to make such structures and is present in soluble form. There are many poorly characterized proteins whose genes are highly activated in iFIB samples. Many of them are listed as unknown proteins (Fig. 8.10). Some of them, such as Lus10039390,

164

T. Gorshkova et al.

Fig. 8.10 Expression of genes specifically upregulated in phloem fibers during intrusive growth encoding a miscellaneous category of proteins (iFIBa and iFIBb). The

genes are presented in the descendant order of the expression level in iFIBb

Lus10009544, and Lus10006631, even do not have the homologs in Arabidopsis (but have the homologs in poplar). The others are named with very general terms like calmodulin-binding proteins (Lus10039544 and Lus10024175) or phosphatase/nucleotidase (Lus10008390) (Fig. 8.10). Such annotations barely help to understand the function because proteins with such names are quite numerous and diverse in nature (e.g., Zeng et al. 2015). The situation is similar with thaumatin-related proteins. Two flax genes, Lus10017266 (Tau > 0.8) and Lus10013561 homologous to two different genes of Arabidopsis (AT4G24180 and AT4G38660, respectively), are annotated as thaumatin-related proteins (Tau 0.8, Fig. 8.10). Such proteins are widespread and vary in function. They are widely used as natural sweeteners; however, knowledge on their role in plants is quite limited (de JesúsPires et al. 2020). In summary, transcriptomic profiling and the subsequent analysis using FIBexDB help to fill

the knowledge gaps in the molecular mechanisms that govern fiber intrusive elongation, the peculiar type of plant cell growth. Through this approach, the involved molecular players and the major regulatory elements would get much more attention. However, it has to be substantiated by further studies, especially for many genes specifically upregulated but annotated as genes encoding unknown proteins.

8.3

Tertiary Cell Wall Formation

After flax fibers reach their final length during intrusive growth, they cease their elongation and start cell wall thickening. This transition from one stage to another is marked in the stem as a “snap point.” It is usually located 5–7 cm from the stem tip. The stem can be easily broken above the snap point, but breaking it below the snap point requires much more force (Gorshkova et al. 2003). Close to the snap point region, fibers

8

Key Stages of Flax Bast Fiber Development Through …

deposit a thin layer of the secondary cell wall (SCW) that contains xylan as the main hemicellulose (Gorshkova et al. 2010), and then, a tertiary cell wall (TCW, gelatinous cell wall, G-layer) is formed (Gorshkova et al. 2018a, b). This type of cell wall is not unique to flax fibers and is inherent in the phloem fibers of crop cultures (hemp, ramie, jute) and can also be induced in xylem fibers in the course of graviresponse (poplar, eucalyptus, birch, flax). For a long time, tertiary layers (G-layers) were considered one of the secondary cell wall layers, and even now, the term “tertiary” is under debate (Clair et al. 2018). Since this layer is deposited specifically in plant fibers and is drastically distinguished in structure and composition from primary and secondary ones, it could reasonably be considered a tertiary cell wall. The tertiary cell wall is highly cellulosic (up to 85%), has an axial orientation of all microfibrils, and increased crystallinity and crystallite sizes. It is almost or completely devoid of xylan and lignin (Mellerowicz et al. 2001; Pilate et al. 2004; Gorshkova et al. 2010, 2012) and has high water content (Schreiber et al. 2010). The low content

165

of xylan and lignin in the total cell wall of flax phloem fibers has been confirmed biochemically (Rihouey et al. 2017), and specifically in tertiary cell wall—immunohistochemically (Gorshkova et al. 2010) and through gene expression analysis (Gorshkov et al. 2017). The set of genes encoding the enzymes of xylan and lignin biosynthesis is poorly expressed in fibers depositing a tertiary cell wall; this is well illustrated in comparison with xylem tissues (Fig. 8.11). Rhamnogalacturonan I (RG-I) with attached b-(1 ! 4)-galactan side chains is the dominant matrix polymer in the tertiary cell wall (Gorshkova et al. 2004; Gorshkova and Morvan 2006; Gurjanov et al. 2007, Rihouey et al. 2018). Accordingly, two main processes associated with the tertiary cell wall formation determine the peculiarities of this cell wall type and are reflected in the transcriptome profile of fibers: biosynthesis of cellulose and RG-I. The large scale of cellulose biosynthesis, namely the whole cell metabolism, is targeted toward the formation of this polymer as indicated by the label incorporation from exogenous 14CO2 (Gorshkov et al. 2017), and the induction of RG-I biosynthesis at

Fig. 8.11 Heatmaps and clustering of flax genes encoding enzymes for xylan and lignin biosynthesis (built by FIBexDB)

166

the onset of the tertiary cell wall deposition (Gorshkova et al. 2004) together with a possibility to isolate the fibers at this stage of in planta development (Mokshina et al. 2014) would permit to make flax phloem fibers a unique model to study the biosynthesis of these two important polymers (Gorshkov et al. 2017; Petrova et al. 2021a). Along with RG-I in the phloem fiber cell wall, the presence of glucomannan was reported, as well as xyloglucan and xylan as the minor components (Rihouey et al. 2017); xylan is the component of the thin secondary cell wall, which was revealed in flax fibers by immunocytochemistry (Gorshkova et al. 2010).

8.3.1 Expression of CESA Genes and Genes for Putative Cofactors in Fibers Forming the Tertiary Cell Wall The major players in cellulose biosynthesis are catalytic subunits of cellulose synthase complexes named CESA; in Arabidopsis (L.), there are 10 CELLULOSE SYNTHASES, designated AtCESA1–10 (Richmond and Somerville 2000). It has been demonstrated by many research groups that in various plant species the expression of individual members of the CESA family differs in cells depositing primary and secondary cell walls. CESA1, CESA3, and CESA6 (and 6like) produce cellulose for PCWs, which are found in all plant cell types and are deposited throughout the cell division and cell growth stages (Desprez et al. 2007; Persson et al. 2007). Many specialized cells, such as vessels or fibers, have been shown to require CESA4, CESA7, and CESA8 for cellulose biosynthesis during the deposition of SCWs, which are formed when cell growth stops (Taylor et al. 2003; Atanassov et al. 2009). Flax CESA genes were annotated, and their expression in different flax tissues was measured using qPCR (Mokshina et al. 2014). Secondary cell wall (SCW)-related LusCESA genes were upregulated in isolated fibers forming TCW compared to fibers depositing PCW during

T. Gorshkova et al.

intrusive growth. When genome-wide transcriptomic data became available (Gorshkov et al. 2017, 2018), a more detailed analysis of the expression of LusCESA genes at different stages of fiber development was performed (Mokshina et al. 2017). It was confirmed that in fibers forming the TCW, indeed, SCW-related LusCESA genes were upregulated compared to tissues forming the PCW, although the activation of SCW-related LusCESAs in xylem tissue enriched in SCW was more pronounced. The known cofactors of SCW-related CESAs, such as LusCTL2 (Lus10016872, Lus10037737, homologous to AT3G16920) or LusCOBL4 (Lus10017863, Lus10034670, homologous to AT5G15630), showed similar expression patterns with SCW LusCESAs. We found that the primary cell wall (PCW)-related LusCESAs continued their expression in fibers forming the tertiary cell wall even after activation of SCW-related LusCESAs (Mokshina et al. 2017, Fig. 8.12). The expression of both PCW- and SCWrelated LusCESA genes is quite pronounced in fibers of all examined flax genotypes (Fig. 8.12), though the ratio in the level of expression between these two groups varies in different samples: in the linseed variety, the expression of PCW-related LusCESA genes on average exceeds the expression of SCW-related LusCESA genes, whereas for fiber flax varieties (tFIBa, tFIB_Gr) and Linum bienne (tFIB_Bi), this ratio is similar, although the level of expression is different (Fig. 8.12). Among PCW-related LusCESA genes, LusCESA6-F has a low expression level (dozens and hundreds of times less than other PCWrelated LusCESAs). LusCESA3-A also has lower levels of expression in fibers during both stages of development, although not as dramatically as LusCESA6-F, whose expression even slightly increases in tFIBa and tFIB_Li samples. Interestingly, all PCW-related LusCESA genes increase their level of expression in tFIBa samples compared to iFIB samples (up to 2.4-fold), except for LusCESA6-A and LusCESA6-B (Fig. 8.12). Both PCW- and SCW-related CESAs get within the joint LusCESA coexpression group in fibers depositing tertiary cell walls as distinct from

8

Key Stages of Flax Bast Fiber Development Through …

167

Fig. 8.12 Expression of LusCESA genes in fibers at different stages of development. iFIBa and iFIBb— intrusively growing phloem fibers with PCW (Gorshkova et al. 2018a; Gorshkov et al. 2019); tFIBa, tFIB_Gr, tFIB_Li, and tFIB_Bi—isolated phloem fibers with TCW (different cultivars and species, Gorshkov et al. 2019;

Mokshina et al. 2020; Galinousky et al. 2020). The detailed sample description is presented in FIBexDB and in the supplementary file of the related paper (Mokshina et al. 2021). PCW—the primary cell wall, and TCW—the tertiary cell wall

other tissues, where these genes are divided into separate coexpression groups (Mokshina et al. 2017). As a result, we propose that both PCW- and SCW-related CESAs may be involved in the deposition of highly cellulosic TCW in flax phloem fibers. This conclusion is based only on gene expression data and needs to be confirmed experimentally. The consequences of such joint expression on the structure of CESA-complexes, the arrangement and regulation of cellulose biosynthesis, and the parameters of formed microfibrils have yet to be determined. Cellulose is synthesized in all plant cells, and there is no CESA isoform specifically expressed in a definite cell type. The profiles of CESA genes are not quite appropriate for building coexpression networks in order to figure out fiber-specific players. However, there are a number of genes specifically upregulated in fibers forming TCW that are potentially associated with cellulose biosynthesis. The “markers” of TCW could be FASCICLINARABINOGALACTAN PROTEINS (FLA), since their upregulation accompanies TCW formation in fibers of different origin in different plant species (Lafarguette et al. 2004;

Andersson-Gunnerås et al. 2006; Roach and Deyholos 2008; Guerriero et al. 2017). Moreover, fiber-specific promoter activity was confirmed for the LusFLA gene, Lus10002985 (Hobson and Deyholos 2013). We checked the expression of FLA genes in various flax tissues and used those highly upregulated in bast fibers to build coexpression networks that allow us to suggest the players important for TCW formation. There are 59 FLA genes (PF02469) in the flax genome (Phytozome); 43 of them are expressed in the stem tissues (TGR  16 at least in one analyzed tissue, based on FIBexDB), and 9 of them are specifically upregulated in fibers with TCW (Fig. 8.13a–c). Five genes for FLAs show strong specificity of expression in fibers forming TCW: Lus10002984, Lus10036112, Lus10036113, Lus10036114 (orthologs of the AT5G03170 gene, encoding AtFLA11), and Lus10002985 (ortholog of the AT5G60490 gene, encoding AtFLA12) (Fig. 8.13b). These genes were named LuFLA01, LuFLA35, LuFLA36, LuFLA37, and LuFLA02, respectively (Hobson 2013). Upon averaging the expression of these genes in tissues with PCW, TCW, and SCW, their upregulation

168

T. Gorshkova et al.

Fig. 8.13 The analysis of expression of LusFLA, LusCOBL4, and LusXTH genes in flax stem tissues with distinct cell wall types performed by FIBexDB options. a The clustering of LusFLAs. b The expression of LusFLA genes upregulated in tFIB samples. c The clustering of genes that were found in the list of coexpressed genes of all five LuFLA genes, specifically upregulated in tFIB samples (marked by a red background). d The cluster analysis of LusCOBL4 gene expression in different flax

tissues. e The cluster analysis of LusXTH (GH16) genes in different flax tissues. The red dot shows LusFLA (Lus10002985), the fiber-specific promoter activity of which was confirmed experimentally (Hobson and Deyholos 2013), and the blue dots mark the genes that are coexpressed with all five LusFLA genes. PCW—the primary cell wall, SCW—the secondary cell wall, and TCW—the tertiary cell wall

in fibers (TCW) varied from 2500 to 15,300 times compared with PCW and from 10 to 30 times compared with SCW. FLAs are highly glycosylated proteins found on the cell surface; the glycosylphosphatidylinositol anchor can temporarily bind them to the plasma membrane. FLA has been linked to a variety of functions, including signaling and cellulose biosynthesis (Seifert 2018). In Arabidopsis, a correlation has been found between the quantity of AtFLA11 and AtFLA12 transcripts and the initiation of cellulose synthesis during secondary-wall deposition (Brown et al. 2005; Persson et al. 2005). Atfla11/12 stems have lower tensile strength and stiffness, as well

as lower cellulose, arabinose, and galactose content, more lignin, and a higher cellulose microfibril angle in their cell walls (MacMillan et al. 2010). The role of GPI-anchored fasciclinlike arabinogalactan protein FLA4, known as SALT OVERLY SENSITIVE 5 (FLA4/SOS5), in the biosynthesis of cellulose in the seed mucilage along with CESA5 and leucine-rich repeat receptor-like kinase FEI2 (named for the Chinese word for fat) was demonstrated (HarpazSaad et al. 2012). According to Seifert (2021), FLA4-FEIs form the core components of a signaling module involved in the primary cell wall integrity maintenance in elongating roots and seed coat mucilage. Furthermore, a relationship

8

Key Stages of Flax Bast Fiber Development Through …

between the FLA4-FEI pathway and pectin rather than cellulose biosynthesis was suggested. LusFLA genes specifically upregulated in fibers depositing TCW may be involved in tertiary cell wall formation, directly or indirectly affecting cellulose biosynthesis. Based on the hypothesis of Seifert (2021) about the involvement of FLA in pectin biosynthesis, we can also suggest that LusFLAs affect the arrangement of rhamnogalacturonan I (RG-I). Coexpression analysis of LusFLAs allowed us to suggest genes that are specifically upregulated in fibers with TCW, and their products may play an essential role in fiber biogenesis and cell wall formation. The expression of most of these genes is decreased during fiber maturation (from tFIBa to tFIBb, Fig. 8.13c). It should be noted that all these genes had a lower level of expression in the isolated fibers of the oilseed cultivar Lirina than in fiber flax cultivars and wild flax (Fig. 8.13c). The flax genes Lus10031972 and Lus10035131 (orthologs of AT5G15630 encoding COBL4) are present in the list of coexpressed genes with all five LusFLA genes specifically upregulated in fibers (LuFLA01, 02, 35–37). In Arabidopsis, AtCOBL4 is highly co-regulated with the CESA genes, most of which are involved in secondary cell wall formation (Persson et al. 2005). In the flax genome, 7 genes have high homology to AtCOBL4; 5 of them are expressed in stem tissues (Fig. 8.13d). Interestingly, two paralogs for LusCOBL4 (Lus10017863 and Lus10034670) are activated in the xylem tissue, which is enriched in SCW, and two other genes for COBL4 (Lus10031972 and Lus10035131) are especially activated in isolated fibers forming TCW (Fig. 8.13d). Lus10015026 and Lus10038902 genes for TRICHOME BIREFRINGENCE-LIKE 17 (TBL17, AT5G51640) and Lus10019774 (TBL38, AT1G29050) are also coexpressed with all five LuFLA genes. Many of the Arabidopsis TBLs are acetyltransferases mediating the regiospecific Oacetylation of various cell wall polymers like xylan (TBL3, as well as TBL28-35, Zhong et al. 2017), xyloglucan (TBL22 and TBL27), mannan (TBL25 and TBL26, Gille et al. 2011), and RG-I (TBL10, Stranne et al. 2018). Based on the

169

expression data of genes for TBLs, we can assume diversification of function for flax TBLs in different tissues, forming different types of cell walls. Similar diversification is assumed for xyloglucan endotransglucosylases/hydrolases and expansins. The Lus10037377 gene encoding xyloglucan endotransglycosylase XTH32 (AT2G36870) and the Lus10009917 gene for EXPANSIN A8 (AT2G40610) are upregulated in fibers with TCW and are found in coexpression networks of all five LusFLA genes. Notably, specific upregulation of definite LusXTH in fibers occurs during TCW deposition rather than during intrusive elongation, when the extension of PCW takes place. Furthermore, xyloglucan and XTH activity were found in poplar (AnderssonGunnerås et al. 2006; Nishikubo et al. 2007) and eucalyptus (Paux et al. 2005; Goulao et al. 2011) during early tension wood production, which involves TCW deposition in fibers. In tension wood, XTH activity was demonstrated between the S- and G-layers during the development of the tertiary cell wall (Mellerowicz et al. 2008). It is thought that XTH alters xyloglucan, which acts as a staple for S- and G-layers (Hayashi et al. 2010). Recently, the presence of galactosylated xyloglucan in the TCW of flax phloem fibers was demonstrated using immunocytochemistry (Ibragimova et al. 2020). Several kinases are present in the coexpression network of LusFLA35 (the gene with the highest level of expression in fibers forming TCW) (Fig. 8.14): Lus10006067 (leucine-rich repeat transmembrane protein kinase, AT1G34420), Lus10022662 (lectin receptor kinase a4.1, AT5G01550), and Lus10031350 (receptor-like kinase 1, AT1G48480). Whether there are receptor-like kinases that form a large receptor complex together with FLAs that is involved in TCW formation is the subject of further research. As described above, the machinery of TCW formation seems to partially include elements of secondary and primary cell walls: SCW and PCW-related CESAs, COBL4, FLA11, FLA12, TBLs, XTH32, and EXPA8. We speculate that, in the course of evolution, specific TCW

170

T. Gorshkova et al.

Fig. 8.14 The coexpression network for LusFLA35 (Lus10036112), which includes several kinase genes. The picture was taken from FIBexDB; then, symbols for some coexpressed genes were given

isoforms of some listed genes have appeared. The coexpression network of FLA and COBL genes includes genes encoding enzymes related to RG-I biosynthesis.

8.3.2 Genes Encoding Proteins Associated with Rhamnogalacturonan I Metabolism The second most abundant component in the tertiary cell wall after cellulose is RG-I (Mikshina et al. 2013; Rihouey et al. 2017). It is a polysaccharide, the backbone of which comprises repeated dimers of rhamnose and galacturonic acid (McNeil et al. 1980; Lau et al. 1985). In most cases, RG-I does not exist as a simple linear chain: rhamnosyl residues are substituted with arabinan or galactan side chains (Ridley et al. 2001; Vincken et al. 2003). Namely, the presence of RG-I with side galactan chains was demonstrated for TCW of xylem and phloem fibers in different plant species, including hemp, flax, and poplar (Gorshkova et al. 2018b). To consider RGI metabolism from a molecular-genetic point of view, we distinguish two processes: the biosynthesis of RG-I and its modifications.

For a long time, the enzymes involved in the assembly of rhamnogalacturonan I (RG-I) were unknown. The RRT1 (RG-I RHAMNOSYLTRANSFERASE 1) genes were recently identified as glycosyltransferases adding rhamnose residues to the backbone of rhamnogalacturonan I (Takenaka et al. 2018). Then, rhamnosyltransferase activity was proposed for 10 genes in the RRT clade of Arabidopsis (Wachananawat et al. 2020). In flax, 25 genes were identified in the corresponding RRT clade (Petrova et al. 2021a). According to transcriptome data, the LusRRT1a (Lus10007975) and LusRRT1b (Lus10013503) genes are expressed in phloem fibers, forming the TCW in a tissue- and stage-specific manner. These LusRRT1 genes were coexpressed with LusRRT10 genes (Lus10027753 and Lus10035540) and fell into one cluster (Fig. 8.15). Other genes involved in the biosynthesis of the RG-I backbone remain unknown. The coexpression network of LusRRT1a contains the genes for glycosyltransferases of different families, for example GT31 and GT90 (Fig. 8.16). Lus10038387 and Lus10036247, both orthologous to AT5G44670, encode GALACTAN SYNTHASE 2 (GALS2) from GT92 (Fig. 8.16); we can assume that LusGALS2s are involved in galactan side chain biosynthesis. Additionally, the

8

Key Stages of Flax Bast Fiber Development Through …

171

Fig. 8.15 Expression of LusRRT1 and LusRRT10 genes in different flax tissues. PCW—the primary cell wall, SCW—the secondary cell wall, and TCW—the tertiary cell wall. The heatmap demonstrates the expression of genes encoding LusRRTs from the flax RRT clade and its

clustering (built using FIBexDB). The black frame marks a cluster of LusRRT genes specifically upregulated in fibers with TCW. Right—part of the phylogenetic tree of GT106 protein family members (Petrova et al. 2021a)

gene for LusMUCI70 (Lus10011175, AT1G28240) is present in the coexpression network (Fig. 8.16), indicating that the encoded enzyme is probably involved in RG-I biosynthesis in flax phloem fiber. In Arabidopsis, MUCILAGE-RELATED 70 (MUCI70), along with a protein with putative galacturonosyltransferase activity 11, is required for the formation of seed mucilage RG-I (Voiniciuc et al. 2018). RG-I with long galactan chains is interspersed between cellulose microfibrils in the immature tertiary cell wall layer, limiting their close interaction and preserving the loosely packed structure characteristic of the Gn-layer (Roach et al. 2011). During tertiary wall maturation, the galactan chains are partly degraded by b-galactosidase, producing free galactose. The removal of side galactan chains allows cellulose microfibrils to form the mature G-layer with tightly packed cellulose microfibrils. The typical transition of layers fails in LuBGAL-RNAi transgenic flax lines with decreased b-galactosidase activity. As a result, the mechanical properties of the stems in the transgenic plants become weaker as compared to the control plants

(Roach et al. 2011). Two paralogous genes (Lus10008974 and Lus10028848) encoding LusBGAL12 were identified in the flax genome, and expression of these genes was suppressed in flax LuBGAL-RNAi transgenic lines (Roach et al. 2011). Promoter activity of the Lus10008974 gene in flax phloem fibers in the course of TCW development was confirmed using a b-glucuronidase reporter construct (Hobson and Deyholos 2013). Among all members of GH35, both of these LusBGAL genes showed strong specificity of expression (Fig. 8.17). Of note, the presence of active bgalactosidase in the tertiary cell wall during its deposition was demonstrated for xylem fibers of tension wood as well (Gorshkova et al. 2015). Another group of enzymes potentially involved in RG-modification is the RG-I lyases. If galactosidase activity in flax phloem fibers and xylem fibers of tension wood is easily measured biochemically using chromogenic and fluorogenic substrates (Mokshina et al. 2012; Gorshkova et al. 2015), confirmation of RG-I lyase activity in plants has not been performed yet. Nevertheless, there are genes encoding RGL (RHAMNOGALACTURONAN

172

T. Gorshkova et al.

Fig. 8.16 Coexpression network for the LusRRT1a gene. The picture was taken from FIBexDB; then, symbols for some coexpressed genes were given

LYASES), and their differential expression has been revealed in flax stem tissues (Mokshina et al. 2019). Two genes, Lus10004281 and Lus10019231 (named RGL6A and RGL6B, respectively; both are homologs to AT2G22620), are specifically upregulated in fibers forming TCW (Fig. 8.17, Mokshina et al. 2019). The geometry of the LusRGL6-A catalytic site allows binding to the RG-I ligand, according to homology modeling and docking simulation data (Mokshina et al. 2019). Interestingly, genes for RG-I biosynthesis and modification are found in one coexpression network (Fig. 8.17).

8.3.3 Other Genes Specifically Upregulated in Fibers with TCW Genes for the proteins that are of special importance during tertiary cell wall deposition can be revealed by several means, including calculation of the Tau score (Fig. 8.2), analysis of the coexpression network of genes that have already been demonstrated to be specifically expressed in tFIB samples, like FLA genes

(Fig. 8.13), and by using the “DEG Finder” option in FIBexDB (Fig. 8.18). In the latter case, we applied rather strict criteria: (tFIBa/ iFIBa > 10 and target exp > 10), and (tFIBa/ iFIBb > 10 and target exp > 10), and (sXYLa/ tFIBa < 0.1 and control exp > 10), and (sXYLb/tFIBa < 0.1 and control exp > 10). The reason for such rigorous selection is a large number of genes upregulated at TCW deposition due to the large scale of the process and its unlikeness to cell wall formation in any other tissue. Even with such strict criteria, 496 flax genes corresponding to 346 Arabidopsis genes were found. Clustering analysis by an option in FIBexDB of these gene expression levels for a wide panel of samples, including roots, leaves, and hypocotyls helps to reveal and to exclude the clusters of genes upregulated, besides stem tissues, in root samples and hypocotyls (Fig. 8.18, marked by black rectangles). A total of 359 genes are listed as specifically upregulated in fibers with TCW. Based on gene description and annotation, most of the gene products in this list are classified into functional categories such as cell wall-related, transporters, signaling, kinases, and others (Fig. 8.18).

8

Key Stages of Flax Bast Fiber Development Through …

173

Fig. 8.17 Expression of genes for enzymes involved in RG-I metabolism. a The expression of BGAL genes (GH35) in different flax tissues. b The expression of RGL genes (PL4) in different flax tissues. c The expression of genes for LusRRTs and LusGALSs (RG-I biosynthesis),

LusBGALs and LusRGLs (RG-I modification) in different flax tissues. d Coexpression network for LusBGAL12a visualized using FIBexDB. Arrows show genes with the highest level of expression

The top of the list of cell wall-related gene products includes the genes already mentioned above, like LusBGAL12a, LusRGL6a, LuFLAs, and LusXTH. There are also several genes encoding cellulose synthase-like proteins from G family (Lus10003196, Lus10023056, and Lus10023057, all homologous to AT4G23990) and glycosyltransferases of family 47 (exostosin GT47, Lus10013790, AT5G03795) (Fig. 8.19), as reported previously (Gorshkov et al. 2017). The precise functions of these glycosyltransferases remain unknown. Some enzymes of GT47 take part in xyloglucan and glucuronoarabinoxylan biosynthesis (Madson et al. 2003; Zhong et al. 2005). It was also speculated that exostosin encoded by Lus10013790 may be somehow involved in the formation of the RG-I backbone (Gorshkov et al. 2017).

Among other genes from this category, those encoding chitinase-like proteins are the most noticeable. The flax genome has 13 genes that are orthologs of AT3G54420 encoding CHITINASE CLASS IV, and three of them, Lus10010864, Lus10010866, and Lus10024366, are specifically upregulated in fibers in the course of TCW formation (Fig. 8.20a). These genes were annotated as genes for chitinase-like proteins, CTL19, 20, and 21, and their fiber-specific expression was confirmed by qPCR (Mokshina et al. 2014). The following genes have a high coefficient of coexpression with all the mentioned LusCTL genes (Fig. 8.20b): genes encoding matrixins (Lus10035221 and Lus10005605, both are homologs of AT1G24140), members of the AP2/ERF transcription factor family (Lus10004369 (AT5G47230) and Lus10041052 (AT1G33760)),

174

T. Gorshkova et al.

Fig. 8.18 Selection of genes upregulated in fibers with the tertiary cell wall. A heatmap was generated by FIBexDB (tFIBa/iFIBa > 10, and tFIBa/iFIBb > 10, and

sXYLa/tFIBa < 0.1); then, some clusters were removed. Most of the gene products were categorized into several functional groups (right)

and SENESCENCE-ASSOCIATED GENES SAG12 (Lus10025410 and Lus10015286, both are homologs of AT5G45890). Matrixins (Matrix metalloproteinases, MMP) are secreted, conserved proteolytic enzymes whose exact functions are unknown. A link between plasma membranelocalized MMP activity with cellulose and callose deposition and plant growth and organ development was demonstrated in transgenic tobacco plants expressing OsMMP1 (Das et al. 2018). The Lus10024366 is coexpressed with Lus10029946 encoding FERONIA (Malectin/receptor-like protein kinase) and with Lus10036697 encoding BRASSINOSTEROID INSENSITIVE 1associated receptor kinase 1 (AT5G48380). There are several genes encoding members of the gibberellin, auxin, and cytokinin pathways among the genes specifically upregulated in fibers in the course of the TCW formation. Lus10006759 (homolog of AT4G21200, ATGA2OX8, gibberellin 2-oxidase 8), and Lus10036539 (homolog of AT5G51810, GA20OX2, GIBBERELLIN 20 OXIDASE 2) encode proteins associated with gibberellin biosynthesis (Fig. 8.19). Lus10036539, which has stable expression in tFIB samples, is coexpressed with genes encoding acyl hydrolases involved in senescence (Lus10032169 and Lus10014518, both homologs of AT5G14930, encoding SAG101) and ISOPENTENYLTRANSFERASE 5, which is

involved in cytokinin synthesis (Lus10034025 and Lus10004428, both homologs of AT5G19040). Lus10025166 gene, encoding an auxin efflux carrier (homolog of AT1G76520 for PIN-LIKES 3) and Lus10033348 gene encoding a SAURlike auxin-responsive protein (homolog of AT2G24400), are found in the upregulated auxinrelated genes, but the level of their expression is relatively low (Fig. 8.19). Lus10033348 is specifically upregulated in bast fibers in the course of graviresponse (Gorshkov et al. 2018), while Lus10025166 is activated in root samples under biotic and abiotic treatments. Interestingly, genes for TBL38 (Lus10019774, mentioned above) and for lectins belonging to the Nictaba family (Lus10018304 and Lus10040602, both homologs of AT4G19840) are found in this coexpression network of gibberellin-associated, senescence-associated, and cytokinin-related genes. Both genes for Nictaba domain lectins that recognize high-mannose N-glycans, complex N-glycans, and GlcNAc oligomers (Tsaneva and Van Damme 2020) were reported as genes specifically upregulated in fibers with TCW (Petrova et al. 2021b). The Lus10024290 gene for mannose-binding lectin of the Jacalin family (AT1G19715), as well as the Lus10009582 gene for GNA lectin (CES101, AT3G16030), and the Lus10003099 gene for curculin-like (mannosebinding) lectin of the GNA family (AT5G18470)

8

Key Stages of Flax Bast Fiber Development Through …

Fig. 8.19 Expression profiles of genes specifically upregulated in fibers forming the tertiary cell wall and encoding cell wall-related (except cell wall-related proteins mentioned in chapters above), hormone-related, and signaling (kinases)-associated products. The list of genes

175

was selected after clustering analysis (Fig. 8.18). Bold font marks genes with Tau > 0.8 (Mokshina et al. 2020). The genes in each category are presented in the descendant order of the expression level in tFIBa

176

T. Gorshkova et al.

Fig. 8.20 a Expression profiles of chitinase-like genes specifically upregulated in fibers forming the tertiary cell wall. b Genes that have a high coefficient of coexpression (r) with all three upregulated LusCTL genes (based on FIBexDB)

are found in the list of genes specifically upregulated in fibers during TCW formation (Petrova et al. 2021b; Fig. 8.19). The role of proteins with lectin domains in TCW formation needs to be elucidated. It should be noted that many genes for kinases (lectins, receptors) are specifically upregulated in fibers during tertiary cell wall formation (Fig. 8.19). For example, Lus10013384, homologous to the AT1G21270 gene encoding WALL-ASSOCIATED KINASE 2 (WAK2), is specifically upregulated in fibers with TCW (Fig. 8.19). WAK2 interacts with pectins and either directly or indirectly regulates the transcription and activity of vacuolar invertase (AtvacINV1) and, as a result, affects turgor regulation and cell expansion (Kohorn et al. 2006). Intensive biosynthesis of cell wall components in bast fibers requires the activation of membrane trafficking and associated lipid metabolism (Fig. 8.21). This may explain the activation of the Lus1001835 gene for CYP77B1 that is involved in Arabidopsis in poly-hydroxy fatty acid biosynthesis (Pineau et al. 2021). Moreover, the induction of genes for LTP3 (Lus10015279, Lus10026418), other lipid-transfer proteins (Lus10041197, Lus10002927), polyketide cyclase (Lus10006042), GDSL-like lipases (Lus10041878, Lus10028425, Lus10031520,

Lus10015162, and Lus10035585), and O-acyltransferase family proteins, e.g., wax ester synthase/diacylglycerol acyltransferase 1-like (WSD1-like, Lus10017718, Lus10039843), is found in tFIB samples (Gorshkov et al. 2017, Fig. 8.21). Some of them (at least CYP77B1, LTP3, and WSD1-like) may be associated with cuticle (wax) deposition as well, although mature fibers show low levels of wax (0–1.7%) (Yan et al. 2014). GDSL proteins can also act as acetylesterases: Rice GDSL lipase named Brittle Leaf Sheath 1 removes acetyl groups from the xylan backbone (Zhang et al. 2017). There is also a hypothesis that cell wall-localized LTPs play a role in cellulose biosynthesis (Doblin et al. 2002; Ambrose et al. 2013), but there is no direct evidence so far, and the role of many LTPs in the deposition of lipid-based surface barriers seems very likely (Edqvist et al. 2018). In earlier investigations of cells with extensive cellulose deposition, the abundance of transcripts encoding LTPs has been reported in flax fibers (Roach and Deyholos 2007; Fenart et al. 2010), hemp fibers (De Pauw et al. 2007), and cotton seed hairs (Orford and Timmis 2000). There are plenty of genes that show specific upregulation in fibers during TCW formation, but the exact function of them remains unknown

8

Key Stages of Flax Bast Fiber Development Through …

Fig. 8.21 Expression profiles of genes specifically upregulated in fibers forming the tertiary cell wall and encoding products associated with lipid metabolism, transport, and other different processes, including unknown proteins (miscellaneous). The list of genes

177

was selected from the result of the clustering analysis (Fig. 8.18). The genes in each category are presented in the descending order of expression level in tFIBa. For the category “miscellaneous,” the genes with TGR > 500 in tFIBa are given

178

T. Gorshkova et al.

Fig. 8.22 Expression of flax genes related to the cytoskeleton organization

(Fig. 8.21). Examples are SAG29 (Lus10003143, AT5G13170) and SAG12 (Lus10025410, AT5G45890), and the gene for BURP domaincontaining protein RD22 (Lus10022114, AT5G25610). AtRD22 (RESPONSIVE TO DESICCATION) is upregulated by drought stress, salinity stress, and exogenously supplied abscisic acid (Abe et al. 1997; Sanchez and Chua 2001), while in cotton, an RD22-like protein interacts with an a-expansin and the simultaneous overexpression of both proteins promotes growth and fruit weight (Xu et al. 2013). SAG29 and SAG12 are also known as members of the SWEET glucoside transporter family (Seo et al. 2011). In Arabidopsis, expression of SAG12 and SAG29 genes is regulated by jasmonic acid; bHLH subgroup IIId factors bind to the SAG29 promoter and suppress the gene expression (Qi et al. 2015). To reveal the genes for cytoskeleton-related products, the same list as for iFIB samples was employed (with additional clustering). Upregulation of several genes encoding proteins involved in cytoskeleton organization is observed in fibers with TCW: Lus10014268 and Lus10025969 for ATP binding microtubule motor family proteins (AT5G66310 and AT3G51150), Lus10039218 and Lus10027462 for myosin heavy chain-related proteins (AT5G52280 and AT1G63300), and Lus10029865 and Lus10020682 for kinesin-like and kinesin motor domain proteins (AT5G27000 and AT1G09170) (Fig. 8.22). The expression of specific members of the cytoskeletal network may be coupled to the axial orientation of all cellulose

microfibrils in the tertiary cell wall, since the direction of microfibril deposition is constrained by the orientation of cortical microtubules (Baskin 2001).

8.3.4 Transcription Factors Potentially Involved in TCW Formation The transcriptional network consisting of master transcriptional switches and their downstream transcription factors that are involved in the regulation of SCW biosynthesis in plant fibers is well established (Zhong et al. 2008). All main transcription factors known as regulators of SCW development are downregulated in fibers during TCW formation and are upregulated in xylem samples enriched with SCW (Gorshkov et al. 2017, Fig. 8.23). Master switches driving secondary cell wall biosynthesis include the NAC and MYB families of TFs: NST1 (Lus10002687 and Lus10017340), NST3/SND1 (Lus10001664 and Lus10008271), and MYB46 (Lus10002559 and Lus10039610) (Gorshkov et al. 2017, Fig. 8.23). To find the genes for transcription factors upregulated during tertiary cell wall formation, we used two main approaches. Several transcription factor genes are detected in the coexpression networks of five LusFLA genes; those present in at least two coexpression networks are shown in Table 8.1. Additionally, we extracted

8

Key Stages of Flax Bast Fiber Development Through …

179

Fig. 8.23 Expression of flax transcription factors in stem tissues. a Clustering of genes encoding TFs involved in SCW formation (list taken from Zhong et al. 2008), and genes for TFs from the coexpression network of LusFLA genes specifically upregulated in tFIB samples. b TF

genes specifically upregulated in fibers: FIBexDB search criteria were tFIBa > iFIBa, tFIBa > iFIBb, tFIBa > sXYLa, and tFIBa > sXYLb in 10 times, expression in tFIBa > 10 TGR

transcription factors from the list of genes specifically upregulated in fibers with TCW (Fig. 8.18). In fibers depositing tertiary cell wall, pronounced upregulation of genes for some members of the WRKY family is observed. Three of the four flax homologs of AtWRKY20 (AT4G26640) are upregulated in tFIB samples, and one of them (Lus10016595) is coexpressed with four LusFLA genes (Table 8.1). The WRKY20 gene was upregulated also in poplar during tension wood induction (AnderssonGunnerås et al. 2006). Overexpression of Glycine soja WRKY20 boosts both drought and salt tolerance in transgenic alfalfa (Medicago sativa L.) (Tang et al. 2014); expression of wild soybean WRKY20 in Arabidopsis promotes drought tolerance and modulates ABA signaling (Luo et al. 2013). Ethylene signaling seems to be important for tertiary cell wall formation: ACC and ethylene induce gelatinous layers (G-layers) in the xylem

fibers of hybrid aspen (Felten et al. 2018). Genes encoding members of the ethylene-responsive transcription factor family (AP2/ERF) coexpress with the LusFLAs (Lus10001898, Lus10002953, Table 8.1) and LusCTL (Lus10004369, Lus10041052) genes. Lus10002953 and Lus10041052 are homologs of AT1G33760 (ERF022). ERF022 belongs to the Group IIIa ERF genes (Nakano et al. 2006). For the Group IIId ERF genes (ERF034, 035, 038, 039), positive transcriptional regulation of PCW-type CESA genes was demonstrated (Saelim et al. 2018). Furthermore, AP2/ERFs from IIId and IIIe groups (ERF035, 036, 037, 038, and 040) may regulate primary cell wall formation including PCW-type CESA expression (Sakamoto et al. 2018). Lus10016790 and Lus10022485 genes, both homologous to AT4G36870 encoding BEL1LIKE HOMEODOMAIN PROTEIN 2 (BLH2), are upregulated in fibers with TCW and coexpressed with a number of genes specifically

180

T. Gorshkova et al.

Table 8.1 Genes encoding transcription factors that get into the coexpression network with TCW-specific LusFLA genes (based on FIBexDB) Arabidopsis ID

TF

Symbol

Flax ID

FLA

AT4G14540

CCAAT

NF-YB3

Lus10035854

02

35

01

36

37

+

+

+

+

+

AT4G36870

HD

BLH2, SAW1

Lus10016790

+

+

+

+

+

AT4G36870

HD

BLH2, SAW1

Lus10022485

+

+

+

+

+

AT1G29950

bHLH

AT4G26640

WRKY

WRKY20

Lus10004927

+

+

+

+

Lus10016595

+

+

+

+

+

+

+

+

AT3G01470

HD

HD-ZIP-1

Lus10036709

AT2G17040

NAC

NAC036

Lus10025118

+

+

+

+

AT4G00050

bHLH

bHLH016

Lus10028175

+

+

+

AT4G00050

bHLH

bHLH016

Lus10042875

+

+

+

BLH1, EDA29

Lus10021270

+

+

+

Lus10009884

+

+

+

AT2G35940

HD

AT3G11280

MYB

AT1G64380

AP2/ERF

Lus10001898

+

+

+

AT3G18990

B3

REM39, VRN1

Lus10008529

+

+

+

AT4G36710

GRAS

HAM4

Lus10041721

+

+

+

AT2G17040

NAC

NAC036

Lus10022636

+

+

+

RSM3, RL6

Lus10041752

AT3G16940

CAMTA

AT1G75250

MYB

AT1G33760

AP2/ERF

Lus10016873

expressed in fibers, including five LusFLA genes. It was shown that BLH2 and BLH4 regulate demethylesterification of homogalacturonan in seed mucilage by directly activating PECTIN METHYLESTERASE 58 (Xu et al. 2020). We can speculate that LusBLH2 may be involved in RG-I biosynthesis or modification. Also, all flax genes (Lus10004201, Lus10013579, Lus10021270, and Lus10029405) homologous to BLH1 (AT2G35940) are specifically upregulated in fibers with TCW (Table 8.1; Fig. 8.24). Flax genome has four genes homologous to AtMYB52 (AT1G17950), which belongs to the second layer of SCW regulators (Zhong et al. 2008) and is suggested as a suppressor of lignin production or the whole SCW deposition (Cassan-Wang et al. 2013). Interestingly, flax AtMYB52 homologs have different expression patterns: Lus10031326 and Lus10031900 are specifically expressed in xylem samples, while one of the remaining isoforms, Lus10028683, is expressed in phloem fiber samples at the same

Lus10002953

+ +

+ +

+

+

level as in xylem, and the other, Lus10029746, has a higher expression level in phloem fibers (Fig. 8.24). So, specific isoforms of LusMYB52 may be involved in the repression of lignin formation in TCW, which is devoid of lignin, while the other isoforms may help to fine-tune deposition of lignin-enriched SCW being a part of the regulatory network, for which negative regulation is required as well. The function of most TFs mentioned above needs to be experimentally confirmed.

8.4

Pipeline Through Transcriptomics to Get Molecular Keys for Targeted Fiber Crop Improvement

Flax is cultivated for its seeds and fibers, which are high in cellulose; new cultivars with improved quality of these deliverables are a priority and goal of any flax breeding program. The

8

Key Stages of Flax Bast Fiber Development Through …

181

Fig. 8.24 Expression profiles of genes for LusMYB52, WRKY20, BLH1, and BLH2 transcription factors

development of transcriptome technologies has stimulated research into finding a link between plant features and gene expression, as well as identifying novel genetic markers for markerassisted selection or target genes for genome editing and genomic selection. Such an attempt was made for fiber flax through the development of a pipeline that relates the expression of a selected gene set in different flax genotypes to technical fiber quality parameters. There are five main steps in the pipeline: (1) candidate gene selection using previously published RNA-Seq (RNA Sequencing) data (Gorshkov et al. 2017, 2019; Mokshina et al. 2020), (2) analysis of their expression in growing plants of diverse flax genotypes, (3) assessment of quality parameters for technical fiber of these flax varieties and

genotypes after plant harvest, (4) determining the optimal sets of genes whose expression is tightly associated with fiber quality parameters by regularized regression models, and (5) evaluation of single nucleotide polymorphism abundance across the coding regions of genes essential for bast fiber development using RNA-Seq data obtained for several distinct flax genotypes (Galinousky et al. 2020). The analyzed set of several tens of genes includes the genes with high Tau scores that are specifically upregulated at different stages of fiber development (Mokshina et al. 2020), as well as the genes important for cell wall thickening, like various LusCESAs, etc. qPCR analysis of their expression in isolated fibers was performed for oilseed flax cultivars (Lirina and

182

T. Gorshkova et al.

Fig. 8.25 a The clustering (k-means) and heatmap analysis of the genes that are specifically expressed in the isolated fibers of fiber flax Grant during the tertiary cell wall development. Criteria for DEG Finder in FIBexDB: (tFIB_Li/tFIB_Gr < 0.5), and (tFIBa/iFIBa > 10), and (tFIBa/iFIBb > 10), and (sXYLa/tFIBa < 0.1), and (sXYLb/tFIBa < 0.1). b The clustering (k-means) and heatmap analysis of the genes that are specifically expressed

in the isolated fibers of oilseed flax Lirina during the tertiary cell wall development. Criteria for DEG Finder in FIBexDB: (tFIB_Li/tFIB_Gr > 2), and (tFIBa/iFIBa > 10), and (tFIBa/iFIBb > 10), and (sXYLa/tFIBa < 0.1), and (sXYLb/tFIBa < 0.1). c, d The expression profiles of highly expressed genes from corresponding clusters. e The expression profiles of LusCESAs in isolated fibers of fiber flax Grant, oilseed flax Lirina, and wild flax Linum bienne

Orpheus), fiber flax (Aramis, Eden, Grant, Laska, Drakkar, and Mogilevsky), and wild flax species (L. bienne and L. angustifolium) (Galinousky et al. 2020). Besides, RNA-Seq analysis was performed for the Grant and Lirina cultivars and L. bienne; these data are available in FIBexDB.

Different flax genotypes diverge in the expression of some genes for products involved in tertiary cell wall development: Most of these genes (e.g., LusBGAL12, LusRGL6, LusRRT, LusCOBL4, LusFLA11, LusLTP3, and others) have higher levels of expression in fiber flax

8

Key Stages of Flax Bast Fiber Development Through …

183

(Fig. 8.25a, c), while there is a set of genes upregulated in an oilseed flax cultivar. For example, the genes that are found in the LusFLA gene coexpression networks have higher levels of expression in fiber flax Grant compared to oilseed flax Lirina (Fig. 8.25b, d). Signal transmission, ion transport, and other activities are connected with the genes with increased expression in oilseed flax fibers. Also, the expression of LusCESA genes varies in isolated fibers of different flax genotypes: PCW-related LusCESA genes have a higher level of expression in oilseed cultivar, while SCW-related LusCESA genes are more activated in fiber flax and wild flax (Fig. 8.25e).

We examined the relationship between the gene expression levels supposedly related to final crop traits and several actual quality parameters: technical stem length, fiber tensile strength, and fiber flexibility, which differ between various flax genotypes. Linear regression models that related gene expression and quality parameters collected through a two-year study resulted in the identification of genes whose expression showed a significant value of the coefficient of determination (Galinousky et al. 2020). Two independent variables, LusGT47-1 and LusDFL1, were included in LASSO models of “tensile strength” and “technical stem length,” accounting for 58 and 80% of the variability of

Fig. 8.26 a The relationship between the technical stem length, the technical fiber strength, and flexibility, and the expression level (DCq-value) of the selected genes. The blue dots—fiber flax, yellow—linseed flax, and green— wild flax. The regression line is plotted with a black line, and a shadow near the line indicates the range of the 95%

confidence interval. b Linear regressions of scutched fiber quality features on relative expression levels of the studied gene (according to LASSO as the method of regression regularization). Adjusted R2 means are given. c The gene description (based on Galinousky et al. 2020)

184

the corresponding parameters, respectively (Fig. 8.26). The LusGT47-1 gene encodes an exostosin-like glycosyltransferase that may be involved in RG-I biosynthesis (Gorshkov et al. 2017). It is also found in the coexpression networks of the genes highly likely related to RG-I biosynthesis and modification (LusGT31, LusGALS2, LusBGAL12, LusRRT1s). Probable glycosyltransferase, which is homologous to At5g03795 (GT47) and LusGT47-1, is induced in ramie bark from the fully elongated internode compared to bark from the elongating internode (Xie et al. 2020). LusGT47-1 has the missense point mutation c.G1108T, which can be helpful in flax breeding (Galinousky et al. 2020). The LusIPT (Lus10028015) had a high score in the stepwise regression model for technical stem length, and its expression level was associated with fiber flexibility (Fig. 8.26) (Galinousky et al. 2020). The adenylate isopentenyltransferase (cytokinin synthase) gene (orthologous to AT3G63110, IPT3) is involved in cytokinin production (Takei et al. 2001). During secondary growth in Arabidopsis plants, the promoter activity of AtIPT3 is strong in the phloem but not in the cambium (Matsumoto-Kitano et al. 2008). The role of cytokinin in fiber formation requires more research. The expression of another fiber-specific gene, Lus10037225, which encodes the THIOREDOXIN LusTH8 (orthologous to AT1G69880) (Mokshina et al. 2020), shows a strong negative correlation with fiber quality parameters: fiber flax cultivars have a lower LusTH8 expression level than linseed cultivars and wild species (Galinousky et al. 2020). Thioredoxins act as hydrogen donors for redox enzymes; there are multiple isoforms that are specialized for different partners. Though the expression level of LusTH8 is strongly linked to technical fiber quality, the actual role of LusTH8 in bast fibers remains unclear. The multiple regression model also revealed other genes with positive and negative correlations (Galinousky et al. 2020). Such studies can certainly be extended to check other quality parameters and other gene sets. However, an algorithm must be developed to identify and

T. Gorshkova et al.

define the genes whose expression is critical for certain fiber quality characteristics, which might be exploited in future breeding and genomeediting operations. Transcriptome-based selection can provide a highly focused functional strategy for the production of novel crop cultivars. Genes with fiber and stage-specific character of expression have an especially high potential for manipulating fiber quality. Because these genes are expressed only in fibers, their active modification through traditional breeding or genome editing might have an influence on bast fiber quality while causing no harm to other tissues.

8.5

MiRNA—The Potential Regulators of Gene Expression

MicroRNAs (miRNAs) are the components of a complex that fine-tunes gene expression through a multi-level program at various stages of plant development, such as transcription factors. Their involvement in the formation of important agronomic traits has been demonstrated (Tang and Chu 2017; Wang et al. 2020; Xu et al. 2021). MiRNAs represent a class of single-stranded non-coding small RNAs ranging in size from 19 to 24 nucleotides that provide posttranscriptional regulation by interacting with the target region of mRNAs to induce degradation or translational repression of the mRNAs (Yu and Wang 2010; Axtell and Meyers 2018). According to the prevailing model, the level of complementarity between miRNAs and mRNAs determines the fate of target destabilization: Almost perfect complementarity is required for cleavage mediated by the RISC (RNA-induced silencing complex), and a lower degree of complementarity induces translation suppression while this issue is still in debate (Hausser and Zavolan 2014; Cloonan 2015; Breda et al. 2015). In plants, transcript cleavage is more often observed than translational inhibition (O’Brien et al. 2018; Dexheimer and Cochella 2020). On the whole, the functionality of the dynamic interplay of miRNAs with target mRNAs depends on many factors, including

8

Key Stages of Flax Bast Fiber Development Through …

their abundance, the stability of miRNA and mRNA complex, which is dependent on the nucleotide composition of 3ʹ-end and alternative polyadenylation, the subcellular localization of miRNAs, and the affinity of miRNA-target interactions, as well as the competition of different miRNAs in relation to the target mRNA (Hausser and Zavolan 2014; Breda et al. 2015; Trabucchi and Mategot 2019). MiRNAdependent post-transcriptional regulation, as well as regulation using transcription factors, is “combinatorial” in nature: a single miRNA is typically capable of recognizing several distinct mRNA transcripts; at the same time, one mRNA could be affected by multiple miRNAs that together provide fine-tuning of gene expression (Lewis et al. 2005). Additionally, it should be noted that small RNAs are mobile: secretion and cell-to-cell (short-range) movement, and systemic (long-range, like shoot-to-root or vice versa) trafficking would happen; that also indicates the functional importance of this regulatory mechanism (Melnyk et al. 2011; Dunoyer et al. 2013; Li et al. 2021). MiRNA families, as well as the target sites of miRNAs, are relatively conserved among various plant taxonomic groups. Therefore, there is a possibility to predict homologous miRNA and miRNA-target genes in non-model plants using a database of already identified small RNA sequences (Axtell and Meyers 2018). According to the miRBase database (Kozomara et al. 2018), 124 flax miRNAs belonging to 23 families have been identified, and hundreds of their potential targets have been predicted (Barozai 2012; Krasnov et al. 2019; Xie et al. 2021). The samples used in these studies to analyze miRNA expression were complicated mixtures of tissues that included many cell types and cells at differing developmental stages. For a deeper insight into the role of the miRNAs at a definite stage of fiber development (intrusive growth or tertiary cell wall formation), Illumina's technology for the integrated analysis of corresponding mRNA and miRNA libraries was applied (Gorshkov et al. 2019). The miRBase and the plant small RNA target analysis server (psRNATarget, http://plantgrn.noble.org/

185

psRNATarget/) were used to identify flax miRNAs and their predicted targets in specific cell types. Five samples, including intrusively growing phloem fibers with only primary cell walls (iFIBa and iFIBb), symplastically growing cortex parenchyma (cPAR), phloem fibers at the stage of tertiary cell wall deposition (tFIBa), and xylem (sXYLa) containing mainly cells with secondary cell walls (Fig. 8.1), were used for high-throughput sequencing of small RNA and mRNA libraries (Gorshkov et al. 2019). The data presented in this chapter were deposited as BioProject PRJNA475325 in the Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra). To measure gene/miRNA expression, we used the value of total gene reads (TGR), which was determined as the number of all reads that were mapped to a gene (for mRNA-Seq data) or to the known pre-miRNA sequence (for miRNA-Seq data). After estimating the expression values using the DESeq2 package (Love et al. 2014), representatives of all 23 known flax miRNA families were identified. It is assumed that the regulatory ability of miRNAs in relation to the target mRNA is evident if the level of miRNA expression reaches 100 or more reads per million (Mullokandov et al. 2012; Breda et al. 2015). The expression level of 43 representatives of flax miRNAs satisfied these conditions in at least one of the samples (Fig. 8.27a). Seven members of five families, miR159, miR319, miR397, miR398, and miR408, were the most abundant in all samples with TGR over 1000 in each sample. Of these, representatives of the miR159 family made up more than 50% of all detected miRNAs. These families are evolutionarily conserved and widespread in plants and participate extensively in the regulation of plant development and reactions to environmental stress (Guo et al. 2017b; Song et al. 2018; Li et al. 2020; Huang et al. 2021; Fang et al. 2021). They target SQUAMOSA PROMOTER BINDING PROTEIN LIKE (SPL) transcription factors and ensure correct timing of the juvenile-to-adult transition in Arabidopsis (Guo et al. 2017b). Earlier studies in flax have detected lus-miR159b and lus-miR159c in the most diverse sample types of flax plants and found their association

186

T. Gorshkova et al.

Fig. 8.27 a Diagram of the distribution of 124 microRNAs according to the expression values of each representative (in at least one sample). b Expression level (the color gradient reflects TGR values of 1–1000) of

differentially expressed lus-miRNAs (the cutoff of comparison tFIB versus iFIB (average of iFIBa + iFIBb) has a log2FC value  2 or  −2 with padj < 0.01)

with the regulation and fine-tuning of the expression of some housekeeping genes (Barvkar et al. 2013; Yu et al. 2016). The miR319 family targets the members of the TCP family of plantspecific transcription factors that regulate gene expression in multiple biological pathways, including vascular formation, leaf development, response to environmental stresses, and others (Fang et al. 2021). The rest of the families with highly expressed miRNAs (miR408, miR398, and miR397) are Cu-miRNAs that are under the control of the Cu-responsive transcription factor SPL7 and target the mRNAs encoding the copper-binding proteins (plastocyanin, Cu/Zn superoxidase dismutase, cytochrome-c oxidase, and others). According to the existing hypothesis, the Cu-miRNAs function as a part of mechanism of copper homeostasis that allows saving copper for the most vital Cu-containing proteins, such as plastocyanin and cytochrome-c oxidase, in actively growing cells during impending Cu deficiency (Shahbaz and Pilon 2019). In addition, the transgenic Arabidopsis plants overexpressing miR408 showed significantly increased plant height and other parameters, which generally increased biomass and seed yield. The increase in plant size was accompanied by a significant increase in the photosynthetic rate.

Consistent with the generally accepted opinion, an inverse relationship between the expression of the miRNAs and their putative mRNAtargets can indicate miRNA-mediated regulation. Based on the analysis of differential miRNA expression by pairwise comparisons between samples (normalized values of TGR > 100 in at least one sample, Gorshkov et al. 2019), the expression levels of 16 miRNAs associated with families of miR156, 166, 394, and 396 showed four or more times difference in abundance in tFIBa than in iFIBa and iFIBb (cutoff padj < 0.05) (Fig. 8.27). It is reasonable to suggest that increased expression of miRNAs at the stage of the tertiary cell wall formation is associated with the destabilization of mRNA active at the stage of intrusive growth. Indeed, the higher expression of lus-miR156a and the isoforms of miR166 and miR396 families at the stage of the tertiary cell wall formation compared to the stage of intrusive growth (Fig. 8.27b) was inversely correlated with the expression of 38 predicted target genes (Fig. 8.28). They are enriched with genes encoding transcription factors—growth-regulating factors (GRF1, GRF5, GRF8, GRF9), SPL9, homeodomain-leucine zipper proteins (HD-ZIP: HB8, HB14, HB15), genes for kinesins (POK2,

8

Key Stages of Flax Bast Fiber Development Through …

187

Fig. 8.28 miRNA-target genes downregulated at the tertiary cell wall deposition stage of phloem fibers. The red-blue heatmap displays the mRNA abundance as normalized TGR values (TGR  16 in at least one sample); the color gradient reflects TGR values from 1 to

500. cPAR—cortical parenchyma, iFIB—phloem fibers at the stage of intrusive elongation, sXYL—stem xylem with secondary cell wall, and tFIB—phloem fibers depositing tertiary cell walls.

HIK), and some other proteins (Figs. 8.28 and 8.29). In Arabidopsis, the GRF proteins tightly coupled with GRF-INTERACTING FACTORs (GIFs) form a functional transcriptional complex, which participates in the regulation of cell cyclerelated genes (Lee et al. 2009; Kim 2019). The

abundance of GRF mRNAs post-transcriptionally controlled by miR396 (Fig. 8.29a) forms an important module of the transition timing from the meristematic state to the differentiating state, regulating the growth and morphogenesis of leaves (Rodriguez et al. 2010). The GRF-GIF-miR396

188

T. Gorshkova et al.

Fig. 8.29 miRNA families upregulated in phloem fibers at the tertiary cell wall deposition stage and the downregulated target genes identified from the comparison of gene expression at the stages of cell wall thickening and

intrusive elongation (a cutoff of comparison tFIB vs. iFIB (average of iFIBa + iFIBb) has a log2FC value  −2 with padj < 0.01)

module is also involved in the regulation of cell elongation during the growth of roots and stems (Kim 2019; Liebsch and Palatnik 2020). In addition, one of the main molecular effects of high levels of miR396 is the downregulation of mitosisspecific genes, kinesins in particular (Rodriguez et al. 2010). Kinesins play critical roles in mitosis and affect microtubule organization, vesicle transport, and cellulose microfibril order (Li et al. 2012; Zhu and Dixit 2012). Lus10027933, Lus10032452, and Lus10042952 (miR396-target genes, Fig. 8.29b, c) and Lus10012048 (miR396/miR166b/miR156target gene, Fig. 8.29b) are the homologs of Arabidopsis genes encoding the two kinesins: HIK (ATP binding microtubule motor/HINKEL, At1g18370) and POK2 (PHRAGMOPLAST ORIENTING KINESIN 2, At4g19050). The POKs act as early anchoring components of the division plane at the cell cortex, maintaining the spatial memory of the preprophase bands and

retaining the downstream identity markers of the cortical division zone (Farquharson 2014; Lipka et al. 2014; Müller and Livanos 2019). HIK, along with POK2, plays a specific role in plant cytokinesis, controlling the reorganization of phragmoplast microtubules during cell plate formation and mediating its accurate positioning (Zhu and Dixit 2012; Herrmann et al. 2018). Mutations in the Arabidopsis HIK and POK genes cause defects in cytokinesis, leading to the formation of incomplete cell walls and multinucleated cells as well as a violation of the phragmoplast dynamics (Strompen et al. 2002; Herrmann et al. 2018). During flax bast fiber elongation, karyokinesis is not followed by cytokinesis; as a result, the cell becomes multinucleated. Furthermore, the formation of a cell plate and a phragmoplast is not observed, although the onset of karyokinesis is accompanied by the formation of preprophase bands, and mitotic spindles have a normal appearance (Ageeva et al. 2005). According to the

8

Key Stages of Flax Bast Fiber Development Through …

obtained data, some key components of this complicated process can be subjected to posttranscriptional control by miRNA. How the modulation of such control can be associated with the development of multinucleate cells in the process of intrusive elongation of phloem fibers remains to be elucidated. Additionally, there may be an epigenetic control of the transition from the intrusive growth stage to the tertiary cell wall formation stage since the miR156-SPL pathway could be subjected to epigenetic regulation, such as DNA methylation, modification of histone proteins, and chromatin remodeling (Xu et al. 2016). In this regard, it is noteworthy that the homolog (Lus10004012) of the SET domain-containing Arabidopsis histone methyltransferases (AT3G61740, TRITHORAX 3, ATX3) that control methylation of lysine residues in the tail of the H3 histone (Chen et al. 2017)) was identified among the miR396-target genes (Fig. 8.28). It is upregulated at the stage of intrusive growth. H3K4-methylation is a major regulator of chromatin state and gene expression; ATX3, redundantly with ATX4/5, can modulate transcriptional programs in different physiological situations by carrying out epigenetic control of thousands of loci throughout the genome (Chen et al. 2017). The members of the miR166 family were significantly upregulated in phloem fibers during cell wall thickening compared to the stage of intrusive elongation and xylem tissue (Fig. 8.27). The miR166/165 members in Arabidopsis and other plants mainly regulate the transcripts of HD-ZIP III genes (IFL1, ATHB-9, ATHB-14, ATHB-8, ATHB-15), which display overlapping and distinct functions in plant development, including stem apical meristem maintenance and cell specification, auxin transport, and the establishment of the vascular pattern, and may play an important role in secondary cell wall development (Prigge et al. 2005; Kim et al. 2005; Zhou et al. 2007). The results of research in model species revealed that downregulation of this miRNA family in xylem tissue, accompanied by the upregulation of HD-ZIP III TFs, plays an important role in the fine-tuning of the

189

expression of transcripts involved in lignification, cellulose biosynthesis, and other secondary wall-related processes (Ong and Wickneswari 2012; Du and Wang 2015). The formation of a thin layer of the SCW, always preceding the biosynthesis of thick TCW, may explain the high level of miR166 expression and a significant reduction (four times or more, padj  0.01) of the transcript levels of HD-ZIP III factors (ATHB-14 (Lus10023357, Lus10038449), ATHB-8 (Lus10011616, Lus10011426), ATHB15 (Lus10037568)), compared to xylem (Figs. 8.28 and 8.29d). This may be accompanied by an interruption of the secondary cell wall biosynthesis (Zhang et al. 2018) and the induction of tertiary cell wall deposition. Two members of miR394 were significantly upregulated in iFIB compared to all other samples except sXYLa (Fig. 8.27). It is known that miR394 during embryonic shoot meristem formation represses its direct target LCR (LEAF CURLING RESPONSIVENESS), thus contributing to the WUS-mediated stem cell maintenance (Knauer et al. 2013). The limited region of miR394 action in relation to this process (only three stem cell layers at the shoot tip) and lack of differential expression of LCR genes across samples suggest a different action of miR394 in iFIB. The phylogeny analysis revealed that mRNAs of genes, encoding F-box proteins, including proteins with carbohydrate-recognition domains (lectins), and associated with many biological events (morphogenesis, hormone perception and signaling, cell cycle, circadian clock regulation) (Stefanowicz et al. 2015), are highly conserved targets of miR394 across different plant species (Kumar et al. 2019). Perhaps miR394 upregulation in intrusively growing fibers is for these genes, but the corresponding miR394 targets have not been identified in flax orthologs (according to the Phytozome v13 database). Further investigations with extensive analysis of the unannotated F-box genes will be needed to better understand the regulation of miR-related genes. It should be noted that the network of the interplay of miRNAs with target mRNAs is generally ambiguous and could exhibit

190

T. Gorshkova et al.

unexpected results that go beyond the generally accepted negative correlation between microRNA and targeted mRNA. Mathematical modeling shows the possibility of a combinatorial (competitive) nature of the interaction of several microRNAs with their targets and could give a positive total effect on the expression of the target mRNA instead of the individual negative influence by each microRNA, and this could be determined by the concentration of the microRNA (Nyayanit and Gadgil 2015). The validation of this model in vivo would provide a more realistic explanation for the coordinated upregulation of both miRNAs and their targets.

8.6

Conclusions and Future Perspective

In this chapter, we have delineated the importance and usage of transcriptome analysis, mainly produced by RNA-Seq experiments employing deep sequencing devices. Information about the spatiotemporal pattern of gene expression allows us to speculate what is occurring in each tissue and gives us a deep insight into the molecular mechanisms of the biological processes taking place in each tissue, which can unite physiological and/or trait studies with molecular biology. Genome-wide analysis of many transcriptome data sets, as well as their integration, would be fundamental and inevitable knowledge for understanding and harnessing the plant. As suggested by the FIBexDB for flax and poplar, comparative analysis of different but related plant species would allow us to sort out common mechanisms among these plants and species-specific inherited features (Mokshina et al. 2021). In addition to the simple RNA-Seq experiments, including those for small RNA, new technologies exploiting deep sequencing, such as ATAC-seq and Ribo-seq, could also be employed in the study of flax biology (Buenrostro et al. 2013; Ingolia et al. 2009). These methods manifest open accessible chromatin regions and actively translated mRNAs, respectively, helping to understand more precise snapshots of gene expression. A recent trend in RNA-

Seq in model plants is single-cell-based RNASeq (scRNA-Seq, Tang et al. 2009). If each cell can be isolated separately, sorted by the fluorescent signal from the artificial promoter: fluorescent protein gene construct, and collected, the transcriptome of each cell can be obtained and analyzed with modern clustering techniques. scRNA-Seq requires cell-type specific promoters and the creation of transgenic plants. It is worthwhile to consider applying the method to flax stem instead of laser microdissection in the future. ChIP-seq and DAP-seq can provide information on target genes of transcription factors and other DNA-binding proteins such as histones (Johnson et al. 2007; O’Malley et al. 2016). Since transcription factors are the key molecules that determine the whole transcriptome, these methods revealing the primary influence of transcription factors are very important for understanding the molecular principles of gene expression. If every antibody to transcription factors could be prepared, it would not be necessary to produce transgenic plants for every transcription factor, but expression of tagfused transcription factors in the plant by the introduction of an artificial DNA construct to obtain better results for ChIP-seq is usually required. The creation of transgenic flax plants is still a major challenge for most laboratories but is also very important for the elucidation of the biological function of each gene featured by the transcriptome analysis as described in this chapter: overexpression, knocking down by RNAi, or knocking out by genome-editing technology of each gene of interest is a key strategy to determine the gene function. If genome (gene) editing could be performed in a single cell by introducing a genome-editing tool like a complex of guide RNA and nuclease protein Cas9 and the resultant cell could be regenerated into an individual plant, then the complicated method of creating transgenic plants would no longer be required. From this point of view, technologies for the isolation and transformation of protoplasts and its regeneration should be extensively studied in addition to the conventional method of transgenic plant creation. Another key feature to be studied is the

8

Key Stages of Flax Bast Fiber Development Through …

development of a new database encompassing and integrating all these data obtained from various kinds of omics and trait/phenotype data from multiple plant species, which is apparently beyond the scope of the current FIBexDB. As described in this chapter and the above summary, exploring transcriptome data in many conditions and their integrated data through the user-friendly web-based database would be a fundamental and inevitable daily procedure for all scientists engaging in flax research. Such activity and further efforts to advance our understanding of genome-wide omics data using emerging new technologies must be a first step toward improving plant traits with state-of-the-art technologies such as genome editing and genomic selection. We consider that the study of flax is now entering a new era, allowing us to understand it at the molecular level and design traits based on unprecedented big data. Acknowledgements This work was supported by grants from the Russian Science Foundation (grant #19-1400361П, RNA-Seq for the combined analysis of mRNA and miRNA; grant #20-44-07005, RNA-Seq data analysis, FIBexDB creation) and Strategic International Collaborative Research project (grant no. JPJ0088379) promoted by the Ministry of Agriculture, Forestry and Fisheries, Tokyo, Japan. The authors also acknowledge financial support from the government assignment for the FRC Kazan Scientific Center of RAS (N. Mokshina, OG, TG data normalization and discussion). No conflict of interest is declared.

References Abe H, Yamaguchi-Shinozaki K, Urao T et al (1997) Role of arabidopsis MYC and MYB homologs in droughtand abscisic acid-regulated gene expression. Plant Cell 9:1859–1868. https://doi.org/10.1105/tpc.9.10.1859 Ageeva MV, Petrovská B, Kieft H et al (2005) Intrusive growth of flax phloem fibers is of intercalary type. Planta 222:565–574. https://doi.org/10.1007/s00425005-1536-2 Ambrose C, DeBono A, Wasteneys G (2013) Cell geometry guides the dynamic targeting of apoplastic GPI-linked lipid transfer protein to cell wall elements and cell borders in Arabidopsis thaliana. PLoS ONE 8:e81215. https://doi.org/10.1371/journal.pone. 0081215 Andersson-Gunnerås S, Mellerowicz EJ, Love J et al (2006) Biosynthesis of cellulose-enriched tension wood in populus: global analysis of transcripts and

191

metabolites identifies biochemical and developmental regulators in secondary wall biosynthesis. Plant J 45:144–165. https://doi.org/10.1111/j.1365-313X. 2005.02584.x Aslam M, Fakher B, Jakada BH et al (2019) SWR1 chromatin remodeling complex: a key transcriptional regulator in plants. Cells 8:1621. https://doi.org/10. 3390/cells8121621 Atanassov II, Pittman JK, Turner SR (2009) Elucidating the mechanisms of assembly and subunit interaction of the cellulose synthase complex of Arabidopsis secondary cell walls. J Biol Chem 284:3833–3841. https://doi.org/10.1074/jbc.M807456200 Axtell MJ, Meyers BC (2018) Revisiting criteria for plant microRNA annotation in the era of big data. Plant Cell 30:272–284. https://doi.org/10.1105/tpc.17.00851 Barozai MYK (2012) In silico identification of micrornas and their targets in fiber and oil producing plant flax (Linum usitatissimum L.). Pakistan J Bot 44:1357– 1362 Barvkar VT, Pardeshi VC, Kale SM et al (2013) Genomewide identification and characterization of microRNA genes and their targets in flax (Linum usitatissimum). Planta 237:1149–1161. https://doi.org/10.1007/ s00425-012-1833-5 Baskin TI (2001) On the alignment of cellulose microfibrils by cortical microtubules: a review and a model. Protoplasma 215:150–171. https://doi.org/10.1007/ BF01280311 Besprozvannaya M, Dickson E, Li H et al (2018) GRAM domain proteins specialize functionally distinct ERPM contact sites in human cells. Elife 7:1–25. https:// doi.org/10.7554/eLife.31019 Borghi L, Kang J, Ko D et al (2015) The role of ABCGtype ABC transporters in phytohormone transport. Biochem Soc Trans 43:924–930. https://doi.org/10. 1042/BST20150106 Breda J, Rzepiela AJ, Gumienny R et al (2015) Quantifying the strength of miRNA–target interactions. Methods 85:90–99. https://doi.org/10.1016/j.ymeth. 2015.04.012 Brown DM, Zeef LAH, Ellis J et al (2005) Identification of novel genes in Arabidopsis involved in secondary cell eall formation using expression profiling and reverse genetics. Plant Cell 17:2281–2295. https://doi. org/10.1105/tpc.105.031542 Buenrostro JD, Giresi PG, Zaba LC et al (2013) Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10:1213–1218. https://doi.org/10.1038/nmeth.2688 Cassan-Wang H, Goué N, Saidi MN et al (2013) Identification of novel transcription factors regulating secondary cell wall formation in Arabidopsis. Front Plant Sci 4:1–14. https://doi.org/10.3389/fpls.2013.00189 Chen L-QQ, Luo J-HH, Cui Z-HH et al (2017) ATX3, ATX4, and ATX5 encode putative H3K4 methyltransferases and are critical for plant development. Plant Physiol 174:1795–1806. https://doi.org/10.1104/ pp.16.01944

192 Clair B, Déjardin A, Pilate G, Alméras T (2018) Is the Glayer a tertiary cell wall? Front Plant Sci 9:8–11. https://doi.org/10.3389/fpls.2018.00623 Cloonan N (2015) Re-thinking miRNA-mRNA interactions: intertwining issues confound target discovery. BioEssays 37:379–388. https://doi.org/10.1002/bies. 201400191 Cosgrove DJ (2005) Growth of the plant cell wall. Nat Rev Mol Cell Biol 6:850–861. https://doi.org/10.1038/ nrm1746 Cosgrove DJ (1998) Update on cell growth cell wall loosening by expansins 1. Plant Physiol 333–339 Das PK, Biswas R, Anjum N et al (2018) Rice matrix metalloproteinase OsMMP1 plays pleiotropic roles in plant development and symplastic-apoplastic transport by modulating cellulose and callose depositions. Sci Rep 8:2783. https://doi.org/10.1038/s41598-01820070-4 Daumann M, Hickl D, Zimmer D et al (2018) Characterization of filament-forming synthases from Arabidopsis thaliana. Plant J 96:316–328. https://doi.org/ 10.1111/tpj.14032 Day A, Addi M, Kim W et al (2005) ESTs from the fibrebearing stem tissues of flax (Linum usitatissimum L.): expression analyses of sequences related to cell wall development. Plant Biol 7:23–32. https://doi.org/10. 1055/s-2004-830462 de Jesús-Pires C, Ferreira-Neto JRC, Pacifico BezerraNeto J et al (2020) Plant thaumatin-like proteins: function, evolution and biotechnological applications. Curr Protein Pept Sci 21:36–51. https://doi.org/10. 2174/1389203720666190318164905 de Jong F, Munnik T (2021) Attracted to membranes: lipid-binding domains in plants. Plant Physiol 185:707–723. https://doi.org/10.1093/plphys/kiaa100 De Pauw MA, Vidmar JJ, Collins J et al (2007) Microarray analysis of bast fibre producing tissues of Cannabis sativa identifies transcripts associated with conserved and specialised processes of secondary wall development. Funct Plant Biol 34:737. https://doi.org/ 10.1071/FP07014 Desprez T, Juraniec M, Crowell EF et al (2007) Organization of cellulose synthase complexes involved in primary cell wall synthesis in Arabidopsis thaliana. Proc Natl Acad Sci 104:15572–15577. https://doi.org/10.1073/pnas.0706569104 Dexheimer PJ, Cochella L (2020) MicroRNAs: from mechanism to organism. Front Cell Dev Biol 8. https://doi.org/10.3389/fcell.2020.00409 DeYoung BJ, Clark SE (2008) BAM receptors regulate stem cell specification and organ development through complex interactions with CLAVATA signaling. Genetics 180:895–904. https://doi.org/10.1534/ genetics.108.091108 Dmitriev AA, Krasnov GS, Rozhmina TA et al (2016a) Glutathione S-transferases and UDPglycosyltransferases are involved in response to aluminum stress in flax. Front Plant Sci 7:1–10. https://doi.org/10.3389/fpls.2016.01920

T. Gorshkova et al. Dmitriev AA, Kudryavtseva AV, Krasnov GS et al (2016b) Gene expression profiling of flax (Linum usitatissimum L.) under edaphic stress. BMC Plant Biol 16:237. https://doi.org/10.1186/s12870-0160927-9 Dmitriev AA, Krasnov GS, Rozhmina TA et al (2017) Differential gene expression in response to Fusarium oxysporum infection in resistant and susceptible genotypes of flax (Linum usitatissimum L.). BMC Plant Biol 17:253. https://doi.org/10.1186/s12870017-1192-2 Doblin MS, Kurek I, Jacob-Wilk D, Delmer DP (2002) Cellulose biosynthesis in plants: from genes to rosettes. Plant Cell Physiol 43:1407–1420. https:// doi.org/10.1093/pcp/pcf164 Du Q, Wang H (2015) The role of HD-ZIP III transcription factors and miR165/166 in vascular development and secondary cell wall formation. Plant Signal Behav 10:e1078955. https://doi.org/10.1080/15592324.2015. 1078955 Dunoyer P, Melnyk C, Molnar A, Keith Slotkin R (2013) Plant mobile small RNAs. Cold Spring Harb Perspect Biol 5. https://doi.org/10.1101/cshperspect.a017897 Edqvist J, Blomqvist K, Nieuwland J, Salminen TA (2018) Plant lipid transfer proteins: are we finally closing in on the roles of these enigmatic proteins? J Lipid Res 59:1374–1382. https://doi.org/10.1194/jlr. R083139 Esau K (1943) Vascular differentiation in the vegetative shoot of Linum. III. The origin of the bast fibers. Am J Bot 30:579–586. https://doi.org/10.2307/2437468 Esau K (1965) Plant anatomy, 2nd edn. Wiley, New York Fang Y, Zheng Y, Lu W et al (2021) Roles of miR319regulated TCPs in plant development and response to abiotic stress. Crop J 9:17–28. https://doi.org/10.1016/ j.cj.2020.07.007 Farquharson KL (2014) POK marks the spot: kinesin-12 proteins are spatial markers of the site transiently occupied by the preprophase band. Plant Cell 26:2284–2284. https://doi.org/10.1105/tpc.114. 128850 Felten J, Vahala J, Love J et al (2018) Ethylene signaling induces gelatinous layers with typical features of tension wood in hybrid aspen. New Phytol 218:999– 1014. https://doi.org/10.1111/nph.15078 Fenart S, Ndong YPA, Duarte J et al (2010) Development and validation of a flax (Linum usitatissimum L.) gene expression oligo microarray. BMC Genomics 11:592. https://doi.org/10.1186/1471-2164-11-592 Ferrero LV, Gastaldi V, Ariel FD et al (2021) Class I TCP proteins TCP14 and TCP15 are required for elongation and gene expression responses to auxin. Plant Mol Biol 105:147–159. https://doi.org/10.1007/ s11103-020-01075-y Galindo-González L, Deyholos MK (2016) RNA-seq transcriptome response of flax (Linum usitatissimum L.) to the pathogenic fungus Fusarium oxysporum f. sp. lini. Front Plant Sci 7:1–22. https://doi.org/10. 3389/fpls.2016.01766

8

Key Stages of Flax Bast Fiber Development Through …

Galinousky D, Mokshina N, Padvitski T et al (2020) The toolbox for fiber flax breeding: a pipeline from fene expression to fiber quality. Front Genet 11. https://doi. org/10.3389/fgene.2020.589881 Gille S, de Souza A, Xiong G et al (2011) O-Acetylation of Arabidopsis hemicellulose xyloglucan requires AXY4 or AXY4L, proteins with a TBL and DUF231 domain. Plant Cell 23:4041–4053. https:// doi.org/10.1105/tpc.111.091728 Gorshkov O, Mokshina N, Gorshkov V et al (2017) Transcriptome portrait of cellulose-enriched flax fibres at advanced stage of specialization. Plant Mol Biol 93:431–449. https://doi.org/10.1007/s11103-0160571-7 Gorshkov O, Mokshina N, Ibragimova N et al (2018) Phloem fibres as motors of gravitropic behaviour of flax plants: level of transcriptome. Funct Plant Biol 45. https://doi.org/10.1071/FP16348 Gorshkov O, Chernova T, Mokshina N et al (2019) Intrusive growth of phloem fibers in flax stem: integrated analysis of miRNA and mRNA expression profiles. Plants 8. https://doi.org/10.3390/ plants8020047 Gorshkova T, Morvan C (2006) Secondary cell-wall assembly in flax phloem fibres: role of galactans. Planta 223:149–158. https://doi.org/10.1007/s00425005-0118-7 Gorshkova TA, Sal’nikov VV, Chemikosova SB et al (2003) The snap point: a transition point in Linum usitatissimum bast fiber development. Ind Crops Prod 18:213–221. https://doi.org/10.1016/S0926-6690(03) 00043-8 Gorshkova TA, Chemikosova SB, Sal’nikov VV et al (2004) Occurrence of cell-specific galactan is coinciding with bast fiber developmental transition in flax. Ind Crops Prod 19:217–224. https://doi.org/10.1016/j. indcrop.2003.10.002 Gorshkova T, Ageeva M, Chemikosova S, Salnikov V (2005) Tissue-specific processes during cell wall formation in flax fiber. Plant Biosyst Int J Deal Asp Plant Biol 139:88–92. https://doi.org/10.1080/ 11263500500056070 Gorshkova TA, Gurjanov OP, Mikshina PV et al (2010) Specific type of secondary cell wall formed by plant fibers. Russ J Plant Physiol 57:328–341. https://doi. org/10.1134/S1021443710030040 Gorshkova T, Brutch N, Chabbert B et al (2012) Plant fiber formation: State of the art, recent and expected progress, and open questions. CRC Crit Rev Plant Sci 31:201–228. https://doi.org/10.1080/07352689.2011. 616096 Gorshkova T, Mokshina N, Chernova T et al (2015) Aspen tension wood fibers contain b-(1!4)-galactans and acidic arabinogalactans retained by cellulose microfibrils in gelatinous walls. Plant Physiol 169:00690. https://doi.org/10.1104/pp.15.00690 Gorshkova T, Chernova T, Mokshina N et al (2018a) Transcriptome analysis of intrusively growing flax fibers isolated by laser microdissection. Sci Rep 8:14570. https://doi.org/10.1038/s41598-018-32869-2

193

Gorshkova T, Chernova T, Mokshina N et al (2018b) Plant ‘muscles’: fibers with a tertiary cell wall. New Phytol 218:66–72. https://doi.org/10.1111/nph.14997 Goulao LF, Vieira-Silva S, Jackson PA (2011) Association of hemicellulose- and pectin-modifying gene expression with Eucalyptus globulus secondary growth. Plant Physiol Biochem 49:873–881. https:// doi.org/10.1016/j.plaphy.2011.02.020 Griffiths JS, Datla RSS (2019) Genetic potential and gene expression landscape in flax, pp 119–128. https://doi. org/10.1007/978-3-030-23964-0_8 Guerriero G, Behr M, Backes A et al (2017) Bast fibre formation: insights from next-generation sequencing. Proc Eng 200:229–235. https://doi.org/10.1016/j. proeng.2017.07.033 Guo Y, Qiu C, Long S et al (2017a) Digital gene expression profiling of flax (Linum usitatissimum L.) stem peel identifies genes enriched in fiber-bearing phloem tissue. Gene 626:32–40. https://doi.org/10. 1016/j.gene.2017.05.002 Guo C, Xu Y, Shi M et al (2017b) Repression of miR156 by miR159 regulates the timing of the juvenile-toadult transition in Arabidopsis. Plant Cell 29:1293– 1304. https://doi.org/10.1105/tpc.16.00975 Gurjanov O, Gorshkova T, Kabel M et al (2007) MALDITOF MS evidence for the linking of flax bast fibre galactan to rhamnogalacturonan backbone. Carbohydr Polym 67:86–96. https://doi.org/10.1016/j.carbpol. 2006.04.018 Harpaz-Saad S, Western TL, Kieber JJ (2012) The FEI2SOS5 pathway and CELLULOSE SYNTHASE 5 are required for cellulose biosynthesis in the Arabidopsis seed coat and affect pectin mucilage structure. Plant Signal Behav 7:285–288. https://doi.org/10.4161/psb. 18819 Hausser J, Zavolan M (2014) Identification and consequences of miRNA–target interactions—beyond repression of gene expression. Nat Rev Genet 15:599–612. https://doi.org/10.1038/nrg3765 Hayashi T, Kaida R, Kaku T, Baba K (2010) Loosening xyloglucan prevents tensile stress in tree stem bending but accelerates the enzymatic degradation of cellulose. Russ J Plant Physiol 57:316–320. https://doi.org/10. 1134/S1021443710030027 Herrmann A, Livanos P, Lipka E et al (2018) Dual localized kinesin-12 POK2 plays multiple roles during cell division and interacts with MAP65-3. EMBO Rep 19:1–16. https://doi.org/10.15252/embr.201846085 Hobson NR (2013) b-galactosidases and fasciclin-like arabinogalactan proteins in flax (Linum usitatissimum) phloem fibre development. University of Alberta Hobson N, Deyholos MK (2013) Genomic and expression analysis of the flax (Linum usitatissimum) family of glycosyl hydrolase 35 genes. BMC Genomics 14:344. https://doi.org/10.1186/1471-2164-14-344 Huang S, Zhou J, Gao L, Tang Y (2021) Plant miR397 and its functions. Funct Plant Biol 48:361. https://doi. org/10.1071/FP20342 Ibragimova N, Mokshina N, Ageeva M et al (2020) Rearrangement of the cellulose-enriched cell wall in

194 flax phloem fibers over the course of the gravitropic reaction. Int J Mol Sci 21:5322. https://doi.org/10. 3390/ijms21155322 Ingolia NT, Lareau LF, Weissman JS et al (2009) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147:789–802. https://doi.org/10.1016/ j.cell.2011.10.002 Johnson DS, Mortazavi A, Myers RM et al (2007) Genome-wide mapping of in vivo protein-DNA interactions. Science 316:1497–1502. https://doi.org/ 10.1126/science.1141319 Joubès J, Chevalier C, Dudits D et al (2000) CDK-related protein kinases in plants. Plant Mol Biol 43:607–620. https://doi.org/10.1023/a:1006470301554 Kan C-C, Chung T-Y, Juo Y-A, Hsieh M-H (2015) Glutamine rapidly induces the expression of key transcription factor genes involved in nitrogen and stress responses in rice roots. BMC Genomics 16:731. https://doi.org/10.1186/s12864-015-1892-7 Kapoor P, Shen X (2014) Mechanisms of nuclear actin in chromatin-remodeling complexes. Trends Cell Biol 24:238–246. https://doi.org/10.1016/j.tcb.2013.10.007 Kim JH (2019) Biological roles and an evolutionary sketch of the GRF-GIF transcriptional complex in plants. BMB Rep 52:227–238. https://doi.org/10. 5483/BMBRep.2019.52.4.051 Kim J, Jung J, Reyes JL et al (2005) microRNA-directed cleavage of ATHB15 mRNA regulates vascular development in Arabidopsis inflorescence stems. Plant J 42:84–94. https://doi.org/10.1111/j.1365313X.2005.02354.x Knauer S, Holt AL, Rubio-Somoza I et al (2013) A protodermal miR394 signal defines a region of stem cell competence in the Arabidopsis shoot meristem. Dev Cell 24:125–132. https://doi.org/10.1016/j. devcel.2012.12.009 Kohorn BD, Kobayashi M, Johansen S et al (2006) An Arabidopsis cell wall-associated kinase required for invertase activity and cell growth. Plant J 46:307–316. https://doi.org/10.1111/j.1365-313X.2006.02695.x Kozomara A, Birgaoanu M, Griffiths-Jones S (2018) miRBase: from microRNA sequences to function. Nucleic Acids Res 47:155–162. https://doi.org/10. 1093/nar/gky1141 Krasnov GS, Dmitriev AA, Zyablitsin AV et al (2019) Aluminum responsive genes in flax (Linum usitatissimum L.). Biomed Res Int 2019. https://doi.org/10. 1155/2019/5023125 Kryuchkova-Mostacci N, Robinson-Rechavi M (2016) A benchmark of gene expression tissue-specificity metrics. Brief Bioinform 18:bbw008. https://doi.org/10. 1093/bib/bbw008 Kumar A, Gautam V, Kumar P et al (2019) Identification and co-evolution pattern of stem cell regulator miR394s and their targets among diverse plant species. BMC Evol Biol 19:55. https://doi.org/10. 1186/s12862-019-1382-7 Lafarguette F, Leplé J, Déjardin A et al (2004) Poplar genes encoding fasciclin-like arabinogalactan proteins

T. Gorshkova et al. are highly expressed in tension wood. New Phytol 164:107–121. https://doi.org/10.1111/j.1469-8137. 2004.01175.x Lau JM, McNeil M, Darvill AG, Albersheim P (1985) Structure of the backbone of rhamnogalacturonan I, a pectic polysaccharide in the primary cell walls of plants. Carbohydr Res 137:111–125. https://doi.org/ 10.1016/0008-6215(85)85153-3 Lee BH, Ko J-H, Lee S et al (2009) The Arabidopsis GRF-INTERACTING FACTOR gene family performs an overlapping function in determining organ size as well as multiple developmental properties. Plant Physiol 151:655–668. https://doi.org/10.1104/ pp.109.141838 Lewis BP, Burge CB, Bartel DP (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120:15–20. https://doi.org/10.1016/j.cell.2004. 12.035 Li J, Xu Y, Chong K (2012) The novel functions of kinesin motor proteins in plants. Protoplasma 249:95– 100. https://doi.org/10.1007/s00709-011-0357-3 Li S-B, Xie Z-Z, Hu C-G, Zhang J-Z (2016) A review of auxin response factors (ARFs) in plants. Front Plant Sci 7:1–7. https://doi.org/10.3389/fpls.2016.00047 Li Y, Li X, Yang J, He Y (2020) Natural antisense transcripts of MIR398 genes suppress microR398 processing and attenuate plant thermotolerance. Nat Commun 11:1–13. https://doi.org/10.1038/s41467020-19186-x Li S, Wang X, Xu W et al (2021) Unidirectional movement of small RNAs from shoots to roots in interspecific heterografts. Nat Plants 7:50–59. https:// doi.org/10.1038/s41477-020-00829-2 Liebsch D, Palatnik JF (2020) MicroRNA miR396, GRF transcription factors and GIF co-regulators: a conserved plant growth regulatory module with potential for breeding and biotechnology. Curr Opin Plant Biol 53:31–42. https://doi.org/10.1016/j.pbi.2019.09.008 Lipka E, Gadeyne A, Stöckle D et al (2014) The Phragmoplast-orienting kinesin-12 class proteins translate the positional information of the preprophase band to establish the cortical division zone in Arabidopsis thaliana. Plant Cell 26:2617–2632. https://doi.org/10.1105/tpc.114.124933 Long S-H, Deng X, Wang Y-F et al (2012) Analysis of 2,297 expressed sequence tags (ESTs) from a cDNA library of flax (Linum usitatissimum L.) bark tissue. Mol Biol Rep 39:6289–6296. https://doi.org/10.1007/ s11033-012-1450-1 Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550. https://doi.org/ 10.1186/s13059-014-0550-8 Ludwig-Müller J (2011) Auxin conjugates: their role for plant development and in the evolution of land plants. J Exp Bot 62:1757–1773. https://doi.org/10.1093/jxb/ erq412 Luo X, Bai X, Sun X et al (2013) Expression of wild soybean WRKY20 in Arabidopsis enhances drought

8

Key Stages of Flax Bast Fiber Development Through …

tolerance and regulates ABA signalling. J Exp Bot 64:2155–2169. https://doi.org/10.1093/jxb/ert073 MacMillan CP, Mansfield SD, Stachurski ZH et al (2010) Fasciclin-like arabinogalactan proteins: specialization for stem biomechanics and cell wall architecture in Arabidopsis and eucalyptus. Plant J 62:689–703. https://doi.org/10.1111/j.1365-313X.2010.04181.x Madson M, Dunand C, Li X et al (2003) The MUR3 gene of Arabidopsis encodes a xyloglucan galactosyltransferase that is evolutionarily related to animal exostosins. Plant Cell 15:1662–1670. https://doi.org/10. 1105/tpc.009837 Matsumoto-Kitano M, Kusumoto T, Tarkowski P et al (2008) Cytokinins are central regulators of cambial activity. Proc Natl Acad Sci 105:20027–20031. https://doi.org/10.1073/pnas.0805619105 McNeil M, Darvill AG, Albersheim P (1980) Structure of plant cell walls. Plant Physiol 66:1128–1134. https:// doi.org/10.1104/pp.66.6.1128 Mellerowicz EJ, Baucher M, Sundberg B, Boerjan W (2001) Unravelling cell wall formation in the woody dicot stem. Plant Mol Biol 47:239–274. https://doi. org/10.1023/A:1010699919325 Mellerowicz EJ, Immerzeel P, Hayashi T (2008) Xyloglucan: the molecular muscle of trees. Ann Bot 102:659– 665. https://doi.org/10.1093/aob/mcn170 Melnyk CW, Molnar A, Baulcombe DC (2011) Intercellular and systemic movement of RNA silencing signals. EMBO J 30:3553–3563. https://doi.org/10. 1038/emboj.2011.274 Mikshina P, Chernova T, Chemikosova S et al (2013) Cellulosic fibers: role of matrix polysaccharides in structure and function. In: Cellulose—fundamental aspects. InTech Mokshina NE, Ibragimova NN, Salnikov VV et al (2012) Galactosidase of plant fibers with gelatinous cell wall: identification and localization. Russ J Plant Physiol 59:246–254. https://doi.org/10.1134/ S1021443712020082 Mokshina N, Gorshkova T, Deyholos MK (2014) Chitinase-like (CTL) and cellulose synthase (CESA) gene expression in gelatinous-type cellulosic walls of flax (Linum usitatissimum L.) bast fibers. PLoS ONE 9:e97949. https://doi.org/10.1371/journal.pone. 0097949 Mokshina N, Gorshkov O, Ibragimova N et al (2017) Cellulosic fibres of flax recruit both primary and secondary cell wall cellulose synthases during deposition of thick tertiary cell walls and in the course of graviresponse. Funct Plant Biol 44. https://doi.org/10. 1071/FP17105 Mokshina N, Chernova T, Galinousky D et al (2018) Key stages of fiber development as determinants of bast fiber yield and quality. Fibers 6:20. https://doi.org/10. 3390/fib6020020 Mokshina N, Makshakova O, Nazipova A et al (2019) Flax rhamnogalacturonan lyases: phylogeny, differential expression and modeling of protein structure. Physiol Plant 167:173–187. https://doi.org/10.1111/ ppl.12880

195

Mokshina N, Gorshkov O, Galinousky D, Gorshkova T (2020) Genes with bast fiber-specific expression in flax plants—molecular keys for targeted fiber crop improvement. Ind Crops Prod 152. https://doi.org/10. 1016/j.indcrop.2020.112549 Mokshina N, Gorshkov O, Takasaki H et al (2021) FIBexDB: a new online transcriptome platform to analyze development of plant cellulosic fibers. New Phytol 231:512–515. https://doi.org/10.1111/nph. 17405 Monné M, Daddabbo L, Gagneul D et al (2018) Uncoupling proteins 1 and 2 (UCP1 and UCP2) from Arabidopsis thaliana are mitochondrial transporters of aspartate, glutamate, and dicarboxylates. J Biol Chem 293:4213–4227. https://doi.org/10.1074/jbc.RA117. 000771 Müller S, Livanos P (2019) Plant kinesin-12: localization heterogeneity and functional implications. Int J Mol Sci 20:4213. https://doi.org/10.3390/ijms20174213 Mullokandov G, Baccarini A, Ruzo A et al (2012) Highthroughput assessment of microRNA activity and function using microRNA sensor and decoy libraries. Nat Methods 9:840–846. https://doi.org/10.1038/ nmeth.2078 Nakano T, Suzuki K, Fujimura T, Shinshi H (2006) Genome-wide analysis of the ERFgene family in Arabidopsis and frice. Plant Physiol 140:411–432. https://doi.org/10.1104/pp.105.073783 Nebenführ A, Dixit R (2018) Kinesins and myosins: molecular motors that coordinate cellular functions in plants. Annu Rev Plant Biol 69:329–361. https://doi. org/10.1146/annurev-arplant-042817-040024 Nishikubo N, Awano T, Banasiak A et al (2007) Xyloglucan endo-transglycosylase (XET) functions in gelatinous layers of rension wood fibers in poplar— a glimpse into the mechanism of the balancing act of trees. Plant Cell Physiol 48:843–855. https://doi.org/ 10.1093/pcp/pcm055 Nyayanit D, Gadgil CJ (2015) Mathematical modeling of combinatorial regulation suggests that apparent positive regulation of targets by miRNA could be an artifact resulting from competition for mRNA. RNA 21:307–319. https://doi.org/10.1261/rna.046862.114 O’Brien J, Hayder H, Zayed Y, Peng C (2018) Overview of microRNA biogenesis, mechanisms of actions, and circulation. Front Endocrinol (lausanne) 9:1–12. https://doi.org/10.3389/fendo.2018.00402 O’Malley RC, Huang SC, Song L et al (2016) Cistrome and epicistrome features shape the regulatory DNA landscape. Cell 165:1280–1292. https://doi.org/10. 1016/j.cell.2016.04.038 Ong SS, Wickneswari R (2012) Characterization of microRNAs expressed during secondary wall biosynthesis in Acacia mangium. PLoS ONE 7:e49662. https://doi.org/10.1371/journal.pone.0049662 Orford SJ, Timmis JN (2000) Expression of a lipid transfer protein gene family during cotton fibre development. Biochim Biophys Acta Mol Cell Biol Lipids 1483:275–284. https://doi.org/10.1016/S13881981(99)00194-8

196 Paux E, Carocha V, Marques C et al (2005) Transcript profiling of Eucalyptus xylem genes during tension wood formation. New Phytol 167:89–100. https://doi. org/10.1111/j.1469-8137.2005.01396.x Perico C, Gao H, Heesom KJ et al (2021) Arabidopsis thaliana myosin XIK is recruited to the Golgi through interaction with a MyoB receptor. Commun Biol 4:1182. https://doi.org/10.1038/s42003-021-02700-2 Persson S, Wei H, Milne J et al (2005) Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets. Proc Natl Acad Sci 102:8633–8638. https://doi.org/10.1073/pnas. 0503392102 Persson S, Paredez A, Carroll A et al (2007) Genetic evidence for three unique components in primary cellwall cellulose synthase complexes in Arabidopsis. Proc Natl Acad Sci 104:15566–15571. https://doi.org/ 10.1073/pnas.0706592104 Petrova A, Kozlova L, Gorshkov O et al (2021a) Cell wall layer induced in xylem fibers of flax upon gravistimulation iss similar to constitutively formed cell walls of bast fibers. Front Plant Sci 12:1–14. https://doi.org/ 10.3389/fpls.2021.660375 Petrova N, Nazipova A, Gorshkov O et al (2021b) Gene expression patterns for proteins with lectin domains in flax stem tissues are related to deposition of distinct cell wall types. Front Plant Sci 12:634594. https://doi. org/10.3389/fpls.2021.634594 Pilate G, Déjardin A, Laurans F, Leplé JC (2004) Tension wood as a model for functional genomics of wood formation. New Phytol 164:63–72. https://doi.org/10. 1111/j.1469-8137.2004.01176.x Pineau E, Sauveplane V, Grienenberger E et al (2021) CYP77B1 a fatty acid epoxygenase specific to flowering plants. Plant Sci 307:110905. https://doi.org/10. 1016/j.plantsci.2021.110905 Pollard TD, Cooper JA (2009) Actin, a central player in cell shape and movement. Science (80–) 326:1208– 1212. https://doi.org/10.1126/science.1175862 Pratelli R, Voll LM, Horst RJ et al (2010) Stimulation of nonselective amino acid export by glutamine dumper proteins. Plant Physiol 152:762–773. https://doi.org/ 10.1104/pp.109.151746 Prigge MJ, Otsuga D, Alonso JM et al (2005) Class III homeodomain-leucine zipper gene family members have overlapping, antagonistic, and distinct roles in Arabidopsis development. Plant Cell 17:61–76. https://doi.org/10.1105/tpc.104.026161 Qi T, Wang J, Huang H et al (2015) Regulation of jasmonate-induced leaf senescence by antagonism between bHLH subgroup IIIe and IIId factors in Arabidopsis. Plant Cell 27:1634–1649. https://doi.org/ 10.1105/tpc.15.00110 Ranocha P, Dima O, Nagy R et al (2013) Arabidopsis WAT1 is a vacuolar auxin transport facilitator required for auxin homoeostasis. Nat Commun 4:2625. https://doi.org/10.1038/ncomms3625 Richmond TA, Somerville CR (2000) The cellulose synthase superfamily. Plant Physiol 124:495–498. https://doi.org/10.1104/pp.124.2.495

T. Gorshkova et al. Ridley BL, O’Neill MA, Mohnen D (2001) Pectins: structure, biosynthesis, and oligogalacturonide-related signaling. Phytochemistry 57:929–967. https://doi.org/ 10.1016/S0031-9422(01)00113-3 Rihouey C, Paynel F, Gorshkova T, Morvan C (2017) Flax fibers: assessing the non-cellulosic polysaccharides and an approach to supramolecular design of the cell wall. Cellulose 24:1985–2001. https://doi.org/10. 1007/s10570-017-1246-5 Roach MJ, Deyholos MK (2007) Microarray analysis of flax (Linum usitatissimum L.) stems identifies transcripts enriched in fibre-bearing phloem tissues. Mol Genet Genomics 278:149–165. https://doi.org/10. 1007/s00438-007-0241-1 Roach MJ, Deyholos MK (2008) Microarray analysis of developing flax hypocotyls identifies novel transcripts correlated with specific stages of phloem fibre differentiation. Ann Bot 102:317–330. https://doi.org/10. 1093/aob/mcn110 Roach MJ, Mokshina NY, Badhan A et al (2011) Development of cellulosic secondary walls in flax fibers requires b-galactosidase. Plant Physiol 156:1351–1363. https://doi.org/10.1104/pp.111. 172676 Rodriguez RE, Mecchia MA, Debernardi JM et al (2010) Control of cell proliferation in Arabidopsis thaliana by microRNA miR396. Development 137:103–112. https://doi.org/10.1242/dev.043067 Ruta V, Longo C, Lepri A et al (2020) The DOF transcription factors in seed and seedling development. Plants 9:218. https://doi.org/10.3390/ plants9020218 Saelim L, Akiyoshi N, Tan TT et al (2018) Arabidopsis group IIId ERF proteins positively regulate primary cell wall-type CESA genes. J Plant Res 132:117–129. https://doi.org/10.1007/s10265-018-1074-1 Sakamoto S, Somssich M, Nakata MT et al (2018) Complete substitution of a secondary cell wall with a primary cell wall in Arabidopsis. Nat Plants 4:777– 783. https://doi.org/10.1038/s41477-018-0260-4 Sanchez J-P, Chua N-H (2001) Arabidopsis PLC1 is required for secondary responses to abscisic acid signals. Plant Cell 13:1143–1154. https://doi.org/10. 1105/tpc.13.5.1143 Schreiber N, Gierlinger N, Pütz N et al (2010) G-fibres in storage roots of Trifolium pratense (Fabaceae): tensile stress generators for contraction. Plant J 61:854–861. https://doi.org/10.1111/j.1365-313X.2009.04115.x Seifert G (2018) Fascinating fasciclins: a surprisingly widespread family of proteins that mediate interactions between the cell exterior and the cell surface. Int J Mol Sci 19:1628. https://doi.org/10.3390/ ijms19061628 Seifert GJ (2021) The FLA4-FEI Pathway: a unique and mysterious signaling module related to cell wall structure and stress signaling. Genes (Basel) 12:145. https://doi.org/10.3390/genes12020145 Sénéchal F, Graff L, Surcouf O et al (2014) Arabidopsis PECTIN METHYLESTERASE17 is co-expressed with and processed by SBT3.5, a subtilisin-like serine

8

Key Stages of Flax Bast Fiber Development Through …

protease. Ann Bot 114:1161–1175. https://doi.org/10. 1093/aob/mcu035 Seo PJ, Park JM, Kang SK et al (2011) An Arabidopsis senescence-associated protein SAG29 regulates cell viability under high salinity. Planta 233:189–200. https://doi.org/10.1007/s00425-010-1293-8 Shahbaz M, Pilon M (2019) Conserved Cu-microRNAs in Arabidopsis thaliana function in copper economy under deficiency. Plants 8:141. https://doi.org/10. 3390/plants8060141 Snegireva AV, Ageeva MV, Amenitskii SI et al (2010) Intrusive growth of sclerenchyma fibers. Russ J Plant Physiol 57:342–355. https://doi.org/10.1134/ S1021443710030052 Song Z, Zhang L, Wang Y et al (2018) Constitutive expression of mir408 improves biomass and seed yield in arabidopsis. Front Plant Sci 8:1–14. https://doi.org/ 10.3389/fpls.2017.02114 Stanga JP, Smith SM, Briggs WR, Nelson DC (2013) Suppressor of more axillary growth2 1 controls seed germination and seedling development in Arabidopsis. Plant Physiol 163:318–330. https://doi.org/10.1104/ pp.113.221259 Stefanowicz K, Lannoo N, Van Damme EJM (2015) Plant F-box proteins—Judges between life and death. CRC Crit Rev Plant Sci 34:523–552. https://doi.org/10. 1080/07352689.2015.1024566 Stranne M, Ren Y, Fimognari L et al (2018) TBL 10 is required for O-acetylation of pectic rhamnogalacturonan-I in Arabidopsis thaliana. Plant J 96:772–785. https://doi.org/10.1111/tpj.14067 Strompen G, El Kasmi F, Richter S et al (2002) The Arabidopsis HINKEL gene encodes a kinesin-related protein involved in cytokinesis and is expressed in a cell cycle-dependent manner. Curr Biol 12:153–158. https://doi.org/10.1016/S0960-9822(01)00655-8 Su HG, Zhang XH, Wang TT et al (2020) Genome-wide identification, evolution, and expression of GDSLtype esterase/lipase gene family in soybean. Front Plant Sci 11. https://doi.org/10.3389/fpls.2020.00726 Suetsugu N, Yamada N, Kagawa T et al (2010) Two kinesin-like proteins mediate actin-based chloroplast movement in Arabidopsis thaliana. Proc Natl Acad Sci 107:8860–8865. https://doi.org/10.1073/pnas. 0912773107 Takei K, Sakakibara H, Sugiyama T (2001) Identification of genes encoding adenylate isopentenyltransferase, a cytokinin biosynthesis enzyme, in Arabidopsis thaliana. J Biol Chem 276:26405–26410. https://doi.org/ 10.1074/jbc.M102130200 Takenaka Y, Kato K, Ogawa-Ohnishi M et al (2018) Pectin RG-I rhamnosyltransferases represent a novel plant-specific glycosyltransferase family. Nat Plants 4:669–676. https://doi.org/10.1038/s41477-018-02177 Tang J, Chu C (2017) MicroRNAs in crop improvement: fine-tuners for complex traits. Nat Plants 3. https://doi. org/10.1038/nplants.2017.77 Tang F, Barbacioru C, Wang Y et al (2009) mRNA-seq whole-transcriptome analysis of a single cell. Nat

197

Methods 6:377–382. https://doi.org/10.1038/nmeth. 1315 Tang L, Cai H, Zhai H et al (2014) Overexpression of glycine soja WRKY20 enhances both drought and salt tolerance in transgenic alfalfa (Medicago sativa L.). Plant Cell Tissue Organ Cult 118:77–86. https://doi. org/10.1007/s11240-014-0463-y Taylor NG, Howells RM, Huttly AK et al (2003) Interactions among three distinct CesA proteins essential for cellulose synthesis. Proc Natl Acad Sci 100:1450–1455. https://doi.org/10.1073/pnas. 0337628100 Trabucchi M, Mategot R (2019) Subcellular heterogeneity of the microRNA machinery. Trends Genet 35:15–28. https://doi.org/10.1016/j.tig.2018.10.006 Tsaneva M, Van Damme EJM (2020) 130 years of plant lectin research. Glycoconj J 37:533–551. https://doi. org/10.1007/s10719-020-09942-y Umehara M, Hanada A, Yoshida S et al (2008) Inhibition of shoot branching by new terpenoid plant hormones. Nature 455:195–200. https://doi.org/10.1038/ nature07272 Van Dam JEG, Gorshkova TA (2003) Fiber Formation. Encycl Appl Plant Sci 87–96 van Gisbergen PAC, Bezanilla M (2013) Plant formins: membrane anchors for actin polymerization. Trends Cell Biol 23:227–233. https://doi.org/10.1016/j.tcb. 2012.12.001 Van Sandt VST, Suslov D, Verbelen J-P, Vissenberg K (2007) Xyloglucan endotransglucosylase activity loosens a plant cell wall. Ann Bot 100:1467–1473. https:// doi.org/10.1093/aob/mcm248 Venglat P, Xiang D, Qiu S et al (2011) Gene expression analysis of flax seed development. BMC Plant Biol 11:74. https://doi.org/10.1186/1471-2229-11-74 Vincken J-P, Schols HA, Oomen RJFJ et al (2003) If homogalacturonan were a side chain of rhamnogalacturonan I. Implications for cell wall architecture. Plant Physiol 132:1781–1789. https://doi.org/10.1104/pp. 103.022350 Voiniciuc C, Engle KA, Günl M et al (2018) Identification of key enzymes for pectin synthesis in seed mucilage. Plant Physiol 178:1045–1064. https://doi. org/10.1104/pp.18.00584 Wachananawat B, Kuroha T, Takenaka Y et al (2020) Diversity of pectin rhamnogalacturonan I rhamnosyltransferases in glycosyltransferase family 106. Front Plant Sci 11:1–12. https://doi.org/10.3389/fpls.2020. 00997 Wang Z, Hobson N, Galindo L et al (2012) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72:461–473. https://doi.org/10.1111/j.1365-313X.2012.05093.x Wang W-Q, Allan AC, Yin X-R (2020) Small RNAs with a big impact on horticultural traits. CRC Crit Rev Plant Sci 39:30–43. https://doi.org/10.1080/07352689. 2020.1741923 Xie K, Zhang X, Sui S et al (2020) Exploring and applying the substrate promiscuity of a Cglycosyltransferase in the chemo-enzymatic synthesis

198 of bioactive C-glycosides. Nat Commun 11:5162. https://doi.org/10.1038/s41467-020-18990-9 Xie D, Yu Y, Dai Z et al (2021) Identification and characterization of miRNAs and target genes in developing flax seeds by multigroup analysis. Biotechnol Biotechnol Equip 35:538–550. https:// doi.org/10.1080/13102818.2021.1903337 Xu B, Gou J-Y, Li F-G et al (2013) A cotton BURP domain protein interacts with a-expansin and their coexpression promotes plant growth and fruitproduction. Mol Plant 6:945–958. https://doi.org/10.1093/mp/ sss112 Xu M, Hu T, Smith MR, Poethig RS (2016) Epigenetic regulation of vegetative phase change in Arabidopsis. Plant Cell 28:28–41. https://doi.org/10.1105/tpc.15. 00854 Xu Y, Wang Y, Wang X et al (2020) Transcription factors BLH2 and BLH4 regulate demethylesterification of homogalacturonan in seed mucilage. Plant Physiol 183:96–111. https://doi.org/10.1104/pp.20.00011 Xu T, Zhang L, Yang Z et al (2021) Identification and functional characterization of plant MiRNA under salt stress shed light on salinity resistance improvement through MiRNA manipulation in crops. Front Plant Sci 12:1–12. https://doi.org/10.3389/fpls.2021.665439 Yan L, Chouw N, Jayaraman K (2014) Flax fibre and its composites—a review. Compos Part B Eng 56:296– 317. https://doi.org/10.1016/j.compositesb.2013.08. 014 Yanai I, Benjamin H, Shmoish M et al (2005) Genomewide midrange transcription profiles reveal expression level relationships in human tissue specification. Bioinformatics 21:650–659. https://doi.org/10.1093/ bioinformatics/bti042 Yu B, Wang H (2010) Translational inhibition by microRNAs in plants. In: Progress in molecular and subcellular biology, pp 41–57 Yu Y, Wu G, Yuan H et al (2016) Identification and characterization of miRNAs and targets in flax (Linum usitatissimum) under saline, alkaline, and salinealkaline stresses. BMC Plant Biol 16:124. https://doi. org/10.1186/s12870-016-0808-2 Zeng H, Xu L, Singh A et al (2015) Involvement of calmodulin and calmodulin-like proteins in plant responses to abiotic stresses. Front Plant Sci 6:1–12. https://doi.org/10.3389/fpls.2015.00600 Zhang N, Deyholos MK (2016) RNASeq analysis of the shoot apex of flax (Linum usitatissimum) to identify phloem fiber specification fgenes. Front Plant Sci 7:1– 5. https://doi.org/10.3389/fpls.2016.00950

T. Gorshkova et al. Zhang B, Zhang L, Li F et al (2017) Control of secondary cell wall patterning involves xylan deacetylation by a GDSL esterase. Nat Plants 3:17017. https://doi.org/10. 1038/nplants.2017.17 Zhang J, Xie M, Tuskan GA et al (2018) Recent advances in the transcriptional regulation of secondary cell wall biosynthesis in the woody plants. Front Plant Sci 9:1– 14. https://doi.org/10.3389/fpls.2018.01535 Zhang R-X, Li S, He J, Liang Y-K (2019) BIG regulates sugar response and C/N balance in Arabidopsis. Plant Signal Behav 14:1669418. https://doi.org/10.1080/ 15592324.2019.1669418 Zhao Y, Zhao S, Mao T et al (2011) The plant-specific actin binding protein SCAB1 stabilizes actin filaments and regulates stomatal movement in Arabidopsis. Plant Cell 23:2314–2330. https://doi.org/10.1105/tpc. 111.086546 Zhong R, Ye Z-H (1999) IFL1, a gene regulating interfascicular fiber differentiation in Arabidopsis, encodes a homeodomain-leucine zipper protein. Plant Cell 11:2139–2152. https://doi.org/10.1105/tpc.11.11. 2139 Zhong R, Peña MJ, Zhou G-K et al (2005) Arabidopsis fragile fiber 8, whichencodes a putative glucuronyltransferase, isessential for normal secondary wall synthesis. Plant Cell 17:3390–3408. https://doi.org/ 10.1105/tpc.105.035501 Zhong R, Lee C, Zhou J et al (2008) A baattery of transcription factors involved in the regulation of secondary cell wall biosynthesis in Arabidopsis. Plant Cell 20:2763–2782. https://doi.org/10.1105/tpc.108. 061325 Zhong R, Cui D, Ye Z-H (2017) Regiospecific acetylation of xylan is mediated by a group of DUF231containing O-acetyltransferases. Plant Cell Physiol 58:2126–2138. https://doi.org/10.1093/pcp/pcx147 Zhou G-K, Kubo M, Zhong R et al (2007) Overexpression of miR165 affects apical meristem formation, organ polarity establishment and vascular development in Arabidopsis. Plant Cell Physiol 48:391–404. https://doi.org/10.1093/pcp/pcm008 Zhu C, Dixit R (2012) Functions of the Arabidopsis kinesin superfamily of microtubule-based motor proteins. Protoplasma 249:887–899. https://doi.org/10. 1007/s00709-011-0343-9 Zhu C, Ganguly A, Baskin TI et al (2015) The fragile fiber1 kinesin contributes to cortical microtubulemediated trafficking of cell wall components. Plant Physiol 167:780–792. https://doi.org/10.1104/pp.114. 251462

9

Metabolomics and Transcriptomics-Based Tools for Linseed Improvement Ashok Somalraju and Bourlaye Fofana

9.1

Introduction

Metabolomics is the profiling and quantitation of metabolites or small molecules of 50–2000 Da commonly found as substrates and products of the cellular metabolism (Johnson et al. 2016; Misra 2018). Metabolites regulate cellular functions and homeostasis and are components of the tissue and organ building blocks (Morello et al. 2020; Campbell et al. 2020). In the continuum of genomics, transcriptomics, and proteomics, metabolomics involves a wide range of technical, methodological, and data analysis approaches for uncovering small molecules referred to as metabolites (Schrimpe-Rutledge et al. 2016). In plants, myriad of metabolites are produced in response to biotic and abiotic stresses for ensuring not only the plant’s survival in the environment, but are also a source of valuable carbohydrates for human and animal nutrition (Forbey et al. 2009; Caretto et al. 2015; van Vliet et al. 2021). Cultivated flax (Linum usitatissimum L. ssp usitatissimum) is an annual self-pollinated crop belonging to the Linaceae family. As a multi-purpose crop, flax is grown for its fibres (fibre flax [Linum usitatissimum L. convar.

A. Somalraju  B. Fofana (&) Charlottetown Research and Development Centre, Agriculture and Agri-Food Canada, 440 University Avenue, Charlottetown, PE C1A 4N6, Canada e-mail: [email protected]

elongatum (Vav. and Ell.)]), or its oil and other valuable seed by-products (linseed or oilseed flax [convar. mediterraneum (Vav. and Ell.) Kulpa and Danert]) since ancient times (Zohary 1999; Fofana et al. 2010a). Thus, it has been granted the ‘generally regarded as safe’ (GRAS) status by US-FDA (2009). Although extensive literature exists on the flax health-promoting virtues (Cunnane et al. 1993; Ogborn et al. 2002; Patel et al. 2012; Goyal et al. 2014; Keykhasalar et al. 2020) and on each major metabolite component such as lipids (Goyal et al. 2014; Sharif et al. 2017), proteins (Barvkar et al. 2012; Mattila et al. 2018), carbohydrates (Mattila et al. 2018; Ndou et al. 2018), and lignans (Westcott and Muir 1996; Goyal et al. 2014; Patel et al. 2012; Fofana et al. 2017a), no systematic review has yet addressed them all in the same study, nor exists an update on the methods and tools used for generating their metabolite profiles. Moreover, the carbon and metabolic flux between pathways leading to the production and regulation of these metabolites in planta is still not fully reviewed. A thorough review of the major flax metabolites, advances in their analysis, and the interplay of metabolic flux between the pathways involved in their production and regulation will provide more insight into the development of strategies for tailoring the next generation of flax’s metabolome that meets the emerging trends of societal needs. This chapter will review and discuss the current knowledge of flax lipids, proteins, starch,

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_9

199

200

A. Somalraju and B. Fofana

and lignans, while not dismissing the importance of other metabolite classes. The tools used to determine the flax metabolite composition and the current knowledge on the genetic control, regulation, biosynthesis, and interplay between metabolic pathways will also be discussed.

9.2

Flax Metabolites and Metabolomics

Each plant species is defined by its genetics, making it phenotypically different from others, as reflected by its metabolome (Harrigan et al. 2007; Ellis et al. 2018; Pontarin et al. 2020). Flax is not an exception, and different flax cultivars or breeding lines display a wide range of primary and secondary metabolite diversity (Goyal et al. 2014; Hamade et al. 2021).

9.2.1 Primary Metabolites Primary metabolites are derived from the primary metabolism including photosynthesis, nucleotides, fatty acids, and protein biosynthesis, and are essential for the plant’s survival and reproduction (Scheible et al. 2000; Wang and Wang 2011; Stitt 2013; Pontarin et al. 2020). Like most higher plants, flax primary metabolites include lipids, proteins starch, cellulose, and lignin, among others.

9.2.1.1 Lipids Lipids are a diverse group of organic compounds including fatty acids (FAs) and their derivatives, cholesterol and its derivatives, and lipoproteins; all of which are reported to be biologically, biosynthetically, and functionally related to lipids and are insoluble in water, but soluble in organic solvents (Thole and Nielsen 2008; Fofana et al. 2010a; Thompson 2020; https://www.britannica. com/science/lipid). Lipids are a major form of carbon storage in seeds and cell membranes, and the flax lipid classes have already been reviewed (Fofana et al. 2010a). Lipids are found in flax stems in the form of wax and cutin (Morrison and

Akin 2001; Guttierrez and Del Rio 2003), leaves in the form of neutral lipids, galactolipids, phospholipids, and complex oxylipins (Chechetkin et al. 2008, 2009), flowers where palmitic, stearic, oleic, linoleic, and alpha linolenic acids (ALA) are found in the ovary at 22.5, 3.4, 3.3, 35.8, and 35.0% of the total FA composition, respectively (Fofana et al. 2004), and seeds where ALA accounts for 50–60% (Green 1986a; Siemens and Daun 2005; Fofana et al. 2006; El-Beltagi et al. 2007; Sebei et al. 2007; Barvkar et al. 2012; Sharif et al. 2017). Flax seed flour accounts for 9% lipids (Trevisan and Arêas 2012), and the seed storage lipids are of particular importance since they constitute the primary source of edible flax oils, the quality of which is highly associated with the fatty acid composition (Green 1986a; Fofana et al. 2010a). These storage lipids are accumulated as tri-acyl glycerols (TAGs) in lipid bodies (Sorensen et al. 2005). In plants, it has been shown that the more a FA is produced, the more it is incorporated into TAGs and found in the oils (Sorensen et al. 2005). Hence, oil seed designer flax varieties are developed to meet such requirements (Green and Marshall 1981; Green 1986b; Ntiamoah et al. 1995).

9.2.1.2 Proteins Flax seed protein content ranges from 20 to 30% of dry seed weight (Daun et al. 2003; Kaushik et al. 2016; Mattila et al. 2018; Trevisan and Arêas 2012), of which 23% represent storage proteins including 2S albumin (conlinin), 11S globulin (legumin, glutenin type A, and cupin), and 7S globulin, highlighting a diversity of the flax seed storage proteins (Barvkar et al. 2012). The amino acid profile of flax is excellent, and its essential amino acid content is above the recommended level 36 g/16 g N in egg (Mattila et al. 2018). From their proteomics studies, Barvkar et al. (2012) identified a total of 1716 proteins in the flax seed. This protein profile along with the flaxseed essential amino acid content (Giacomino et al. 2013; Ndou et al. 2018) highlights the high nutritional value of flaxseed in addition to its oil content (Kaushik et al. 2016).

Metabolomics and Transcriptomics-Based Tools …

201

9.2.1.3 Starch Compared with Rapeseed, Quinoa, buckwheat, and faba bean, flaxseed meal has a lower carbohydrate content, but a higher energy content per 100 g DW (Mattila et al. 2018). It has a low starch content (Ndou et al. 2018), but is rich in non-digestible fibres (Trevisan and Arêas, 2012; Ndou et al. 2018). Hence, the oilseed meal energy values of flax have usually been attributed to residual oil content (Ndou et al. 2018). Flaxseed meals are thus incorporated into starch-rich meals such as corn and soybean for feed and food formulations (Trevisan and Arêas, 2012; Giacomino et al. 2013; Ndou et al. 2018).

contrast, a 40% reduction of the lignin to cellulose ratio was reported in the CAD27 transgenic flax plants, but it resulted in higher plant susceptibility to Fusarium oxysporum fungal disease (Wróbel-Kwiatkowska et al. 2007).

9

9.2.1.4 Cellulose and Lignin The cell walls of plant tissue are mainly made of cellulose, hemicellulose, pectic compounds, lignin, and dietary and non-dietary fibres (Augustin et al. 2020). Flax fibre is characterized as lignocellulosic because it contains about 70% cellulose, *10% hemicellulose, pectin, and 3–5% lignin, with lignin ensuring the rigidity and the mechanical resistance to the fibre and the plant (Wróbel-Kwiatkowska et al. 2007; Preisner et al. 2014; Morin et al. 2020). In contrast, cotton fibre is characterized as non-lignocellulosic, with no lignin, and is less biologically active than flax fibre. Indeed, flax fibre is characterized by its highest liquid absorption capacity among natural fibres (Preisner et al. 2014). The biological activities of fungal and bacterial enzymes on the flax stems during the retting process cause an active degradation and lead to clean fibres in a marketable form (Wróbel-Kwiatkowska et al. 2007). The presence of lignin with cellulose in the flax fibre impacts negatively on its elastic properties compared to the cotton fibre. Hence, attempts are made to produce low-lignin flax plants with modified elastic properties of the fibre (Wróbel-Kwiatkowska et al. 2007; Chantreau et al. 2013). Traditionally, cellulose accounts for 60–70% of the flax fibre weight (WróbelKwiatkowska et al. 2007), and transgenic studies using CAD gene silencing to reduce lignin content in the flax fibre did not show any significant changes to the cellulose content and to its chemical and physical properties (Table 9.1). In

9.2.2 Secondary Metabolites Flax plants produce myriad of secondary metabolites including kaempferol, quercetin, anthocyanin, flavanone, coumaric acid, ferulic acid, caffeic acid, herbacetin, secoisolariciresinol diglucoside, and cyanogenic glucoside (Wanasundara et al. 1993; Fofana et al. 2010b, 2017a). This review will only discuss lignans and cyanogenetic glucosides further, considering their high and direct importance to the human and animal health.

9.2.2.1 Lignans Lignans are a class of diphenolic nonsteroidal phytoestrogens often found glycosylated in planta and are purported with a wide variety of health benefits (Dixon 2004; Thompson et al. 2006; Arroo et al. 2009; Prakash and Gupta 2011; Wang et al. 2015). Flax seeds are a rich source of secoisolariciresinol diglucoside (SDG) lignans (Westcott and Muir 1996; Patel et al. 2012; Goyal et al. 2014; Fang et al. 2016). In flax seed coat, SDG lignan units are usually found ester-linked with herbacetin diglucoside (HDG) by 3-hydroxy-3-methylglutaric acid (HMGA), whereas p-coumaric acid glucoside (CouAG), ferrulic acid glucoside (FeAG), and caffeic acid glucoside units are directly esterlinked to the SDG within an oligomeric matrix called the lignan macromolecule (Struijs et al. 2009; Kosińska et al. 2011; Dalisay et al. 2015; Ramsay et al. 2017; Fofana et al. 2017a; Thiombiano et al. 2020; Hamade et al. 2021). SDG contributes 62% (w/w) to the lignan macromolecule while CouAG, FeAG, and HDG contribute 12.2, 9.0, and 5.7%, respectively (Struijs et al. 2009). The highest amount of SDG is found in mature seed, and variations in SDG content are observed between varieties (Fig. 9.1). An interesting study by Dalisay et al. (2015)

202 Table 9.1 Comparison of cellulose and lignin content in wild type and transgenic CAD gene silenced flax plants

A. Somalraju and B. Fofana Flax lines

Cellulose

Lignin

Wild type

0.2

0.26

References Wróbel-Kwiatkowska et al. (2007)

CAD27

0.2

0.16

Wróbel-Kwiatkowska et al. (2007)

CAD33

0.21

0.21

Wróbel-Kwiatkowska et al. (2007)

Fig. 9.1 Variation of flax SDG lignan content during seed development in two flax cultivars AC McDuff and SP2047. Lignans from developing seeds [0–60 days after anthesis (DAA)] were extracted for UPLC-MS analysis. a UPLC Chromatogram showing steady SDG increase

from 0 days after anthesis to maturity in AC McDuff; b comparative progression of SDG lignan production in flax cultivars AC McDuff and SP2047 during seed development

described the dirigent protein-mediated lignan formation in flax seed using MALDI Mass Spectrometry Imaging and found that SDG and SDG-HMG are mainly detectable at a relatively early flax boll developmental stage, 6 and 7 DAF, and provided insight into a varying distribution of SDG in the seed coat region, which seems to corroborate the data shown in Fig. 9.1. As embedded within the lignan macromolecule, SDG moieties are released by alkaline hydrolysis in vitro (Fofana et al. 2017a) and by colon digestion and microbial metabolism (Clavel et al. 2005, 2006a, b), whereas fragmented macromolecule polyesters of different sizes and masses are released by partial hydrolysates (Thiombiano et al. 2020). After the release of SDG moieties, SDG is deglucosylated into

SECO, which is further demethylated into enterodiol (END) and enterolactone (ENL) enterolignans, the two most bioactive lignan forms in the body (Patel et al. 2012; Landete et al. 2016). Along with SDG, other minor lignan forms including pinoresinol, lariciresinol, isolariciresinol, and matairesinol are found in the lignan macromolecule hydrolysates (Ramsay et al. 2017). Because of its pharmacological activities, SDG lignans have been the focus of much more attention (Patel et al. 2012; Fabian et al. 2020).

9.2.2.2 Cyanogenic Glucoside Cyanogenic glucosides (CGs) are widely distributed in the plant kingdom and are found in more than 2600 plant species, including flax (Oomah et al. 1992; Wanasundara et al. 1993;

Metabolomics and Transcriptomics-Based Tools …

203

Kobaisy et al. 1996; Cressey and Reeve 2019; Zuk et al. 2020). A relatively high concentration of CGs is found in certain grasses, pulses, root crops, and fruit kernels. Some cassava varieties are reported to produce more than 1 g HCN/kg of fresh tissue (Haque and Bradbury 2002). In flax, the major CGs are cyanogenic monoglucosides linamarin and lotaustralin, and cyanogenic diglucosides linustatin and neolinustatin (Dalisay et al. 2015). The monoglucoside forms are found in higher amounts during vegetative growth in vegetative tissue, developing bolls and seeds, whereas the diglucosides are higher in mature seeds (Zuk et al. 2020), with linustatin being predominant in the seed. The levels of linustatin and neolinustatin in dry flaxseed decrease drastically during germination and the vegetative stages of plant development. In contrast, the levels of cyanogenic monoglucosides linamarin and lotaustralin increase steadily in a bi-phasing manner during seedling growth and flowering stages (Zuk et al. 2020). Hence, CGs are produced and accumulated in different flax organs including stems, leaves, root, flower, and seed (Frehner et al. 1990; Niedzwiedz-Siegień 1998). Using a high resolution MALDI-MS imaging, Dalisay et al. (2015) showed that cyanogenic monoglucosides are detected throughout the flax bolls from 0 DAA until 7 DAA, especially in tissues such as ovary, seed coat, and embryo tissues, but not at later stages from 10 to 12 DAA. Linamarin content was found to be higher than that of lotaustralin at these early boll developmental stages. Whereas linustatin and neolinustatin could be seen in the embryo and endosperm at these early developmental stages, they were not detected in the seed coat and ovary (Dalisay et al. 2015). These authors were able to detect linustatin from 0 to 12 DAA with increasing levels as maturation progresses, but neolinustatin was only detected in mature flax tissues after 7 DAA. These MALDI-MS imaging findings are in agreement with previous reports that CG contents vary with age and developmental stage (Frehner et al. 1990; NiedzwiedzSiegień 1998). Our own CG analysis in 16 DAA flax bolls dehydrated for 30 min showed 58.25 ± 2.55, 2.93 ± 0.11, 1.08 ± 0.03, and

0.17 ± 0.01 µmol/g of linamarin, lotaustralin, linustatin, and neolinustatin, respectively, for 62.44 µmol total CGs/g. This corresponds to a total of 1688 ± 69 mg HCN /kg of 16 DAA flax bolls (Fofana lab, unpublished data).

9

9.2.3 Metabolomics Tools In the continuum of genomics and transcriptomics, metabolomics is closer to the phenotype in living organisms (Fig. 9.2) and, hence, is a powerful tool for hypothesis-driven data generation towards a better understanding of biological and physiological mechanisms (Misra 2018). Two broad methods are used in metabolomics: targeted and untargeted metabolomics (Schrimpe-Rutledge et al. 2016; Thiombiano et al. 2020) to study chemical variations in a biological sample by means of analytical platforms such as liquid chromatography (LC), gas chromatography (GC), mass spectrometry (MS), nuclear magnetic resonance (NMR), and many more (Brigante et al. 2020).

9.2.3.1 Targeted Metabolomics In targeted metabolomics, which is the most frequently used, a group of defined metabolites is detected and absolutely quantified using MS/MS correlation to known references (SchrimpeRutledge et al. 2016). As such, most flax metabolites including FAs, CGS, lignans, starch, proteins, cellulose, lignin, phenolic acids, and flavonoid polyphenolics have been extracted and quantified using various analytical platforms (Frehner et al. 1990; Oomah et al. 1992; Wanasundara et al. 1993; Kobaisy et al. 1996; Westcott and Muir 1996; Niedzwiedz-Siegień 1998; Daun et al. 2003; Fofana et al. 2004, 2017a; Sorensen et al. 2005; Wróbel-Kwiatkowska et al. 2007; Chechetkin et al. 2008, 2009; Struijs et al. 2009; Kosińska et al. 2011; Trevisan and Arêas 2012; Barvkar et al. 2012; Patel et al. 2012; Giacomino et al. 2013; Goyal et al. 2014; Preisner et al. 2014; Dalisay et al. 2015; Fang et al. 2016; Kaushik et al. 2016; Ramsay et al. 2017; Mattila et al. 2018; Ndou et al. 2018; Cressey and Reeve 2019; Morin et al. 2020;

204

A. Somalraju and B. Fofana

Fig. 9.2 From phenotype to phenotype through metabolomics and transcriptomics: variations in metabolomic and transcriptomic tools for revealing the metabolite and transcript diversity in flax plant tissue

Thiombiano et al. 2020; Zuk et al. 2020). In a recent study, targeted metabolomics was performed in chia (Salvia hispanica L.), sesame (Sesamum indicum L.), and flax to assess the authenticity of bakery products (Brigante et al. 2020). By analysing polyphenols using HPLC– DAD-ESI-qTOF (MS/MS) LC System (Agilent, Santa Clara, CA, USA) coupled to a DAD detector in tandem with an ESI source, connected to a Micro-QTOF II (Bruker Daltonics, Billerica, MA, USA) mass spectrometer, the authors identified 44 polyphenols in the three studied seeds, and a chemometrics analysis allowed them to select 12 of the 44 compounds, which were defined as novel markers for seed authentication. These 12 compounds allowed the differentiation among the seeds and bakery products, with hydroxycinnamic derivatives and lignans being the groups enabling the most such discrimination (Brigante et al. 2020).

9.2.3.2 Untargeted Metabolomics Contrary to the targeted metabolomics, untargeted metabolomics looks to qualitatively determine all measurable metabolites in a sample (Misra 2018; Ellis et al. 2018). The idea behind

untargeted mass spectrometry (MS)-based metabolomics analysis is that a large list of ‘identified’ small molecules can be mapped to networks and pathways using MS/MS correlations to database or libraries (Schrimpe-Rutledge et al. 2016). But in most cases, a high confidence assignment or identification of analytes is not usually achieved due to the fundamental challenges during the metabolomic identification processes. Indeed, whereas features of many metabolites can be assigned to a large number of tentative or preliminary structures, there may not be candidate matches for many others in the curated databases which are often incomplete (Schrimpe-Rutledge et al. 2016). Nonetheless, untargeted metabolomics allows the discovery and annotation of novel metabolites for further study and characterization by targeted metabolomics. Hence, one major advantage of untargeted metabolomics is the collection of data without pre-existing knowledge (SchrimpeRutledge et al. 2016; Ellis et al. 2018). In this context, Hamade et al. (2021) recently performed a comparative untargeted metabolomics in a wild type and lignan-deficient flax root, stem, and leaf using 1 H-NMR and LC-M and identified

Metabolomics and Transcriptomics-Based Tools …

205

common and specific metabolites produced in response to osmotic stress induced by PEG-6000. The authors found significant differences in metabolite pools between the two flax types, both in terms of content and nature of the metabolites for each analytical tool used. The differences included both the primary and secondary metabolites in roots (22 metabolites), stems (32 metabolites), and leaves (33 metabolites). Noticeably, there was an increase in the accumulation of pinoresinol mono- and diglucosides, lariciresinol mono- and diglucosides, and secoisolariciresinol monoglucoside in the PLR1 RNAi mutant line but not the wild type (Hamade et al. 2021). Likewise, Pontarin et al. (2020) performed untargeted metabolomics in flax, unravelled the metabolic relationships within and between flax leaves, and established agedependent metabolic profiles at different developmental stages. What are then the platforms and tools used in metabolomics?

non-polar metabolites (Marquet 2012; SchrimpeRutledge et al. 2016; Di Ottavio et al. 2020). Thus, HPLC–MS/MS has been recently used to assess the authenticity of bakery products containing chia, sesame, and flax seeds using targeted metabolomics (Brigante et al. 2020). Similarly, LC–MS untargeted metabolomics was used to characterize flax seed coat lignan macromolecules, allowing the structural elucidation of 120 distinct oligo-esters (Thiombiano et al. 2020). Recently, LC–MS/MS imaging is emerging as a new trend in direct detection, profiling, quantitation, and visualization of metabolites from microdissected biological samples (Oya et al. 2018). Whereas GC–MS and LC–MS can be used as standalone in some studies, it is common to see the use of both methods in the same study. As such, Pontarin et al. (2020) studied polar and specialized metabolites using GC–MS and LC–MS, respectively, to decipher the age-dependent metabolic profiles and metabolic relationships within and between flax leaves. MS data acquisition, metabolite identification, and quantification rely heavily on sophisticated software and curated metabolite databases. For details on the database interrogation for metabolite identification and annotation, type of databases and libraries, and software and bioinformatics tools and platforms used for identification and annotation, the readers can refer to previous reports (Schrimpe-Rutledge et al. 2016; Misra 2018; Di Ottavio et al. 2020).

9

9.2.3.3 Analytical Chromatography Platforms and Tools High-performance liquid chromatography (HPLC) and gas chromatography (GC) have been extensively used for a very long time in analytical chemistry (Xu et al. 2019). Traditionally, gas chromatography (GC) has been extensively used in targeted metabolomics to detect and quantify a wide range of plant metabolites (Fait et al. 2006), including flax metabolites (Bacala and Barthet 2007; Radovanovic et al. 2014; Bénard et al. 2018; Mierziak et al. 2020). When associated with mass spectrometry (MS) as a GC–MS system, the duo is considered as a gold standard of analytical platforms used to separate and identify metabolites in gaseous forms in samples (Fait et al. 2006; del Rio et al. 2011; Marquet 2012) for its simplicity, cost, and maintenance. With the advent of liquid chromatographic (LC) methods such as HPLC–MS and UPLC-MS, any compounds that are soluble in liquid phase can be separated and identified, thus allowing an analysis of thermally labile compounds. Hence, the use of LC–MS methods has become popular due to its speed turnout, sensitivity, and specificity for both polar and

9.2.3.4 Spectroscopic Tools Nuclear Magnetic Resonance (NMR) Spectroscopy NMR is an analytical tool that often does not require sample preprocessing or any chromatographic separation techniques, and provides reliable data complementary to the MS methods based on NMR spectra of specific metabolites and peak intensities that are proportional to their amounts (Nagana Gowda and Raftery 2021). Because of its simple sample preparation and robustness in quantitation, NMR-based metabolomics delivers quality metabolomics data that allows for the identification of more accurate and

206

unbiased changes in the contents of small molecules (Ellis et al. 2018; Misra 2018). Using NMR, it is possible to identify changes in the polar metabolome with or without alkaline treatment, with no discrimination on molecules having absorbance properties such as organic acids, known to participate in the lignan macromolecular complex for ester linkage (Ramsay et al. 2017). As such, these authors studied the kinetics in the incorporation of the main phenolic compounds encountered in the lignan macromolecule during flaxseed development. They showed an agreement between the NMR and HPLC–UV analyses in that (+)-SDG, the main lignan in flaxseed, accumulated in a free form at S0 to S4 stages (Ramsay et al. 2017). Furthermore, NMR and LC–MS-based metabolomics were used by Hamade et al. (2021) to study osmotic stress in lignan-deficient flax and identified the common and specific metabolic responses in plants under osmotic stress conditions induced by polyethylene glycol. Currently, new tools, databases, resources, and applications continue to emerge every year. However, it is a tedious task to evaluate and use each of them before making a recommendation and verdict on their usefulness, ease, or difficulty of usage. Thus, it is for the community to analyse and decide on effectiveness in the years to come. It also remains a continued challenge and endeavour for the community to focus on informing, encouraging, and educating the upcoming generation of researchers into metabolomics (Misra 2018). Synchrotron Light Source Spectroscopy Light source synchrotron is an accelerated electron-generated spectroscopy tool that can measure the spectra of transition elements in biological samples, such as using X-ray absorption near-edge structure (XANES) spectroscopy at a specific elemental level (Veronesi et al. 2013). It can also be used to explore the molecular structure and chemical mapping in samples. Using synchrotron-based vibrational molecular spectroscopy of TT8 and HB12 gene silenced in alfalfa leaves, Lei et al. (2020) showed that silencing both the TT8 and HB12 genes in alfalfa

A. Somalraju and B. Fofana

increased the carbonyl CO (CCO) CCO profiles in the silenced leaves compared with the wild type. This method is applicable to imaging, quantification and analysis of infected plant tissue, plant phenotyping, and microstructure analysis and chemical imaging of agri-products contaminated with toxic microelements such as cadmium and arsenic (Agriculture—Canadian Light Source).

9.2.4 Metabolite Biosynthesis The biosynthesis of flax FA, lignan, and CGs as important economic primary and secondary metabolites will be specifically discussed in the current review.

9.2.4.1 Fatty Acids The fatty acid biosynthesis genes KAS, KCS, SAD, and FAD have been reported and characterized (Singh et al. 1994; Jain et al. 1999; Fofana et al. 2004, 2006; Vrinten et al. 2005; You et al. 2014), and the molecular mechanism of FA biosynthesis during flax seed development has been studied (Fofana et al. 2004, 2006; Acket et al. 2019; Zhang et al. 2020). In a recent study, You et al. (2014) further identified 91 FA-related genes from the genome of flax cv. CDC Bethune, from which seven genes were previously cloned and validated. The newly identified 84 FArelated genes included 14 novel genes from the KAS family, two from the SAD family, 13 from the FAD2 family, three from the FAD3 family, 38 from the KCS family, and 14 from the FAT family (You et al. 2014). Moreover, the sequential acylation of FAs to tri-acyl glycerols (TAGs) and incorporation in lipid bodies have been investigated (Sorensen et al. 2005; Krist et al. 2006), and key acyl-CoA:diacylglycerol acyltransferases (DGATs) and phospholipid:diacylglycerol acyltransferases (PDATs) enzymes, DGAT1, PDAT1, and PDAT2, have been identified as having significant effects on flax seed oil phenotype (Pan et al. 2013, 2015a, b; Wickramarathna et al. 2015). Beside the FA biosynthesis genes, miRNAs, including LusmiRNA156a, were reported to be involved in

Metabolomics and Transcriptomics-Based Tools …

207

the regulation of FA biosynthesis, seed oil content, and in delaying flowering in flax (Zhang et al. 2020). Overexpression of Lus-miRNA156a appeared to repress the expression of FAE1, FAD2, and FAD3 as well as that of the squamosa promoter binding protein-like (SPL) genes SPL6 and SPL9, both being putative targets for LusmiRNA156a. The roles for SPL genes in plant morphogenesis and seed development have been reviewed in other plants such as Arabidopsis (Chuck et al. 2007; Wu and Poethig 2006).

oxidoreduced into cyanohydrins and unstable cyanogens that are glucosylated by a glucosyltransferase into the cyanogenic monoglucosides linamarin and lotaustralin. These monoglucosides are then glucosylated into the diglucosides linustatin and neolinustatin (Cutler et al. 1985; Zuk et al. 2020). Whereas many studies have been undertaken to elucidate the mechanism of CGs biosynthesis in flax (Butler and Conn 1964; Fan and Conn 1985; Cutler 1985), it is only recently that a cyanogenic glucosyltransferase, LuCGT1, has been identified and characterized in flax (Kazachkov et al. 2020). The enzyme was found to be active in both aliphatic and aromatic substrates by these authors (Kazachkov et al. 2020). The findings by Kazachkov et al. (2020) shed light into the nature of the UGT that glucosylates aceto cyanohydrin into the mono- and diglucoside forms of cyanogenic glucosides and open the door into the genetic editing of flax for a total utilization.

9

9.2.4.2 Flax Lignans The first step of the flax lignan biosynthesis consists of the coupling of two oxidized molecules of coniferyl alcohol by the dirigent proteins into pinoresinol (Beejmohun et al. 2007). Then, pinoresinol undergoes through sequential reductions by pinoresinol-lariciresinol reductases (PLRs) (Hano et al. 2006; Renouard et al. 2014; Markulin et al. 2019a; Xiao et al. 2021) to form secoisolariciresinol (SECO), an unstable lignan form that is further sequentially glucosylated into SECO monoglucoside (SMG) and SECO diglucoside (SDG) by UGT74S1 (Ghose et al. 2014, 2015; Fofana et al. 2017a, b), SDG being stored within a macromolecule (Ramsay et al. 2017). Moreover, the role of WRKY36 transcription factor activating the SDG lignan biosynthesis has been demonstrated (Markulin et al. 2019b). Hence, the production of different forms of lignans has been undertaken in vitro and in vivo (Ghose et al. 2015; Cong et al. 2015; Fofana et al. 2017a; Corbin et al. 2017; Nadeem et al. 2018; Anjum et al. 2020). 9.2.4.3 Cyanogenic Glucosides Cyanogenic glucoside biosynthesis is initiated by the protein amino acids l-Val, l-Ile, l-Leu, l-Phe, or l-Tyr and from the nonprotein amino acid cyclopentenyl-Gly (Koch et al. 1995; Zuk et al. 2020). The two major cyanogenic monoglucosides linamarin and lotaustralin are derived from L-Valine and L-isoleucine, respectively, through an N-hydroxylation leading to 2-methylylpropanal-oxime and 2-metylbutanal-oxime that are dehydrated into 2-methylpropionitrile and 2methylbutylnitrile. These nitriles are further

9.2.4.4 Interplay Between the Primary and Secondary Metabolite Pathways Leading to FA, Lignans, and CGs The metabolic pathways that involve FA, lignans, and CG biosynthesis are somewhat interconnected, either as substrates, intermediates, or products from a particular pathway, but reactants in other pathways. Indeed, an interconnection exists between all three metabolites through the glucose catabolism pathways (Caretto et al. 2015). In fact, glucose metabolism leads to the shikimate and the phenylpropanoid pathways through glycolysis on one hand, while pyruvate metabolism leads to amino acids and FA biosynthesis through acetyl-CoA and the TCA cycle on the other hand (Acket et al. 2019). Valine and isoleucine produced from pyruvate are direct precursors of CGs, linamarin, and lotaustralin, while successive condensations of acetyl-CoA by FA condensing enzymes KCS and KAS produce FAs. The aromatic amino acids produced from the shikimate pathway such as phenylalanine lead to coumaroyl CoA, the key precursor for lignin and lignan biosynthesis (Hamade et al. 2021). A full understanding of

208

A. Somalraju and B. Fofana

each key pathway is highly important to understand the concomitant up-regulation of lignan and FA biosynthesis pathways while downregulating the cyanogenic glucosides formation.

9.3

Transcriptomics and Pathway Regulation

Transcriptomics is the study of the entire RNA transcripts expressed in a cell, tissue, or organism at a given time (Milward et al. 2016). The whole set of transcripts is known as transcriptome.

9.3.1 Transcriptomic Resources and Tools 9.3.1.1 Genomic Resources The early 2000s has seen the construction of cDNA libraries and the release of massive expressed sequence tags (ESTs) from almost all cultivated crops in genebanks (Fofana et al. 2004; Day et al. 2005; Roach and Deyholos 2007). In flax, from nil EST in the NCBI genebank in year 2000, 1299 ESTs were reported in 2002 (Fofana et al. 2004), 12,605 ESTs as of September 24, 2010, 261,272 additional ESTs in 2011 (Venglat et al. 2011), and 286,900 ESTs as of June 25, 2014. As of October 8, 2021, the number of flax ESTs in the NCBI genebank seems stable at 288,919, mainly due to the shift to the nextgeneration sequencing technologies. These ESTs were used in full length gene cloning and characterization (Ghose et al. 2014; Kazachkov et al. 2020), tissue-specific gene expression (Venglat et al. 2011; Ghose et al. 2014), and for assessing the accuracy of the flax whole genome de novo assembly (Wang et al. 2012). 9.3.1.2 Transcriptomic Platforms With the event of large EST sets, cDNA microarray gene expression system has emerged as a powerful platform for understanding the biological systems, allowing the arraying of thousands of ESTs (Roberts 2008). While microarray gene expression platform proved to be powerful in capturing most gene

transcriptional activity in a sample (Wang et al. 2014), the extent of transcriptome is limited to the nature and number of spotted features (Russo et al. 2003). Moreover, depending on the quality of pre- and post-array possessing, the false positive rate for low expressed genes has always been a concern despite the use of powerful statistical and bioinformatics packages (Yatskou et al. 2008; Raman et al. 2009; Nachappa et al. 2020; Vahlensieck et al. 2021). Recently with next-generation sequencing, RNAseq has emerged as a more sensitive, powerful transcriptomic tool (Vahlensieck et al. 2021), for not requiring a prior knowledge of targets in any organism. Like microarray, RNAseq uses strong bioinformatics packages for detecting differentially expressed genes, and high agreement and correlations have been found between the two platforms for genes with above median gene expression levels (Chen et al. 2017; Vahlensieck et al. 2021). One of the major advantages of RNAseq over microarray is its likelihood for genes or transcript discovery.

9.3.2 Flax Transcriptomic A better understanding of the biological mechanisms of how the flax plant copes with its surrounding environment and how the plant fulfils its biological functions for higher seed productivity and quality is a perpetual quest for plant biologists. Flax has a shallow root system and is sensitive to water stress (drought and flooding). Whereas most flax varieties are moderately tolerant and prone to drought stress (Dash et al. 2017), almost all current flax cultivars are poorly tolerant to excess water and flooding. To better understand how flax plants coordinate the expression of drought resistance genes in response to drought stress, Wang et al. (2021) performed a transcriptome analysis in two contrasting linseed flax varieties for drought tolerance under repeated drought conditions using single-molecule real time (SMRT) PacBio long read sequencing at the genome-wide level. The authors identified 1190 transcription factors that were responsive to drought stress, and proline

9

Metabolomics and Transcriptomics-Based Tools …

biosynthesis was found to interact with the drought stress responsive genes through the RAD5-interacting protein 1 (RIN1) (Wang et al. 2021). Similarly, flax gene expression responses to toxic compounds (Ali et al. 2021) and to imbalanced phosphorus nutrition stress (Dmitriev et al. 2016) have been reported. In particular, Dmitriev et al. (2016) showed that JAZ family proteins as well as the WRKY transcription factor family are involved in flax’s response to deficient and excessive nutrition in flax. Taken together, differentially expressed genes and upregulated pathways in response to these diverse stresses were highlighted and discussed. As for the abiotic stresses, transcriptomic profiling was studied in susceptible and resistant flax seedlings after infection with Fusarium oxysporum lini. Differentially expressed genes were identified in 13 groups of gene, including calcium signalling, chitinase, ethylene-responsive transcription factors, ethylene biosynthesis, glutaredoxin, glutathione cycle, beta-1–3-glucanase, jasmonic acid synthesis, jasmonic acid-responsive transcription factors, kinase, phosphatase, pathogen inducible salicylic acid glucosyltransferase, transcription factors, thioredoxins, and WRKY (Boba et al. 2021). Besides the abiotic and biotic stresses, transcriptional mechanisms for cell wall and stem development (Zhang and Deyholos 2016; Gorshkov et al. 2017; Galinousky et al. 2020; Tombuloglu 2020; Petrova et al. 2021; Yuan et al. 2021) as well as seed development and diverse plant characteristics (Xie et al. 2019; Dmitriev et al. 2020) were investigated. During the seed development and fatty acid metabolism in high and low oil flax lines, 11,802 differentially expressed genes were identified in combination with GWAS studies, and 10 candidate genes for fatty acid metabolism were reported (Xie et al. 2019). Previously, using proteome profiling of developing flax seed, Barvkar et al. (2012) reported 1716 proteins, of which 19% were involved in primary metabolism, 14% found as storage protein, and 10% for energy. Taken together, flax transcriptome, like other plant species, is regulated by environmental cues as well as by developmental stages to reflect the displayed phenotype.

209

9.3.3 From Genome to Transcriptome to Metabolome to Phenotype As depicted in Fig. 9.2, the genome commands the metabolomes and phenotypes under the influence of the surrounding environment. As such, by altering the genome under a specific environment, different metabolomes and phenotypes are expected. Hence, using chemically induced mutations in the SDG lignan glucosylation gene UGT74S1 of flax CDC Bethune, Fofana et al. (2017a) reported stable new flax lines with altered SDG profiles. Similarly, flax plants with altered lignin in cell walls by EMS mutagenesis have been reported (Chantreau et al. 2013). Virus-induced gene silencing has also been used to significantly alter the seed anthocyanins production (Hano et al. 2020), whereas PLR-RNAi transgenic flax lines and wild types grown under the same conditions showed very different metabolite pools in the roots, stems, and leaves (Hamade et al. 2021). These examples illustrate how a better understanding of the flax genome, its transcriptomes, and metabolic pathways can help biologists to tailor flax for multiple uses.

9.4

Concluding Remarks

Flax crop improvement has so far relied on traditional morphological phenotyping for selecting the most promising lines in a given environment. With the current trends in phenomics, in addition to the metabolomics tools described above, new tools, databases, bioinformatic resources, and applications continue to emerge for transcriptomics and metabolomics. Yet, the task remains to evaluate the pertinence of each approach in the context of its efficiency, high throughput, and precision. In the context of current climate change, close connections between transcriptomic, metabolomics, phenomics, and prediction will ensure consistent and predictable phenotyping of flax crop under changing climate. Knowledge of key pathways and the factors for their regulation should continue to be

210

investigated. This will help in re-designing the flax genome to better adapt to the new climate challenges. Acknowledgements The authors thank Dr. Kaushik Ghose and Mr. David Main for their useful proofreading of this chapter. Many thanks to André Gionet for querying and assembling the literature for review.

References Acket S, Degournay A, Rossez Y, Mottelet S, Villon P, Troncoso-Ponce A, Thomasset B (2019) 13Cmetabolic flux analysis in developing Flax (Linum usitatissimum L.) embryos to understand storage lipid biosynthesis. Metabolites 10(1):14. https://doi.org/10. 3390/metabo10010014 Ali E, Saand MA, Khan AR, Shah JM, Feng S, Ming C, Sun P (2021) Genome-wide identification and expression analysis of detoxification efflux carriers (DTX) genes family under abiotic stresses in flax. Physiol Plant 171(4):483–501. https://doi.org/10.1111/ppl.13105 Anjum S, Komal A, Drouet S, Kausar H, Hano C, Abbasi BH (2020) Feasible production of lignans and neolignans in root-derived in vitro cultures of flax (Linum usitatissimum L.). Plants (Basel) 9(4):409. https://doi.org/10.3390/plants9040409 Arroo RRJ, Androutsopoulos V, Beresford K, Ruparelia K, Surichan S, Wilsher N, Potter GA (2009) Phytoestrogens as natural prodrugs in cancer prevention: dietary flavonoids. Phytochem Rev 8:375–386. https://doi.org/10.1007/s11101-009-9128-6 Augustin LSA, Aas AM, Astrup A, Atkinson FS, BaerSinnott S, Barclay AW, Brand-Miller JC, Brighenti F, Bullo M, Buyken AE, Ceriello A, Ellis PR, Ha MA, Henry JC, Kendall CWC, La Vecchia C, Liu S, Livesey G, Poli A, Salas-Salvadó J, Riccardi G, Riserus U, Rizkalla SW, Sievenpiper JL, Trichopoulou A, Usic K, Wolever TMS, Willett WC, Jenkins DJA (2020) Dietary fibre consensus from the International Carbohydrate Quality Consortium (ICQC). Nutrients 12(9):2553. https://doi.org/10. 3390/nu12092553 Bacala R, Barthet V (2007) Development of extraction and gas chromatography analytical methodology for cyanogenic glycosides in flaxseed (Linum usitatissimum). J AOAC Int 90(1):153–161 Barvkar VT, Pardeshi VC, Kale SM, Kadoo NY, Giri AP, Gupta VS (2012) Proteome profiling of flax (Linum usitatissimum) seed: characterization of functional metabolic pathways operating during seed development. J Proteome Res 11:6264–6276. https://doi.org/ 10.1021/pr300984r Beejmohun V, Fliniaux O, Hano C, Pilard S, Grand E, Lesur D, Cailleu D, Lamblin F, Lainé E, Kovensky J, Fliniaux MA, Mesnard F (2007) Coniferin dimerisation in lignan biosynthesis in flax cells.

A. Somalraju and B. Fofana Phytochemistry 68(22–24):2744–2752. https://doi. org/10.1016/j.phytochem.2007.09.016 Bénard C Acket S, Rossez Y, Frenandez O, berton T, Gibon Y, CAbasson C (2018) Untargeted analysis of semipolar compounds by LC-MS and targeted analysis of fatty acids by GC-MS/GC-FID: from plant cultivation to extract preparation. In: António C (eds) Plant metabolomics. Methods in molecular biology, vol 1778. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7819-9_8 Boba A, Kostyn K, Kozak B, Zalewski I, Szopa J, Kulma A (2021) Transcriptomic profiling of susceptible and resistant flax seedlings after Fusarium oxysporum lini infection. PLoS ONE 16(1):e0246052. https://doi. org/10.1371/journal.pone.0246052 Brigante FI, Lucini Mas A, Pigni NB, Wunderlin DA, Baroni MV (2020) Targeted metabolomics to assess the authenticity of bakery products containing chia, sesame and flax seeds. Food Chem 312:126059. https://doi.org/10.1016/j.foodchem.2019.126059 Butler GW, Conn EE (1964) biosynthesis of the cyanogenic glucosides linamarin and lotaustralin. I. Labeling studies in vivo with Linum usitatissimum. J Biol Chem 239:1674–1679 Campbell K, Westholm J, Kasvandik S, Bartolomeo FD, Mormino M, Nielsen J (2020) Building blocks are synthesized on demand during the yeast cell cycle. PNAS 117:7575–7583. https://doi.org/10.1073/pnas. 1919535117 Caretto S, Linsalata V, Colella G, Mita G, Lattanzio V (2015) Carbon fluxes between primary metabolism and phenolic pathway in plant tissues under stress. Int J Mol Sci 16:26378–26394. https://doi.org/10. 3390/ijms161125967 Chantreau M, Grec S, Gutierrez L, Dalmais M, Pineau C, Demailly H, Paysant-Leroux C, Tavernier R, Trouvé JP, Chatterjee M, Guillot X, Brunaud V, Chabbert B, van Wuytswinkel O, Bendahmane A, Thomasset B, Hawkins S (2013) PT-Flax (phenotyping and TILLinG of flax): development of a flax (Linum usitatissimum L.) mutant population and TILLinG platform for forward and reverse genetics. BMC Plant Biol 13:159. https://doi.org/10.1186/1471-2229-13-159 Chechetkin IR, Blufard A, Hamberg M, Grechkin A (2008) A lipoxygenase-divinyl ether synthase pathway in flax (Linum usitatissimum L.) leaves. Phytochemistry 69:2008–2015 Chechetkin IR, Mukhitova FK, Blufard ASB, Yarin AY, Antsygina LL, Grechkin AN (2009) Unprecedented pathogen-inducible complex oxylipins from flaxlinolipins A and B. FEBS J 276:4463–4472 Chen L, Sun F, Yang X, Jin Y, Shi M, Wang L, Shi Y, Zhan C, Wang Q (2017) Correlation between RNASeq and microarrays results using TCGA data. Gene 628:200–204. https://doi.org/10.1016/j.gene.2017.07. 056 Chuck G, Cigan AM, Saeteurn K, Hake S (2007) The heterochronic maize mutant corngrass1 results from overexpression of a tandem microRNA. Nat Genet 39:544–549. https://doi.org/10.1038/ng2001

Metabolomics and Transcriptomics-Based Tools …

211

Clavel T, Henderson G, Alpert C-A, Philippe C, Rigottier-Goi L, Doré J, Blaut M (2005) Intestinal bacterial communities that produce active estrogenlike compounds enterodiol and enterolactone in humans. Appl Environ Microbiol 71:6077–6085 Clavel T, Borrmann D, Braune A, Doré J, Blaut M (2006a) Occurrence and activity of human intestinal bacteria involved in the conversion of dietary lignans. Anaerobe 12:140–147 Clavel T, Henderson G, Engst W, Doré J, Blaut M (2006b) Phylogeny of human intestinal bacteria that activate the dietary lignan secoisolariciresinol diglucoside. FEMS Microbiol 55:471–478 Cong LH, Dauwe R, Lequart M, Vinchon S, Renouard S, Fliniaux O, Colas C, Corbin C, Doussot J, Hano C, Lamblin F, Molinié R, Pilard S, Jullian N, Boitel M, Gontier E, Mesnard F, Laberche JC (2015) Kinetics of glucosylated and non-glucosylated aryltetralin lignans in Linum hairy root cultures. Phytochemistry 115:70– 78. https://doi.org/10.1016/j.phytochem.2015.01.001 Corbin C, Drouet S, Mateljak I, Markulin L, Decourtil C, Renouard S, Lopez T, Doussot J, Lamblin F, Auguin D, Lainé E, Fuss E, Hano C (2017) Functional characterization of the pinoresinol-lariciresinol reductase-2 gene reveals its roles in yatein biosynthesis and flax defense response. Planta 246(3):405–420. https://doi.org/10.1007/s00425-017-2701-0 Cressey P, Reeve J (2019) Metabolism of cyanogenic glycosides: a review. Food Chem Toxicol 125:225– 232. https://doi.org/10.1016/j.fct.2019.01.002 Cunnane SC, Ganguli S, Menard C, Liede AC, Hamadeh MJ, Chen ZY, Wolever TM, Jenkins DJ (1993) High alpha-linolenic acid flaxseed (Linum usitatissimum): some nutritional properties in humans. Br J Nutr 69:443–453 Cutler AJ, Sternberg M, Conn EE (1985) Properties of a microsomal enzyme system from Linum usitatissimum (linen flax) which oxidizes valine to acetone cyanohydrin and isoleucine to 2-methylbutanone cyanohydrin. Arch Biochem Biophys 238(1):272–279. https://doi. org/10.1016/0003-9861(85)90165-1 Dalisay DS, Kim KW, Lee C, Yang H, Rübel O, Bowen BP, Davin LB, Lewis NG (2015) Dirigent protein-mediated lignan and cyanogenic glucoside formation in flax seed: integrated omics and MALDI mass spectrometry imaging. J Nat Prod 78:1231– 1242. https://doi.org/10.1021/acs.jnatprod.5b00023 Dash PK, Rai R, Mahato AK, Gaikwad K, Singh NK (2017) Transcriptome landscape at different developmental stages of a drought tolerant cultivar of flax (Linum usitatissimum). Front Chem 5:82. https://doi. org/10.3389/fchem.2017.00082 Daun JK, Barthet VJ, Chornick TL, Duguid S (2003) Flaxseed in human nutrition. In: Thompson L, Cunnane S (eds) Structure, composition and variety development of flaxseed, 2nd edn. AOCS Press, Champaign, IL, pp 1−40 Day A, Addi M, Kim W, David H, Bert F, Mesnage P, Rolando C, Chabbert B, Neutelings G, Hawkins S (2005) ESTs from the fibre-bearing stem tissues of flax

(Linum usitatissimum L.): expression analyses of sequences related to cell wall development. Plant Biol (Stuttg) 7(1):23–32. https://doi.org/10.1055/s-2004830462 del Río JC, Rencoret J, Gutiérrez A, Nieto L, JiménezBarbero J, Martínez ÁT (2011) Structural characterization of guaiacyl-rich lignins in flax (Linum usitatissimum) fibers and shives. J Agric Food Chem 59 (20):11088–11099. https://doi.org/10.1021/jf201222r Di Ottavio F, Gauglitz JM, Ernst M, Panitchpakdi MW, Fanti F, Compagnone D, Dorrestein PC, Sergi M (2020) A UHPLC-HRMS based metabolomics and chemoinformatics approach to chemically distinguish ‘super foods’ from a variety of plant-based foods. Food Chem 313:126071. https://doi.org/10.1016/j. foodchem.2019.126071 Dixon RA (2004) Phytoestrogens. Annu Rev Plant Biol 55:225–261. https://doi.org/10.1146/annurev.arplant. 55.031903.141729 Dmitriev AA, Kudryavtseva AV, Krasnov GS, Koroban NV, Speranskaya AS, Krinitsina AA, Belenikin MS, Snezhkina AV, Sadritdinova AF, Kishlyan NV, Rozhmina TA, Yurkevich OY, Muravenko OV, Bolsheva NL, Melnikova NV (2016) Gene expression profiling of flax (Linum usitatissimum L.) under edaphic stress. BMC Plant Biol 16(Suppl 3):237. https://doi.org/10.1186/s12870016-0927-9 Dmitriev AA, Novakovskiy RO, Pushkova EN, Rozhmina TA, Zhuchenko AA, Bolsheva NL, Beniaminov AD, Mitkevich VA, Povkhova LV, Dvorianinova EM, Snezhkina AV, Kudryavtseva AV, Krasnov GS, Melnikova NV (2020) Transcriptomes of different tissues of flax (Linum usitatissimum L.) cultivars with diverse characteristics. Front Genet 11:565146. https://doi.org/10.3389/fgene.2020. 565146 El-Beltagi HS, Salama ZA, El-Hariri DM (2007) Evaluation of fatty acids profile and the content of some secondary metabolites in seeds of different flax cultivars (Linum usitatissimum L.). Gen Appl Plant Physiol 33:187–202 Ellis N, Hattori C, Cheema J, Donarski J, Charlton A, Dickinson M, Venditti G, Kaló P, Szabó Z, Kiss GB, Domoney C (2018) NMR metabolomics defining genetic variation in pea seed metabolites. Front Plant Sci 9:1022. https://doi.org/10.3389/fpls.2018.01022 Fabian CJ, Khan SA, Garber JE, Dooley WC, Yee LD, Klemp JR, Nydegger JL, Powers KR, Kreutzjans AL, Zalles CM, Metheny T, Phillips TA, Hu J, Koestler DC, Chalise P, Yellapu NK, Jernigan C, Petroff BK, Hursting SD, Kimler BF (2020) Randomized phase IIB trial of the lignan secoisolariciresinol diglucoside in premenopausal women at increased risk for development of breast cancer. Cancer Prev Res (Phila) 13(7):623–634. https://doi.org/10.1158/19406207.CAPR-20-0050 Fait A, Angelovici R, Less H, Ohad I, UrbanczykWochniak E, Fernie AR, Galili G (2006) Arabidopsis seed development and germination is associated with

9

212 temporally distinct metabolic switches. Plant Physiol 142(3):839–854. https://doi.org/10.1104/pp.106.086 694 Fan TW, Conn EE (1985) Isolation and characterization of two cyanogenic beta-glucosidases from flax seeds. Arch Biochem Biophys 243(2):361–373. https://doi. org/10.1016/0003-9861(85)90513-2 Fang J, Ramsay A, Renouard S, Hano C, Lamblin F, Chabbert B, Mesnard F, Schneider B (2016) Laser microdissection and spatiotemporal pinoresinollariciresinol reductase gene expression assign the cell layer-specific accumulation of secoisolariciresinol diglucoside in flaxseed coats. Front Plant Sci 7:1743. https://doi.org/10.3389/fpls.2016.01743 Fofana B, Duguid S, Cloutier S (2004) Cloning of fatty acid biosynthetic genes ß-ketoacyl CoA synthase, fatty acid elongase, stearoyl-ACP desaturase, and fatty acid desaturase and analysis of expression in the early developmental stages of flax (Linum usitatissimum L.) seeds. Plant Sci 166:487–1496 Fofana B, Cloutier S, Duguid S, Ching J, Rampitsch C (2006) Gene expression of stearoyl-ACP desaturase and D12 fatty acid desaturase 2 is modulated during seed development of flax (Linum usitatissimum). Lipids 41:705–712 Fofana B, Ragupathy R, Cloutier S (2010a) Flax Lipids: lasses, biosynthesis, genetics and the promise of applied genomics for understanding and altering of fatty acids. In: Gilmore PL (ed) Lipids: categories, biological functions and metabolism, nutrition, and health. Nova Science Publishers, Inc., pp 71–98 Fofana B, Cloutier S, Kirby CW, McCallum J, Duguid S (2010b) A well balanced omega-6/omega-3 ratio in developing flax bolls after heating and its implications for use as a fresh vegetable by humans. Food Res Int 44:2459–2464 Fofana B, Ghose K, Somalraju A, McCallum J, Main D, Deyholos MK, Rowland GG, Cloutier S (2017a) Induced mutagenesis in UGT74S1 gene leads to stable new flax lines with altered secoisolariciresinol diglucoside (SDG) profiles. Front Plant Sci 8:1638. https:// doi.org/10.3389/fpls.2017.01638 Fofana B, Ghose K, McCallum J, You FM, Cloutier S (2017b) UGT74S1 is the key player in controlling secoisolariciresinol diglucoside (SDG) formation in flax. BMC Plant Biol 17(1):35. https://doi.org/10. 1186/s12870-017-0982-x Forbey JS, Harvey AL, Huffman MA, Provenza FD, Sullivan R, Tasdemir D (2009) Exploitation of secondary metabolites by animals: a response to homeostatic challenges. Integr Comp Biol 49:314– 328. https://doi.org/10.1093/icb/icp046 Frehner M, Scalet M, Conn EE (1990) Pattern of the cyanide-potential in developing fruits: implications for plants accumulating cyanogenic monoglucosides (Phaseolus lunatus) or cyanogenic diglucosides in their seeds (Linum usitatissimum, Prunus amygdalus). Plant Physiol 94:28–34. https://doi.org/10.1104/pp.94. 1.28

A. Somalraju and B. Fofana Galinousky D, Mokshina N, Padvitski T, Ageeva M, Bogdan V, Kilchevsky A, Gorshkova T (2020) The toolbox for fiber flax breeding: a pipeline from gene expression to fiber quality. Front Genet 11:589881. https://doi.org/10.3389/fgene.2020.589881 Ghose K, Selvaraj K, McCallum J, Kirby CW, SweeneyNixon M, Cloutier SJ, Deyholos M, Datla R, Fofana B (2014) Identification and functional characterization of a flax UDP-glycosyltransferase glucosylating secoisolariciresinol (SECO) into secoisolariciresinol monoglucoside (SMG) and diglucoside (SDG). BMC Plant Biol 14:82. https://doi.org/10.1186/1471-222914-82 Ghose K, McCallum J, Sweeney-Nixon M, Fofana B (2015) Histidine 352 (His352) and tryptophan 355 (Trp355) are essential for flax UGT74S1 glucosylation activity toward secoisolariciresinol. PLoS ONE 10(2): e116248. https://doi.org/10.1371/journal.pone.0116248 Giacomino S, Peñas E, Ferreyra V, Pellegrino N, Fournier M, Apro N, Carrión MO, Frias J (2013) Extruded flaxseed meal enhances the nutritional quality of cereal-based products. Plant Foods Hum Nutr 68(2):131–136. https://doi.org/10.1007/s11130013-0359-8 Gorshkov O, Mokshina N, Gorshkov V, Chemikosova S, Gogolev Y, Gorshkova T (2017) Transcriptome portrait of cellulose-enriched flax fibres at advanced stage of specialization. Plant Mol Biol 93:431–449. https://doi.org/10.1007/s11103-016-0571-7 Goyal A, Sharma V, Upadhyay N, Gill S, Sihag M (2014) Flax and flaxseed oil: an ancient medicine and modern functional food. J Food Sci Technol 51:1633–1653 Green AG (1986a) Genetic control of polyunsaturated fatty acid biosynthesis in flax (Linum usitatissimum) seed oil. Theor Appl Genet 72:654–661 Green AG (1986b) A mutant genotype of flax (Linum usitatissimum L.) containing very low levels of linolenic acid in its seed oil. Can J Plant Sci 66:499–503 Green AG, Marshall DR (1981) Variation for oil quantity and quality in linseed (Linum usitatissimum). Aust J Agric Res 32:599–607 Guttierrez A, Del Rio JC (2003) Lipids from flax fibers and fate in alkaline pulping. J Agric Food Chem 51:4965–4971 Hamade K, Fliniaux O, Fontaine JX, Molinié R, Otogo Nnang E, Bassard S, Guénin S, Gutierrez L, Lainé E, Hano C, Pilard S, Hijazi A, El Kak A, Mesnard F (2021) NMR and LC-MS-based metabolomics to study osmotic stress in lignan-deficient flax. Molecules 26(3):767. https://doi.org/10.3390/molecules26 030767 Hano C, Martin I, Fliniaux O, Legrand B, Gutierrez L, Arroo RR, Mesnard F, Lamblin F, Lainé E (2006) Pinoresinol-lariciresinol reductase gene expression and secoisolariciresinol diglucoside accumulation in developing flax (Linum usitatissimum) seeds. Planta 224(6):1291–1301. https://doi.org/10.1007/s00425-00 6-0308-y

Metabolomics and Transcriptomics-Based Tools …

213

Hano C, Drouet S, Lainé E (2020) Virus-induced gene silencing (VIGS) infFlax (Linum usitatissimum L.) seed coat: description of an effective procedure using the transparent testa 2 gene as a selectable marker. Methods Mol Biol 2172:233–242. https://doi.org/10. 1007/978-1-0716-0751-0_17 Haque MR, Bradbury JH (2002) Total cyanide determination of plants and foods using the picrate and acid hydrolysis methods. Food Chem 77:107–114. https:// doi.org/10.1016/S0308-8146(01)00313-2 Harrigan GG, Martino-Catt S, Glenn KC (2007) Metabolomics, metabolic diversity and genetic variation in crops. Metabolomics 3:259–272. https://doi.org/10. 1007/s11306-007-0076-0 Jain RK, Thompson RG, Taylor DC, MacKenzie SL, McHughen A, Rowland GG, Tenaschuk D, Coffey M (1999) Isolation and characterization of two promoters from linseed for genetic engineering. Crop Sci 39:1696–1701. https://doi.org/10.2135/cropsci1999. 3961696x Johnson CH, Ivanisevic J, Siuzdak G (2016) Metabolomics: beyond biomarkers and towards mechanisms. Nat Rev Mol Cell Biol 17(7):451–459. https://doi.org/ 10.1038/nrm.2016.25 Kaushik P, Dowling K, McKnight S, Barrow CJ, Wang B, Adhikari B (2016) Preparation, characterization and functional properties of flax seed protein isolate. Food Chem 197:212–220. https://doi.org/10. 1016/j.foodchem.2015.09.106 Kazachkov M, Li Q, Shen W, Wang L, Gao P, Xiang D, Datla R, Zou J (2020) Molecular identification and functional characterization of a cyanogenic glucosyltransferase from flax (Linum unsitatissimum). PLoS ONE 15(2):e0227840. https://doi.org/10.1371/ journal.pone.0227840 Keykhasalar R, Tabrizi MH, Ardalan P, Khatamian N (2020) The apoptotic, cytotoxic, and antiangiogenic impact of Linum usitatissimum seed essential oil nanoemulsions on the human ovarian cancer cell line A2780. Nutr Cancer 22:1–9. https://doi.org/10.1080/ 01635581.2020.1824001 Kobaisy M, Oomah D, Mazza G (1996) Determination of cyanogenic glycosides in flaxseed by barbituric acidpyridine, pyridine-pyrazolone, and highperformance liquid chromatography methods. J Agrl Food Chem 44:3178–3181 Koch BM, Sibbesen O, Halkier BA, Svendsen I, Moller BL (1995) The primary sequence of cytochrome P450tyr, the multifunctional N-hydroxylase catalyzing the conversion of L-tyrosine to phydroxyphenylacetaldehyde oxime in the biosynthesis of the cyanogenic glucoside dhurrin in Sorghum bicolor (L.) Moench. Arch Biochem Biophys 323:177–186. https://doi.org/10.1006/abbi.1995.0024 Kosińska A, Penkacik K, Wiczkowski W, Amarowicz R (2011) Presence of caffeic acid in flaxseed lignan macromolecule. Plant Foods Hum Nutr 66:270–274. https://doi.org/10.1007/s11130-011-0245-1 Krist S, Stuebiger G, Bail S, Unterweger H (2006) Analysis of volatile compounds and triacylglycerol

composition of fatty seed oil gained from flax and false flax. Eur J Lipid Sci Technol 108:48–60 Landete JM, Arqués J, Medina M, Gaya P, de Las Rivas B, Muñoz R (2016) Bioactivation of phytoestrogens: intestinal bacteria and health. Critical Rev Food Sci Nutr 56:1826–1843. https://doi.org/10.1080/ 10408398.2013.789823 Lei Y, Hannoufa A, Christensen D, Yu P (2020) Synchrotron-radiation sourced SR-IMS molecular spectroscopy to explore impact of silencing TT8 and HB12 genes in alfalfa leaves on the molecular structure and chemical mapping. Spectrochim Acta A Mol Biomol Spectrosc 243:118676. https:// doi.org/10.1016/j.saa.2020.118676 Markulin L, Corbin C, Renouard S, Drouet S, Gutierrez L, Mateljak I, Auguin D, Hano C, Fuss E, Lainé E (2019a) Pinoresinol-lariciresinol reductases, key to the lignan synthesis in plants. Planta 249(6):1695–1714. https://doi.org/10.1007/s00425-019-03137-y Markulin L, Corbin C, Renouard S, Drouet S, Durpoix C, Mathieu C, Lopez T, Auguin D, Hano C, Lainé É (2019b) Characterization of LuWRKY36, a flax transcription factor promoting secoisolariciresinol biosynthesis in response to Fusarium oxysporum elicitors in Linum usitatissimum L. hairy roots. Planta 250(1):347–366. https://doi.org/10.1007/s00425-01903172-9 Marquet P (2012) LC-MS vs. GC-MS, online extraction systems, advantages of technology for drug screening assays. Methods Mol Biol 902:15–27. https://doi.org/ 10.1007/978-1-61779-934-1_2 Mattila P, Mäkine S, Eurola M, Jalava T, Pihlava J-M, Hellström J, Pihlanto A (2018) Nutritional value of commercial protein-rich plant products. Plant Foods Hum Nutr 73:108–115. https://doi.org/10.1007/ s11130-018-0660-7 Mierziak J, Wojtasik W, Kulma A, Dziadas M, Kostyn K, Dymińska L, Hanuza J, Żuk M, Szopa J (2020) 3Hydroxybutyrate is active compound in flax that upregulates genes involved in DNA methylation. Int J Mol Sci 21(8):2887. https://doi.org/10.3390/ ijms21082887 Milward EA,Shahandeh A, Heidari M, Johnstone DM, Daneshi N, Hondermarck H (2016) Transcriptomics. In: Bradshaw RA, Stahl PD (eds) Encyclopedia of cell biology. Academic Press, pp 160–165. https://doi.org/ 10.1016/B978-0-12-394447-4.40029-5 Misra BB (2018) New tools and resources in metabolomics: 2016–2017. Electrophoresis 39(7):909–923. https://doi.org/10.1002/elps.201700441. Epub 25 Jan 2018 Morello L, Pydiura N, Galinousky D, Blume Y, Breviario D (2020) Flax tubulin and CesA superfamilies represent attractive and challenging targets for a variety of genome- and base-editing applications. Funct Integr Genomics 20(1):163–176. https://doi. org/10.1007/s10142-019-00667-2 Morin S, Lecart B, Istasse T, Bailly Maître Grand C, Meddeb-Mouelhi F, Beauregard M, Richel A (2020) Effect of a low melting temperature mixture on the

9

214 surface properties of lignocellulosic flax bast fibers. Int J Biol Macromol 148:851–856. https://doi.org/10. 1016/j.ijbiomac.2020.01.232 Morrison WH III, Akin DE (2001) Chemical composition of components comprising bast tissue in flax. J Agric Food Chem 49:2333–2338 Nachappa P, Challacombe J, Margolies DC, Nechols JR, Whitfield AE, Rotenberg D (2020) Tomato spotted wilt virus benefits its thrips vector by modulating metabolic and plant defense pathways in tomato. Front Plant Sci 18(11):575564. https://doi.org/10.3389/fpls. 2020.575564 Nadeem M, Ahmad W, Zahir A, Hano C, Abbasi BH (2018) Salicylic acid-enhanced biosynthesis of pharmacologically important lignans and neo lignans in cell suspension culture of Linum ussitatsimum L. Eng Life Sci 19(3):168–174. https://doi.org/10.1002/elsc. 201800095 Nagana Gowda GA, Raftery D (2021) NMR-Based metabolomics. In: Hu S (eds) Cancer metabolomics. Advances in experimental medicine and biology, vol 1280. Springer, Cham. https://doi.org/10.1007/978-3030-51652-9_2 Ndou SP, Kiarie E, Walsh MC, Nyachoti CM (2018) Nutritive value of flaxseed meal fed to growing pigs. Anim Feed Sci Technology 238:123–129. https://doi. org/10.1016/j.anifeedsci.2018.02.009 Niedzwiedz-Siegien I (1998) Cyanogenic glucosides in Linum usitatissimum. Phytochem 49:59–63 Ntiamoah C, Rowland GG, Taylor DC (1995) Inheritance of elevated palmitic acid in flax and its relationship to the low linolenic acid. Crop Sci 35:148–152 Ogborn MR, Nitschmann E, Bankovic-Calic N, Weiler HA, Aukema H (2002) Dietary flax oil reduces renal injury, oxidized LDL content, and tissue n-6/n-3 FA ration in experimental polycystic kidney disease. Lipids 37:1059–1065 Oomah BD, Mazza G, Kenaschuk EO (1992) Cyanogenic compounds in flaxseed. J Agr Food Chem 40:1346– 1348 Oya M, Suzuki H, Anas ARJ, Oishi K, Yamaguch S, Eguch M, Sawada M (2018) LC-MS/MS imaging with thermal film-based laser microdissection. Anal Bioanal Chem 410:491–499. https://doi.org/10.1007/ s00216-017-0739-2 Pan X, Siloto RM, Wickramarathna AD, Mietkiewska E, Weselake RJ (2013) Identification of a pair of phospholipid:diacylglycerol acyltransferases from developing flax (Linum usitatissimum L.) seed catalyzing the selective production of trilinolenin. J Biol Chem 288(33):24173–24188. https://doi.org/10.1074/ jbc.M113.475699 Pan X, Peng FY, Weselake RJ (2015a) Genome-wide analysis of phospholipid:diacylglycerol acyltransferase (PDAT) genes in plants reveals the eudicotwide PDAT gene expansion and altered selective pressures acting on the core eudicot PDAT paralogs. Plant Physiol 167(3):887–904. https://doi.org/10. 1104/pp.114.253658

A. Somalraju and B. Fofana Pan X, Chen G, Kazachkov M, Greer MS, Caldo KMP, Zou J, Weselake RJ (2015b) In vivo and in vitro evidence for biochemical coupling of reactions catalyzed by lysophosphatidylcholine acyltransferase and diacylglycerol acyltransferase. J Biol Chem 290(29):18068– 18078. https://doi.org/10.1074/jbc.M115.654798 Patel D, Vaghasiya J, Pancholi SS, Paul A (2012) Therapeutic potential of secoisolariciresinol diglucoside: a plant lignan. Inter J Pharm Sci Drug Res 4 (1):15–18 Petrova N, Nazipova A, Gorshkov O, Mokshina N, Patova O, Gorshkova T (2021) Gene expression patterns for proteins with lectin domains in flax stem tissues are related to deposition of distinct cell wall types. Front Plant Sci 12:634594. https://doi.org/10. 3389/fpls.2021.634594 Pontarin N, Molinié R, Mathiron D, Tchoumtchoua J, Bassard S, Gagneul D, Thiombiano B, Demailly H, Fontaine JX, Guillot X, Sarazin V, Quéro A, Mesnard F (2020) Age-dependent metabolic profiles unravel the metabolic relationships within and between flax leaves (Linum usitatissimum). Metabolites 10:218. https://doi.org/10.3390/metabo10060218 Prakash D, Gupta C (2011) Role of phytoestrogens as nutraceuticals in human health. Pharmacol Online 1:510–523 Preisner M, Kulma A, Zebrowski J, Dymińska L, Hanuza J, Arendt M, Starzycki M, Szopa J (2014) Manipulating cinnamyl alcohol dehydrogenase (CAD) expression in flax affects fibre composition and properties. BMC Plant Biol 14:50. https://doi.org/ 10.1186/1471-2229-14-50 Radovanovic N, Thambugala D, Duguid S, Loewen E, Cloutier S (2014) Functional characterization of flax fatty acid desaturase FAD2 and FAD3 isoforms expressed in yeast reveals a broad diversity in activity. Mol Biotechnol 56(7):609–620. https://doi.org/10. 1007/s12033-014-9737-1 Raman T, O’Connor TP, Hackett NR, Wang W, Harvey BG, Attiyeh MA, Dang DT, Teater M, Crystal RG (2009) Quality control in microarray assessment of gene expression in human airway epithelium. BMC Genomics 10:493. https://doi.org/10.1186/1471-216410-493 Ramsay A, Fliniaux O, Quéro A, Molinié R, Demailly H, Hano C, Paetz C, Roscher A, Grand E, Kovensky J, Schneider B, Mesnard F (2017) Kinetics of the incorporation of the main phenolic compounds into the lignan macromolecule during flaxseed development. Food Chem 15(217):1–8. https://doi.org/10. 1016/j.foodchem.2016.08.039 Renouard S, Tribalatc MA, Lamblin F, Mongelard G, Fliniaux O, Corbin C, Marosevic D, Pilard S, Demailly H, Gutierrez L, Hano C, Mesnard F, Lainé E (2014) RNAi-mediated pinoresinol lariciresinol reductase gene silencing in flax (Linum usitatissimum L.) seed coat: consequences on lignans and neolignans accumulation. J Plant Physiol. 171(15):1372–1377. https://doi.org/10.1016/j.jplph.2014.06.005

Metabolomics and Transcriptomics-Based Tools …

215

Roach MJ, Deyholos MK (2007) Microarray analysis of flax (Linum usitatissimum L.) stems identifies transcripts enriched in fibre-bearing phloem tissues. Mol Genet Genomics 278(2):149–165. https://doi.org/10. 1007/s00438-007-0241-1 Roberts PC (2008) Gene expression microarray data analysis demystified. Biotechnol Annu Rev 14:29–61. https://doi.org/10.1016/S1387-2656(08)00002-1 Russo G, Zegar C, Giordano A (2003) Advantages and limitations of microarray technology in human cancer. Oncogene 22:6497–6507. https://doi.org/10.1038/sj. onc.1206865 Scheible W-R, Krapp A, Stitt M (2000) Reciprocal diurnal changes of phosphoenolpyruvate carboxylase expression and cytosolic pyruvate kinase, citrate synthase and NADP-isocitrate dehydrogenase expression regulate organic acid metabolism during nitrate assimilation in tobacco leaves. Plant Cell Env 23:1155–1167. https://doi.org/10.1046/j.1365-3040. 2000.00634.x Schrimpe-Rutledge AC, Codreanu SG, Sherrod SD, McLean JA (2016) Untargeted metabolomics strategies-challenges and emerging directions. J Am Soc Mass Spectrom 27:1897–1905. https://doi.org/10. 1007/s13361-016-1469-y Sebei K, Debez A, Herchi W, Boukhchina S, Kallel H (2007) Germination kinetics and seed reserve mobilization in two flax (Linum usitatissimum L.) cultivars under moderate salt stress. J Plant Biol 50:447–454 Sharif HR, Williams PA, Sharif MK, Khan MA, Majeed H, Safdar W, Shamoon M, Shoaib M, Haider J Zhong F (2017) Influence of OSA-starch on the physico chemical characteristics of flax seed oil-eugenol nanoemulsions, Food Hydrocolloids 66:365–377 Siemens BJ, Daun JK (2005) Determination of the fatty acid composition of canola, flax, and solin by nearinfrared spectroscopy. J Am Oil Chem Soc 82:153– 157 Singh S, McKinney S, Green A (1994) Sequence of a cDNA from Linum usitatissimum encoding the stearoyl-ACP carrier protein desaturase. Plant Physiol 140:1075 Sorensen BM, Furukawa-Stoffer TL, Marshall KS, Page EK, Mir Z, Forster RJ, Weselake RJ (2005) Storage lipid accumulation and acyltransferase action in developing flaxseed. Lipids 40:1043–1049 Stitt M (2013) Progress in understanding and engineering primary plant metabolism. Curr Opinion Biotech 24:229–238. https://doi.org/10.1016/j.copbio.2012. 11.002 Struijs K, Vincken JP, Doeswijk TG, Voragen AG, Gruppen H (2009) The chain length of lignan macromolecule from flaxseed hulls is determined by the incorporation of coumaric acid glucosides and ferulic acid glucosides. Phytochemistry 70:262–269. https://doi.org/10.1016/j.phytochem.2008.12.015 Thiombiano B, Gontier E, Molinié R, Marcelo P, Mesnard F, Dauwe R (2020) An untargeted liquid chromatography-mass spectrometry-based workflow

for the structural characterization of plant polyesters. Plant J 102:1323–1339. https://doi.org/10.1111/tpj. 14686 Thole JM, Nielsen E (2008) Phosphoinositides in plants: novel functions in membrane trafficking. Curr Opin Plant Biol 11:620–631 Thompson TE (2020) “Lipid”. Encyclopedia Britannica, 21 Feb 2020. https://www.britannica.com/science/ lipid. Accessed 21 Aug 2021 Thompson LU, Boucher BA, Liu Z, Cotterchio M, Kreiger N (2006) Phytoestrogen content of foods consumed in Canada, including isoflavones, lignans, and coumestan. Nutr Cancer 54:184–201. https://doi. org/10.1207/s15327914nc5402_5 Tombuloglu H (2020) Genome-wide identification and expression analysis of R2R3, 3R- and 4R-MYB transcription factors during lignin biosynthesis in flax (Linum usitatissimum). Genomics 112(1):782–795. https://doi.org/10.1016/j.ygeno.2019.05.017 Trevisan AJB, Arêas JAG (2012) Development of corn and flaxseed snacks with high-fibre content using response surface methodology (RSM). Inter J Food Sci Nutr 63:362–367. https://doi.org/10.3109/ 09637486.2011.629179 US-FDA (2009) High linolenic acid flaxseed oil-GRN no. 256. https://www.accessdata.fda.gov/scripts/fdcc/? set=GRASNotices&id=256. Accessed 2 Aug 2021 Vahlensieck C, Thiel CS, Adelmann J, Lauber BA, Polzer J, Ullrich O (2021) Rapid transient transcriptional adaptation to hypergravity in jurkat T cells revealed by comparative analysis of microarray and RNA-Seq data. Int J Mol Sci 22(16):8451. https://doi. org/10.3390/ijms22168451 van Vliet S, Bain JR, Muehlbauer MJ, Provenza FD, Kronberg SL, Pieper CF, Huffman KM (2021) A metabolomics comparison of plant-based meat and grass-fed meat indicates large nutritional differences despite comparable nutrition facts panels. Sci Rep 11:13828. https://doi.org/10.1038/s41598-021-931003 Venglat P, Xiang D, Qiu S, Stone SL, Tibiche C, Cram D, Alting-Mees M, Nowak J, Cloutier S, Deyholos M, Bekkaoui F, Sharpe A, Wang E, Rowland G, Selvaraj G, Datla R (2011) Gene expression analysis of flax seed development. BMC Plant Biol 11:74. https:// doi.org/10.1186/1471-2229-11-74 Veronesi G, Koudouna E, Cotte M, Martin FL, Quantock AJ (2013) X-ray absorption near-edge structure (XANES) spectroscopy identifies differential sulfur speciation in corneal tissue. Anal Bioanal Chem 405 (21):6613–6620. https://doi.org/10.1007/s00216-0137120-x Vrinten P, Hu Z, Munchinsky MA, Rowland G, Qiu X (2005) Two FAD3 desaturase genes control the level of linolenic acid in flax seed. Plant Physiol 139:79–87 Wanasundara PKJPD, Amarowicz R, Kara MT, Shahidi F (1993) Removal of cyanogenic glucoside from flassed meal. Food Chem 48:263–266 Wang Z, Wang T (2011) Dynamic proteomic analysis reveals diurnal homeostasis of key pathways in rice

9

216 leaves. Proteomics 11:225–238. https://doi.org/10. 1002/pmic.201000065 Wang Z, Hobson N, Galindo L, Zhu S, Shi D, McDill J, Yang L, Hawkins S, Neutelings G, Datla R, Lambert G, Galbraith DW, Grassa CJ, Geraldes A, Cronk QC, Cullis C, Dash PK, Kumar PA, Cloutier S, Sharpe AG, Wong GK, Wang J, Deyholos MK (2012) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72(3):461–473. https://doi.org/ 10.1111/j.1365-313X.2012.05093.x Wang C, Gong B, Bushel PR, Thierry-Mieg J, ThierryMieg D, Xu J, Fang H, Hong H, Shen J, Su Z, Meehan J, Li X, Yang L, Li H, Łabaj PP, Kreil DP, Megherbi D, Gaj S, Caiment F, van Delft J, Kleinjans J, Scherer A, Devanarayan V, Wang J, Yang Y, Qian HR, Lancashire LJ, Bessarabova M, Nikolsky Y, Furlanello C, Chierici M, Albanese D, Jurman G, Riccadonna S, Filosi M, Visintainer R, Zhang KK, Li J, Hsieh JH, Svoboda DL, Fuscoe JC, Deng Y, Shi L, Paules RS, Auerbach SS, Tong W (2014) The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol 32(9):926–932. https://doi.org/ 10.1038/nbt.3001 Wang Y, Fofana B, Roy M, Ghose K, Yao X-H, Nixon M-S, Nair S, Nyomba GBL (2015) Flaxseed lignan secoisolariciresinol diglucoside improves insulin sensitivity through upregulation of GLUT4 expression in diet-induced obese mice. J Funct Foods 18:1–9 Wang W, Wang L, Wang L, Tan M, Ogutu CO, Yin Z, Zhou J, Wang J, Wang L, Yan X (2021) Transcriptome analysis and molecular mechanism of linseed (Linum usitatissimum L.) drought tolerance under repeated drought using single-molecule long-read sequencing. BMC Genomics 22(1):109. https://doi. org/10.1186/s12864-021-07416-5 Westcott ND, Muir AD (1996) Variation in the concentration of the flax seed lignan concentration with variety, location and year. In: Proceedings of the 56th flax institute of the United States conference. Flax Institute of the United States, Fargo, ND, pp 77–80 Wickramarathna AD, Siloto RM, Mietkiewska E, Singer SD, Pan X, Weselake RJ (2015) Heterologous expression of flax phospholipid:diacylglycerol cholinephosphotransferase (PDCT) increases polyunsaturated fatty acid content in yeast and Arabidopsis seeds. BMC Biotechnol 15:63. https://doi.org/10. 1186/s12896-015-0156-6 Wróbel-Kwiatkowska M, Starzycki M, Zebrowski J, Oszmiański J, Szopa J (2007) Lignin deficiency in transgenic flax resulted in plants with improved mechanical properties. J Biotechnol 128(4):919–934. https://doi.org/10.1016/j.jbiotec.2006.12.030 Wu G, Poethig RS (2006) Temporal regulation of shoot development in Arabidopsis thaliana by miR156 and

A. Somalraju and B. Fofana its target SPL3. Development 133:3539–3547. https:// doi.org/10.1242/dev.02521 Xiao Y, Shao K, Zhou J, Wang L, Ma X, Wu D, Yang Y, Chen J, Feng J, Qiu S, Lv Z, Zhang L, Zhang P, Chen W (2021) Structure-based engineering of substrate specificity for pinoresinol-lariciresinol reductases. Nat Commun 12(1):2828. https://doi.org/10. 1038/s41467-021-23095-y Xie D, Dai Z, Yang Z, Tang Q, Deng C, Xu Y, Wang J, Chen J, Zhao D, Zhang S, Zhang S, Su J (2019) Combined genome-wide association analysis and transcriptome sequencing to identify candidate genes for flax seed fatty acid metabolism. Plant Sci 286:98– 107. https://doi.org/10.1016/j.plantsci.2019.06.004 Xu C, Wang W, Wang B, Zhang T, Cui X, Pu Y, Li N (2019) Analytical methods and biological activities of Panax notoginseng saponins: recent trends. J Ethnopharmacol 236:443–465. https://doi.org/10. 1016/j.jep.2019.02.035 Yatskou M, Novikov E, Vetter G, Muller A, Barillot E, Vallar L, Friederich E (2008) Advanced spot quality analysis in two-colour microarray experiments. BMC Res Notes 1:80. https://doi.org/10.1186/1756-0500-180 You FM, Li P, Kumar S, Ragupathy R, Li Z, Fu Y-Bi, Cloutier S (2014) Genome-wide identification and characterization of the gene families controlling fatty acid biosynthesis in flax (Linum usitatissimum L). J Proteomics Bioinform 7:310–326. https://doi.org/10. 4172/jpb.1000334 Yuan H, Guo W, Zhao L, Yu Y, Chen S, Tao L, Cheng L, Kang Q, Song X, Wu J, Yao Y, Huang W, Wu Y, Liu Y, Yang X, Wu G (2021) Genome-wide identification and expression analysis of the WRKY transcription factor family in flax (Linum usitatissimum L.). BMC Genomics 22(1):375. https://doi.org/10. 1186/s12864-021-07697-w Zhang N, Deyholos MK (2016) RNASeq analysis of the shoot apex of flax (Linum usitatissimum) to identify phloem fiber specification genes. Front Plant Sci 7:950. https://doi.org/10.3389/fpls.2016.00950 Zhang T, Li Z, Song X, Han L, Wang L, Zhang J, Long Y, Pei X (2020) Identification and characterization of microRNAs in the developing seed of linseed flax (Linum usitatissimum L.). Int J Mol Sci 21 (8):2708. https://doi.org/10.3390/ijms21082708 Zohary D (1999) Monophyletic vs. polyphyletic origin of the crops on which agriculture was founded in the Near East. Genet Res Crop Evol 46:133–142 Zuk M, Pelc K, Szperlik J, Sawula A, Szopa J (2020) Metabolism of the cyanogenic glucosides in developing flax: metabolic analysis, and expression pattern of genes. Metabolites 10(7):288. https://doi.org/10.3390/ metabo10070288

Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax

10

Pingchuan Li and Frank M. You

10.1

Introduction

Plant resistance inheritance studies date back to more than 100 years ago, which resulted in many important hypotheses promoting plant pathology studies. In 1894, Eriksson, a Swedish plant pathologist, showed that different cereal rust fungus races could only be distinguished by their pathogenicity on cereal hosts rather than their morphological differences (Eriksson and Henning 1894). In 1915, Stakman called the plant resistance reactions as hypersensitive responses while studying the cereal rusts (Stakman 1915). In 1946, Flor proposed the most famous genefor-gene hypothesis that for each host’s resistance-controlled gene, there is a corresponding gene, controlling avirulence in the pathogens and vice versa. A direct interaction between the resistance gene and the avirulence gene was observed between tomato Pto and P. syringae avrPto in the yeast two-hybrid system (Scofield et al. 1996). In 1963, Van der Plank postulated that all disease resistance in plants could be classified into two categories (Plank 1963): (1) vertical resistance, which is controlled by a few major resistance genes that

P. Li  F. M. You (&) Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada e-mail: [email protected]

are strongly against one or a few specific races of pathogens, and (2) horizontal resistance, which is determined by multiple minor-effect genes, but effective against multiple races of a pathogen species. In 2006, Jones and Dangl proposed a coevolutionary model of plant–pathogen interaction, called the “zigzag” model, to reveal how the plant immune system works, from which two branches were involved. The first branch is called pattern-triggered immunity (PTI), caused by the pathogen-associated or microbe-associated molecular patterns (PAMPs/MAMPs) that are usually transmembrane-located components. The second branch is called effector-triggered immunity (ETI), which suppresses PTI (Jones and Dangl 2006), usually within the cell. Though the zigzag plant immune system has its limitations, this model is extremely used and considered as an accurate approximation to the evolution order of components that build the plant immune systems. For example, the framework of this model is not quantitative or predictive for the study of plant–microbe interaction (Pritchard and Birch 2014). Generally, plants monitor pathogen infections by recruiting a sophisticated biochemical immune system from extracellular and intracellular compartments. The gene resistance located on the plasma membrane is conferred by cell surface pattern-recognition receptors (PRRs) that can recognize general elicitors, including PAMP/MAMPs. However, the initial immune system can be defeated by specific pathogen

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_10

217

218

P. Li and F. M. You

effectors (Gohre et al. 2008; Macho and Zipfel 2015). Plants have also evolved other types of sensors called resistance (R) proteins that recognize enzymatic modification to plant components as well as specific effectors secreted by the invading pathogen within the cell, which further trigger a cascade ETI (Jones and Dangl 2006). To date, over 300 R genes against different fungal or bacterial races have been cloned from multiple species, including Arabidopsis, tomato, tobacco, flax, rice, maize, wheat, barley, and others (Sekhwal et al. 2015; Kourelis and van der Hoorn 2018). Almost all cloned R genes were found to function in either the intracellular or extracellular manner to interact with elicitor or effector for the activation of hypersensitive response. These pioneer studies provided breeders with important information to improve the resistance of the target species with broadspectrum resistance through introgression. As revealed by earlier studies, some of the resistances are contributed by multiple minor genes or a few major genes. By pyramiding these genes into one cultivar via breeding programs, it would be effective to achieve comprehensively enhanced resistance. This approach has been successfully applied to rice (Pradhan et al. 2015; Ramalingam et al. 2020), soybean (Meinhardt et al. 2021), barley (Sharma Poudel et al. 2018), wheat (Sharma et al. 2021), and maize (Zhu et al. 2018). At the same time, a large-scale prediction of resistance gene analogs through the known structures of these R genes has become an effective strategy to accelerate the identification of resistance genes and their applications in crop breeding programs. In this chapter, we summarize the major categories of resistance genes and introduce several computational resistance gene prediction pipelines. We also present the profiles of RGAs found in the flax genome.

10.2

Classification of Plant Resistance Gene Analogs

Like the resistance R gene candidates, plant resistance gene analogs (RGAs) are featured for possessing similar conserved domains and motifs

to R genes that play specific roles in interactions with pathogen avirulence genes. These structural domains/motifs in RGAs allow their identification through a computational approach. To date, RGAs comprise two groups of PRRs and R genes (Sekhwal et al. 2015). The majority of the characterized PRRs are either surface-localized receptor-like protein kinases (RLKs) or membrane-associated receptor-like proteins (RLPs) (Monaghan and Zipfel 2012; Böhm et al. 2014; Zipfel 2014). RLKs, such as FLS2 (Gomez-Gomez and Boller 2000), EFR (Zipfel et al. 2006), and XA21 (Song et al. 1995), possess an extracellular sensing domain, a transmembrane (TM) region, and an intracellular protein kinase. They all belong to the leucinerich repeat (LRR) containing RLKs and CERK1 (lysine motif (LysM)-type) (Shimizu et al. 2010). The amino acid substitution in an LRR experiment indicated that the LRR domain has an impact on the ligand binding or resistance triggering (Kale et al. 2013). RLPs have similar domain architecture to RLKs except for the absence of a kinase domain in their intracellular region (Fritz-Laylin et al. 2005), such as Cf-9 (LRR-type) (Jones et al. 1994), Eix1 and Eix2 (LRR-type) (Ron and Avni 2004), and CEBiP (LysM-type) (Kaku et al. 2006). Most sensors playing roles in the ETI usually consist of hundreds of nucleotide-binding and leucine-rich repeat receptors (NLR). The early identified R genes, including RPM1, RPS2, RPS5, and Prf, are all belong to this family (Jones and Dangl 2006). The genes containing the NBS-ARC domain and a series of C-terminal leucine-rich repeats are typical members in this NLR family (Kim et al. 2012). Seven major domains or motifs have been found in these R proteins: Toll/Interleukin-1 receptor (TIR), coiled-coil (CC), NB-ARC, LRR, leucine zipper (LZ), RPW8 (Xiao et al. 2001), TM, and serinethreonine kinase (STK) (van Ooijen et al. 2007). Of these domains, the dimerization of TIR:TIR via self-association across multiple surfaces was recently suggested as a common mechanism of the signaling of defense (Bernoux et al. 2011; Williams et al. 2016; Zhang et al. 2017). Based on the combinations of these domains, R proteins

10

Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax

219

Fig. 10.1 Common structures of the major plant R proteins. a Typical domain dissection for TNL and CNL protein. The conserved motifs are illustrated in color bars. The domain or motif length are not scaled for ease of visualization. b Typical domain and structure for RLK and RLP. RLP has no kinase domain compared with

RLK. c Typical domain and structure for RPW8-NLRs (RNL) and RPW8 domain only proteins. TIR: Toll/interleukin-1 receptor; NB: nucleotide-binding site; ARC: abbreviated from Apaf-1, R proteins and CED-4; CC: coiled-coil; SP: signal peptide; TM: transmembrane; LRR: leucine-rich repeats

are categorized into four major classes: (1) NLR, (2) RLKs, (3) RLPs, and (4) other variants (Sanseverino et al. 2010). According to different combinations of NB-ARC domain with other domains or motifs, NLR group can be further classed into several subgroups: TIR-NBS-LRR (TNL), CC-NBS-LRR(CNL), CC-NBS (CN), TIR-NBS (TN), NBS-LRR (NL), TIR-X (TX), and RPW8-NLR (RNL) (Fig. 10.1). The gene Ve was early cloned from tomato when showing the verticillium wilt resistance. The domain structure analysis suggests that it encodes a class of cell surface glycoproteins (Kawchuk et al. 2001). An assessment of racespecific resistance against the Verticillium dahliae race1 in a tolerant flax cultivar detected four high similarity homologs to Ve1, and the structure analysis further indicated Ve1 is a receptorlike protein encoding gene (Blum et al. 2021). The low efficiency of resistance gene identification in flax prompted other ideas for large-scale R gene discovery. A genome-wide association study (GWAS) in a population of 370 flax accessions phenotyped for flax pasmo resistance

revealed that 45 out of the detected 500 QTLs spanned 85 RGAs, and a TIR domain-containing NLR gene cluster was located on chromosome 8 (He et al. 2018). A genome-wide NBS-LRR similarity search against Pfam database in the draft flax genome revealed 147 NLR genes consisting of two major types of NLRs: TNL and CNL, and the TNL genes accounted for up to 67% of the NLR genes (Kale et al. 2013). TNL genes are also the major contributors for a repository of the flax rust resistance genes in L, M, N, and P loci (Lawrence et al. 2010).

10.3

Experimental Methods for Resistance Gene Identification

Positional cloning and chromosome walking for resistance gene cloning are usually timeconsuming and extremely costly, involving high-resolution mapping populations and reference genome assembly. One of the challenges is that if the interested gene is located in the areas

220

of reduced recombination, the map-based gene cloning strategy may fail to reach the targeted gene. Nowadays, the whole genome sequencing and assembling cost has been dramatically decreased compared to twenty years ago in terms of sequencing depth and assembly quality. Whole-genome sequencing has become an efficient alternative approach to expand resistance gene discovery.

10.3.1 Gene Cloning Map-based gene cloning is a traditional gene identification method for the mutant phenotypes upon the genetic markers. The mechanism is to progressively narrow down the linkage of markers to the physical location of the target genes controlling the phenotype. Thus, a set of highresolution markers across all chromosomes are usually essential to implement the experiment. However, only a high recombination rate between adjacent markers around the target gene can help locate the target gene. Though telomeres consist of abundant repeat sequences (Zakian 1995), high frequencies of recombination events still occur in this region. This is quite different from the centromere region that mainly comprises a megabasescale tandemly repeated satellite sequence unit that is averagely *178 bp in length, termed the CEN180 satellites (Heslop-Harrison et al. 1999). A recent Arabidopsis centromere assembly indicates that crossover recombinations are suppressed within the centromere regions, and another long terminal repeat class of retrotransposons like ATHILA was found within centromeres. These features make the centromere regions quite different from euchromatin and heterochromatin (Naish et al. 2021). Apparently, genes falling into the centromere-proximal regions make map-based cloning more difficult when the flanking sequence is under a low level of meiotic double-strand break occurrence. Flax rust disease leads to severe seed yield losses and reduces fiber quality. This disease is

P. Li and F. M. You

usually caused by the destructive organism Melampsora lini, a fungus that can overwinter using teliospores on flax debris. A total of 31 genes for rust resistance have been mapped at five loci designated K, L, M, N, and P, which encode 2, 13, 7, 3, and 6 genes, respectively (Islam 1990). To date, R genes from L, M, N, and P have been cloned (Lawrence et al. 1995; Anderson et al. 1997; Ellis et al. 1999; Dodds et al. 2001). Unlike many other plant rusts that require an alternative host, flax rust can complete its life cycle on the flax plant, mostly on the leaf and stem (Heslop-Harrison et al. 1999). At least thirteen alleles of the L genes (L, L2 to L11, and LH) have been described, each conferring a different rust resistance specificity. Flax rustresistant varieties can be distinguished by their reaction to a range of flax rust strains with corresponding avirulence genes (Ellis et al. 1999). A rust fungi resistance gene in the L6 locus of flax was cloned by maize transposable element Activator from two different testcross families (Lawrence et al. 1995). The gene in the L6 locus encodes two products of 1294 and 705 amino acids, resulting from an alternative splicing. The longer one, containing the leucine-rich repeat, is similar to the mosaic virus resistance gene N in tobacco or RPS2 of Arabidopsis in a partial sequence level (Lawrence et al. 1995). All three genes confer resistance to taxonomically distant pathogens. The domain analysis indicates that the longer L6 encodes a typical TNL proteincontaining TIR-NBS-LRR structure (Jones and Banfield 2017). The M gene is another rust-resistant gene cloned from two flax-based strategies, including using maize Ac tagging and the gel blot hybridization upon the L6-derived probe (Anderson et al. 1997). The M gene is a typical TNL type R gene. It shares 78% identity of sequence with L6, but it is different from L6 in two locations of the sequence: (1) 37 contiguous amino acids in the N-terminal and (2) two gaps and several places of mispairing in the third exon (Anderson et al. 1997).

10

Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax

10.3.2 QTL Mapping for Resistance Genes Pasmo is a fungus disease caused by Septoria linicola (Islam et al. 2021). Unlike rust, the resistance of pasmo is primarily controlled by multiple resistance genes with small effects. Genome-wide association study (GWAS) has been a leading approach for complex trait dissection and identification of novel and superior alleles in many crops breeding (Suwarno et al. 2015; Pantaliao et al. 2016; Sun et al. 2017; Ledesma-Ramirez et al. 2019). Using a diversity panel of 380 flax accessions and 258,873 SNP markers, 67 large-effect QTLs were identified, explaining 32–64% of the total variation of pasmo resistance, and 45 of these QTLs spanned 85 RGAs, including a TNL cluster on chromosome 8 (He et al. 2018). Powdery mildew is another severe disease caused by the fungus Oidium lini. Although early seeding can reduce the impact of this disease by avoiding early infections and building up of epidemics, the most economical way to control the disease is the use of resistant varieties. To date, several QTLs have been identified to associate with powdery mildew resistance. Several RLK, one NBS, and one TM-CC gene are located near these QTLs (You and Cloutier 2020).

10.4

Computational Methods for RGA Identification

RGAs have significant or unique sequence features and protein domains/motifs, allowing the bioinformatician to take advantage of computational approaches to identify RGAs. To date, at least 10 RGA prediction pipelines have been developed for different types of RGAs (Table 10.1). This table lists some major RGA prediction pipelines and their features. Two major methods have been used to predict RGAs from gene sequences during the past ten years: (1) conventional similarity search against well-maintained protein and domain databases, and (2) machine learning-based RGA identification. The conventional method usually contains

221

two main steps. The first step is to use tools like BLAST, FastA, or HMMER to search the input, either nucleotide or protein sequences of genes against the well-annotated databases, including nr (non-redundant protein sequence database) (Sayers et al. 2022), Pfam (Finn et al. 2010), or InterPro databases (Zdobnov and Apweiler 2001). The positive hits related to those known domains of R genes have remained for downstream analysis. For example, NB-ARC or transmembrane (TM) containing domains will remain for further analysis. The second step is to analyze the composition of domains and motifs and classify the genes into different RGA types (Fig. 10.1) based upon the domain configuration of known R genes. The newly identified R genes, including domains and their configuration information, can be further used to improve the RGA prediction pipeline. NLR-parser (Steuernagel et al. 2015), RGAugury (Li et al. 2016), and NLRtracker (Kourelis et al. 2021) are three typical RGA prediction pipelines developed using the conventional homology search-based approach. Machine learning is an in silico prediction approach based on a training dataset using the machine learning algorithm. Support vector machine (SVM) is mostly used to classify the potential RGAs. The goal of this algorithm is to learn and generate a classifier or a hyperplane by using as many measurable parameters as possible from compositional frequencies of the resistance gene sequences to distinguish them from nonresistance genes. NBSPred (Kushwaha et al. 2016), prPred (Wang et al. 2020), and DRPPP (Pal et al. 2016) are three machine learning-based pipelines for the annotation of NLR family. The machine learning methods are highly flexible with good performance of gene prediction. Compared to the in silico prediction, the conventional similarity search-based method has low efficiency. Usually, similarity search against multiple large databases is quite challenging because the similarity search tools usually do not work in high performance, especially at a genome-wide level. Low sensitivity is another weakness of the conventional computational approach. For example, LRRs are highly

222

P. Li and F. M. You

Table 10.1 Comparison of resistance gene analog (RGA) prediction pipelines Pipeline

Input type

Major features

References

Algorithm

Multithreads support

RGA categories detected

RGAugury

Amino sequences

Blast/InterProScan Similarity alignment

Yes

NLR, RLK, RLP, TMCC, RPW8

Li et al. (2016)

NLR tracker

Amino sequence/transcripts

InterProScan/MEME motif search

Partially supported

NLR, RNL

Kourelis et al. (2021)

NLR-annotator

CDS, genes

MEME motif search

No

NLR

Steuernagel et al. (2020) and Zhang (2020)

NLGenomeSweeper

Genomic, CDS

BLAST/HMM similarity alignment + InterProScan

NO

NLR

Toda et al. (2020)

DRAGO3

Amino sequences, CDS

HMM similarity alignment

Yes

NLR, RLK, RLP, LYK, LYP

Sanseverino et al. (2013) and Calle Garcia et al. (2022)

RRGPredictor

Amino sequences, CDS

Annotation on InterProScan

Partially supported

NLR, RLP, RLK, RPW8/RNL? MLO?

Santana Silva and Micheli (2020)

NLR-parser

Amino sequences, CDS

Similarity alignment

No

NLR

Steuernagel et al. (2015)

NBSPred

Genomic, amino sequences, CDS

SVM

No

NLR

Kushwaha et al. (2016)

DRPPP

Amino sequence

SVM

No

NLR, RLK, RLP

Pal et al. (2016)

prPred

Amino sequence

SVM

No

NLR, RLP, RLK

Wang et al. (2020)

SVM support vector machine, a machine learning algorithm

adaptable structural domains functioning in the protein–protein interactions and are under diversifying selection (Ellis et al. 2000; Jones and Dangl 2006). Those NBS-LRR domaincontaining NLR genes not only lack conservation but are also significantly more diverse than expected from random genetic drift, which suggests that pressures prompted the evolving of new pathogen specificities for the recognition of various pathogen Avr proteins (Marone et al. 2013), which has potential risk when the cutoff of sensitivity of a pipeline was improperly set up.

10.4.1 RGAugury RGAugury is an efficient genome-wide RGA prediction tool we developed in 2016 (Li et al. 2016). This pipeline tool integrates the power of many well-known domains or motif recognizing computing tools into one straightforward pipeline, including BLAST (Camacho et al. 2009), pfam_scan (Finn et al. 2010), InterProScan (Zdobnov and Apweiler 2001), nCoil (Lupas et al. 1991), Phobius (Kall et al. 2004), and curated protein databases: Pfam (Zdobnov and

10

Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax

223

10.4.1.1 Support for Docker and PodMan RGAugury was initially designed to have two different operation modes, a web user interface (WebUI) and a command-line version for all levels of researchers (Li et al. 2016). For the command-line version, the major challenge for many users is the installation of the dependencies of various Perl modules. Frequent updates of Linux may cause serial changes in the Linux environment. For example, the deprecation of obsolete modules in a higher version of Perl will

directly lead to the failure of RGAugury installation or the absence of the RGA gene domain and the genome distribution rendering feature. To facilitate the installation of the RGAugury pipeline, we developed RGAugury 2.0 with some new features. In recent years, Docker technology has become popular in the bioinformatics field. Many popular bioinformatics tools, such as BWA (Li and Durbin 2009, 2010), FreeBayes (Garrison and Marth 2012), and Galaxy (Blankenberg et al. 2014) workflow, have their Docker versions available for users. Docker is a container platform for easy, convenient, and rapid application development and delivery. Docker is a system to run in an isolated environment without the concern of ruining the environment system. All the applications have been precompiled; no additional dependency is required. All the code, dependencies, and infrastructure can be packaged together in an isolated container, facilitating rapid deployment without additional installation of dependencies and libraries. As the application in the Docker container has been preconfigured, even further configuration after the deployment, the container still can keep all the configuration in an isolated volume for easier migration to a different platform, reducing interference for the host environment. RGAugury 2.0 has been preconfigured with the Docker platform. The image file of the

Fig. 10.2 Flowchart of resistance gene analog (RGA) identification pipeline. All the modules and part of databases have been packaged with Docker. The line and arrow indicate the data flow from the input to output.

In addition, the Docker image is also compatible with rootless permission platform such as Podman. RGAugury Docker image can be downloaded from https://hub. docker.com

Apweiler 2001), Gene3D (Lewis et al. 2018), SMART (Schultz et al. 2000), Superfamily (Wilson et al. 2009), and Panther (Mi et al. 2016). In addition to identifying the most common RGA subclasses of NLR, RLP, RLK and TM-CC, RGAugury can also identify typical forms of NLR, including TN, N, CN, TX, and NL. Based on the combinations of different domains or motifs, RGAugury can group those genes into various RGA families (Fig. 10.2). To date, this tool has been broadly applied in gene annotation of genome sequencing projects, such as rice (Yu et al. 2021), hazelnut (Pavese et al. 2021), Brassica (Park et al. 2021; Yang et al. 2021; Zhang et al. 2021), banana (Rijzaani et al. 2021), and Aegilops tauschii (Wang et al. 2021), and RGA evolution study in the Brassicaceae families (Tirnaz et al. 2020).

224

pipeline can be downloaded directly from the Web site of Docker, (hub.docker.io). In this new version, users only need to install and mount the InterProScan and Pfam database to the container in terms of software deployment and then can operate this pipeline in either WebUI or command-line mode from the Linux console. The Docker image has also been successfully tested under the Podman system, an alternative container technology option to Docker. The source code of RGAugury 2.0 has been deposited into Bitbucket (https://bitbucket.org/yaanlpc/ rgaugury/src/master/).

10.4.1.2 Support for the RNL (RPW8) The RPW8 domain was found in several broadspectrum mildew resistance proteins from Arabidopsis and other dicots (Xiao et al. 2001, 2003). The A. thaliana locus RPW8 contains two naturally polymorphic and dominant R genes: RPW8.1 and RPW8.2, which individually confers broad-spectrum resistance to powdery mildew pathogens, a global disease devastating important agricultural crops. They induce localized, salicylic acid- (SA) and EDS1-dependent defenses similar to those induced by R genes that confer specific resistance in Arabidopsis (Dangl and Jones 2001). Apparently, broad-spectrum resistance mediated by RPW8 uses the same mechanisms as specific resistance. The RPW8 domain sequences consist of a predicted (TM) domain or possibly a signal peptide on the N-terminal and a coiled-coil (CC) motif (Zhong and Cheng 2016). More than 35 plant species have been reported to contain the RPW8 genes (Zhong and Cheng 2016). RPW8 domain-containing genes have emerged as an important category of resistance gene recently (Jorgensen and Emerson 2009). Thus, we added the RPW8 domain-containing gene identification in the second version of RGAugury. To emphasize these new RPW8 genes, the output of RGAugury 2.0 presents RPW8-containing genes as a new group in addition to NLR, RLK, RLP, and TM-CC. The RPW8-containing proteins will be categorized as the RNL (RPW8-NLR) group under the NLR category when the NBS domain was detected in

P. Li and F. M. You

the N terminus or as an individual group without detection of NBS domain (Kourelis et al. 2021).

10.4.2 Machine Learning Based RGA Annotation Pipelines In addition to several homology search-based pipelines, such as RGAugury and NLRTracker, here, we review three machine learning-based pipelines. The pipeline NBSPred was developed using an SVMlight package to generate the SVM classifier for NBS-LRR protein prediction (Kushwaha et al. 2016). Usually, the best classifier is identified through a fivefold cross-validation technique (Kushwaha et al. 2016). It is an iterative training method using different optimization for classification like liner, sigmoid, polynomial radial basis function (RBF) to create an optimal hyperplane that will divide the positive from the negative dataset. Six types of parameters, including the amino acid frequency, dipeptide frequency, tripeptide frequency, multiplet frequency, electronic charge, and hydrophobicity composition, are constructed as numerical feature vectors from an explicit known positive and a negative dataset, respectively, to generate an SVM classifier to screen the positive resistance genes from negative genes. To achieve the best classifier, both the positive and negative datasets were equally divided into five subsets, respectively; four of each sub-dataset were used as the training dataset. The left one was used as the testing dataset. As different parameters and core kernel of SVM built about 588 models in the present configuration, the performance of each model could be evaluated via three major outcomes, including specificity (SP), sensitivity (SN), and Mathew’s correlation coefficient (MCC). NBSPred adopted the best ten models using RBF and polynomial to work together on other real input data. However, the accuracy of the tests using independent datasets is not available while released. DRPPP is another learning-based computational pipeline developed based on the SVM package (Chang and Lin 2011). Its algorithms

10

Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax

and feature extractions are similar to NBSPred. However, DRPPP used ten-fold cross-validation to obtain the best classifier for R gene prediction. DRPP generated up to 91% accuracy compared to 83% of NBSPred when using the independent training datasets (Pal et al. 2016). Meanwhile, NBSPred only predicts NLR genes due to the limitation of the training dataset, and DRPPP can also predict RLK and RLP genes. PrPred is also a machining-learning pipeline to identify three major types of R genes, NLR, RLK, and RLP (Wang et al. 2020). Initially, five different algorithms were compared in PrPred for R gene prediction, and only SVM generated the most accurate prediction results (Wang et al. 2020). In this algorithm, two steps of feature selection strategies and the additional data processing, including the consideration of k-spaced amino acid pairs (CKSAAPs) and k-spaced amino acid group pairs (CKSAAGPs) for k-value optimization, were used. In addition, the iFeature software (Chen et al. 2018) was used to generate the vector parameters for the classifier, which simplified the vector construction.

10.4.3 Comparison of RGA Identification Tools Although different algorithms (similarity searchbased or machine learning-based methods) are implemented in RGA identification tools, users are more interested in their performance. Most pipelines were designed to identify NLR genes as well as other types of RGAs (RLP, RLK, etc.) (Table 10.1). RPW8-related genes (NLR) can be predicted by RGAugury (v2.0), NLRTracker (Kourelis et al. 2021), and RRGPredictor (Santana Silva and Micheli 2020). Some pipelines provide several input options, such as amino sequences (AA), transcripts (CDS), or gene sequences (GENE). NLR-annotator and NLGenomeSweeper only take genomic sequences (CDS and GENE) input, while RGAugury uses AAs. NBSPred also takes whole genome sequence input. In this case, the tool first predicts proteins through Augustus (Stanke and Morgenstern 2005) and then RGAs (Kushwaha et al. 2016).

225

Recently, Kourelis et al. (2021) compiled a comprehensive collection of experimentally validated plant disease resistance proteins from the NLR family, called RefPlantNLR. This collection contains 481 NLRs from 31 genera, providing a good test dataset to compare the performance of different pipelines in annotating NLR genes. Using the subset of this dataset, Kourelis et al. (2021) performed a benchmark test for six popular NLR gene prediction pipelines, including NLRannotator (Steuernagel et al. 2020; Zhang 2020), NLGenomeSweeper, DRAGO2 (Toda et al. 2020), RGAugury (Li et al. 2016), RRGPreditor (Santana Silva and Micheli 2020), and NLRTracker (Kourelis et al. 2021). Since NLRAnnotator and NLGenomeSweeper only take nucleotide sequence as an input, RGAugury only works on protein sequences; they used only the RefPlantNLR entries with CDS information (457/481) for the benchmarking. In addition, only 407 entries of them had gene sequences for analyses. The original data were listed in the supplementary files of the published paper (Appendix S1, RefPlantNLR benchmarking). Based on this dataset, they calculated the sensitivity and specificity of each pipeline. Sensitivity represents the proportion of the number of genes detected by the pipelines as NLR genes compared to the total number of NLR genes, while specificity is the proportion of the number of predicted genes that had the same NLR categories with RefPlantNLR compared to the total number of predicted NLR genes. The results are shown in Table 10.2. NLRTracker had the highest sensitivity (100%) and annotation specificity (100%), predicting all entries of the RefPlantNLR dataset as NLR genes and correctly recognizing all types of domains and motifs. DRAGO2 also had the 100% sensitivity, followed by NLR-Annotator and NLGenomeSweeper (98.0%, both used CDS as an input), RGAugury (96.9%), and RRGPredictor (95.4%). Besides NLRTracker, NLRannotator had relatively high annotation specificity (88.2%), followed by RRGPredictor (61.9%) and RGAugury (61.1%). It is noticeable that different numbers of NLR entries were used in the above benchmarking

226

P. Li and F. M. You

Table 10.2 Performance of six NLR annotation tools Tool

Input

Entries of RefPlantNLR

Sensitivity (%)

Specificity (%)

DRAGO2

AA

457

100

45.2

CDS

457

99.3

CDS

457

98.0

31.5

Gene

407

88.9

23.1

CDS

457

98.0

88.2

Gene

407

97.3

87.0

AA

457

96.9

61.1

NLGenomeSweeper NLR-Annotator RGAugury v1.0 RRGPredictor

AA/CDS

457

95.4

61.9

NLRTracker

AA/CDS

457

100

100

Data source modified from Kourelis et al. (2021)

test. The comparison of the pipelines may be biased to some extent. The major reason is that the discrimination or annotation capability for some specific types of NLRs varies from different pipelines. In addition, in this benchmarking test, the RGAugury v1.0 was used. Thus, the performance was underestimated compared to RGAugury 2.0, which has been largely improved recently by adding the support of NLR gene identification. Therefore, we reconducted RGA prediction using the new version (2.0) of RGAugury with the same 481 annotated NLR gene dataset and updated the prediction results of RGAugury. We noticed that the actual number of NLRs in the RefPlantNLR dataset has amino sequences; CDS and gene sequences were only 377. To keep all pipelines with the same gene sets for three different types of inputs, we only used a subset of 377 NLR genes to further analyze the pipeline performance. Due to the lack of the results of NLRTracker in the original paper’s supplementary file, we only compared the other five pipelines here. The results are summarized in Fig. 10.3. Except for DRGO2 running on CDS sequences, all other pipelines had more than 95% sensitivity. NLR-annotator predicted all NLR genes running on CDS sequences, while RGAugury 2.0 retrieved 99.5% of NLR genes, and only two NLR genes were missed (Fig. 10.3a). DRAGO2 gained higher sensitivity while using amino sequences instead of CDS sequences, whereas both NLR-annotator and

NLGenomeSweeper slightly outperformed with CDS sequences input. (Fig. 10.3a). Among the five pipelines, NLGenomeSweeper generated the highest sensitivity (94.9%) compared to 67.4 for RRGPredictor, 63.5% for RGAugury, 44.3% for NLR-annotator, and 36.0% for DRAGO2 (Fig. 10.3b). This result is divergent from those reported by the original paper, where the authors used different numbers of NLR genes (457 for AA and CDS input and 407 for gene sequence input) for the calculations (Kourelis et al. 2021). Therefore, it appears that the sample size and constitution of NLR categories in the sample dataset affect the sensitivity and specificity of each pipeline.

10.5

RGA Profile of Flax

Using the newly predicted protein-coding genes of seven flax genome assemblies, including five L. usitatissimum and one L. bienne, a genome-wide identification of RGAs has been performed using the latest RGAugury pipeline (v2.0) with the updated protein databases of InterProScan (v5.52) and Pfam (v33.1) with by default parameters. The results are listed in Table 10.3. A total of 902 RGAs were identified from 15 flax chromosome-scale pseudomolecules (CDC Bethune v2) after the removal of redundancy. RGAugury categorized these RGAs into four major groups: NLR, RLK, RLP, and TM-CC

10

Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax

Fig. 10.3 Comparison of five popular resistance gene analog (RGA) identification pipelines to identify NLR genes in terms of sensitivity (a) and specificity (b) of each pipeline. A total of 377 manually curated NLR genes that have CDS, genes (GENE), and amino sequences

227

(AA) available were used as a common dataset. The different inputs, including CDS, GENE, and AA, were applied according to the input options of different pipelines. Data source original data from Kourelis et al. (2021)

Table 10.3 Comparison of numbers of resistance gene analogs (RGAs) identified in seven flax assemblies, including one L. bienne and five L. usitatissimum cultivars Genotype

L. bienne

L. usitatissimum Linseed

Pale flax

Fiber

CDC Bethune v1

CDC Bethune v2

Longya-10

Heiya-14

Yiya-5

Atlant

NLR CN

3

4

3

4

4

3

5

CNL

15

9

8

15

13

8

13

NBS

11

12

7

18

11

6

21

NL

35

18

18

26

36

20

38

RN

3

3

3

2

2

3

2

RNL

6

5

5

6

6

6

6

TN

19

2

4

17

20

7

23

TNL

90

10

11

93

93

13

95

TX

55

31

28

58

64

25

56

OTHER RLK

7

3

3

10

10

2

11

807

515

549

870

862

593

882

RLP

115

107

90

107

99

84

115

TM-CC

221

159

163

246

235

169

243

RPW8 Total

12

13

10

13

11

10

10

1399

891

902

1485

1466

949

1520

NBS nucleotide-binding site domain; LRR leucine-rich repeat; CC coiled-coil; TNL Toll/interleukin-1 receptor-like domain; CNL CC–NBS–LRR; TN TIR–NBS; RLK receptor-like protein kinase; RLP receptor-like protein; TM-CC transmembrane coiled-coil protein. TX TIR-unknown/random; RNL RPW8-NBS-LRR; RPW8 RPW8 domaincontaining protein

228

(Table 10.3). In the new version of RGAugury, RPW8-related NLR genes are grouped as RNL genes in the NLR family, while the remaining RPW8 genes are listed as a separate RPW8 group. Out of 902 RGAs, five RNL and ten RPW8 domain-containing genes are RPW8related RGAs, which may confer broad-spectrum resistance to powdery mildew in flax. RLK is the most abundant resistance category, accounting for 60.8% of all RGAs. Like other dicot species, both TNL and CNL genes exist in the flax genome. This is different from most monocot species in which TNL genes have lost in the common ancestor of monocots during their genome evolution (Meyers et al. 1999; Pan et al. 2000; Akita and Valkonen 2002; Bai et al. 2002; Cannon et al. 2002; Li et al. 2016). A total of eight CNL and eleven TNL genes with complete structures were predicted in this assembly. Most NLR, RLP, and RLK genes prefer to be clustered at both ends of chromosomes (Fig. 10.4). For example, the cloned TNL gene L6 (Accession# U27081.1) was aligned to a genomic region between 15 and 16 Mbp of chromosome 5 where at least 6 NLR genes were clustered together (Fig. 10.4a). Usually, most NBS-encoding genes occur in gene clusters, indicating the extensive gene duplication during the evolution (Cheng et al. 2010). The clustering feature may facilitate the sequence exchanges via recombination and mispairing during the miosis (Ma et al. 2021). At least 25 NLR genes are located in the region between 18 and 19 Mbp of chromosome 8 (Fig. 10.4a), and this region involves two pasmo QTLs (QTL37 and QTL 38) (He et al. 2018). Gene duplication is considered a major force for evolution (Magadum et al. 2013), leading to the domain structural variation and neofunctionalization in family members (Ponting and Russell 2002; Magadum et al. 2013; Ma et al. 2021). Like NBS-encoding genes, RLK and RLP genes are also clustered across chromosomes (Fig. 10.4b, c). In contrast, TM-CC genes likely prefer to be dispersed on 15 chromosomes though several spots have the clustering potential in low density (Fig. 10.4d).

P. Li and F. M. You

Comparing all seven flax assemblies, the number of detected RGAs ranges between 891 in CDC Bethune (v1) and 1520 in Atlant. The differences among different assemblies are largely due to the differences in the size of assemblies and the total number of genes predicted (See Table 1.5 in Chapter 1 (You et al. 2023)). The numbers of predicted genes in different assemblies are related to the mRNA sequences used for gene prediction, such as RNA-seq sequencing depth and libraries of tissues. No obvious evidence supports that the differences are due to either different species (wild and cultivated) or different morphotypes (fiber and linseed), or different cultivars. However, the differences of RGAs among different assemblies are primarily attributed to the RLK and TM-CC genes, while other types of RGAs are of similar numbers.

10.6

Conclusion and Perspective

As an important and ancient economic crop, flax is grown worldwide to provide linseed and fiber to the market. To keep a sustainable increase of flaxseed and fiber production and combat global climate changes, genetic improvement of disease resistance in cultivars is a long-term challenge. Along with the whole genome assembly and annotation of coding genes in flax, genome-wide resistance genes can be identified from annotated genomes using bioinformatics tools. Several developed genome-wide RGA prediction tools will facilitate the identification and annotation of disease resistance genes. These predicted potential disease resistance genes can be further validated and applied to flax resistance inheritance studies and resistance breeding. With the reduction in costs of next-generation sequencing, which has been a game-changer, researchers benefit from the high density of SNP markers to perform advanced genetic analyses, including genome-wide QTL mapping, identification of candidate genes, and genomic selection for disease resistance. The ultimate goal is to transfer the results and information to breeding programs.

10

Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax

Fig. 10.4 Distribution of resistance gene analogs (RGAs) in chromosomes of L. usitatissimum in terms of different RGA categories: a NLR, b RLK, c RLP, and d TM-CC.

229

The RGA genes are plotted as stacked when the distance of adjacent is shorter than 10 kb

230

References Akita M, Valkonen JP (2002) A novel gene family in moss (Physcomitrella patens) shows sequence homology and a phylogenetic relationship with the TIR-NBS class of plant disease resistance genes. J Mol Evol 55:595–605 Anderson PA, Lawrence GJ, Morrish BC, Ayliffe MA, Finnegan EJ et al (1997) Inactivation of the flax rust resistance gene M associated with loss of a repeated unit within the leucine-rich repeat coding region. Plant Cell 9:641–651 Bai J, Pennill LA, Ning J, Lee SW, Ramalingam J et al (2002) Diversity in nucleotide binding site-leucine-rich repeat genes in cereals. Genome Res 12:1871–1884 Bernoux M, Ve T, Williams S, Warren C, Hatters D et al (2011) Structural and functional analysis of a plant resistance protein TIR domain reveals interfaces for self-association, signaling, and autoregulation. Cell Host Microbe 9:200–211 Blankenberg D, Von Kuster G, Bouvier E, Baker D, Afgan E et al (2014) Dissemination of scientific software with Galaxy ToolShed. Genome Biol 15:403 Blum A, Castel L, Trinsoutrot-Gattin I, Driouich A, Laval K (2021) Identification of tomato Ve1 homologous proteins in flax and assessment for race-specific resistance in two fiber flax cultivars against Verticillium dahliae Race 1. Plants (Basel) 10 Böhm H, Albert I, Fan L, Reinhard A, Nürnberger T (2014) Immune receptor complexes at the plant cell surface. Curr Opin Plant Biol 20:47–54 Calle Garcia J, Guadagno A, Paytuvi-Gallart A, SaeraVila A, Amoroso CG et al (2022) PRGdb 4.0: an updated database dedicated to genes involved in plant disease resistance process. Nucleic Acids Res 50: D1483–D1490 Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J et al (2009) BLAST+: architecture and applications. BMC Bioinform 10:421 Cannon SB, Zhu H, Baumgarten AM, Spangler R, May G et al (2002) Diversity, distribution, and ancient taxonomic relationships within the TIR and non-TIR NBS-LRR resistance gene subfamilies. J Mol Evol 54:548–562 Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines 2:Article 27 Chen Z, Zhao P, Li F, Leier A, Marquez-Lago TT et al (2018) iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34:2499–2502 Cheng X, Jiang H, Zhao Y, Qian Y, Zhu S et al (2010) A genomic analysis of disease-resistance genes encoding nucleotide binding sites in Sorghum bicolor. Genet Mol Biol 33:292–297 Dangl JL, Jones JD (2001) Plant pathogens and integrated defence responses to infection. Nature 411:826–833 Dodds PN, Lawrence GJ, Ellis JG (2001) Contrasting modes of evolution acting on the complex N locus for rust resistance in flax. Plant J 27:439–453

P. Li and F. M. You Ellis JG, Lawrence GJ, Luck JE, Dodds PN (1999) Identification of regions in alleles of the flax rust resistance gene L that determine differences in genefor-gene specificity. Plant Cell 11:495–506 Ellis J, Dodds P, Pryor T (2000) Structure, function and evolution of plant disease resistance genes. Curr Opin Plant Biol 3:278–284 Eriksson J, Henning E (1894) Die Hauptresultate einer neuen Untersuchung über die Getreideroste. Z Pflanzenkrkh 4:66–73 Finn RD, Mistry J, Tate J, Coggill P, Heger A et al (2010) The Pfam protein families database. Nucleic Acids Res 38:D211–D222 Fritz-Laylin LK, Krishnamurthy N, Tor M, Sjolander KV, Jones JD (2005) Phylogenomic analysis of the receptor-like proteins of rice and Arabidopsis. Plant Physiol 138:611–623 Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing. arXiv preprint 1207.3907 [q-bio.GN] Gohre V, Spallek T, Haweker H, Mersmann S, Mentzel T et al (2008) Plant pattern-recognition receptor FLS2 is directed for degradation by the bacterial ubiquitin ligase AvrPtoB. Curr Biol 18:1824–1832 Gomez-Gomez L, Boller T (2000) FLS2: an LRR receptor-like kinase involved in the perception of the bacterial elicitor flagellin in Arabidopsis. Mol Cell 5:1003–1011 He L, Xiao J, Rashid KY, Yao Z, Li P et al (2018) Genome-wide association studies for pasmo resistance in flax (Linum usitatissimum L.). Front Plant Sci 9:1982 Heslop-Harrison JS, Murata M, Ogura Y, Schwarzacher T, Motoyoshi F (1999) Polymorphisms and genomic organization of repetitive DNA from centromeric regions of Arabidopsis chromosomes. Plant Cell 11:31–42 Islam MRMG (1990) A compendium on host genes in flax conferring resistance to flax rust. Plant Breed 104:89–100 Islam T, Vera C, Slaski J, Mohr R, Rashid KY et al (2021) Fungicide management of pasmo disease of flax and sensitivity of Septoria linicola to Pyraclostrobin and Fluxapyroxad. Plant Dis 105:1677– 1684 Jones JD, Banfield MJ (2017) Two-faced TIRs trip the immune switch. Proc Natl Acad Sci U S A 114:2445– 2446 Jones JD, Dangl JL (2006) The plant immune system. Nature 444:323–329 Jones DA, Thomas CM, Hammond-Kosack KE, BalintKurti PJ, Jones JD (1994) Isolation of the tomato Cf-9 gene for resistance to Cladosporium fulvum by transposon tagging. Science 266:789–793 Jorgensen TH, Emerson BC (2009) RPW8 and resistance to powdery mildew pathogens in natural populations of Arabidopsis lyrata. New Phytol 182:984–993 Kaku H, Nishizawa Y, Ishii-Minami N, AkimotoTomiyama C, Dohmae N et al (2006) Plant cells recognize chitin fragments for defense signaling

10

Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax

through a plasma membrane receptor. Proc Natl Acad Sci U S A 103:11086–11091 Kale SM, Pardeshi VC, Barvkar VT, Gupta VS, Kadoo NY (2013) Genome-wide identification and characterization of nucleotide binding site leucine-rich repeat genes in linseed reveal distinct patterns of gene structure. Genome 56:91–99 Kall L, Krogh A, Sonnhammer EL (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338:1027–1036 Kawchuk LM, Hachey J, Lynch DR, Kulcsar F, van Rooijen G et al (2001) Tomato Ve disease resistance genes encode cell surface-like receptors. Proc Natl Acad Sci U S A 98:6511–6515 Kim J, Lim CJ, Lee BW, Choi JP, Oh SK et al (2012) A genome-wide comparison of NB-LRR type of resistance gene analogs (RGA) in the plant kingdom. Mol Cells 33:385–392 Kourelis J, van der Hoorn RAL (2018) Defended to the nines: 25 years of resistance gene cloning identifies nine mechanisms for R protein function. Plant Cell 30:285–299 Kourelis J, Sakai T, Adachi H, Kamoun S (2021) RefPlantNLR is a comprehensive collection of experimentally validated plant disease resistance proteins from the NLR family. PLoS Biol 19:e3001124 Kushwaha SK, Chauhan P, Hedlund K, Ahren D (2016) NBSPred: a support vector machine-based highthroughput pipeline for plant resistance protein NBSLRR prediction. Bioinformatics 32:1223–1225 Lawrence GJ, Finnegan EJ, Ayliffe MA, Ellis JG (1995) The L6 gene for flax rust resistance is related to the Arabidopsis bacterial resistance gene RPS2 and the tobacco viral resistance gene N. Plant Cell 7:1195–1206 Lawrence GJ, Anderson PA, Dodds PN, Ellis JG (2010) Relationships between rust resistance genes at the M locus in flax. Mol Plant Pathol 11:19–32 Ledesma-Ramirez L, Solis-Moya E, Iturriaga G, Sehgal D, Reyes-Valdes MH et al (2019) GWAS to identify genetic loci for resistance to yellow rust in wheat pre-breeding lines derived from diverse exotic crosses. Front Plant Sci 10:1390 Lewis TE, Sillitoe I, Dawson N, Lam SD, Clarke T et al (2018) Gene3D: extensive prediction of globular domains in proteins. Nucleic Acids Res 46:D435– D439 Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760 Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26:589–595 Li P, Quan X, Jia G, Xiao J, Cloutier S et al (2016) RGAugury: a pipeline for genome-wide prediction of resistance gene analogs (RGAs) in plants. BMC Genomics 17:852 Lupas A, Van Dyke M, Stock J (1991) Predicting coiled coils from protein sequences. Science 252:1162–1164 Ma Y, Chhapekar SS, Lu L, Oh S, Singh S et al (2021) Genome-wide identification and characterization of

231

NBS-encoding genes in Raphanus sativus L. and their roles related to Fusarium oxysporum resistance. BMC Plant Biol 21:47 Macho AP, Zipfel C (2015) Targeting of plant pattern recognition receptor-triggered immunity by bacterial type-III secretion system effectors. Curr Opin Microbiol 23:14–22 Magadum S, Banerjee U, Murugan P, Gangapur D, Ravikesavan R (2013) Gene duplication as a major force in evolution. J Genet 92:155–161 Marone D, Russo MA, Laido G, De Leonardis AM, Mastrangelo AM (2013) Plant nucleotide binding siteleucine-rich repeat (NBS-LRR) genes: active guardians in host defense responses. Int J Mol Sci 14:7302–7326 Meinhardt C, Howland A, Ellersieck M, Scaboo A, Diers B et al (2021) Resistance gene pyramiding and rotation to combat widespread soybean cyst nematode virulence. Plant Dis 105:3238–3243 Meyers BC, Dickerman AW, Michelmore RW, Sivaramakrishnan S, Sobral BW et al (1999) Plant disease resistance genes encode members of an ancient and diverse protein family within the nucleotide-binding superfamily. Plant J 20:317–332 Mi H, Poudel S, Muruganujan A, Casagrande JT, Thomas PD (2016) PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res 44:D336–D342 Monaghan J, Zipfel C (2012) Plant pattern recognition receptor complexes at the plasma membrane. Curr Opin Plant Biol 15:349–357 Naish M, Alonge M, Wlodzimierz P, Tock AJ, Abramson BW et al (2021) The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374:eabi7489 Pal T, Jaiswal V, Chauhan RS (2016) DRPPP: a machine learning based tool for prediction of disease resistance proteins in plants. Comput Biol Med 78:42–48 Pan Q, Wendel J, Fluhr R (2000) Divergent evolution of plant NBS-LRR resistance gene homologues in dicot and cereal genomes. J Mol Evol 50:203–213 Pantaliao GF, Narciso M, Guimaraes C, Castro A, Colombari JM et al (2016) Genome wide association study (GWAS) for grain yield in rice cultivated under water deficit. Genetica 144:651–664 Park SG, Noh E, Choi S, Choi B, Shin IG et al (2021) Draft genome assembly and transcriptome dataset for European turnip (Brassica rapa L. ssp. rapifera), ECD4 carrying clubroot resistance. Front Genet 12:651298 Pavese V, Cavalet Giorsa E, Barchi L, Acquadro A, Torello Marinoni D et al (2021) Whole-genome assembly of Corylus avellana cv ‘Tonda Gentile delle Langhe’ using linked-reads (10X Genomics). G3 (Bethesda) Plank VD (1963) Vertical and horizontal resistance against potato blight, pp 171–205. https://doi.org/10. 1016/B978-0-12-711450-750017-2 Ponting CP, Russell RR (2002) The natural history of protein domains. Annu Rev Biophys Biomol Struct 31:45–71

232 Pradhan SK, Nayak DK, Mohanty S, Behera L, Barik SR et al. (2015) Pyramiding of three bacterial blight resistance genes for broad-spectrum resistance in deepwater rice variety, Jalmagna. Rice (N Y) 8:51 Pritchard L, Birch PR (2014) The zigzag model of plantmicrobe interactions: is it time to move on? Mol Plant Pathol 15:865–870 Ramalingam J, Raveendra C, Savitha P, Vidya V, Chaithra TL et al (2020) Gene pyramiding for achieving enhanced resistance to bacterial blight, blast, and sheath blight diseases in rice. Front Plant Sci 11:591457 Rijzaani H, Bayer PE, Rouard M, Dolezel J, Batley J et al. (2021) The pangenome of banana highlights differences between genera and genomes. Plant Genome e20100 Ron M, Avni A (2004) The receptor for the fungal elicitor ethylene-inducing xylanase is a member of a resistance-like gene family in tomato. Plant Cell 16:1604–1615 Sanseverino W, Roma G, De Simone M, Faino L, Melito S et al (2010) PRGdb: a bioinformatics platform for plant resistance gene analysis. Nucleic Acids Res 38:D814–D821 Sanseverino W, Hermoso A, D’Alessandro R, Vlasova A, Andolfo G et al (2013) PRGdb 2.0: towards a community-based database model for the analysis of R-genes in plants. Nucleic Acids Res 41:D1167–D1171 Santana Silva RJ, Micheli F (2020) RRGPredictor, a settheory-based tool for predicting pathogen-associated molecular pattern receptors (PRRs) and resistance (R) proteins from plants. Genomics 112:2666–2676 Sayers EW, Bolton EE, Brister JR, Canese K, Chan J et al (2022) Database resources of the national center for biotechnology information. Nucleic Acids Res 50: D20–D26 Schultz J, Copley RR, Doerks T, Ponting CP, Bork P (2000) SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res 28:231–234 Scofield SR, Tobias CM, Rathjen JP, Chang JH, Lavelle DT et al (1996) Molecular basis of gene-forgene specificity in bacterial speck disease of tomato. Science 274:2063–2065 Sekhwal MK, Li P, Lam I, Wang X, Cloutier S et al (2015) Disease resistance gene analogs (RGAs) in plants. Int J Mol Sci 16:19248–19290 Sharma A, Srivastava P, Mavi GS, Kaur S, Kaur J et al (2021) Resurrection of wheat cultivar PBW343 Using marker-assisted gene pyramiding for rust resistance. Front Plant Sci 12:570408 Sharma Poudel R, Al-Hashel AF, Gross T, Gross P, Brueggeman R (2018) Pyramiding rpg4- and Rpg1mediated stem rust resistance in barley requires the Rrr1 gene for both to Function. Front Plant Sci 9:1789 Shimizu T, Nakano T, Takamizawa D, Desaki Y, IshiiMinami N et al (2010) Two LysM receptor molecules, CEBiP and OsCERK1, cooperatively regulate chitin elicitor signaling in rice. Plant J 64:204–214

P. Li and F. M. You Song WY, Wang GL, Chen LL, Kim HS, Pi LY et al (1995) A receptor kinase-like protein encoded by the rice disease resistance gene, Xa21. Science 270:1804– 1806 Stakman EC (1915) Relation between Puccinia graminis and plants highly resistant to its attack. J of Agric Res 4:193–200 Stanke M, Morgenstern B (2005) AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 33:W465– W467 Steuernagel B, Jupe F, Witek K, Jones JD, Wulff BB (2015) NLR-parser: rapid annotation of plant NLR complements. Bioinformatics 31:1665–1667 Steuernagel B, Witek K, Krattinger SG, RamirezGonzalez RH, Schoonbeek HJ et al (2020) The NLR-annotator tool enables annotation of the intracellular immune receptor repertoire. Plant Physiol 183:468–482 Sun C, Zhang F, Yan X, Zhang X, Dong Z et al (2017) Genome-wide association study for 13 agronomic traits reveals distribution of superior alleles in bread wheat from the Yellow and Huai Valley of China. Plant Biotechnol J 15:953–969 Suwarno WB, Pixley KV, Palacios-Rojas N, Kaeppler SM, Babu R (2015) Genome-wide association analysis reveals new targets for carotenoid biofortification in maize. Theor Appl Genet 128:851–864 Tirnaz S, Bayer PE, Inturrisi F, Zhang F, Yang H et al (2020) Resistance gene analogs in the Brassicaceae: identification, characterization, distribution, and evolution. Plant Physiol 184:909–922 Toda N, Rustenholz C, Baud A, Le Paslier MC, Amselem J et al (2020) NLGenomeSweeper: a tool for genome-wide NBS-LRR resistance gene identification. Genes (Basel) 11 van Ooijen G, van den Burg HA, Cornelissen BJ, Takken FL (2007) Structure and function of resistance proteins in solanaceous plants. Annu Rev Phytopathol 45:43–72 Wang Y, Wang P, Guo Y, Huang S, Chen Y et al (2020) prPred: a predictor to identify plant resistance proteins by incorporating k-spaced amino acid (group) pairs. Front Bioeng Biotechnol 8:645520 Wang L, Zhu T, Rodriguez JC, Deal KR, Dubcovsky J et al. (2021) Aegilops tauschii genome assembly Aet v5.0 features greater sequence contiguity and improved annotation. G3 (Bethesda) Williams SJ, Yin L, Foley G, Casey LW, Outram MA et al (2016) Structure and function of the TIR domain from the grape NLR protein RPV1. Front Plant Sci 7:1850 Wilson D, Pethica R, Zhou Y, Talbot C, Vogel C et al (2009) SUPERFAMILY–sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res 37:D380–D386 Xiao S, Ellwood S, Calis O, Patrick E, Li T et al (2001) Broad-spectrum mildew resistance in Arabidopsis thaliana mediated by RPW8. Science 291:118–120

10

Genome-Wide Prediction of Disease Resistance Gene Analogs in Flax

Xiao S, Charoenwattana P, Holcombe L, Turner JG (2003) The Arabidopsis genes RPW8.1 and RPW8.2 confer induced resistance to powdery mildew diseases in tobacco. Mol Plant Microbe Interact 16:289–294 Yang H, Mohd Saad NS, Ibrahim MI, Bayer PE, Neik TX et al (2021) Candidate Rlm6 resistance genes against Leptosphaeria. maculans identified through a genomewide association study in Brassica juncea (L.) Czern. Theor Appl Genet 134:2035–2050 You FM, Cloutier S (2020) Mapping quantitative trait loci onto chromosome-scale pseudomolecules in flax. Methods Protoc 3 You FM, Moumen I, Khan K, Cloutier S (2023) Reference genome sequence of flax. In: You F, Fofana B (eds) The flax genome, compendium of plant genomes. https://doi.org/10.1007/978-3-03116061-5_1 Yu H, Lin T, Meng X, Du H, Zhang J et al (2021) A route to de novo domestication of wild allotetraploid rice. Cell 184(1156–1170):e1114 Zakian VA (1995) Telomeres: beginning to understand the end. Science 270:1601–1607 Zdobnov EM, Apweiler R (2001) InterProScan—an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17:847–848 Zhang W (2020) NLR-Annotator: a tool for de novo annotation of intracellular immune receptor repertoire. Plant Physiol 183:418–420

233

Zhang X, Bernoux M, Bentham AR, Newman TE, Ve T et al (2017) Multiple functional self-association interfaces in plant TIR domains. Proc Natl Acad Sci U S A 114:E2046–E2052 Zhang Y, Edwards D, Batley J (2021) Comparison and evolutionary analysis of Brassica nucleotide binding site leucine rich repeat (NLR) genes and importance for disease resistance breeding. Plant Genome 14: e20060 Zhong Y, Cheng ZM (2016) A unique RPW8-encoding class of genes that originated in early land plants and evolved through domain fission, fusion, and duplication. Sci Rep 6:32923 Zhu X, Zhao J, Abbas HMK, Liu Y, Cheng M et al (2018) Pyramiding of nine transgenes in maize generates high-level resistance against necrotrophic maize pathogens. Theor Appl Genet 131:2145–2156 Zipfel C (2014) Plant pattern-recognition receptors. Trends Immunol 35:345–351 Zipfel C, Kunze G, Chinchilla D, Caniard A, Jones JD et al (2006) Perception of the bacterial PAMP EF-Tu by the receptor EFR restricts Agrobacterium-mediated transformation. Cell 125:749–760

Genome-Editing Tools for Flax Genetic Improvement

11

Vanessa Clemis, Mohsin Zaidi, and Bourlaye Fofana

11.1

Introduction

Genome is a complete, randomized, and ordered association of the four nucleotides (A, C, G, T) making up the genetic codes of an individual or a species. It is the identity tracer for a given organism and its evolutionary history. In Eukaryotes, genomes are physically made of chromosomes, the number of which in the cell is fixed in a given organism, but varies from species to species (Kowles 2001). In somatic cells, there are generally two sets of the identical length-matching chromosomes called chromosomes pairs or homologous chromosomes (Bozza and Pawlowski 2008). In plant cells, there are three distinct genomes: nuclear, mitochondrial, and plastidial (Daniell et al. 2016). In the plant nuclear genome, the genetic information is organized linearly as chromosomes, whereas it is organized as circular plasmid DNA in the mitochondria and plastids (Daniell et al. 2016), although linear forms of DNA have been reported in the mitochondria and chloroplast (Oldenburg and Bendich 2015, 2016). In plants,

V. Clemis  M. Zaidi  B. Fofana (&) Charlottetown Research and Development Centre, Agriculture and Agri-Food Canada, 440 University Avenue, Charlottetown, PE C1A 4N6, Canada e-mail: [email protected]

the number of homologous chromosomes in the nuclear genome can vary from two (diploid) to twelve (dodecaploid) (Ainouche et al. 2009). Nuclear genomes with more than two sets of chromosomes per cell are referred to as polyploid genomes. Polyploidy can arise through three mechanisms: autopolyploidy by genome duplication from a single species (Qiu et al. 2020), allopolyploidy through interspecific hybridization of two or more genomes followed by chromosome doubling (Levin 1983), and segmental allopolyploidy through single species differentiation followed by interspecific hybridization and chromosome doubling of the hybrids (Stebbins 1947; Levin 1983; Soltis 1984; Adams and Wendel 2005; Zaman et al. 2019; Edger et al. 2019). In each genome (nuclear, mitochondrial, chloroplastic), genes are the physical units of inheritance, made of specific collections of nucleotides and arranged one after another, coding generally for proteins and leading to specific characteristics or phenotypes under the control of regulatory elements in specific environmental or developmental conditions (Galli et al. 2020). These regulatory elements can be directly surrounding the genes or be located far away from their target genes, even on different chromosomes. Therefore, genes and regulatory elements are not necessarily physically associated with each other and may result in complex, genome-wide networks of chromosomal interactions (Miele and Dekker 2008; Galli et al. 2020). Taken together, variation in ploidy, gene copy

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_11

235

236

number, structures of genes, and regulatory elements contribute to the genetic diversity, adaptation, and crop production in a given environment. Cultivated flax (Linum usitatissimum L. ssp usitatissimum) is an annual self-pollinated crop of the Linaceae family and has been grown for its bast fibers (fiber flax, [Linum usitatissimum L. convar. elongatum (Vav. and Ell.)]) or its oilproducing seeds (linseed or oilseed flax, [convar. mediterraneum (Vav. and Ell.) Kulpa and Danert]) since ancient times (Zohary 1999; Fofana et al. 2010a). The plant origin and history (Allaby et al. 2005; Xie et al. 2018a; Morello et al. 2020), its natural and induced genetics diversity, breeding, biology, and botanical description, and economic importance have been reviewed extensively (Green and Marshall 1984; Cullis 2007; Green et al. 2008; Diederichsen et al. 2013; Soto-Cerda et al. 2013, 2021; Xie et al. 2018a; Holme et al. 2019). Flax seed is a health-promoting functional crop generally regarded as safe (GRAS) by USFDA (2009). Hence, the commercial production of genetically modified flax remains an impediment to conquering such markets (Booker et al. 2017). Thus, mutational breeding approaches have been adopted (Green and Marshall 1984; Rowland et al. 1995; Ntiamoah and Rowland 1997; Chantreau et al. 2013; Fofana et al. 2017b). Nonetheless, these approaches are timeconsuming, non-precise and rely on randomly induced mutations (Carroll 2017; Holme et al. 2019). Currently, no systematic review on flax gene editing has been made public. A thorough understanding of current genome-editing tools, currently existing genome-edited flax genetic resources as well as target traits appears to be a key driver for developing strategies that will ensure resilient flax crop production in the context of changing climate. Where and how far can gene editing take the beautiful flax crop in the CRISPR era? This fundamental question will be addressed. In this chapter, we will review the current knowledge of genome-editing tools, and their applications to flax improvement will be discussed.

V. Clemis et al.

11.2

Gene Editing

11.2.1 Definition and History 11.2.1.1 Definition A gene or genome editing refers to the core technology that uses chimeric, engineered nuclease made of sequence-specific DNAbinding domains fused to a non-specific DNA cleavage module to efficiently and precisely alter the genetic material structure of a living organism by inducing targeted DNA double-strand breaks (DSBs). This DNA cleavage stimulates the cellular DNA repair mechanisms, including error-prone non-homologous end joining (NHEJ) and homology-directed repair (HDR) (Wyman and Kanaar 2006; Gaj et al. 2013). Gene editing can be defined as a correction of nucleotide spelling errors in the DNA sequence or a correction by insertion/deletion of the desired/ unwanted DNA fragment in the genetic code of an organism, similar to editing words in a book. 11.2.1.2 History The concept of gene editing or gene repair is not new. The mid to late 1990s has seen the development of the gene therapy concept through which mutated genes causing chronic diseases could be repaired and reverted to the wild type haplotypes using oligonucleotide technologies related to antisense, antigene, or ribozyme (Woolf et al. 1995; Sierakowska et al. 1996; Cole-Strauss et al. 1996; Jones et al. 1996; Verma and Somia 1998; Woolf 1998). However, these technologies were inefficient and laborintensive (Esvelt and Wang 2013). Novelties in gene editing arose with the advent of engineered meganuclease and nuclease technologies such as meganucleases (Seligman et al. 2002; Stoddard 2005; Silva et al. 2011; de Souza 2012; Davies et al. 2017), zinc finger nucleases (ZFNs) (Jamieson et al. 1994; Greisman and Pabo 1997; Lloyd et al. 2005; Shukla et al. 2009; Davis and Stokoe 2010), transcription activator-like effector nucleases (TALENs) (Moscou and Bogdanove 2009; Boch et al. 2009), and the clustered regularly interspaced short palindromic repeats

11

Genome-Editing Tools for Flax Genetic Improvement

(CRISPR/Cas9) system (Deveau et al. 2008; Haurwitz et al. 2010; Wiedenheft et al. 2012; Doubna and Charpentier 2014; Travis 2015). Whereas ZFNs and TALENs have proved their efficiency and precision, the CRISPR system has imposed itself for both its robustness, efficiency, precision, simplicity, and low cost (Zaman et al. 2019).

11.2.2 Gene-Editing Tools and Processes 11.2.2.1 Gene-Editing Tools Whereas many gene-editing tools have been described (Woolf et al. 1995; Cole-Strauss et al. 1996; Jones et al. 1996; Verma and Somia 1998; Woolf 1998; Seligman et al. 2002; Stoddard 2005; Moscou and Bogdanove 2009; Boch et al. 2009; Wiedenheft et al. 2012), this review will only focus on the engineered nuclease geneediting methods ZFNs, TALENs, and CRISPR, while putting more emphasis on the CRISPR technology (Fig. 11.1). ZFNs Zinc finger nucleases (ZFNs) are engineered chimeric fusion of zinc finger binding protein domain (Cys2-His2) and a DNA cleavage

237

endonuclease domain of FokI (Beerli and Barbas 2002; Urnov et al. 2010). ZFN gene-editing technology was originally described by Beerli et al. (1998) and first identified as repetitive Cys2-His2 domains in the protein transcription factor IIIA from Xenopus oocytes (Miller et al. 1985; Brown et al. 1985). The technology relies on the principle that zinc finger proteins can typically recognize up to any 18 bp stretch of DNA sequence, specify a unique position in a given complex genome (Nardelli et al. 1992; Beerli and Barbas 2002), and make a doublestrand DNA break (DSB). The individual zinc finger domain (Cys2-His2) consists of 30 amino acids in a conserved bba configuration interacting with zinc (Miller et al. 1985; Brown et al. 1985; Beerli and Barbas 2002). ZFN can be stacked as 3–6 zinc finger domain arrays to increase the DNA-binding specificity and be fused to a restriction endonuclease such as FokI. Zin fingers function in a pair of 9 bp site-specific binding zinc fingers, each fused to a FokI endonuclease. The dimerized FokI endonuclease binds the target and cleaves the double-strand DNA at a specific position between the two zinc finger binding sites. Once the FokI endonuclease cuts the DNA, creating a double-strand break (DSB), cellular repair mechanisms tend to repair the damage through the non-homologous end

Fig. 11.1 Overview of potential phenotypic change induced by a nucleotide mutation using ZFN, TALEN, or CRISPR gene-editing tools

238

V. Clemis et al.

TALENs Like ZFNs, TALEN uses the same bacterial DNA cleavage domain, FokI, fused to a DNA recognition domain from the transcription factors produced by the plant pathogenic bacteria of the genus Xanthomonas (Carroll 2017). The TALE protein contains DNA-binding domains made of a series of 33–35 amino-acid repeat domains that flank a central DNA-binding region (Lino et al. 2018) and recognize a single base pair each. Its specificity is determined by two hypervariable amino acids (Mak et al. 2012; Deng et al. 2012; Gaj et al. 2013), and the modular TALE repeats are linked together to recognize contiguous DNA sequence. Two flanking TALEs to the target sequence must dimerize (Sommer et al. 2015;

Jaganathan et al. 2018) to create a DNA doublestrand break, the repair of which involves the NHEJ mechanism as with the ZFNs. For more pictogram details, the reader is referred to previous reviews (Lino et al. 2018; Zaman et al. 2019). The versatility of TALEN compared to ZFN is that the TALE DNA-binding array can be fused to various effector (activator/repressor) proteins for gene knockdown, gene knockout, gene knock-in (by providing a HDR repair template), gene activation, gene repression, base editing, and imaging (Qi et al. 2013; Mali et al. 2013; Cho et al. 2013a). As of July 26, 2021, TALEN has been reported in as many as 2123 publications between the year 1948 and 2021 (Fig. 11.2) in plants (Bi and Yang 2017; Luo et al. 2019; Kazama et al. 2019; Shinoyama et al. 2020; Han et al. 2020), animals (Chen et al. 2017; Tang et al. 2018; Xia et al. 2019), and microorganisms (Benjamin et al. 2016). However, the large cDNA size for each TALE (3 kb), the technical challenges associated with the protein design, and the intensive labor for its synthesis and validation constitute a real barrier to a large-scale adoption and routine application of TALEN (Doubna and Charpentier 2014; Lino et al. 2018).

Fig. 11.2 Comparative evolution of ZFN, TALEN, and CRISPR between 1948 and 2021 in the NCBI PubMed database as of July 26, 2021. ZFN, a total of 1689 entries

with a peak in 2014; TALEN, a total of 2123 entries with a peak in 2016; CRISPR, a total of 28,371 entries with 6222 entries in 2020 alone

joining (NHEJ) pathway, leading to error-prone base or fragment insertion/deletion frameshift mutations (Davies et al. 2017). More pictogram details can be seen in Zaman et al. (2019) and Lino et al. (2018). Since its first description, zinc finger nucleases have been used in gene editing of insects (Doyon et al. 2008), plants, and animals (Urnov et al. 2010; Zhu et al. 2011; Davies et al. 2017), reaching a peak in 2014 (Fig. 11.2).

11

Genome-Editing Tools for Flax Genetic Improvement

CRISPR Unlike the site-specific nuclease ZFNs and TALEN, CRISPR gene-editing technology relies on a programmable RNA-guided DNA endonuclease that can be designed to cleave any DNA sequence in any genome (Gaj et al. 2013), ultimately leading to targeted induced mutations and the desired phenotypic traits (Enciso-Rodriguez et al. 2019; Jaganathan et al. 2018). CRISPR involves two components: a single guide RNA (sgRNA) that binds to the desired target gene sequence and Cas9 (CRISPR-associated protein 9) endonuclease (Lino et al. 2018). The sgRNA forms a ribonucleoprotein complex with Cas9 (Kim et al. 2020), guided to a specific target sequence using 20 base pairs at the “5” end of the target. Cas9 must also recognize a short sequence (2–5 nucleotides) adjacent to the target sequence called the protospacer adjacent motif (PAM) sequence and cuts the double-strand DNA at the specific location (Cho et al. 2014; Redman et al. 2016; Anders et al. 2016; Lino et al. 2018) similar to FokI. Recently, other optimized Cas proteins including CRISPR/Cas12a (Cpf1) have been developed and tested (Zetsche et al. 2015; Kim et al. 2020; Pan et al. 2021). The reader is referred to previous reviews for more pictogram details (Lino et al. 2018; Zhang et al. 2018; Zaman et al. 2019). The specificity and restricted selectivity of sgRNA within the genome during the CRISPR design are ensured by the 6–11 nucleotides before the PAM (Cho et al. 2014). Similar to TALEN, CRISPR can be adapted for gene knockdown, gene knockout, gene knock-in (by providing a HDR repair template), gene activation, and gene repression by fusing to various effector (activator/repressor) proteins to a deactivated Cas9, base editing, and gene imaging by fusing with a GFP (Qi et al. 2013; Mali et al. 2013; Cho et al. 2013a, b; Lino et al. 2018; Pan et al. 2021). Since its first description (Ishino et al. 1987) and its predicted roles in DNA repair or gene regulation (Makarova et al. 2002; Guy et al. 2004), CRISPR opened new frontiers in genome editing in many biological systems (Doubna and Charpentier 2014; Yin et al. 2014; Zhang et al. 2017; Lino et al. 2018; Kim et al.

239

2020). Because of its simplicity in design, low cost, and versatility compared to TALEN (Zhang et al. 2016; Lino et al. 2018; Pan et al. 2021), CRISPR has currently dominated any other geneediting technology in all applications (Fig. 11.2), including agriculture (Ricroch et al. 2017). However, it is known for its higher off-target effects than TALEN (Lino et al. 2018). Nonetheless, this off-target effect can be mitigated using Cas9 nickase, requiring two sgRNAs flanking the opposite strands of the target DNA sequence to perform the double-strand DNA cut (Cho et al. 2014; Lino et al. 2018) or with highfidelity engineered Cas9 nucleases (Kleinstiver et al. 2016; Slaymaker et al. 2016).

11.2.2.2 CRISPR Gene-Editing Process While ZFNs and TALENs are still used extensively, only the CRISPR gene-editing process, as it relates to gene-editing tool development and delivery mechanisms, will be described herein. SgRNA Design A proper sgRNA design relies on a careful selection of target DNA sites (Lino et al. 2018; Kim et al. 2020). For gene discovery and characterization, whole-genome sequencing, transcriptomic sequencing, and/or a combination of the two are the first steps for generating the genomic resources. Whereas single and multibase mismatches may be tolerated at the “5” end of the sgRNA, far from the PAM, the DNA cleavage strictly requires a near-perfect match between the target DNA and the “seed” region located 6–11 nucleotides upstream of the PAM in the sgRNA (Cho et al. 2014; Anders et al. 2016), knowing that the mismatch often leading to offtarget cutting. Hence, many computational tools and software packages have been developed to facilitate proper sgRNA design (Ito et al. 2015; Park et al. 2015; Lino et al. 2018; Kim et al. 2020). SgRNA and Cas9 Delivery To perform an efficient and specific DSB DNA cut, the CRISPR/Cas9 system is required to be assembled, referred to as cargo, and carried into

240

the cell via different carriers, termed vehicles (Lino et al. 2018). Cargo Systems Used with CRISPR

The most commonly used cargo systems include (a) DNA plasmid constructs harboring the appropriate sgRNA and the Cas9 protein sequences along with gene expression reporter genes such as GFP, RFP, or antibiotic resistance markers, and appropriate promoters (Oleszkiewicz et al. 2021). This cargo system involves the cloning of sgRNA and Cas9 sequences in a suitable single or binary expression vector (Ito et al. 2015; Jaganathan et al. 2018); (b) the mRNA coding for Cas9 and a sgRNA or the Cas9 protein and sgRNA (Lino et al. 2018); (c) a ribonucleoprotein complex of the Cas9 protein and its associated sgRNA (Kim et al. 2020). The choice of one of these cargo systems is usually determined by the type of targeted tissue/cells, the delivery methods (Lino et al. 2018; Kim et al. 2020), as well as the crop reproduction system (Hameed et al. 2020). Delivery Methods Used with CRISPR

Currently, CRISPR delivery into the cell remains the major bottleneck (Lino et al. 2018). In plants, the delivery vehicles can vary depending on the organism, tissue, or cell types (Oleszkiewicz et al. 2021). Indeed, various delivery vehicles for CRISPR/Cas9 can include Agrobacterium-mediated transformation by co-culture, and particle bombardment of plant explants or calli (Miao et al. 2013; Klimek-Chodacka et al. 2018; Pan et al. 2021; Oleszkiewicz et al. 2021; Kieu et al. 2021), PEG-mediated protoplast transfection (Kim et al. 2020; Toda and Okamoto 2020), electroporation (Oleszkiewicz et al. 2021), or gold nanoparticle CRISPR/Cas9 encapsulation (Lino et al. 2018; Demirer et al. 2021). The physical delivery methods (Agro-transformation and shotgun particle bombardment) use plasmid constructs (Kieu et al. 2021; Zhang et al. 2021), whereas protoplast transfection and nanoparticle encapsulated CRISPR/Cas9 delivery systems can involve both plasmid constructs (Kieu et al. 2021) and Cas9:sgRNA ribonucleoprotein (RNP) complex (Kim et al. 2020). Transfection of

V. Clemis et al.

preassembled CRISPR/Cas9 RNPs leads to genome editing without risk of introducing foreign DNA sequences in the cells (Woo et al. 2015; Toda et al. 2019). In contrast, all delivery methods (agro-transformation, transfection, and nanoparticle encapsulation) involving plasmid DNA cargoes may sometimes lead to the insertion of small plasmid DNA fragments at ontarget and off-target sites in the host cells. These fragments can be removed through breeding cycles for some crop species, but not for those reproduced clonally by asexual reproduction (Woo et al. 2015). Hence, an appropriate selection of the cargo and the delivery vehicle is crucial for a final and optimal editing outcome depending on the crop type. Gene-Editing Validation

CRISPR/Cas9 gene editing can be validated in vitro or in vivo (Cho et al. 2014; Lino et al. 2018; Kim et al. 2020). The in vitro validation consists of supplying the target gene amplicon as template for the CRISPR/Cas9 RNP complex cleavage assay in vitro, followed by gel visualization of the cleaved fragments. The in vivo validation consists of delivering the CRISPR/ Cas9 RNP complex into living cells or plants, followed by genomic DNA extraction and Sanger sequencing of the target gene or wholegenome sequencing (Kim et al. 2020). This validation can also be achieved by performing a mismatch cleavage detection assays or surveyor assay through which target DNA is amplified, denatured, and reannealed. Mismatched reannealed sequences are cleaved with the surveyor or the T7E1 nuclease. After a gel run, the cleavage products give the percentage of edited clones (Nekrasov et al. 2013; Vouillot et al. 2015; Sentmanat et al. 2018). This procedure should also be validated by target gene or genome sequencing (Oleszkiewicz et al. 2021).

11.2.3 Gene Editing in Plants The ability of gene editing to modify specific gene sequences and to create loss-of-function variants has become a powerful tool in gene

11

Genome-Editing Tools for Flax Genetic Improvement

functional studies for the development of new products (Sauer et al. 2016; Davies et al. 2017). As such, model plants including Arabidopsis, Nicotiana tabacum, and N. benthaniama have been gene edited, leading to obvious new phenotypes (Fauser et al. 2014; Gao et al. 2015; Nekrasov et al. 2013). Hence, attempts in developing unique traits in ornamental (Zhang et al. 2016), horticultural crops such as tomato (Brooks et al. 2014; Ito et al. 2015) and carrot (Oleszkiewicz et al. 2021), as well as staple crops including rice (Zhou et al. 2019), wheat (Cui et al. 2019), and barley (Howells et al. 2018; Jouanin et al. 2020) have been reported. Nonetheless, no commercial products from plant-edited crops are so far on the market to our knowledge.

11.3

Gene Editing in Flax

Flax is a dual crop grown for its seed and fiber (Xie et al. 2018a). While the fiber products are not meant for human or animal consumption, flax seed and its by-products have multi-purpose uses as industrial, food, and natural health products (US-FDA 2009; Shim et al. 2014; Pitarresi et al. 2015; Wang et al. 2015; Kaneda et al. 2016; Graupner et al. 2020). Where and how far can gene editing take flax crop to its full potential in terms of both agronomic performance and various uses?

11.3.1 Success Stories in Flax Genetic Transformation and Gene-Editing With its granted GRAS status, genetic manipulations in flax have so far been very limited. Apart from the traditional radiation or chemical mutagenesis that led to many commercial flax cultivars (Green and Marshall 1984; Rowland et al. 1995; Ntiamoah and Rowland 1997), only one registered transgenic linseed cultivar, CDC Triffid, has been reported (McHughen and Holm 1994; McHughen et al. 1997), then de-registered in 2001 and pulled from the market due to the concerns about genetically modified flax exports

241

(Ryan and Smyth 2012). Since then, flax engineering by genetic transformation mostly performed in Eastern Europe was focused on fiber flax for understanding the plant biological processes (Lorenc-Kukula et al. 2009) and for improved fiber properties (Wróbel-Kwiatkowska et al. 2009, 2015; Mierziak et al. 2020; Morello et al. 2020). Nonetheless, engineered flax plants with increased antioxidant capacity and improved oil composition and stability in the seed have been reported (Zuk et al. 2012). Many other traits have been the focus of genetic transformation (See Ludvíková and Griga 2015 for review). But, no such flax lines seems yet to have reached the market. With the advent of gene editing, shifting the paradigm in plant engineering, more appetite for flax engineering has re-emerged. Using a nontransgenic site-specific gene-editing approach combining a simultaneous delivery of singlestranded oligonucleotides (ssODNs) and a sitespecific DNA double-strand breaker, either TALENs or CRISPR/Cas9, Sauer et al. (2016) developed herbicide tolerance trait in flax (Linum usitatissimum) by precisely editing the 5′enolpyruvylshikimate-3-phosphate synthase (EPSPS) gene. The EPSPS editing occurred at sufficient frequency that the authors could regenerate whole plants from edited protoplasts without employing selection. The edited plants were subsequently determined to be tolerant to the herbicide glyphosate in greenhouse spray tests. The progenies from these plants showed a Mendelian segregation as expected for the EPSPS edits (Sauer et al. 2016). This work was the first to describe the regeneration of whole non-transgenic flax plant edited for a new trait in such an important agricultural crop. To our knowledge, no other such reports on flax genome editing have been made available, and large boulevards are open to non-transgenic flax development.

11.3.2 Tools Paving the Road for Flax Gene Editing As described earlier, CRISPR /Cas gene editing in a given organism requires a proper guide RNA

242

design, which in turn relies on a prior knowledge of the target sequence. Fortunately, the linseed flax draft genome sequence was released in 2012 (Wang et al. 2012) and refined in 2018 by You et al. (2018a, b). The genome of the fiber flax Atlant was recently released by combining Oxford Nanopore (long reads) and Illumina (short reads) sequencing data (Dmitriev et al. 2021), making the genomic resources available for appropriate target sequence selection and gRNA design in both flax types. Since the mid-1980s, in the middle of the biotechnology revolution, plant biotechnology has heavily used plant transformation. Hence, protocols for flax protoplast isolation (Barakat and Cocking 1983; Ling and Binding 1987, 1997; Roger et al. 1996; Millam et al. 2005; Aoyagi 2011), protoplast-mediated plant tissue transformation (Beyaz et al. 2017; Sauer et al. 2016; Morello et al. 2020; Majumder et al. 2020; Kesiraju et al. 2020; Chantreau and Neutelings 2020) as well as tissue culture and plant regeneration protocols have been developed and successfully deployed (Sauer et al. 2016). Therefore, for CRISPR/Cas gene editing in flax, proven transformation and plant regeneration systems are in place (Wijayanto and McHughen 1999; Shysha et al. 2013; Bastaki and Cullis 2014), although some adjustments may still be required. To overcome the bottlenecks of stable transformation, the protoplast delivery of preassembled CRISPR/Cas9 RNPs approach can be refined and used for generating the transgene-free plants (Hameed et al. 2020) in both fiber and linseed flax.

11.3.3 Flax Genetic Resources Flax cultivation is distributed worldwide and displays a large genetic diversity (Diederichsen et al. 2013), a diversity partially consisting of 3378 accessions from 76 countries and maintained at the Plant Genetic Resources of Canada (Diederichsen et al. 2013; Soto-Cerda et al. 2021). This collection includes wild progenitor species of cultivated flax, landraces, breeding lines, and cultivars. It has been characterized and

V. Clemis et al.

curated to a reduced core collection of 407 accessions from 38 countries, capturing the diversity spectrum present in the whole collection (Diederichsen et al. 2013; Soto-Cerda et al. 2021). Based on the current data, linseed flax displays a greater genetic diversity than its fiber flax counterpart (Xie et al. 2018a). While notwithstanding the diversity of the 48,000 accessions of the World Genebanks, the Canadian core collection is a prime flax genetic resources for use and for initiating flax gene editing.

11.3.4 Flax Traits of Interest for Gene Editing Target traits for flax improvement in linseed and fiber flax types are not the same. In linseed type flax, the number of bolls, oil content, short stem, and branched architecture are the most important traits for crop breeding perspectives. Fiber flax, in contrast, is taller and unbranched, and the fiber percentage and length are the essential indexes for fiber flax breeding (Copur et al. 2006; Xie et al. 2018a). In addition to these traits, seed characteristics, seed and plant metabolite profiles, and plant responses to biotic and abiotic stresses are of high agronomic importance.

11.3.4.1 Agronomic Plant Traits Seed Traits Flax seed setting is determined by good plant growth and a proper flowering time during the growth season (Craufurd and Wheeler 2009). Hence, candidate genes associated with flowering have been identified (Soto-Cerda et al. 2021). After the flower development stage, the number of bolls per plant, seeds per bolls, and 1000 seed weight are key determinants for flax seed yield performance (Zhang et al. 2014). Nonetheless, negative correlations between the number of tillers (branching) and number of bolls have been reported in some environments (Darapuneni et al. 2014). Thus, candidate genes associated with some of these traits have been located through genome-wide association mapping (GWAM)

11

Genome-Editing Tools for Flax Genetic Improvement

analysis (You et al. 2014; Soto-Cerda et al. 2014; Xie et al. 2018b). In addition to seed yield traits, seed-quality traits including oil, fatty acid, protein, and lignan content are market-driven characteristics (Dribnenki and Green 1995; Kenaschuk 2005; Diederichsen et al. 2006; Fofana et al. 2006; Cloutier et al. 2010). With the advent of genome-wide association mapping, candidate genes were also identified and located in the flax genome for most of these traits (You et al. 2018a, b; Soto-Cerda et al. 2018). Yield trait being quantitative and controlled by multiple genes, gene or regulatory element activation by CRISPR in one or more of the yield component traits can have a significant impact on flax yield potential. Moreover, based on the current understanding of the fatty acid (Singh et al. 1994; Fofana et al. 2004, 2006, 2010a; Vrinten et al. 2005) and secoisolariciresinol diglucoside (SDG) lignan (Ghose et al. 2014, 2015) biosynthesis in flax, seed quality traits including oil content, lignan, and fatty acid profiles can be tailored to fit many market niches. Hence, based on the findings by Ghose et al. (2014) and Fofana et al. (2017a), the potential for producing nonglucosylated forms of SDG lignans in vitro (Ghose et al. 2015) and in planta (Fofana et al. 2017b) has been investigated and reported using mutagenesis approaches using the flax lignan biosynthesis gene UGT74S1 (Ghose et al. 2014, 2015) as a target. The use of CRISPR gene editing in lignan biosynthesis could provide more opportunities for producing more potent and bioavailable lignan metabolites from flax. Cyanogenic glucoside (CG), which is toxic to human and animals, can be found in both the seed and foliage, depending on the developmental stage (Zuk et al. 2020). Recently, a cyanogenic glucosyltransferase, LuCGT1, has been identified and enzymatically characterized as active toward both aliphatic and aromatic substrates. The gene is expressed in leaf tissue and in developing seed, and its expression level was found to be drastically reduced in mutant flax lines low in cyanogenic glucosides (Kazachkov et al. 2020). This is a significant finding. Indeed, using CRISPR gene editing, it will now be possible to produce whole

243

flax plants (forage) for animal feed while keeping their traditional uses. Developing flax bolls have indeed been shown to have a well-balanced omega-6/omega-3 ratio after heating treatments, and the use of developing flax bolls as a fresh vegetable by humans has been suggested if adequately processed (Fofana et al. 2010b). However, the high level of CGs in the bolls and developing seed from all current flax cultivars impedes such large utilization of flax. Fiber Traits Flax bast fiber is located in the outer layer of the flax stem accounts for 20–30% of the total stem weight and contains about 70% of cellulose fibrils deposited in the cell walls together with hemicellulose, pectin, and lignin (Titok et al. 2006; Morello et al. 2020). Genes and processes involved in cell wall deposition and thickening in flax have been investigated (Morello et al. 2020 for review). The traits that are highly relevant to flax fiber quality include plant height, unbranched plant architecture, and fiber percentage and fiber length (Xie et al. 2018a). These authors conducted a high-through sequencing of a flax collection including 224 varieties from around the world and constructed a comprehensive map of flax genome variation, defined the evolutionary diversity pattern tendencies within the flax collections, and identified candidate genes associated with important agronomic seed and fiber traits by GWAS. The authors identified genetic loci and suggested candidate genes for fiber percentage and plant height in flax species (Xie et al. 2018a). These findings provide opportunities toward tailoring the fiber flax plant's architecture and the fiber content and/or structure in the cell wall using a fine-tuned CRISPR geneediting approach. Flax and Biotic and Abiotic Stresses Flax plant growth and development can be adversely impacted by many environmental stressors, referred to as biotic and abiotic stresses (Mostafavi 2011; Ding et al. 2012; Pereira 2016; Fukao et al. 2019; Boba et al. 2020; Sun et al. 2021). With its shallow root system (Sertse

244

et al. 2019), flax is very sensitive to water stress (drought and flooding) (Dash et al. 2014; Fukao et al. 2019; Soto-Cerda et al. 2020; Wang et al. 2021). Recent transcriptional data pointed to the central roles for many transcription factors and the proline biosynthetic pathway, interacting with drought stress-responsive genes through the DNA repair protein 50-interacting protein 1 (RIN1) (Wang et al. 2021). Moreover, using multiple GWAS models, Sertse et al. (2019; 2021) identified quantitative trait nucleotides (QTNs) and their associated candidate genes for drought tolerance in a flax mini core collection under irrigated and non-irrigated field conditions. These findings suggest that CRISPR gene editing can be used to activate/regulate some of these TFs and amplify some of the water stress positive, responsive gene interactions to create flax lines more tolerant to water stress. Along with these abiotic stresses, flax is challenged by many fungal pathogens, including powdery mildew (Rashid et al. 1998; Rashid 1998; Rashid and Duguid 2005; Asgarinia et al. 2013; Braun et al. 2019), pasmo caused by the fungal disease Septoria linicola (He et al. 2019a), and fusarium wilt caused by Fusarium oxysporum f. sp. lini (Dmitriev et al. 2017; You et al. 2017). Using a multi-scale statistical analysis of powdery mildew, pasmo, and fusarium wilt in the core collection, You et al. (2017) uncovered a large genetic diversity and paved the way for genome-wide association mapping. Hence, Asgarinia et al. (2013) identified 4 QTLs associated with powdery mildew resistance using SSR markers. Using single nucleotide polymorphic (SNP) markers, GWAS studies identified candidate resistance gene analogs for pasmo in the flax core collection (He et al. 2019b), while differential gene transcriptomes were reported for fusarium wilt (Dmitriev et al. 2017). With most of the current flax cultivars being susceptible to these diseases, it is anticipated that CRISPR gene editing would allow the editing of the gene loci present in the susceptible lines without introducing any foreign DNA sequences.

V. Clemis et al.

Flax Gene-Edited Traits Regulatory Frameworks The setback from the linseed Triffid flax episode (McHughen and Holm 1994; McHughen et al. 1997; Ryan and Smyth 2012) has sent a red flag to the flax research community. Flax seed commodities (seed and oil) are market-driven by consumer’s opinion, which can be the opposite to the regulations framed by regulatory agencies, as was the case for Triffid flax. Learning from this episode, gene-edited linseed flax should be designed in such a way that no foreign DNA fragments are inserted at both on-target and offtarget sites in host cells (Woo et al. 2015). Hence, a proper gRNA design and delivery of preassembled Cas9 protein-gRNA ribonucleoproteins (RNPs), rather than plasmids that encode these components, into the plant cells (protoplast) could remove the likelihood of inserting recombinant DNA in the host genome (Cho et al. 2013b). This approach will broaden the applicability of CRISPR gene editing to both the linseed and fiber flax as suggested for all transformable plant species by Woo et al. (2015). Therefore, if adequately designed and executed, gene-edited flax may meet both consumer and regulatory agency requirements.

11.4

Concluding Remarks

Flax crop improvement has so far relied on traditional and mutagenesis breeding techniques and has achieved extraordinary milestones with the release of valuables cultivars. In the nextgeneration sequencing era, extensive flax genomic resources have been generated, the complexity of the genome has been deciphered, and gene networks are being better understood. With climate change, increasing demands for flax byproducts, and informed consumer-driven markets, the flax research community is facing paramount challenges for efficiency and credibility to ensure a viable, sustainable, and safe flax supply chain. Gene editing is set to take some of these challenges.

11

Genome-Editing Tools for Flax Genetic Improvement

Acknowledgements The authors thank Dr. Kaushik Ghose and Mr. David Main for their useful proof reading of this chapter.

References Adams KL, Wendel JF (2005) Polyploidy and genome evolution in plants. Curr Opin Plant Biol 8(2):135– 141. https://doi.org/10.1016/j.pbi.2005.01.001 Ainouche ML, Fortune PM, Salmon A, Parisod C, Grandbastien M-A, Fukunaga K, Ricou M, Misset M-T (2009) Hybridization, polyploidy and invasion: lessons from Spartina (Poaceae). Biol Invasions 11:1159. https://doi.org/10.1007/s10530-008-9383-2 Allaby RG, Peterson GW, Merriwether DA, Fu YB (2005) Evidence of the domestication history of flax (Linum usitatissimum L.) from genetic diversity of the sad2 locus. Theor Appl Genet 112(1):58–65. https:// doi.org/10.1007/s00122-005-0103-3 Anders C, Bargsten K, Jinek M (2016) Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol Cell 61 (6):895–902. https://doi.org/10.1016/j.molcel.2016. 02.020 Aoyagi H (2011) Application of plant protoplasts for the production of useful metabolites. Biochem Eng J 56:1–8 Asgarinia P, Cloutier S, Duguid S, Rashid K, Mirlohi A, Banik M, Saeidi G (2013) Mapping quantitative trait loci for powdery mildew resistance in flax (Linum usitatissimum L.). Crop Sci 53:2462–2472 Barakat MN, Cocking EC (1983) Plant regeneration from protoplast-derived tissues of Linum usitatissimum L. (Flax). Plant Cell Rep 2(6):314–317 Bastaki NK, Cullis CA (2014) Floral-dip transformation of flax (Linum usitatissimum) to generate transgenic progenies with a high transformation rate. J Vis Exp 94:52189. https://doi.org/10.3791/52189 Beerli RR, Barbas CF III (2002) Engineering polydactyl zinc-finger transcription factors. Nat Biotechnol 20 (2):135–141. https://doi.org/10.1038/nbt0202-135 Beerli RR, Segal DJ, Dreier B, Barbas CF III (1998) Toward controlling gene expression at will: specific regulation of the erbB-2/HER-2 promoter by using polydactyl zinc finger proteins constructed from modular building blocks. PNAS 95(25):14628– 14633. https://doi.org/10.1073/pnas.95.25.14628 Benjamin R, Berges BK, Solis-Leal A, Igbinedion O, Strong CL, Schiller MR (2016) TALEN gene editing takes aim on HIV. Hum Genet 135(9):1059–1070. https://doi.org/10.1007/s00439-016-1678-2 Beyaz R, Aycan M, Yildiz, M (2017) Explant position effect on gene transformation to flax (Linum usitatissimum L.) via Agrobacterium tumefaciens. Period Biol 119(3):223–228 Bi H, Yang B (2017) Gene Editing With TALEN and CRISPR/Cas in rice. Prog Mol Biol Transl Sci

245 149:81–98. https://doi.org/10.1016/bs.pmbts.2017.04. 006 Boba A, Kostyn K, Kozak B, Wojtasik W, Preisner M, Prescha A, Gola EM, Lysh D, Dudek B, Szopa J, Kulma A (2020) Fusarium oxysporum infection activates the plastidial branch of the terpenoid biosynthesis pathway in flax, leading to increased ABA synthesis. Planta 251(2):50. https://doi.org/10.1007/ s00425-020-03339-9 Boch J, Scholze H, Schornack S, Landgraf A, Hahn S, Kay S, Lahaye T, Nickstadt A, Bonas U (2009) Breaking the code of DNA binding specificity of TAL-type III effectors. Science 326(5959):1509– 1512. https://doi.org/10.1126/science.1178811 Booker HM, Lam EG, Smyth SJ (2017) Ex-post assessment of genetically modified, low level presence in Canadian flax. Transgenic Res 26(3):399–409. https:// doi.org/10.1007/s11248-017-0012-7 Bozza CG, Pawlowski WP (2008) The cytogenetics of homologous chromosome pairing in meiosis in plants. Cytogenet Genome Res 120(3–4):313–319. https:// doi.org/10.1159/000121080 Braun U, Preston CD, Cook RTA, Götz M, Takamatsu S (2019) Podosphaera lini (Ascomycota, Erysiphales) revisited and re-united with Oidium lini. Plant Pathol Quar 9(1):128–138. https://doi.org/10.5943/ppq/9/1/ 11 Brooks C, Nekrasov V, Lippman ZB, Van Eck J (2014) Efficient gene editing in tomato in the first generation using the clustered regularly interspaced short palindromic repeats/CRISPR-associated9 system. Plant Physiol 166(3):1292–1297. https://doi.org/10.1104/ pp.114.247577 Brown RS, Sander C, Argos P (1985) The primary structure of transcription factor TFIIIA has 12 consecutive repeats. FEBS Lett 186(2):271–274. https:// doi.org/10.1016/0014-5793(85)80723-7 Carroll D (2017) Genome editing: past, present, and future. Yale J Biol Med 90(4):653–659 Chantreau M, Neutelings G (2020) Virus-induced gene silencing of cell wall genes in flax (Linum usitatissimum). Methods Mol Biol 2172:65–74 Chantreau M, Grec S, Gutierrez L, Dalmais M, Pineau C, Demailly H, Paysant-Leroux C, Tavernier R, Trouvé J-P, Chatterjee M, Guillot X, Brunaud V, Chabbert B, van Wuytswinkel O, Bendahmane A, Thomasset B, Hawkins S (2013) PT-flax (phenotyping and tilling of flax) development of a flax (Linum usitatissimum L.) mutant population and tilling platform for forward and reverse genetics. BMC Plant Biol 13:159. https://doi. org/10.1186/1471-2229-13-159 Chen Y, Yu J, Niu Y, Qin D, Liu H, Li G, Hu Y, Wang J, Lu Y, Kang Y, Jiang Y, Wu K, Li S, Wei J, He J, Wang J, Liu X, Luo Y, Si C, Bai R, Zhang K, Liu J, Huang S, Chen Z, Wang S, Chen X, Bao X, Zhang Q, Li F, Geng R, Liang A, Shen D, Jiang T, Hu X, Ma Y, Ji W, Sun YE (2017) Modeling Rett syndrome using TALEN-edited MECP2 mutant cynomolgus monkeys. Cell 169(5):945–955.e10. https://doi.org/10.1016/j. cell.2017.04.035

246 Cho SW, Kim S, Kim JM, Kim JS (2013a) Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat Biotechnol 31 (3):230–232. https://doi.org/10.1038/nbt.2507 Cho SW, Lee J, Carroll D, Kim JS, Lee J (2013b) Heritable gene knockout in Caenorhabditis elegans by direct injection of Cas9-sgRNA ribonucleoproteins. Genetics 195(3):1177–1180. https://doi.org/10.1534/ genetics.113.155853 Cho SW, Kim S, Kim Y, Kweon J, Kim HS, Bae S, Kim J-S (2014) Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases. Genome Res 24(1):132–141 Cloutier S, Ragupathy R, Niu Z, Duguid S (2010) SSRbased linkage map of flax (linum usitatissimun L.) and mapping of QTLs underlying fatty acid composition traits. Mol Breeding 28:437–451. https://doi.org/10. 1007/s11032-010-9494-1 Cole-Strauss A, Yoon K, Xiang Y, Byrne BC, Rice MC, Kollman WK, Kmiec EB (1996) Correction of the mutation responsible for sickle cell anemia by an RNA-DNA oligonucleotide. Science 273(5280): 1386–1389 Copur O, Gur MA, Karakus M, Demirel U (2006) Determination of correlation and path analysis among yield components and seed yield in oil flax varieties (Linum usitatissimum L.). J Biol Sci 6(4):50–53 Craufurd PQ, Wheeler TR (2009) Climate change and the flowering time of annual crops. J Exp Bot 60(9):2529– 2539. https://doi.org/10.1093/jxb/erp196 Cui X, Balcerzak M, Schernthaner J, Babic V, Datla R, Brauer EK, Labbé N, Subramaniam R, Ouellet T (2019) An optimised CRISPR/Cas9 protocol to create targeted mutations in homoeologous genes and an efficient genotyping protocol to identify edited events in wheat. Plant Methods 15:119. https://doi.org/10. 1186/s13007-019-0500-2 Cullis CA (2007) Flax. In: Kole C (ed) Genome mapping and molecular breeding in plants. Springer, Berlin, Heidelberg, pp 275–295 Daniell H, Lin C-S, Yu M, Chang W-J (2016) Chloroplast genomes: diversity, evolution, and applications in genetic engineering. Genome Biol 17(1):134. https:// doi.org/10.1186/s13059-016-1004-2 Darapuneni MK, Morgan GD, Ibrahim AMH, Duncan RW (2014) Association of flax seed yield and its components in southeast Texas using path coefficient and biplot analyses. J Crop Improv 28(1):1–16. https:// doi.org/10.1080/15427528.2013.846285 Dash PK, Cao Y, Jailani AK, Gupta P, Venglat P, Xiang D, Rai R, Sharma R, Thirunavukkarasu N, Abdin MZ, Yadava DK, Singh NK, Singh J, Selvaraj G, Deyholos M, Kumar PA, Datla R (2014) Genome-wide analysis of drought induced gene expression changes in flax (Linum usitatissimum). GM Crops Food 5(2):106–119. https://doi.org/10. 4161/gmcr.29742 David H, David A, Bade P, Millet J, Morvan O, Morvan C (1994) Cell wall composition and morphogenic response in callus derived from protoplasts of two

V. Clemis et al. fibre flax (Linum usitatissimum L.) genotypes. J Plant Physiol 143(3):379–384 Davies JP, Kumar S, Sastry-Dent L (2017) Use of zincfinger nucleases for crop improvement. Prog Mol Biol Transl Sci 149:47–63. https://doi.org/10.1016/bs. pmbts.2017.03.006 Davis D, Stokoe D (2010) Zinc finger nucleases as tools to understand and treat human diseases. BMC Med 8:42. https://doi.org/10.1186/1741-7015-8-42 de Souza N (2012) Primer: genome editing with engineered nucleases. Nat Methods 9:27. https://doi.org/ 10.1038/nmeth.1848 Demirer GS, Silva TN, Jackson CT, Thomas JB, Ehrhardt WD, Rhee SY, Mortimer JC, Landry MP (2021) Nanotechnology to advance CRISPR-Cas genetic engineering of plants. Nat Nanotechnol 16(3):243–250. https://doi.org/10.1038/s41565-021-00854-y Deng D, Yan C, Pan X, Mahfouz M, Wang J, Zhu J-K, Shi Y, Nieng Yan N (2012) Structural basis for sequence-specific recognition of DNA by TAL effectors. Science 335(6069):720–723 Deveau H, Barrangou R, Garneau JE, Labonté J, Fremaux C, Boyaval P, Romero DA, Horvath P, Moineau S (2008) Phage response to CRISPRencoded resistance in Streptococcus thermophilus. J Bacteriol 190(4):1390–1400. https://doi.org/10. 1128/JB.01412-07 Diederichsen A, Raney P, Duguid S (2006) Variation of mucilage in flax seed and its relationship with other seed characters. Crop Sci 46(1):365–371 Diederichsen A, Kusters PM, Kessler D, Bainas Z, Gugel RK (2013) Assembling a core collection from the flax world collection maintained by Plant Gene resources of Canada. Genet Resour Crop Evol 60:1479– 1485. https://doi.org/10.1007/s10722-012-9936-1 Ding Y, Fromm M, Avramova Z (2012) Multiple exposures to drought ‘train’ transcriptional responses in Arabidopsis. Nat Commun 3:740. https://doi.org/ 10.1038/ncomms1732 Dmitriev AA, Krasnov GS, Rozhmina TA, Novakovskiy RO, Snezhkina AV, Fedorova MS, Yurkevich OY, Muravenko OV, Bolsheva NL, Kudryavtseva AV, Melnikova NV (2017) Differential gene expression in response to Fusarium oxysporum infection in resistant and susceptible genotypes of flax (Linum usitatissimum L.). BMC Plant Biol 17:253. https://doi.org/10.1186/s12870-017-1192-2 Dmitriev AA, Pushkova EN, Novakovskiy RO, Beniaminov AD, Rozhmina TA, Zhuchenko AA, Bolsheva NL, Muravenko OV, Povkhova LV, Dvorianinova EM, Kezimana P, Snezhkina AV, Kudryavtseva AV, Krasnov GS, Melnikova NV (2021) Genome sequencing of fiber flax cultivar Atlant using oxford nanopore and illumina platforms. Front Genet 11:590282. https://doi.org/10.3389/fgene. 2020.590282 Doubna JA, Charpentier E (2014) Genome editing. The new frontier of genome engineering with CRISPRCas9. Science 346(6213):1258096. https://doi.org/10. 1126/science

11

Genome-Editing Tools for Flax Genetic Improvement

Doyon Y, McCammon JM, Miller JC, Faraji F, Ngo C, Katibah GE, Amora1 R, Hocking TD, Zhang L, Rebar EJ, Gregory PD, Urnov FD, Sharon L, Amacher SL (2008) Heritable targeted gene disruption in zebrafish using designed zinc-finger nucleases. Nat Biotechnol 26(6):702–708. https://doi.org/10.1038/ nbt1409 Dribnenki JCP, Green AG (1995) Linola “947” low linolenic acid flax. Can J Plant Sci 75:201–202 Edger PP, Poorten TJ, VanBuren R, Hardigan MA, Colle M, McKain MR, Smith RD, Teresi SJ, Nelson ADL, Wai CM, Alger EI, Bird KA, Yocca AE, Pumplin N, Ou S, Ben-Zvi G, Brodt A, Baruch K, Swale T, Shiue L, Acharya CB, Cole GS, Mower JP, Childs KL, Jiang N, Lyons E, Freeling M, Puzey JR, Knapp SJ (2019) Origin and evolution of the octoploid strawberry genome. Nat Genet 51:541–547. https:// doi.org/10.1038/s41588-019-0356-4 Enciso-Rodriguez F, Manrique-Carpintero NC, Nadakuduti SS, Buell CR, Zarka D, Douches D (2019) Overcoming self-incompatibility in diploid potato using CRISPR-Cas9. Front Plant Sci 10:376 Esvelt KM, Wang HH (2013) Genome-scale engineering for systems and synthetic biology. Mol Syst Biol 9:641. https://doi.org/10.1038/msb.2012.66 Fauser F, Schiml S, Puchta H (2014) Both CRISPR/Casbased nucleases and nickases can be used efficiently for genome engineering in Arabidopsis thaliana. Plant J 79(2):348–359. https://doi.org/10.1111/tpj. 12554 Fofana B, Duguid S, Cloutier S (2004) Cloning of fatty acid biosynthetic genes b-ketoacyl CoA synthase, fatty acid elongase, stearoyl-ACP desaturase, and fatty acid desaturase and analysis of expression in the early developmental stages of flax (Linum usitatissimum) seeds. Plant Sci 166(6):1487–1496 Fofana B, Cloutier S, Duguid S, Ching J, Rampitsch C (2006) Gene expression expression of stearoyl-ACP desaturase and Δ12 fatty acid desaturase 2 is modulated during sed development of flax (Linum usitatissimum). Lipids 41(7):705–712 Fofana B, Ragupathy R, Cloutier S (2010a) Flax Lipids: classes, biosynthesis, genetics and the promise of applied genomics for understanding and altering of fatty acids. In: Gilmore PL (ed) Lipids: categories, biological functions and metabolism, nutrition, and health. Nova Science Publishers Inc, pp 71–98 Fofana B, Cloutier S, Kirby CW, McCallum J, Duguid S (2010b) A well balanced omega-6/omega-3 ratio in developing flax bolls after heating and its implications for use as a fresh vegetable by humans. Food Res Intern 44:2459–2464 Fofana B, Ghose K, McCallum J, You FM, Cloutier S (2017a) UGT74S1 is the key player in controlling secoisolariciresinol diglucoside (SDG) formation in flax. BMC Plant Biol 17(1):35. https://doi.org/10. 1186/s12870-017-0982-x Fofana B, Ghose K, Somalraju A, McCallum J, Main D, Deyholos MK, Rowland GG, Cloutier S (2017b) Induced mutagenesis in UGT74S1 gene leads to stable

247 new flax lines with altered secoisolariciresinol diglucoside (SDG) profiles. Front Plant Sci 8:1638. https:// doi.org/10.3389/fpls.2017.01638 Fukao T, Barrera-Figueroa BE, Juntawong P, Peña-Castro JM (2019) Submergence and waterlogging stress in plants: a review highlighting research opportunities and understudied aspects. Front Plant Sci 10:340. https://doi.org/10.3389/fpls.2019.00340 Gaj T, Gersbach CA, Barbas CF III (2013) ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends Biotechnol 31(7):397–405. https://doi. org/10.1016/j.tibtech.2013.04.004 Galli M, Feng F, Gallavotti A (2020) mapping regulatory determinants in plants. Front Genet 11:1280. https:// doi.org/10.3389/fgene.2020.591194 Gao J, Wang G, Ma S, Xie X, Wu X, Zhang X, Wu Y, Zhao P, Xia Q (2015) CRISPR/Cas9-mediated targeted mutagenesis in Nicotiana tabacum. Plant Mol Biol 87(1–2):99–110. https://doi.org/10.1007/s11103014-0263-0 Ghose K, Selvaraj K, McCallum J, Kirby CW, SweeneyNixon M, Cloutier SJ, Deyholos M, Datla R, Fofana B (2014) Identification and functional characterization of a flax UDP-glycosyltransferase glucosylating secoisolariciresinol (SECO) into secoisolariciresinol monoglucoside (SMG) and diglucoside (SDG). BMC Plant Biol 14:82. https://doi.org/10.1186/1471-222914-82 Ghose K, McCallum J, Sweeney-Nixon M, Fofana B (2015) Histidine 352 (His352) and tryptophan 355 (Trp355) are essential for flax UGT74S1 glucosylation activity toward secoisolariciresinol. PLoS ONE 10(2): e116248. https://doi.org/10.1371/journal.pone.0116248 Graupner N, Lehmann KH, Weber DE, Hilgers HW, Bell EG, Walenta I, Berger L, Brückner T, Kölzig K, Randerath H, Bruns A, Frank B, Wonneberger M, Joulian M, Bruns L, von Dungern F, Janßen A, Gries T, Kunst S, Müssig J (2020) Novel low-twist bast fibre yarns from flax tow for high-performance composite applications. Materials (Basel) 14(1):105. https://doi.org/10.3390/ma14010105 Green AG, Marshall DR (1984) Isolation of induced mutants of linseed (Linum usitatissimum L.) having reduced linolenic acid content. Euphytica 33:321–328. https://doi.org/10.1007/BF00021128 Green AG, Chen Y, Singh SP, Dribnenki JCP (2008) Flax. In: Kole C, Hall TC (eds) Compendium of transgenic crop plants: transgenic oilseed crops. Blackwell Publishing Ltd., Oxford, pp 199–226 Greisman HA, Pabo CO (1997) A general strategy for selecting high-affinity zinc finger proteins for diverse DNA target sites. Science 275(5300):657–661 Guy CP, Majerník AI, Chong JPJ, Bolt EL (2004) A novel nuclease-ATPase (Nar71) from archaea is part of a proposed thermophilic DNA repair system. Nucleic Acids Res 32(21):6176–6186. https://doi. org/10.1093/nar/gkh960 Hameed A, Mehmood MA, Shahid M, Fatma S, Khan A, Ali S (2020) Prospects for potato genome editing to engineer resistance against viruses and cold-induced

248 sweetening. GM Crops Food 11(4):185–205. https:// doi.org/10.1080/21645698.2019.1631115 Han J, Xia Z, Liu P, Li C, Wang Y, Guo L, Jiang G, Zhai W (2020) TALEN-based editing of TFIIAy5 changes rice response to Xanthomonas oryzae pv Oryzae. Sci Rep 10(1):2036. https://doi.org/10.1038/ s41598-020-59052-w Haurwitz RE, Jinek M, Wiedenheft B, Zhou K, Doudna JA (2010) Sequence- and structure-specific RNA processing by a CRISPR endonuclease. Science 329(5997):1355–1358. https://doi.org/10.1126/science He L, Xiao J, Rashid KY, Jia G, Li P, Yao Z, Wang X, Cloutier S, You FM (2019a) Evaluation of genomic prediction for pasmo resistance in flax. Int J Mol Sci 20(2):359 He L, Xiao J, Rashid KY, Yao Z, Li P, Jia G, Wang X, Cloutier S, You FM (2019b) Genome-wide association studies for pasmo resistance in flax (Linum usitatissimum L.). Front Plant Sci 9:1982. https://doi. org/10.3389/fpls.2018.01982 Holme IB, Gregersen PL, Brinch-Pedersen H (2019) Induced genetic variation in crop plants by random or targeted mutagenesis: convergence and differences. Front Plant Sci 10:1468. https://doi.org/10.3389/fpls. 2019.01468 Howells RM, Craze M, Bowden S, Wallington EJ (2018) Efficient generation of stable, heritable gene edits in wheat using CRISPR/Cas9. BMC Plant Biol 18 (1):215. https://doi.org/10.1186/s12870-018-1433-z Ishino Y, Shinagawa H, Makino K, Amemura M, Nakata A (1987) Nucleotide sequence of the iap gene, responsible for alkaline phosphatase isozyme conversion in Escherichia coli, and identification of the gene product. J Bacteriol 169(12):5429–5433 Ito Y, Nishizawa-Yokoi A, Endo M, Mikami M, Toki S (2015) CRISPR/Cas9-mediated mutagenesis of the RIN locus that regulates tomato fruit ripening. Biochem Biophys Res Commun 467(1):76–82. https://doi.org/10.1016/j.bbrc.2015.09.117 Jaganathan D, Ramasamy K, Sellamuthu G, Jayabalan S, Venkataraman G (2018) CRISPR for crop improvement: an update review. Front Plant Sci 9:985. https:// doi.org/10.3389/fpls.2018.00985 Jamieson AC, Kim SH, Wells JA (1994) In vitro selection of zinc fingers with altered DNA-binding specificity. Biochemistry 33(19):5689–5695. https://doi.org/10. 1021/bi00185a004 Jones JT, Seong-Wook L, Sullenger BA (1996) Tagging ribozyme reaction sites to follow trans-splicing in mammalian cells. Nat Med 2(6):643–648 Jouanin A, Gilissen LJWJ, Schaart JG, Leigh FJ, Cockram J, Wallington EJ, Boyd LA, van den Broeck HC, van der Meer IM, America AHP, Visser RGF, Smulders MJM (2020) CRISPR/Cas9 gene editing of gluten in wheat to reduce gluten content and exposure-reviewing methods to screen for coeliac safety. Front Nutr 7:51. https://doi.org/10.3389/fnut. 2020.00051 Kaneda T, Yoshida H, Nakajima Y, Toishi M, Nugroho AE, Morita H (2016) Cyclolinopeptides,

V. Clemis et al. cyclic peptides from flaxseed with osteoclast differentiation inhibitory activity. Bioorg Med Chem Lett 26 (7):1760–1761. https://doi.org/10.1016/j.bmcl.2016. 02.040 Kazachkov M, Li Q, Shen W, Wang L, Gao P, Xiang D, Datla R, Zou J (2020) Molecular identification and functional characterization of a cyanogenic glucosyltransferase from flax (Linum usitatissimum). PLoS ONE 15(2):e0227840. https://doi.org/10.1371/ journal.pone.0227840 Kazama T, Okuno M, Watari Y, Yanase S, Koizuka C, Tsuruta Y, Sugaya H, Toyoda A, Itoh T, Tsutsumi N, Toriyama K, Koizuka N, Arimura SI (2019) Curing cytoplasmic male sterility via TALEN-mediated mitochondrial genome editing. Nat Plants 5(7):722–730. https://doi.org/10.1038/s41477-019-0459-z Kenaschuk EO (2005) High linolenic acid flax. US patent 6870077, 22 Mar 2005 Kesiraju K, Tyagi S, Mukherjee S, Rai R, Singh NK, Sreevathsa R, Dash PK (2020) An apical meristemtargeted in planta transformation method for the development of transgenics in flax (Linum usitatissimum): optimization and validation. Front Plant Sci 11:562056. https://doi.org/10.3389/fpls.2020.562056 Kieu NP, Lenman M, Wang ES, Petersen BL, Andreasson E (2021) Mutations introduced in susceptibility genes through CRISPR/Cas9 genome editing confer increased late blight resistance in potatoes. Sci Rep 11 (1):4487. https://doi.org/10.1038/s41598-021-83972-w Kim H, Choi J, Won KH (2020) A stable DNA-free screening system for CRISPR/RNPs-mediated gene editing in hot and sweet cultivars of Capsicum annuum. BMC Plant Biol 20(1):449. https://doi.org/ 10.1186/s12870-020-02665-0 Kleinstiver BP, Pattanayak V, Prew MS, Tsai SQ, Nguyen NT, Zheng Z, Joung JK (2016) High-fidelity CRISPR-Cas9 nucleases with no detectable genomewide off-target effects. Nature 529(7587):490–495 Klimek-Chodacka M, Oleszkiewicz T, Lowder LG, Qi Y, Baranski R (2018) Efficient CRISPR/Cas9-based genome editing in carrot cells. Plant Cell Rep 37 (4):575–586 Kowles R (2001) Variations in chromosome number and structure. In: Solving problems in genetics. Springer, New York, NY. https://doi.org/10.1007/978-1-46130205-6_5 Levin DA (1983) Polyploidy and novelty in flowering plants. Am Nat 122(1):1–25 Ling HQ, Binding H (1987) Plant regeneration from protoplasts in Linum. Plant Breed 98:312–317 Ling HQ, Binding H (1997) Transformation in protoplast cultures of Linum usitatissimum and L. suffruticosum mediated with PEG and with Agrobacterium tumefaciens. J Plant Physiol 151(4):479–488 Lino CA, Harper JC, Carney JP, Timlin JA (2018) Delivering CRISPR: a review of the challenges and approaches. Drug Deliv 25(1):1234–1257. https://doi. org/10.1080/10717544.2018.1474964 Lloyd A, Plaisier CL, Carroll D, Drews GN (2005) Targeted mutagenesis using zinc-finger nucleases in

11

Genome-Editing Tools for Flax Genetic Improvement

Arabidopsis. Proc Natl Acad Sci U S A 102(6):2232– 2237. https://doi.org/10.1073/pnas.0409339102 Lorenc-Kukuła K, Zuk M, Kulma A, Czemplik M, Kostyn K, Skala J, Starzycki M, Szopa J (2009) Engineering flax with the GT family 1 Solanum sogarandinum glycosyltransferase SsGT1 confers increased resistance to Fusarium infection. J Agric Food Chem 57(15):6698–66705. https://doi.org/10. 1021/jf900833k Ludvíková M, Griga M (2015) Transgenic flax/linseed (Linum usitatissimum L.)—expectations and reality. Czech J Genet Plant Breed 51(4):123–141. https://doi. org/10.17221/104/2015-CJGPB Luo M, Li H, Chakraborty S, Morbitzer R, Rinaldo A, Upadhyaya N, Bhatt D, Louis S, Richardson T, Lahaye T, Ayliffe M (2019) Efficient TALENmediated gene editing in wheat. Plant Biotechnol J 17(11):2026–2028. https://doi.org/10.1111/pbi.13169 Majumder S, Sarkar C, Datta K, Datta SK (2020) Establishment of the ‘imbibed seed piercing’ method for Agrobacterium-mediated transformation of jute and flax bast fibre crops via phloem-specific expression of the b-glucuronidase Gene. Ind Crops Prod 154:112620. https://doi.org/10.1016/j.indcrop.2020. 112620 Mak AN-S, Bradley P, Cernadas RA, Bogdanove AJ, Barry L, Stoddard BL (2012) The crystal structure of TAL effector PthXo1 bound to its DNA target. Science 335(6069):716–719 Makarova KS, Aravind L, Grishin NV, Rogozin IB, Koonin EV (2002) A DNA repair system specific for thermophilic Archaea and bacteria predicted by genomic context analysis. Nucleic Acids Res 30 (2):482–496. https://doi.org/10.1093/nar/30.2.482 Mali P, Aach J, Stranges PB, Esvelt KM, Moosburner M, Kosuri S, Yang L, Church GM (2013) CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering. Nat Biotechnol 31(9):833–838 McHughen A, Holm F (1994) Transgenic flax with environmentally and agronomically sustainable attributes. Transgenic Res 4:3–11. https://doi.org/10.1007/ BF01976495 McHughen A, Rowland G, Holm F, Bhatty R, Kenaschuk E (1997) CDC Triffid transgenic flax. Can J Plant Sci 77:641–643. https://doi.org/10.4141/P96-188 Miao J, Guo D, Zhang J, Huang Q, Qin G, Zhang X, Wan J, Gu H, Qu LJ (2013) Targeted mutagenesis in rice using CRISPR-Cas system. Cell Res 23 (10):1233–1236. https://doi.org/10.1038/cr.2013.123 Miele A, Dekker J (2008) Long-range chromosomal interactions and gene regulation. Mol Biosyst 4 (11):1046–1057. https://doi.org/10.1039/b803580f Mierziak J, Wojtasik W, Kulma A, Dziadas M, Kostyn K, Dymińska L, Hanuza J, Żuk SJ (2020) 3hydroxybutyrate is active compound in flax that upregulates genes involved in DNA methylation. Int J Mol Sci 21(8):2887. https://doi.org/10.3390/ ijms21082887

249 Millam S, Obert B, Pret’ová A (2005) Plant cell and biotechnology studies in Linum usitatissimum—a review. PCTOC 82(1):93–103 Miller J, McLachlan AD, Klug A (1985) Repetitive zincbinding domains in the protein transcription factor IIIA from Xenopus oocytes. EMBO J 4(6):1609–1614 Morello L, Pydiura N, Galinousky D, Blume Y, Breviario D (2020) Flax tubulin and CesA superfamilies represent attractive and challenging targets for a variety of genome- and base-editing applications. Funct Integr Genomics 20(1):163–176. https://doi. org/10.1007/s10142-019-00667-2 Moscou MJ, Bogdanove AJ (2009) A simple cipher governs DNA recognition by TAL effectors. Science 326(5959):1501. https://doi.org/10.1126/science. 1178817 Mostafavi K (2011) A study effects of drought stress on germination and early seedling growth of flax (Linum usitatissimum L.) cultivars. Adv Environ Biol 5 (10):3307–3312 Nardelli J, Gibson T, Charnay P (1992) Zinc finger-DNA recognition: analysis of base specificity by sitedirected mutagenesis. Nucleic Acids Res 20 (16):4137–4144. https://doi.org/10.1093/nar/20.16. 4137 Nekrasov V, Staskawicz WD, Jones JDG, Kamoun S (2013) Targeted mutagenesis in the model plant Nicotiana benthamiana using Cas9 RNA-guided endonuclease. Nat Biotechnol 31(8):691–693. https:// doi.org/10.1038/nbt.2655 Ntiamoah C, Rowland GG (1997) Inheritance and characterization of two low linolenic acid EMSinduced McGregor mutant flax (Linum usitatissimum L.). Can J Plant Sci 77:353–358. https://doi.org/10. 4141/P96-137 Oldenburg DJ, Bendich AJ (2015) DNA maintenance in plastids and mitochondria of plants. Front Plant Sci 6:883 Oldenburg DJ, Bendich AJ (2016) The linear plastid chromosomes of maize: terminal sequences, structures, and implications for DNA replication. Curr Genet 62(2):431–442 Oleszkiewicz T, Klimek-Chodacka M, Kruczek M, Godel-Jędrychowska K, Sala K, Milewska-Hendel A, Zubko M, Kurczyńska E, Qi Y, Baranski R (2021) Inhibition of carotenoid biosynthesis by CRISPR/Cas9 triggers cell wall remodelling in carrot. Int J Mol Sci 22(12):6516. https://doi.org/10.3390/ijms22126516 Pan C, Sretenovic S, Qi Y (2021) CRISPR/dCas-mediated transcriptional and epigenetic regulation in plants. Curr Opin Plant Biol 60:101980. https://doi.org/10. 1016/j.pbi.2020.101980 Park J, Bae S, Kim J-S (2015) Cas-designer: a web-based tool for choice of CRISPR-Cas9 target sites. Bioinformatics 31(24):4014–4016. https://doi.org/10.1093/ bioinformatics/btv537 Pereira A (2016) Plant abiotic stress challenges from the changing environment. Front Plant Sci 7:1123. https:// doi.org/10.3389/fpls.2016.01123

250 Pitarresi G, Tumino D, Mancuso A (2015) Thermomechanical behaviour of flax-fibre reinforced epoxy laminates for industrial applications. Materials (Basel) 8(11):7371–7388. https://doi.org/10.3390/ma8115384 Qi LS, Larson MH, Gilbert LA, Doudna JA, Weissman JS, Arkin AP, Lim WA (2013) Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152(5):1173–1183 Qiu T, Liu Z, Liu B (2020) The effects of hybridization and genome doubling in plant evolution via allopolyploidy. Mol Biol Rep 47(7):5549–5558. https://doi. org/10.1007/s11033-020-05597-y Rashid KY (1998) Powdery mildew on flax: a new disease in western Canada. Can J Plant Pathol 20:216 Rashid K, Duguid S (2005) Inheritance of resistance to powdery mildew in flax. Can J Plant Pathol 27(3): 404–409. https://doi.org/10.1080/0706066050950 7239 Rashid KY, Kenaschuk EO, Platford RG (1998) Diseases of flax in Manitoba in 1997 and first report of powdery mildew on flax in Canada. Can Plant Dis Surv 78:99– 100 Redman M, King A, Watson C, King D (2016) What is CRISPR/Cas9? Arch Dis Child Educ Pract Ed 101 (4):213–215. https://doi.org/10.1136/archdischild2016-310459 Ricroch A, Clairand P, Harwood W (2017) Use of CRISPR systems in plant genome editing: toward new opportunities in agriculture. Emerg Top Life Sci 1 (2):169–182. https://doi.org/10.1042/etls20170085 Roger D, David A, David H (1996) Immobilization of flax protoplasts in agarose and alginate beads. Correlation between ionically bound cell-wall proteins and morphogenetic response. Plant Physiol 112(3):1191–1199. https://doi.org/10.1104/pp.112.3.1191 Rowland GG, McHughen A, Gusta L, Bhatty RS, MacKenzie SL, Taylor DC (1995) The application of chemical mutagenesis and biotechnology to the modification of linseed (Linum usitatissimum L.). Euphytica 85:317–321. https://doi.org/10.1007/ BF00023961 Ryan CD, Smyth SJ (2012) Economic implications of low-level presence in a zero-tolerance European import market: the case of Canadian Triffid flax. AgBioforum 15(1):21–30 Sauer NJ, Narváez-Vásquez J, Mozoruk J, Miller RB, Warburg ZJ, Woodward MJ, Mihiret YA, Lincoln TA, Segami RE, Sanders SL, Walker KA, Beetham PR, Schöpke CR, Gocal G (2016) Oligonucleotidemediated genome editing provides precision and function to engineered nucleases and antibiotics in plants. Plant Physiol 170(4):1917–1928. https://doi. org/10.1104/pp.15.01696 Seligman LM, Chisholm KM, Chevalier BS, Chadsey MS, Edwards ST, Savage JH, Veill AL (2002) Mutations altering the cleavage specificity of a homing endonuclease. Nucleic Acids Res 30(17):3870–3879. https:// doi.org/10.1093/nar/gkf495 Sentmanat MF, Peters ST, Florian CP, Connelly JP, Pruett-Miller SM (2018) A survey of validation

V. Clemis et al. strategies for CRISPR-Cas9 editing. Sci Rep 8 (1):888. https://doi.org/10.1038/s41598-018-19441-8 Sertse D, You FM, Ravichandran S, Cloutier S (2019) The complex genetic architecture of early root and shoot traits in flax revealed by genome-wide association analyses. Front Plant Sci 10:1483. https://doi. org/10.3389/fpls Sertse D, You FM, Ravichandran S, Soto-Cerda BJ, Duguid S, Cloutier S (2021) Loci harboring genes with important role in drought and related abiotic stress responses in flax revealed by multiple GWAS models. Theor Appl Genet 134(1):191–212. https:// doi.org/10.1007/s00122-020-03691-0 Shim YY, Gui B, Arnison PG, Wang Y, Reaney MJT (2014) Flaxseed (Linum usitatissimum L.) bioactive compounds and peptide nomenclature: a review. Trends Food Sci Technol 38(1):5–20. https://doi.org/ 10.1016/j.tifs.2014.03.011 Shinoyama H, Ichikawa H, Nishizawa-Yokoi A, Skaptsov M, Toki S (2020) Simultaneous TALENmediated knockout of chrysanthemum DMC1 genes confers male and female sterility. Sci Rep 10(1):16165. https://doi.org/10.1038/s41598-020-72356-1 Shukla VK, Doyon Y, Miller JC, DeKelver RC, Moehle EA, Worden SE, Mitchell JC, Arnold NL, Gopalan S, Meng X, Choi VM, Rock JM, Wu YY, Katibah GE, Zhifang G, McCaskill D, Simpson MA, Blakeslee B, Greenwalt SA, Butler HJ, Hinkley SJ, Zhang L, Rebar EJ, Gregory PD, Urnov FD (2009) Precise genome modification in the crop species Zea mays using zinc-finger nucleases. Nature 459 (7245):437–441. https://doi.org/10.1038/nature07992 Shysha EN, Korkhovyu VI, Bayer GYA, Guzenko EV, Lemesh VA, Kartel NA, Yemets AI, Blume YB (2013) Genetic transformation of flax (Linum usitatissimum L.) with chimeric GFP-TUA6 gene for visualisation of microtubules. Cytol Genet 47(2):3–11. https://doi.org/10.3103/S0095452713020096 Sierakowska H, Sambade MJ, Agrawal S, Kale R (1996) Repair or thalassemic human beta-globin mRNA in mammalian cells by antisense oligonucleotides. Proc Nat Acad Sci USA 93(23):12840–12844 Silva G, Poirot L, Galetto R, Smith J, Montoya G, Duchateau P, Paques F (2011) Meganucleases and other tools for targeted genome engineering: perspectives and challenges for gene therapy. Curr Gene Ther 11(1):11–27. https://doi.org/10.2174/1566523117 94520111 Singh S, McKinney S, Green A (1994) Sequence of a cDNA from Linum usitatissimum encoding the stearoyl-ACP carrier protein desaturase. Plant Physiol 140(3):1075 Slaymaker IM, Gao L, Zetsche B, Scott DA, Yan WX, Zhang F (2016) Rationally engineered Cas9 nucleases with improved specificity. Science 351(6268):84–88 Soltis DE (1984) Autopolyploidy in Tolmiea menziesii (Saxifragaceae). Am J Bot 71(9):1171–1174. https:// doi.org/10.2307/2443640 Sommer D, Peters AE, Baumgart AK, beyer M (2015) TALEN-mediated genome engineering to generate

11

Genome-Editing Tools for Flax Genetic Improvement

targeted mice. Chromosome Res 23:43–55. https://doi. org/10.1007/s10577-014-9457-1 Soto-Cerda BJ, Diederichsen A, Ragupathy R, Cloutier S (2013) Genetic characterization of a core collection of flax (Linum usitatissimum L.) suitable for association mapping studies and evidence of divergent selection between fiber and linseed types. BMC Plant Biol 13:78 Soto-Cerda BJ, Duguid S, Booker H, Rowland G, Diederichsen A, Cloutier S (2014) Genomic regions underlying agronomic traits in linseed (Linum usitatissimum L.) as revealed by association mapping. J Integr Plant Biol 56(1):75–87. https://doi.org/10. 1111/jipb.12118 Soto-Cerda BJ, Cloutier S, Quian R, Gajardo HA, Olivos M, You FM (2018) Genome-wide association analysis of mucilage and hull content in flax (Linum usitatissimum L.) seeds. Int J Mol Sci 19(10):2870. https://doi.org/10.3390/ijms19102870 Soto-Cerda BJ, Cloutier S, Gajardo HA, Aravena G, Quian R, You FM (2020) Drought response of flax accessions and identification of quantitative trait nucleotides (QTNs) governing agronomic and root traits by genome-wide association analysis. Mol Breed 40(10):15 Soto-Cerda BJ, Aravena G, Cloutier S (2021) Genetic dissection of flowering time in flax (Linum usitatissimum L.) through single- and multi-locus genome-wide association studies. Mol Genet Genomics 296(4):877– 891. https://doi.org/10.1007/s00438-021-01785-y Stebbins GL Jr (1947) Types of polyploids: their classification and significance. Adv Genet 1:403–429 Stoddard BL (2005) Homing endonuclease structure and function. Q Rev Biophys 38(1):49–95. https://doi.org/ 10.1017/S0033583505004063 Sun C, Ali K, Yan K, Fiaz S, Dormatey R, Bi Z, Bai J (2021) Exploration of epigenetics for improvement of drought and other stress resistance in crops: a review. Plants (Basel) 10(6):1226. https://doi.org/10.3390/ plants10061226 Tang L, Bondareva A, González R, Rodriguez-Sosa JR, Carlson DF, Webster D, Fahrenkrug S, Dobrinski I (2018) TALEN-mediated gene targeting in porcine spermatogonia. Mol Reprod Dev 85(3):250–261. https://doi.org/10.1002/mrd.22961 Titok V, Leontiev V, Shostak L, Khotyleva L (2006) Thermogravimetric analysis of the flax bast fibre bundle. J Nat Fibres 3:35–41. https://doi.org/10.1300/ J395v03n01_04 Toda E, Okamoto T (2020) Gene expression and genome editing systems by direct delivery of macromolecules into rice egg cells and zygotes. Bio Protoc 10(14): e3681. https://doi.org/10.21769/BioProtoc.3681. PMID:33659352;PMCID:PMC7842353 Toda E, Koiso N, Takebayashi A, Ichikawa M, Kiba T, Osakabe K, Osakabe Y, Sakakibara H, Kato N, Okamoto T (2019) An efficient DNA- and selectablemarker-free genome-editing system using zygotes in rice. Nat Plants 5(4):363–368. https://doi.org/10.1038/ s41477-019-0386-z

251 Travis J (2015) Making the cut—CRISPR genomeediting technology shows its power. Science 350 (6267):1456–1457. https://doi.org/10.1126/science. 350.6267.1456 Urnov FD, Rebar EJ, Holmes MC, Zhang HS, Gregory PD (2010) Genome editing with engineered zinc finger nucleases. Nat Rev Genet 11(9):636–646 US-FDA (2009) High linolenic acid flaxseed oil-grn no. 256. https://www.accessdata.fda.gov/scripts/fdcc/? set=GRASNotices&id=256. Accessed 2 Aug 2021 Verma IM, Somia N (1998) Gene therapy- promises, problems and prospects. Nature 389(6648):239–242 Vouillot L, Thélie A, Pollet N (2015) Comparison of T7E1 and surveyor mismatch cleavage assays to detect mutations triggered by engineered nucleases. G3 (Bethesda) 5(3):407–415. https://doi.org/10.1534/ g3.114.015834 Vrinten P, Hu Z, Munchinsky M-A, Rowland G, Qiu X (2005) Two FAD3 desaturase genes control the level of linolenic acid in flax seed. Plant Physiol 139(1):79– 87 Wang Z, Hobson N, Galindo L, Zhu S, Shi D, McDill J, Yang L, Hawkins S, Neutelings G, Datla R, Lambert G, Galbraith DW, Grassa CJ, Geraldes A, Cronk QC, Cullis C, Dash PK, Kumar PA, Cloutier S, Sharpe AG, Wong GK, Wang J, Deyholos MK (2012) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72(3):461–473. https://doi.org/ 10.1111/j.1365-313X.2012.05093.x Wang Y, Fofana B, Roy M, Ghose K, Yao X-H, Nixon M-S, Nair S, Nyomba GBL (2015) Flaxseed lignan secoisolariciresinol diglucoside improves insulin sensitivity through upregulation of GLUT4 expression in diet-induced obese mice. J Funct Foods 18:1–9 Wang W, Wang L, Wang L,Tan M, Ogutu CO,Yin Z, Zhou J,Wang J, Lijun Wang,Yan X (2021) Transcriptome analysis and molecular mechanism of linseed (Linum usitatissimum L.) drought tolerance under repeated drought using single-molecule long-read sequencing. BMC Genomics 22(1):109 Wiedenheft B, Sternberg SH, Doudna JA (2012) RNAguided genetic silencing systems in bacteria and archaea. Nature 482(7385):331–338 Wijayanto T, McHughen A (1999) Genetic transformation of Linum by particle bombardment. In Vitro Cell Dev Biol Plant 35(6):456–465. https://doi.org/10. 1007/s11627-999-0068-z Woo JW, Kim J, Kwon SI, Corvalán C, Cho SW, Kim H, Kim SG, Kim ST, Choe S, Kim JS (2015) DNA-free genome editing in plants with preassembled CRISPRCas9 ribonucleoproteins. Nat Biotechnol 33 (11):1162–1164. https://doi.org/10.1038/nbt.3389 Woolf TM (1998) Therapeutic repair of mutated nucleic acid sequences. Nat Biotechnol 16(4):341–344 Woolf TW, Chase JM, Stinchcomb D (1995) Towards the therapeutic editing of mutated RNA sequences. PNAS 92(18):8298–8302 Wróbel-Kwiatkowska M, Zuk M, Szopa J, Dymińska L, Maczka M, Hanuza J (2009) Poly-3-hydroxy butyric

252 acid interaction with the transgenic flax fibers: FT-IR and Raman spectra of the composite extracted from a GM flax. Spectrochim Acta A Mol Biomol Spectrosc 73(2):286–294. https://doi.org/10.1016/j.saa.2009.02. 034 Wróbel-Kwiatkowska M, Jabłoński S, Szperlik J, Dymińska L, Łukaszewicz M, Rymowicz W, Hanuza J, Szopa J (2015) Impact of CAD-deficiency in flax on biogas production. Transgenic Res 24(6):971– 978. https://doi.org/10.1007/s11248-015-9894-4 Wyman C, Kanaar R (2006) DNA double-strand break repair: all’s well that ends well. Annu Rev Genet 40:363–383. https://doi.org/10.1146/annurev.genet. 40.110405.090451 Xia E, Zhang Y, Cao H, Li J, Duan R, Hu J (2019) TALEN-mediated gene targeting for cystic fibrosisgene therapy. Genes (Basel) 10(1):39. https://doi.org/ 10.3390/genes10010039 Xie D, Dai Z, Yang Z, Tang Q, Sun J, Yang X, Song X, Lu Y, Zhao D, Zhang L, Su J (2018a) Genomic variations and association study of agronomic traits in flax. BMC Genomics 19(1):512. https://doi.org/10. 1186/s12864-018-4899-z Xie D, Dai Z, Yang Z, Sun J, Zhao D, Yang X, Zhang L, Tang Q, Su J (2018b) Genome-wide association study identifying candidate genes influencing important agronomic traits of flax (Linum usitatissimum L.) using SLAF-seq. Front Plant Sci 8:2232. https://doi. org/10.3389/fpls.2017.02232 Yin H, Xue W, Chen S, Bogorad RL, Benedetti E, Grompe M, Koteliansky V, Sharp PA, Jacks T, Anderson DG (2014) Genome editing with Cas9 in adult mice corrects a disease mutation and phenotype. Nat Biotechnol 32(6):551–553 You F, Li P, Kumar S, Ragupathy R, Li ZN, Fu YB, Cloutier S (2014) Genome-wide identification and characterization of the gene families controlling fatty acid biosynthesis in flax (Linum usitatissimum L). J Proteom Bioinform 7(10):310–326. https://doi.org/ 10.4172/jpb.1000334 You FM, Jia G, Xiao J, Duguid SD, Rashid KY, Booker HM, Cloutier S (2017) Genetic variability of 27 traits in a core collection of flax (Linum usitatissimum L.). Front Plant Sci 8:1636. https://doi.org/10. 3389/fpls.2017.01636 You FM, Xiao J, Li P, Yao Z, Jia G, He L, Kumar S, Soto-Cerda B, Duguid SD, Booker HM, Rashid KY, Cloutier S (2018a) Genome-wide association study and selection signatures detect genomic regions associated with seed yield and oil quality in flax. Int J Mol Sci 19(8):2303. https://doi.org/10.3390/ijms19082303 You FM, Xiao J, Li P, Yao Z, Jia G, He L, Zhu T, Luo MC, Wang X, Deyholos MK, Cloutier S (2018b) Chromosome-scale pseudomolecules refined by optical, physical and genetic maps in flax. Plant J 95 (2):371–384. https://doi.org/10.1111/tpj.13944

V. Clemis et al. Zaman QU, Li C, Cheng H, Hu Q (2019) Genome editing opens a new era of genetic improvement in polyploid crops. Crop J 7(2):141–150. https://doi.org/10.1016/j. cj.2018.07.004 Zetsche B, Gootenberg JS, Abudayyeh OO, Slaymaker IM, Makarova KS, Essletzbichler P, Volz SE, Joung J, van der Oost J, Regev A, Koonin EV, Zhang F (2015) Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell 163(3):759–771 Zhang T, Lamb EG, Soto-Cerda B, Duguid S, Cloutier S, Rowland G, Diederichsen A, Booker HM (2014) Structural equation modeling of the Canadian flax (Linum usitatissimum L.) core collection for multiple phenotypic traits. Can J Plant Sci 94:1325–1332 Zhang B, Yang X, Yang C, Li M, Guo Y (2016) Exploiting the CRISPR/cas9 system for targeted genome mutagenesis in Petunia. Sci Rep 6:20315. https://doi.org/10.1038/srep20315 Zhang Y, Ma X, Xie X, Liu Y-G (2017) CRISPR/Cas9based genome editing in plants. Prog Mol Biol Transl Sci 149:133–150 Zhang Y, Massel K, Godwin ID, Gao C (2018) Applications and potential of genome editing in crop improvement. Genome Biol 19:210. https://doi.org/ 10.1186/s13059-018-1586-y Zhang Y, Iaffaldano B, Qi Y (2021) CRISPR ribonucleoprotein-mediated genetic engineering in plants. Plant Commun 2(2):100168. https://doi.org/ 10.1016/j.xplc.2021.100168 Zhou J, Xin X, He Y, Chen H, Li Q, Tang X, Zhong Z, Deng K, Zheng X, Akher SA, Cai G, Qi Y, Zhang Y (2019) Multiplex QTL editing of grain-related genes improves yield in elite rice varieties. Plant Cell Rep 38 (4):475–485. https://doi.org/10.1007/s00299-018-2340-3 Zhu C, Smith T, McNulty J, Rayla AL, Lakshmanan A, Siekmann AF, Buffardi M, Meng X, Shin J, Padmanabhan A, Cifuentes D, Giraldez AJ, Look AT, Epstein JA, Lawson ND, Wolfe SA (2011) Evaluation and application of modularly assembled zinc-finger nucleases in zebrafish. Development 138(20):4555– 4564. https://doi.org/10.1242/dev.066779 Zohary D (1999) Monophyletic vs. polyphyletic origin of the crops on which agriculture was founded in the Near East. Gen Res Crop Evol 46:133–142 Zuk M, Prescha A, Stryczewska M, Szopa J (2012) Engineering flax plants to increase their antioxidant capacity and improve oil composition and stability. J Agric Food Chem 60(19):5003–5012. https://doi. org/10.1021/jf300421m Zuk M, Pelc K, Szperlik J, Sawula A, Szopa J (2020) Metabolism of the cyanogenic glucosides in developing flax: metabolic analysis, and expression pattern of genes. Metabolites 10(7):288. https://doi.org/10.3390/ metabo10070288

Genomics Assisted Breeding Strategy in Flax

12

Nadeem Khan, Hamna Shazadee, Sylvie Cloutier, and Frank M. You

12.1

Introduction

Genomic selection (GS) or genomic prediction (GP) is a form of marker-assisted selection that uses genome-wide markers to predict genomicestimated breeding values (GEBVs) of phenotypes (Meuwissen et al. 2001). A major purpose of plant breeding programs is to maximize genetic gain for desirable traits. The genetic gain can be calculated using the “Breeder's equation” in classical quantitative genetics (Lush 1937). In the past, only phenotypic data were normally used to calculate genetic gain. GS is now becoming a conventional approach in plant and animal breeding. The use of GS over traditional breeding programs offers three major benefits, namely increasing selection accuracy, shortening breeding cycles, and reducing breeding cost. Cost-effective genotyping on a large scale has enabled the adoption of low-cost genome sequencing. As a result, more complicated and innovative approaches, such as genomic crossprediction using GS methodologies, can now be

N. Khan  H. Shazadee  S. Cloutier (&)  F. M. You (&) Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada e-mail: [email protected] F. M. You e-mail: [email protected]

investigated to achieve various breeding goals. With the development of high-throughput genotyping technology, high-density genome-wide molecular markers can be readily obtained, and many individuals in breeding populations can be genotyped at a low cost. Some of the widely used genotyping methods include genotyping by sequencing (GBS) (He et al. 2019b), wholegenome resequencing (Malmberg et al. 2018), array-based genotyping (Ali et al. 2015), and target sequences-based genotyping (Guo et al. 2019). GS also offers three major applications for breeding programs: (1) offspring selection, (2) evaluation of germplasm and parent selection, and (3) hybrid prediction based on general combining ability (GCA) and specific combining ability (SCA) (Zhao et al. 2015; You et al. 2022). One major challenge in hybrid breeding is the efficient selection of superior hybrids. In that regard, GS has the potential to solve such complexities by predicting the outcomes of crosses from the parental materials, thereby allowing the selection of the best potential parents. Most GS models are constructed based on genome-wide random markers (Norman et al. 2018; He et al. 2019a). However, the cost associated with generating large-scale datasets limits the use of GS in practical breeding programs. Recently, several studies (He et al. 2019a; Lan et al. 2020) have discussed the strategy of using QTLs as markers for improving the predictive ability as well as reducing the genotyping cost. Our recent studies (He et al. 2019a, b) showed that

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_12

253

254

N. Khan et al.

the combined use of single- and multi-locus genome-wide association study (GWAS) methods can identify both large and minor effect QTLs that can be used to build GS models and significantly improve the genomic predictive ability. The use of a QTL-based strategy is therefore beneficial for the successful implementation of GS in breeding programs along with the bonus advantage of reduced genotyping costs. Similarly, the advent of GP and the use of computer simulation or genomic cross-prediction have opened the door for developing virtual crosses and their use in parent selection for two major reasons: (i) utilization of both dense markers or QTLs for estimating GEBVs and (ii) a large number of virtual crosses, such as single, double, three-way or backcross, can be generated costeffectively in a short time through simulation (You et al. 2022). Thus, a major goal of genomic cross-prediction is to determine the best parental materials and to develop superior crosses, both of which being paramount to cross-breeding. Comparing different breeding strategies in large-scale field studies is both expensive and timeconsuming. In contrast, genomic cross-prediction studies can be an effective way to simulate and forecast the outcomes of various breeding strategies. In addition, genomic cross-prediction studies can help in understanding the impact of factors, such as population type for example, on genetic gain and other breeding target variables of interest (e.g., selection accuracy and genetic variation), thereby allowing to improve breeding program efficiencies (Liu et al. 2019). The objective of this chapter is to summarize the recent findings in this area, demonstrating them through a case study on the use of QTLbased strategy and to discuss genomic crossprediction for its potential use in flax breeding programs.

12.2

Factors Affecting Genomic Prediction (GP) Efficiency

Several key factors influence the predictive ability of GS such as the training population (TRP), the size of the test populations (TPs) and

the relationship between the two, the GP models, the markers, etc. Here, we will discuss GP models, TRP, TP and marker types for improving GS accuracy, and we will describe several case studies in flax. Prediction efficiency of GS depends on the predictive ability and the unit cost. There are several methods to evaluate GS predictive ability (de los Campos et al. 2013). The most popular one is cross-validation (CV), which is important in GS for empirical assessments of predictive ability before the GS models are applied to actual selection. CV is performed when both phenotypic and genotypic data are available for a given germplasm collection. Here, the collection is randomly divided into subsets: TRP and TP sets. The marker effects are then calculated based on the data from the TRP set, followed by a prediction based on the marker effects of the genotypic values of plants in the TP set. A measure of the predictive ability is given by the Pearson correlation (r) or the square root of the average squared error (Daetwyler et al. 2013) between the estimated and observed values of the TP set. The fivefold cross-validation is mostly used. That is, the collection is randomly split into five subsets with equal number of individuals. One of them is in turn used as a TP and the remaining as a TRP to construct GS models and calculate predictive ability. This approach has been used in many crops, including flax (Lan et al. 2020), maize (Albrecht et al. 2011), wheat (Heffner et al. 2011), and barley (Heslot et al. 2012). However, the predictive ability estimated by CV is specific to the population and may not always represent the predictive ability required in a practical breeding program. This is due to the fact that population structure can have a significant impact on predictive ability estimations. Several studies such as those from Hickey et al. (2014) and Lehermeier et al. (2014) have used stochastic simulation data and real datasets from maize breeding programs and have shown that predictive ability within and among families can differ significantly in structured populations. To boost the potential of the breeding program, the population structure should be taken into account. This may determine whether or not GS can be applied in a

12

Genomics Assisted Breeding Strategy in Flax

given breeding program. Continuous use of closely related (TRPs to TPs) populations to improve prediction will likely narrow or reduce genetic variation that could contribute to the future selection response and consequently slow down the genetic gains required in long-term GS application (Jannink et al. 2010; Moeinizade et al. 2019).

12.2.1 Genomic Prediction (GP) Models Many GS methodologies, which differ with respect to assumptions about marker effects, have been proposed to obtain GEBVs (Meuwissen et al. 2001). GS models, for example, are generally divided into two groups: parametric and nonparametric (Desta and Ortiz 2014). Parametric models, such as ridge regression best linear unbiased prediction (RR-BLUP) and Bayesian approaches, are the most popular (Endelman 2011; Habier et al. 2011). Only additive effects are accounted for in parametric or linear models, which presume a linear relationship between the phenotype and genotype (Pérez-Rodríguez et al. 2012). The accuracy of GS models varies according to the assumptions and treatments of marker effects. For example, both Bayesian LASSO (BL) and ridge regression models have been found to outperform support vector machines in predicting GEBVs for hostplant resistance to wheat rusts (Ornella et al. 2012). Another study tested six statistical models for GS, namely Bayesian ridge regression, BL, reproducing kernel Hilbert spaces (RKHS) and Bayes A, B and C (Rolling et al. 2020), and found only slight variations in their predictive potentials. Different GP models, such as RRBLUP, genomic best linear unbiased prediction (GBLUP), Bayesian regression, partial least squares regression and machine learning, can produce slight variations in predictive ability (Jannink et al. 2010). Although the accuracy of the different methods is often similar (He et al. 2019a), the prediction differences may be due to difference in the genetic architecture of the traits (Spindel et al. 2015). As a result, optimization of

255

GS models varies depending on the trait and the crop of interest, necessitating the testing of numerous models to determine the model with the best fit, i.e., the one with the highest predictive ability. Theoretically, models like RRBLUP and GBLUP assume that all markers influence trait variation, whereas BayesA and BayesB assume a unique variance for each marker. As a result these models are grouped into two: the first group of models should be more appropriate for complex quantitative traits with a polygenic architecture, whereas the second group of models would likely be better-suited for traits controlled by a small number of genes or QTLs with high impacts (You et al. 2022). For traits controlled by a few genes with large effects, BayesB has been found to perform better than GBLUP or RR-BLUP in several studies (Daetwyler et al. 2010; Jannink et al. 2010; Thavamanikumar et al. 2015). A case study for seed yield (YLD), days to flowing (DTF) and Fusarium wilt (FW) disease resistance, was performed to assess the effect of different models on the predictive ability of GS (Fig. 12.1). To evaluate the predictive ability of three datasets comprising 308, 422, and 571 markers, respectively, ten GS models were tested in the R package BGLR (Pérez and de los Campos 2014) or in G2P (Tang and Liu 2019). These encompass both parametric or nonparametric regressions (Desta and Ortiz 2014), including RR-BLUP (Endelman 2011), GBLUP (Clark and Werf 2013), BRR (Heffner et al. 2009), BL (Li and Sillanpää 2012), Bayes A, Bayes B, Bayes C (Habier et al. 2011), random forest (RF) (González-Recio and Forni 2011), SVR (Moser et al. 2009), and RKHS (Gianola et al. 2006). The parametric models RR-BLUP, GBLUP and BL consistently outperformed the RF and SVR models for all traits and marker sets. The RR-BLUP model has been widely used to predict GEBV because it often outperforms the other models and has high computation efficiency. However, it is quite possible that other models could perform superior with different marker sets or experimental designs. For example, in alfalfa, the machine learning methods, support vector machine and RF, outperformed

256

N. Khan et al.

Fig. 12.1 Genomic predictive ability (r) obtained using ten statistical models for seed yield (YLD), days to flowing (DTF) and Fusarium wilt (FW) disease resistance in the flax core collection consisting of 408 diverse genotypes. A fivefold cross-validation (CV) scheme was used. A total of 422, 571 and 308 quantitative trait loci (QTLs) for YLD,

DTF and FW, respectively, were used for the construction of genomic selection (GS) models. Statistical significance of predictive ability among these models for each trait is indicated by a letter above each box plot. The letters indicate significant differences at 5% probability level obtained using the Tukey multiple comparison method

RR-BLUP (Medina et al. 2020). Therefore, to implement GS successfully, it is suggested that several models be evaluated for specific populations and traits.

maize lines, achieved lower predictive ability between groups than within groups, and more accurate predictions were obtained with closely related populations (Heslot et al. 2015). In summary, TRP optimization has attracted the attention of the breeding community for a variety of reasons. First, GS predictions are highly dependent on the markers and optimization of the TRP is key to improve GS efficiency and efficacy. Second, phenotyping is expensive, but accurate phenotyping is paramount to breeding and the success of GS. To better allocate resources within plant breeding programs and improve GS’ predictive ability, it is preferable to increase the size of the TRP. In breeding programs, phenotyping has always been a bottleneck because selection progress is directly proportional to the number of genotypes that can be phenotypically analyzed. Genotyping prices have decreased considerably in the genomic era, whereas phenotyping costs have remained roughly constant. In this regard, one of the key goals of TRP optimization is to improve the “selected phenotyping” process, which aims to reduce phenotyping costs while retaining high predictive ability models.

12.2.2 Training Populations (TRPs) and Relatedness to the Test Populations (TPs) The development of both TRP and TP datasets is also a vital step toward the implementation of GS. In GS, a TRP is used to train a prediction model, which consists of germplasm that has been phenotyped for target traits and genotyped, generally with genome-wide markers. After being trained, this model is used to calculate the GEBVs of the individuals of the TP, strictly on the basis of its genotypic information. To achieve high GS accuracy, the TRP and TP should be closely related. For example, the predictive ability in maize was remarkably improved when the two populations had a close relationship (Schulz-Streeck et al. 2012; Zhang et al. 2017). Similarly, GS for grain yield, performed by Windhausen et al. (2012) using diverse panels of

12

Genomics Assisted Breeding Strategy in Flax

There are a number of models (Table 12.1) that appropriately address the statistical issues in GS, the most prominent being the RR-BLUP model (Endelman 2011). Several factors, however, may have an impact on predictive ability. They occur at various levels and are affected by a combination of genetic, environmental, and statistical factors. In GS, some of the major issues include: (i) diverse and multiple genotype-byenvironment (GE) interaction, (ii) the size of the TRP and its relationship with the TP, and (iii) genotypic complexity and heritability of traits. The most significant challenge in GS is the influence of GE and its interaction, which are connected to the density of markers and accuracy of phenotypic measurements. These complications necessitate the use of parametric and nonparametric statistical models, particularly Bayesian estimations and more recently deep machine learning approaches that can handle massive datasets (Crossa et al. 2017). The large amount of data that needs to be handled in GS protocols has created computational and statistical hurdles, such as model fitting and parameter optimization. As a result, efficient implementation of GS requires the creation of complete and simple computer programs to estimate the GEBVs of the individuals to be selected under these complex scenarios.

12.2.3 Markers Given a practical TRP, the choice of markers is a critical factor for improving predictive ability (Meuwissen et al. 2001). In general, increasing the marker density can help to improve predictive ability until it reaches a point when it plateaus and where additional markers no longer improve the predictive ability of the models (Xu et al. 2018; Juliana et al. 2019). The required marker density will differ depending on the plant species, as well as the type and size of the populations. Generally, cross-pollinated species require a higher density of markers than selfpollinated species (Liu et al. 2018; Juliana et al. 2019). For instance, the optimization of genomic predictive ability was explored by using a single

257

QTL effect and a combination of all traits in flax (He et al. 2019a; Lan et al. 2020). Here, single trait effect had a better genomic predictive ability compared to the combined QTL effect and random SNPs. The accuracy of prediction explicitly tests the efficiency of a marker set in GS. The genomic predictive ability is more likely to increase for a particular trait when using a traitspecific QTL set identified from GWAS than a random set of markers, even if the latter is of a higher density (Hoffstetter et al. 2016; He et al. 2019a). Ultimately, accurate genotyping and phenotyping remain critical to the success of any large-scale analysis of genetic association and GS, as the number of false-positive and falsenegative associations can be inflated by systematic biases caused by even small sources of error. The strategy outlined below as a case study is based on the sequential use of GWAS and GS and is efficient and powerful even for complex traits such as FW, YLD, and pasmo resistance (He et al. 2019a). QTL effect from a single- or multiple-related traits cannot only significantly improve predictive ability, but can also decrease the number of markers needed in GS models, thereby potentially reducing the genotyping costs through the use of a QTL-defined genotyping platform for example. This input cost reduction combined with the simplification of the computation associated with using few markers can tip the scale toward adoption of GS versus the status quo of phenotypic selection where cost-reduction strategy such as phenomics is still at the developmental stages for many important traits. Along with the constant reduction in sequencing costs per se, these additional cost-saving strategies make GS a viable option for many breeding programs, and we anticipate that this will translate into its increased adoption in the next few years.

12.3

Improving Predictive Ability by QTL Markers

The implementation of GP as a breeding tool is limited by the cost of sequencing and the challenge of laborious phenotyping of complex traits

258

N. Khan et al.

Table 12.1 Summary of major R packages for genomic selection (GS) R package

Major features

References

lme4GS

Genomic-based prediction of GS that can fit mixed models with multiple variance–covariance matrices

Caamal-Pat et al. (2021)

Sommer

GS for single and multi-environments and heritability estimation

Covarrubias-Pazaran (2016)

learnMET

Deep learning model for GP using multi-environment trial data

Westhues et al. (2021)

rrBLUP

Fast maximum-likelihood algorithm for mixed models

Endelman (2011)

IBCF.MTME

GP for multi-traits and multi-environment data

Montesinos-López et al. (2018)

TrainSel

Selection of TRP

Akdemir et al. (2021)

GAPIT

Integrated tool for both GWAS and GS

Lipka et al. (2012)

GSelection

GP with an integrated model

Guha Majumdar et al. (2020)

such as those related to drought, root traits, YLD, and resistance to diseases. The application of GP depends on the production of high-throughput genotyping data of large breeding populations. Here, we discuss the impact of a sequential approach that relies first on a suite of GWAS models to identify QTLs, which then become the input dataset for GS implementation. The combination of GWAS and GS is highly effective and is more likely to increase predictive ability when using a QTL-based strategy set identified from GWAS.

12.3.1 QTL Identification by Singleand Multi-locus GWAS GWAS is a hypothetical study involving an association between genotype and phenotype that estimates the contribution of genetic variants across the genomes of many individuals to the measured traits. GWAS is used to identify QTLs or quantitative trait nucleotides (QTNs) that are statistically related to a target trait, such as YLD, DTF, or FW. GWAS analysis requires three sets of input data: genome-wide markers, population structure, and trait measurements. The use of GWAS is based on two types of models: SNP- or haplotype-based. SNP-based statistical models include single-locus (SLSMs) and multi-locus statistical models (MLSMs). GLM and MLM models are examples of SLSMs, and both tend to

detect major QTLs. MLSMs are a more recent addition to GWAS statistical methods, e.g., the multi-locus random SNP effect mixed linear models (Wang et al. 2016). The goal of MLSMs is to identify both major and minor effect QTLs or QTNs. In GWAS, the use of models such as SLSMs and MLSMs is somewhat complementary and their combined use enables the identification of both small and large effect QTNs for complex and low heritability traits. The primary distinction between SLSMs and MLSMs is that SLSMs evaluate the relationship between each marker and the trait in a sequential manner, whereas all MLMs use two-step algorithms. SLSM GWAS approach is used to scan the complete genome in the first step, and putative QTLs are found using a less rigorous critical value, such as P < 0.005 or P < 1/n, where n is the number of markers. In the second stage, an MLSM examines all putative QTLs to determine their significance using a logarithm of the odds (LOD) statistics. As a result, more possible QTLs/QTNs related to target traits can be discovered. In the case of haplotype-based, the RTM-GWAS model firstly groups SNPs into linkage disequilibrium blocks and then uses a two-stage analysis for QTL identification, where markers are preselected by a SLSM followed by a multi-locus multi-allele model stepwise regression (He et al. 2017). These models do not require Bonferroni correction, have a higher power for QTN detection, and were found

12

Genomics Assisted Breeding Strategy in Flax

superior to SLSMs in identifying small-effect loci for complex associations (Cui et al. 2018; Zhang et al. 2018).

12.3.2 Genomic Heritability (h2 ) The advantages of GS are more pronounced for traits that are complex, time-consuming, and/or costly to measure, particularly when it is important to analyze multiple environments. In practical GS applications, the marker effects are calculated based on the TRP from which the lines are genotyped and phenotyped and subsequently used to predict GEBVs. Four random SNP datasets of different sizes and one QTL dataset were used to evaluate the effects of the marker density and marker type on the predictive ability of three traits (Table 12.2). The predictive ability of the QTL-based dataset was 0.96 for YLD, 0.94 for DTF, and 0.92 for FW. The random SNP datsets had significantly lower predictive abilities with 0.78 for YLD using 5000 SNPs and 0.60 for FW using 258,870 SNPs for examples (Table 12.2). The QTL set from the traits per se achieved higher genomic predictive ability than the random SNP datasets. The high accuracy obtained with the QTL is hypothesized to be a consequence of the reduction in background noise through the removal of unrelated markers (He et al. 2019a). In breeding, trait heritability is an important factor for both phenotypic selection and GSHigh heritability of a trait implies high accuracy in phenotypic selection and also likely translates into high predictive ability through GS. Genomic heritability (h2 ) measures the extent with which the phenotypic performance is explained by currently available markers. High h2 implies the trait performance is more accurately predicted by the markers. Among the three traits considered in the study described above, DTF had the highest h2 of 0.83 and the highest predictive ability (r) of 0.94. YLD had the lowest h2 value of 0.33 but a moderate predictive ability of 0.78 when using a genome-wide random dataset of 258,870 SNPs. For all traits, the genomic heritability and

259

predictive ability were highest when calculated from the QTL datasets. However, even though the QTLs and the highest density SNP datasets had the same h2 (0.83) for DTF, the predictive ability of the QTL dataset far exceeded that of the random SNP datasets with 0.04 versus 0.62– 0.63, respectively (Table 12.2). In summary, a steep increase in genomic predictive ability was observed for all traits when using a trait-specific QTL input dataset as was previously reported for pasmo resistant by He et al. (2019a). In conclusion, the combined use of GWAS and GS gives the breeding community a new way to estimate breeding values by using QTLs/QTNs as markers instead of genome-wide random markers. This technique has been tested in flax for several traits (He et al. 2019a; Lan et al. 2020) and has become a viable tool for improving genomic predictive ability and reducing the associated cost.

12.3.3 A Case Study for Drought and Root Traits A mini-core collection consisting of 101 and 106 flax accessions was assessed for drought-related and root traits using 258,708 SNP markers to test their predictive ability. To identify putative QTNs associated with both drought-stress-related and root traits, two types of GWAS models were used: the SLSMs included GLM (Price et al. 2006) and MLM (Yu et al. 2006) and the MLSMs included pLARmEB, FASTmrMLM, FASTmrEMMA, ISIS EM-BLASSO, and mrMLM from the R packages mrMLM (https:// cran.r-project.org/web/packages/mrMLM/index. html), and rMVP (https://github.com/xiaolei-lab/ rMVP). In addition, another MLSM called FarmCPU was also used (Liu et al. 2016). Six drought tolerance-related traits, including bundle weight (BDLWt), canopy temperature (CT), plant height (HT), seeds per boll (SPB), thousand seed weight (TSW), and grain yield (YLD) were evaluated under non-stress (irritated) and stress (non-irrigated) condition as previously described by Sertse et al. (2021). The six respective indices were used for GWAS with the

260

N. Khan et al.

Table 12.2 Genomic heritability (h2 ± s) and genomic predictive ability (r ± s) for different types and number of markers for three traits in the flax core collection Trait

Marker set

Genomic heritability (h2 ± s)a

Seed yield

422 QTLs

0.77 ± 0.04

Days to flowering

Fusarium wilt resistance

Pasmo resistance (He et al. 2019a)

Predictive ability (r ± s)b 0.96 ± 0.01 a

5000 SNPs

0.43 ± 0.09

0.78 ± 0.05 b

10,000 SNPs

0.41 ± 0.09

0.78 ± 0.05 b

50,000 SNPs

0.35 ± 0.10

0.78 ± 0.05 b

258,870 SNPs

0.33 ± 0.10

0.78 ± 0.05 b

571 QTLs

0.83 ± 0.04

0.94 ± 0.01 a

5000 SNPs

0.75 ± 0.07

0.59 ± 0.12 c

10,000 SNPs

0.79 ± 0.07

0.60 ± 0.11 c

50,000 SNPs

0.83 ± 0.07

0.63 ± 0.09 b

258,870 SNPs

0.83 ± 0.07

0.62 ± 0.10 b

308 QTLs

0.70 ± 0.05

0.92 ± 0.02 a

5000 SNPs

0.38 ± 0.09

0.57 ± 0.08 c

10,000 SNPs

0.46 ± 0.10

0.60 ± 0.07 b

50,000 SNPs

0.43 ± 0.10

0.59 ± 0.07 b

258,870 SNPs

0.43 ± 0.10

0.60 ± 0.08 b

500 QTLs

0.72 ± 0.04

0.92 ± 0.02 a

52,347 SNPs

0.54 ± 0.07

0.67 ± 0.07 b

a

Genomic heritability is the proportion of additive genetic variance in the total phenotypic variation and estimated using the R package “sommer” b Predictive ability is defined as the Pearson correlation coefficient (r) of observed values with predicted values using a fivefold cross-validation scheme and the GBLUP model. The predictive ability reported is the average value of 250 iterations. The letters on the right of the predictive ability values represent significant difference among different marker sets at 5% probability level using Tukey multiple comparison statistical test

high-density 250 K SNP dataset. The six indices of drought susceptibility index, drought tolerance efficiency, stress susceptibility index, Schneider’s stress severity index, tolerance against stress and stress tolerance index (STI) were also calculated for each of the above six traits as previously described (Sertse et al. 2021). The six traits measured under the drought stress condition (S) and their STI traits used for GS analyses were BDLWt_S and BDLWt_STI, CT_S and CT_STI, HT_S and HT_STI, SPB_S and SPB_STI, TSW_S and TSW_STI, and YLD_S and YLD_STI. The correlations have been previously described (Sertse et al. 2021), and based on high correlation values, traits were selected

for the remaining GS analyses, and STI was chosen as an important indicator for evaluating both yield and stress tolerance (Fernandez et al. 1993). A similar procedure of using correlation values was adopted for the root traits based on previously reported study (Sertse et al. 2019), and the following seven traits were carefully chosen for GS analyses: average root diameter (ARD), maximum number of roots (MaxR), network area (NWA), network length (NWL), network surface area (NWSA), network volume (NWV), and shoot dry weight (SDWt). A total of 1057 unique QTNs were identified for six drought-related traits, and 528 QTNs were associated with the sixteen root traits. All QTNs

12

Genomics Assisted Breeding Strategy in Flax

were grouped into haplotype blocks and the QTNs within a haplotype block with the highest R2 for the traits were chosen as tag QTNs to represent QTL. A fivefold random crossvalidation was used to test the accuracy of the ten GS models. Both the genomic and phenotypic datasets for drought-related and root traits obtained from 106 and 101 accessions, respectively, were randomly divided into five subsets. Each subset was sequentially used as test data for a given partition, while the remaining four subsets constituted the training dataset. This partitioning method was replicated 50 times. Then, the Pearson correlation coefficient between the GS-predicted GEBVs and the measured phenotypic values was calculated to determine the accuracy of the genomic predictions. Further, a joint analysis of variance with Tukey’s multiple pairwise-comparisons (HSD.test function) was performed to compare GS models constructed from QTL marker sets to test the statistical significance of differences in r values using the R agricolae package (https://cran.r-project.org/ web/packages/agricolae/index.html). Tag QTNs or QTL identified by GWAS were used as markers to construct GS models. Two sets of QTN markers were accessed for GS of each trait: the QTL set from the traits per se and the combined QTL set from all the traits. The positiveeffect allele of the tag QTN was encoded “1” and the alternative allele “−1” for model construction. Missing marker data was imputed using the

261

EM algorithm implemented in the R package RR-BLUP (Endelman 2011). Three marker sets based on GWAS were used to evaluate GS models for six drought-related traits: (1) tag QTNs for the drought-related traits per se (Table 12.3), (2) tag QTNs for their respective stress indices, and (3) the overall set of 1057 QTNs identified for all drought-related traits and indices. A total of ten GS statistical models were evaluated for genomic prediction performance: seven parametric (RR-BLUP, GBLUP, BRR, BL, Bayes A, Bayes B, and Bayes C) and three nonparametric (RF, SVR, and RKHS) models. The parametric models RR-BLUP, GBLUP and BL consistently outperformed the RF and SVR models across all traits and sets of markers (Fig. 12.2a–f). The highest predictions were observed using tag QTNs specific to the traits per se compared to the overall set of 1057 QTNs, and the highest predictive ability range of 0.88–0.92 was observed for HT_S and the lowest of 0.6– 0.75 for SPB_S. Likewise, the reductive ability for the stress indices varied from 0.88 to 0.93 for HT_STI and 0.64–0.87 for CT_STI as illustrated in Fig. 12.2a–f. A slightly higher predictive ability was obtained for all index-based traits when compared to the drought-related traits determined under water-limiting stress conditions, using either the tag QTNs for the droughtrelated traits per se or the overall set of 1057 QTNs.

Table 12.3 Number of tag quantitative trait nucleotides (QTNs) for drought-stress-related and root traits used for downstream genomic selection (GS) Droughtrelated traits

No. of QTNs

No. of tag QTNs

Drought-related stress index traits

No. of QTNs

No. of tag QTNs

Root traits

No. of QTNs

No. of tag QTNs

127

95

BDLWt_S

34

29

BDLWt_STI

55

42

ARD

CT_S

32

24

CT_STI

32

18

MaxR

48

39

HT_S

30

25

HT_STI

63

57

NWA

27

18

SPB_S

41

31

SPB_STI

39

29

NWL

25

23

TSW_S

42

37

TSW_STI

38

29

NWSA

31

22

YLD_S

33

26

YLD_STI

42

33

Total

212

172

Total

269

208

NWV

76

60

SDWt

85

68

Total

419

325

262

N. Khan et al.

For root traits, two marker sets were used for genomic prediction: (1) tag QTNs for the root traits listed in Table 12.3 and (2) the overall set of 528 QTNs identified for all root traits combined. Most root traits had a significantly higher predictive ability using tag QTNs specific to the root traits compared to the overall set of 528

QTNs, except for NWA and NWL where the predictive ability difference of the two marker sets for these traits was only 0.01 and 0.04 across all models. For the overall marker set, the highest predictive ability of 0.90 was observed for NWSA, whereas the lowest was 0.72 for SDWt. Similarly, using the tag QTNs specific to the

Fig. 12.2 Comparisons of genomic predictive ability (r) of ten genomic selection (GS) models for six droughtrelated traits per se and their respective indices: a BDLWt_S and BDLWt_STI; b CT_STI and CT_S; c HT_S and HT_STI; d SPB_STI and SPB_S; e TSW_STI and TSW_S; and f YLD_S and YLD_STI. Each panel displays the r values obtained from the ten GS models using three QTN marker sets: the tag QTNs identified by

all GWAS models for each trait per se (left), for their corresponding stress indices (middle) and the overall set of 1057 QTNs detected for all drought-related traits (right). The number of QTNs used in the models is on the x-axis. A fivefold cross-validation was used to estimate r values. The different letters on the top of boxes represent statistical significance at the 5% probability level obtained using the Tukey multiple comparison statistical test

12

Genomics Assisted Breeding Strategy in Flax

individual traits as markers, the highest predictive abilities were obtained for NWSA, ARD, and NWV, with ranges of 0.92–0.93, as shown in Fig. 12.3a–g. GP has revolutionized many breeding programs by accelerating the selection process and the rate of genetic gains for highly complex traits such as drought resistance (Shikha et al. 2017; Ben Hassen et al. 2018; Velazco et al. 2019). The implementation of GP as a breeding tool is limited by the cost of sequencing and the many challenges of phenotyping the TRP for complex traits such as those related to drought and roots for examples. An attempt at optimizing genomic predictive ability was performed by comparing the predictive ability of a trait-specific QTN marker set to that of a combined QTN marker set from related traits. In all cases, the trait-specific QTN marker-based GS models achieved higher genomic predictive ability.

12.4

A QTL-Based Genomic Selection (GS) Strategy

Based on the above-described findings, we propose the QTL-based GS strategy depicted in Fig. 12.4. This strategy incorporates GS modeling with QTL identification by SLSM and MLSM GWAS from TRP followed by GP from TP and optional GS evaluation in breeding programs.

263

cultivars, where some could be part of regional or national trials which are evaluated over multiple years and locations with replications. These provide useful data for target traits, such as yield, seed quality, and disease resistance.

12.4.2 Test Population (TP) and Genomic Selection (GS) The test population (TP) is a set of test lines in the breeding program to be evaluated for their GEBV. In GS, a prediction model is trained using a TRP that have been genotyped with genomewide markers and phenotyped for target traits. By computing GEBVs, this model may be trained to predict performance on a TP solely based on genotypic data. To successfully deploy GS in flax breeding programs, we propose an integrated approach that includes GS, genomic crossprediction, and phenotypic evaluation, as shown in Fig. 12.5. In the flax breeding program, every year additional lines of TRP are added to increase diversity and selection accuracy. For this purpose, each line needs to be evaluated using the best GS models, traditional selection, and their performance comparison to assess the progress of successful implementation of GS programs.

12.4.3 GS Evaluation 12.4.1 GS Modeling In a breeding program using GS, the first step is to choose a suitable TRP to build GS models. A good TRP should be closely related with the TP that is usually comprised of breeding lines developed from multiple crosses. Consequently, the TRP should contain the parental material or its ancestors. The TRP should also contain as many of the target traits’ favorable alleles or genes. Breeders aim to accumulate as many potential favorable alleles as possible. Thus, a sufficient number of diverse genotypes are needed. In a practical GS-based breeding program, TRPs often include historical breeding lines and

12.4.4 Cost of GS The rapid advances in high-throughput sequencing technologies have significantly reduced the cost of marker data points. Marker technologies that result in even greater costeffective genotype characterization continue to be welcomed. The availability of such technology would promote the use of GS in practical breeding schemes. Genotyping by sequencing (GBS) is one of the most cost-effective genotyping platforms that can be employed in large populations and is extensively used in breeding programs. GBS generate thousands of SNPs at a

264

N. Khan et al.

Fig. 12.3 Comparisons of genomic predictive ability (r) of ten genomic selection (GS) models for seven root traits: a ARD; b MaxR; c NWA; d NWL; e NWSA; f NWV; and g SDWt. Each panel displays the r values obtained from the ten GS models using two QTN marker sets: the tag QTNs identified by all GWAS models for the root traits per

se and the overall set of 528 QTNs detected for all root traits. The number of QTNs used in the GS models is on the x-axis. A fivefold cross-validation was used to estimate r. The different letters on the top of boxes represent statistical significance at the 5% probability level obtained using the Tukey multiple comparison statistical test

reasonable cost per sample that can be used for GS of species regardless of the existence of a reference genome (Poland and Rife 2012). Some of the recent breakthroughs in GBS include skim-based GBS (Bayer et al. 2015), genotyping in thousands by sequencing (GT-seq) (Campbell et al. 2015) and rAMpSeq (Buckler et al. 2016).

Skim-based GBS allows for high-resolution genotyping while requiring low-coverage sequencing (Bayer et al. 2015). GT-seq dramatically reduces the cost of genotyping hundreds of targeted SNPs compared to existing methods (Campbell et al. 2015). Similarly, rAmpSeq in maize has been reported (Buckler et al. 2016) to

12

Genomics Assisted Breeding Strategy in Flax

265

Fig. 12.4 QTL-based genomic selection (GS) strategy that incorporates GS modeling with QTL identification by single- and multi-locus GWAS from training population

(TRP), genomic prediction (GP) from test population (TP) and an optional GS evaluation in breeding programs

cost as little as $2 per sample and has the potential of transforming breeding programs. GBS markers adopted in an oat breeding program ran smoothly and predictably at a lower cost per sample than any other whole-genome marker system available at the time (Mellers et al. 2020). The cost per sample of this approach and that of a few other variations are outlined in USD (Table 12.4). These costs are similar to those reported by CIMMYT, Cornell University, and Kansas State University (Poland and Rife 2012; Bassi et al. 2016). The sequencing cost varies depending on the read depth and number of markers targeted. For GBS, the sequencing depth is lower than we might need for an application such as GWAS, but adequate for GS

(Bekele et al. 2018). The sequencing cost can also be reduced by moving to a higher-capacity sequencing system where samples are multiplexed using distinct adaptors. A subset of useful GBS tags has been assembled to develop a baitbased tag capture system that is designed specifically for Canadian oat breeding programs, similar to the procedure known as “rapture” (Ali et al. 2015). This method provides the possibility for 384 samples to be combined per sequencing lane, which reduces the sequencing cost to 21$/ sample and/or provide a higher sequencing depth that improves the accuracy of genotyping. Another advantage is the possibility to combine this dataset with more than ten thousand lines that have been previously genotyped. Rapture

266

N. Khan et al.

Fig. 12.5 Proposed approach for the integration of genomic selection (GS) in a flax breeding program, including crossprediction and its evaluation through comparisons with conventional phenotypic evaluations

translates into low-cost and high-read coverage at target loci, while allowing previous TRP genotype data based on non-captured GBS data to be fully compatible with the added rapture data. Given the high proportion of homozygous sites within these lines, even low coverage can provide sufficient data for a probabilistic genotyping approach (Nielsen et al. 2011). Phenotyping costs are high compared to genotyping. For example, in the oat program, phenotyping is replicated twice, and the total cost

may reach $80–90 per genotype. In Brandon, Manitoba phenotyping for each plot (plot size: 1.5 m  7.5 m trimmed to a 6-m harvest area, three replicates, with data collected for heading, height, maturity, lodging, yield, test weight, seed moisture content at harvest) was estimated at $125 per plot or $375 per genotype. For additional quality analysis, Fusarium head blight/deoxynivalenol testing adds another *$75 per plot or $225 per genotype. In summary, the phenotypic cost is high and cost-reduction

12

Genomics Assisted Breeding Strategy in Flax

options are limited because field phenotyping and quality analyses rely heavily on labor. On the other hand, the cost of genotyping continues to drop through either higher throughputs or cheaper chemistries and platforms which, assuming that genomic accuracy is achieved, gives the edge to genotyping over phenotyping. However, in order to maintain accurate prediction equations, phenotyping accuracy remains paramount and efforts must be made to achieve the best quality phenotyping of the TRP.

12.5

Parent Evaluation and CrossPrediction

Plant breeding, or genetic improvement of plant species, has played an essential part in the evolution of human societies (Spiertz 2014). Farmers must improve productivity and plant tolerance to biotic and/or abiotic stresses to provide a steady food supply. Plant breeding has traditionally relied on a large number of crosses for selection of desirable traits, which is a lengthy and laborious approach that requires a lot of resources. However, to boost yields and feed a fast-growing global population (Tester and Langridge 2010), to meet the needs of customers with different dietary requirements, and to produce high-quality and safe food, more effective breeding methods are needed. It is commonly acknowledged that genetic data will be used to design effective breeding methods (Bernardo 2008). The advances in next-generation sequencing technologies have resulted in the availability of a large number Table 12.4 Cost (US dollar per sample) of genotyping by target sequencing (GBTS), genotyping by sequencing (GBS), and RAD capture (rapture) for test population (TP)

267

of low-cost molecular markers (Davey et al. 2011). This has paved the way for the widespread use of molecular markers in plant breeding. For instance, the use of genomic crosspredictions in flax breeding is a novel approach to assist quickly the evaluation and optimize breeding programs (Khan et al. 2022). Plant breeders make many crosses each year and analyze their progeny in fields or greenhouses. However, generally only few progeny will outperform the parents. The use of genomic crossprediction in breeding can help with regard to three aspects: (i) to increase accuracy and efficiency, (ii) to shorten the cycle, and (iii) to select parents and progeny. Genomic cross-prediction methods that allow plant breeders to accelerate the selection of superior parents based on genotypic and phenotypic data promise to improve the efficiency of breeding programs, not only in flax but also in other plant species (You et al. 2022). Genomic cross-prediction has indeed evolved into a vital tool in scientific research, serving as both a preliminary validation of hypotheses and as a guide for empirical studies (You et al. 2022). Many software packages are available for the genomic cross-prediction. They can be used to simulate a wide range of crossing and progeny scenarios using genomic data. For instance, QuLinePlus (Hoyos-Villegas et al. 2019), QuHybrid (Wang et al. 2003), ADAM-plant (Liu et al. 2019), MareyMap (Siberchicot et al. 2017), and other simulation tools are useful for the purpose of evaluating the efficiency of different selections strategies.

Crops

Maize (2.3 Gb) Guo et al. (2019)

Oat (12.3 Gb) Bekele et al. (2018)

Genotyping protocol

GBTS

GBTS

GBS

Rapture

Number of markers (K)

5

1

*10

5

DNA extraction

0.50

0.50

3.08

3.08

Construction of library

6.61

6.61

8.46

8.46

Probe hybridization

3.00

3.00





Sequencing

2.25

0.75

12.31

4.62

Total (per sample)

12.36

10.86

23.85

16.15

268

12.5.1 Genomic Cross-Prediction for Flax Improvement To date, cross-breeding remains the corner stone of breeding programs as its success relies heavily on having access to comprehensively evaluated germplasm for parent selection which is a major challenge because such knowledge is usually limited to a narrow set of genotypes. In flax, You et al. (2022) have compiled 290 linseed accessions and identified a total of 1811 QTNs for five major traits, i.e., YLD, DTM, oil content, linolenic acid, and powdery mildew resistance. The major objective of this study is to use this 290accession TRP to evaluate the accessions’ potential as parents through the development of GS models and simulation of progeny populations from intercross using the available genomic data (SNPs). In this scheme, the developed GS models are subsequently used to predict GEBVs of all progenies of a cross (You et al. 2022). Cross-performance is evaluated according to usefulness criteria which are a function of progeny population mean and genetic variance. This study highlights that the use of genomic crossprediction in flax can be highly effective for two major reasons: (i) A large number of virtual crosses can be easily simulated and (ii) the use of GWAS for the identification of QTNs and that of optimized GS models can advantageously predict the GEBVs of progenies. In the future, genomic cross-predictions will need to be evaluated on a variety of traits, including biotic and/or abiotic stress, not just in flax but also in maize, barley, and wheat, to ensure their accuracy. Finally, the use of genomic cross-prediction in breeding programs has the potential to provide an effective and low-cost breeding tool.

12.5.2 Future Based: Integrated Flax Breeding Improvement Strategy For the flax breeding program, we propose the use of an integrated approach that comprises GS, genomic cross-prediction, and conventional phenotypic evaluation. In this strategy, GS

N. Khan et al.

models for target traits such as YLD, DTF, and FW using SNPs and QTNs identified from the flax core collection are first developed. The flax core collection (*290 linseed and 90 fiber lines) serves as the basic TRP, banking on high-quality phenotypic data already collected in multiple environments (years and locations), on a curated *1.7 M SNP dataset and on previously identified QTNs for all major traits. This TRP should be genetically highly correlated with the breeding populations because most ancestors of presentday flax cultivars are included in this core collection. This primary TRP is supplemented with new cultivars and breeding lines each year. The predictive GS models from multi-year and multienvironment is integrated or applied independently using common criteria. Each year, multiple models are assessed from the new TRP iteration and its incremental phenotypic data. The genome-wide random SNP and the trait-specific QTN marker sets are used to evaluate models for their predictive abilities. The best models are then selected based on the CV procedure and the target environments’ correlations. Secondly, every year the best GS models are applied to the flax breeding program to estimate GEBVs of the test progenies. Every year, about 300 F6 lines are chosen from different populations. The performance of these selected lines is compared by GS verus those chosen through traditional selection in the following generations to assess the progress and success of GS. Thirdly, GS models are used to predict progenies from which the putative best parents are selected for crossing (Fig. 12.5). Both linseed and fiber accessions of the core collection and newly added cultivars and breeding lines are used to make virtual single crosses and evaluate their progeny. For every cross, 500 recombinant inbred lines (RILs) are simulated based on an additive model. Their genotypes are generated based on the QTN/SNP markers of the two parents and the genetic recombination within individuals of the progeny populations based on the previous genetic maps (Cloutier et al. 2012). The optimal GS models for each trait are used to predict the GEBVs of the RILs for each cross and

12

Genomics Assisted Breeding Strategy in Flax

trait. The crosses and their parents thus evaluated are recommended to the breeding program based on their general and specific combining abilities. We anticipate that this integrated approach will boost up flax breeding progress through accelerating the genetic gains for yield and other complex traits.

12.6

Conclusions and Future Prospects

Employing GS with a QTL-based method not only improves predictive ability, but also aids in the implementation of successful genomic crossprediction in flax breeding for two primary reasons. First, it helps breeders in assessing parents and choosing the potentially best crosses, and second, it offers an effective and low-cost breeding assisting tool. The predictive ability of a GS program is determined by multiple factors, such as the TRP and its relatedness with the TP, the statistical GS models, and the molecular marker type and density. The use of large- and minor effect QTL identified by single- and multilocus GWAS models significantly increase GS models’ predictive ability. As previously discussed, the advantage of QTL-based GS modeling over genome-wide random markers has been demonstrated in few case studies. In summary, the two main goals of GS are to improve the accuracy of GEBVs and to improve breeding program outcomes. Another goal may be to trade part of the phenotyping effort for genotyping in order to reduce cost and/or shorten the breeding cycle, while the incorporation of target locibased genotyping techniques such as rapture and GBTS is likely to reduce cost.

References Akdemir D, Rio S, Isidro y Sánchez J (2021) TrainSel: an R package for selection of training populations. Front Plant Sci 12:655287 Albrecht T, Wimmer V, Auinger HJ et al (2011) Genomebased prediction of testcross values in maize. Theor Appl Genet 123:339–350

269 Ali OA, O’Rourke SM, Amish SJ et al (2015) RAD capture (rapture): flexible and efficient sequence-based genotyping. Genetics 202:389–400 Bassi FM, Bentley AR, Charmet G et al (2016) Breeding schemes for the implementation of genomic selection in wheat (Triticum spp.). Plant Sci 242:23–36 Bayer PE, Ruperao P, Mason AS et al (2015) Highresolution skim genotyping by sequencing reveals the distribution of crossovers and gene conversions in Cicer arietinum and Brassica napus. Theor Appl Genet 128:1039–1047 Bekele WA, Wight CP, Chao S et al (2018) Haplotypebased genotyping-by-sequencing in oat genome research. Plant Biotechnol J 16:1452–1463 Ben Hassen M, Bartholomé J, Valè G et al (2018) Genomic prediction accounting for genotype by environment interaction offers an effective framework for breeding simultaneously for adaptation to an abiotic stress and performance under normal cropping conditions in rice. G3: Genes Genom Genet 8:2319– 2332 Bernardo R (2008) Molecular markers and selection for complex traits in plants: learning from the last 20 years. Crop Sci 48:1649–1664 Buckler ES, Ilut DC, Wang X et al (2016) rAmpSeq: using repetitive sequences for robust genotyping. bioRxiv:096628 Caamal-Pat D, Pérez-Rodríguez P, Crossa J et al (2021) lme4GS: an R-package for genomic selection. Front Plant Sci 12:680569 Campbell NR, Harmon SA, Narum SR (2015) Genotyping-in-thousands by sequencing (GT-seq): a cost effective SNP genotyping method based on custom amplicon sequencing. Mol Ecol Resour 15:855–867 Clark SA, van der Werf J (2013) Genomic best linear unbiased prediction (gBLUP) for the estimation of genomic breeding values. Methods Mol Biol 1019:321–330 Cloutier S, Ragupathy R, Miranda E et al (2012) Integrated consensus genetic and physical maps of flax (Linum usitatissimum L.). Theor Appl Genet 125:1783–1795 Covarrubias-Pazaran G (2016) Genome-assisted prediction of quantitative traits using the R package sommer. PLoS One 11:e0156744 Crossa J, Pérez-Rodríguez P, Cuevas J et al (2017) Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 22:961–975 Cui Y, Zhang F, Zhou Y (2018) The application of multilocus GWAS for the detection of salt-tolerance loci in rice. Front Plant Sci 9:01464 Daetwyler HD, Pong-Wong R, Villanueva B et al (2010) The impact of genetic architecture on genome-wide evaluation methods. Genetics 185:1021–1031 Daetwyler HD, Calus MP, Pong-Wong R et al (2013) Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics 193:347–365

270 Davey JW, Hohenlohe PA, Etter PD et al (2011) Genomewide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499– 510 de los Campos G, Hickey JM, Pong-Wong R et al (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345 Desta ZA, Ortiz R (2014) Genomic selection: genomewide prediction in plant improvement. Trends Plant Sci 19:592–601 Endelman JB (2011) Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome 4. https://doi.org/10.3835/plantgenome2011. 08.0024 Fernandez GCJ, Asian Vegetable R, Development C et al (1993) Effective selection criteria for assessing plant stress tolerance. In: International symposium, adaptation of food crops to temperature and water stress, 410th edn. AVRDC. Taipei, Taiwan, Taipei [unconfirmed], pp 257–270 Gianola D, Fernando RL, Stella A (2006) Genomicassisted prediction of genetic value with semiparametric procedures. Genetics 173:1761–1776 González-Recio O, Forni S (2011) Genome-wide prediction of discrete traits using bayesian regressions and machine learning. Genet Sel Evol 43:7 Guha Majumdar S, Rai A, Mishra DC (2020) Integrated framework for selection of additive and nonadditive genetic markers for genomic selection. J Comput Biol 27:845–855 Guo Z, Wang H, Tao J et al (2019) Development of multiple SNP marker panels affordable to breeders through genotyping by target sequencing (GBTS) in maize. Mol Breed 39:37 Habier D, Fernando RL, Kizilkaya K et al (2011) Extension of the bayesian alphabet for genomic selection. BMC Bioinform 12:186 He J, Meng S, Zhao T et al (2017) An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding. Theor Appl Genet 130:2327–2343 He L, Xiao J, Rashid KY et al (2019a) Evaluation of genomic prediction for pasmo resistance in flax. Int J Mol Sci 20:359 He L, Xiao J, Rashid KY et al (2019b) Genome-wide association studies for pasmo resistance in flax (Linum usitatissimum L.). Front Plant Sci 9:1982 Heffner EL, Sorrells ME, Jannink J-L (2009) Genomic selection for crop improvement. Crop Sci 49:1–12 Heffner EL, Jannink J-L, Iwata H et al (2011) Genomic selection accuracy for grain quality traits in biparental wheat populations. Crop Sci 51:2597–2606 Heslot N, Yang H-P, Sorrells ME et al (2012) Genomic selection in plant breeding: a comparison of models. Crop Sci 52:146–160 Heslot N, Jannink J-L, Sorrells ME (2015) Perspectives for genomic selection applications and research in plants. Crop Sci 55:1–12

N. Khan et al. Hickey JM, Dreisigacker S, Crossa J et al (2014) Evaluation of genomic selection training population designs and genotyping strategies in plant breeding programs using simulation. Crop Sci 54:1476–1488 Hoffstetter A, Cabrera A, Huang M et al (2016) Optimizing training population data and validation of genomic selection for economic traits in soft winter wheat. G3: Genes Genom Genet 6:2919–2928 Hoyos-Villegas V, Arief VN, Yang W-H et al (2019) QuLinePlus: extending plant breeding strategy and genetic model simulation to cross-pollinated populations—case studies in forage breeding. Heredity 122:684–695 Jannink J-L, Lorenz AJ, Iwata H (2010) Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics 9:166–177 Juliana P, Poland J, Huerta-Espino J et al (2019) Improving grain yield, stress resilience and quality of bread wheat using large-scale genomics. Nat Genet 51:1530–1539 Khan N, You FM, Cloutier S (2022) Designing genomic solutions to enhance abiotic stress resistance in flax. In: Kole C (ed) Genomic designing for abiotic stress resistant oilseed crops. Springer International Publishing, Cham, pp 251–283 Lan S, Zheng C, Hauck K et al (2020) Genomic prediction accuracy of seven breeding selection traits improved by QTL identification in flax. Int J Mol Sci 21. https://doi.org/10.3390/ijms21051577 Lehermeier C, Krämer N, Bauer E et al (2014) Usefulness of multiparental populations of maize (Zea mays L.) for genome-based prediction. Genetics 198:3–16 Li Z, Sillanpää MJ (2012) Overview of LASSO-related penalized regression methods for quantitative trait mapping and genomic selection. Theor Appl Genet 125:419–435 Lipka AE, Tian F, Wang Q et al (2012) GAPIT: genome association and prediction integrated tool. Bioinformatics 28:2397–2399 Liu X, Huang M, Fan B et al (2016) Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet 12:e1005767 Liu X, Wang H, Wang H et al (2018) Factors affecting genomic selection revealed by empirical evidence in maize. Crop J 6:341–352 Liu H, Tessema BB, Jensen J et al (2019) ADAM-Plant: a software for stochastic simulations of plant breeding from molecular to phenotypic level and from simple selection to complex speed breeding programs. Front Plant Sci 9:1926 Lush JL (1937) Animal breeding plans. Collegiate Press, Ames, Iowa, p 1937 Malmberg MM, Barbulescu DM, Drayton MC et al (2018) Evaluation and recommendations for routine genotyping using skim whole genome re-sequencing in canola. Front Plant Sci 9:1809 Medina CA, Hawkins C, Liu XP et al (2020) Genomewide association and prediction of traits related to salt

12

Genomics Assisted Breeding Strategy in Flax

tolerance in autotetraploid alfalfa (Medicago sativa L.). Int J Mol Sci 21:3361 Mellers G, Mackay I, Cowan S et al (2020) Implementing within-cross genomic prediction to reduce oat breeding costs. Plant Genome 13:e20004 Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829 Moeinizade S, Hu G, Wang L et al (2019) Optimizing selection and mating in genomic selection with a lookahead approach: an operations research framework. G3: Genes Genom Genet 9:2123–2133 Montesinos-López OA, Luna-Vázquez FJ, MontesinosLópez A et al (2018) An R package for multitrait and multienvironment data with the item-based collaborative filtering algorithm. Plant Genome 11. https://doi. org/10.3835/plantgenome2018.02.0013 Moser G, Tier B, Crump RE et al (2009) A comparison of five methods to predict genomic breeding values of dairy bulls from genome-wide SNP markers. Genet Sel Evol 41:56 Nielsen R, Paul JS, Albrechtsen A et al (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443–451 Norman A, Taylor J, Edwards J et al (2018) Optimising genomic selection in wheat: effect of marker density, population size and population structure on prediction accuracy. G3: Genes Genom Genet 8:2889–2899 Ornella L, Singh S, Perez P et al (2012) Genomic prediction of genetic values for resistance to wheat rusts. Plant Genome 5. https://doi.org/10.3835/ plantgenome2012.07.0017 Pérez P, de los Campos G (2014) Genome-wide regression and prediction with the BGLR statistical package. Genetics 198:483–495 Pérez-Rodríguez P, Gianola D, González-Camacho JM et al (2012) Comparison between linear and nonparametric regression models for genome-enabled prediction in wheat. G3: Genes Genom Genet 2:1595–1605 Poland JA, Rife TW (2012) Genotyping-by-sequencing for plant breeding and genetics. Plant Genome 5. https://doi.org/10.3835/plantgenome2012.05.0005 Price AL, Patterson NJ, Plenge RM et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904– 909 Rolling WR, Dorrance AE, McHale LK (2020) Testing methods and statistical models of genomic prediction for quantitative disease resistance to Phytophthora sojae in soybean [Glycine max (L.) Merr] germplasm collections. Theor Appl Genet 133:3441–3454 Schulz-Streeck T, Ogutu JO, Karaman Z et al (2012) Genomic selection using multiple populations. Crop Sci 52:2453–2461 Sertse D, You FM, Ravichandran S et al (2019) The complex genetic architecture of early root and shoot traits in flax revealed by genome-wide association analyses. Front Plant Sci 10:1483

271 Sertse D, You FM, Ravichandran S et al (2021) Loci harboring genes with important role in drought and related abiotic stress responses in flax revealed by multiple GWAS models. Theor Appl Genet 134:191– 212 Shikha M, Kanika A, Rao AR et al (2017) Genomic selection for drought tolerance using genome-wide snps in maize. Front Plant Sci 8:550 Siberchicot A, Bessy A, Guéguen L et al (2017) MareyMap online: a user-friendly web application and database service for estimating recombination rates using physical and genetic maps. Genome Biol Evol 9:2506–2509 Spiertz H (2014) Agricultural sciences in transition from 1800 to 2020: exploring knowledge and creating impact. Eur J Agron 59:96–106 Spindel J, Begum H, Akdemir D et al (2015) Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet 11:e1004982 Tang Y, Liu X (2019) G2P: a genome-wide-associationstudy simulation tool for genotype simulation, phenotype simulation and power evaluation. Bioinformatics 35:3852–3854 Tester M, Langridge P (2010) Breeding technologies to increase crop production in a changing world. Science 327:818–822 Thavamanikumar S, Dolferus R, Thumma BR (2015) Comparison of genomic selection models to predict flowering time and spike grain number in two hexaploid wheat doubled haploid populations. G3: Genes Genom Genet 5:1991–1998 Velazco JG, Jordan DR, Mace ES et al (2019) Genomic prediction of grain yield and drought-adaptation capacity in sorghum is enhanced by multi-trait analysis. Front Plant Sci 10:997 Wang J, Van Ginkel M, Podlich D et al (2003) Comparison of two breeding strategies by computer simulation. Crop Sci 43:1764–1773 Wang SB, Feng JY, Ren WL et al (2016) Improving power and accuracy of genome-wide association studies via a multi-locus mixed linear model methodology. Sci Rep 6:19444 Westhues CC, Simianer H, Beissinger TM (2021) learnMET: an R package to apply machine learning methods for genomic prediction using multi-environment trial data. bioRxiv:2021.2012.2013.472185 Windhausen VS, Atlin GN, Hickey JM et al (2012) Effectiveness of genomic prediction of maize hybrid performance in different breeding populations and environments. G3: Genes Genom Genet 2:1427–1436 Xu Y, Wang X, Ding X et al (2018) Genomic selection of agronomic traits in hybrid rice using an NCII population. Rice 11:32 You FM, Zheng C, Bartaula S et al (2022) Genomic cross prediction for linseed improvement. In: Gosal SS, Wani SH (eds) Accelerated Plant breeding, vol 4. Oil

272 crops. Springer International Publishing, Cham, pp 451–480 Yu J, Pressoir G, Briggs WH et al (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet 38:203–208 Zhang A, Wang H, Beyene Y et al (2017) Effect of trait heritability, training population size and marker density on genomic prediction accuracy estimation in 22

N. Khan et al. bi-parental tropical maize populations. Front Plant Sci 8:1916 Zhang Y, Liu P, Zhang X et al (2018) Multi-locus genome-wide association study reveals the genetic architecture of stalk lodging resistance-related traits in maize. Front Plant Sci 9:611 Zhao Y, Mette MF, Reif JC (2015) Genomic selection in hybrid breeding. Plant Breed 134:1–10

Flax Genomic Resources and Databases

13

Pingchuan Li, Ismael Moumen, Sylvie Cloutier, and Frank M. You

13.1

Introduction

Flax (Linum usitatissimum L.) is one of the most important fiber and oil crops. The need to increase its quality and quantity incited the use of biotechnology in flax genetic and breeding research during the past decades. Genomic data, such as DNA and RNA sequences, genetic maps, physical maps, reference genomes, molecular markers, quantitative trait loci (QTLs), quantitative trait nucleotides (QTNs), candidate genes associated with important traits, as well as phenotypic data of genetic populations and germplasm collections have been generated in the last decade, thanks to the significant advances made in high-throughput sequencing technologies. These genomic resources have benefited flax breeding selections. Flax researchers have taken advantage of these resources to increase flax seed and fiber yields, and improve quality traits and disease resistance of new cultivars to meet the challenges associated with global climate change and the increasing need for high-quality flax seed

P. Li  I. Moumen  S. Cloutier  F. M. You (&) Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Avenue, Ottawa, ON K1A 0C6, Canada e-mail: [email protected]

and fiber. In this chapter, we report on some important genomic resources and databases available to the flax research community. We anticipate these resources to be useful references for flax researchers.

13.2

Flax Genomic Resources

Large amounts of genomic data have been generated in the last decade. These data go from raw sequences to highly annotated and characterized information. The sequence data primarily comprise the genomic sequences generated from whole-genome shotgun sequencing (WGS) and genome assembly of flax varieties, RNA sequences for gene expression study or gene annotation of flax genomes, and short reads generated from re-sequencing of large genetic populations. From this sequence data, variants such as single nucleotide polymorphisms (SNPs) can be identified for marker development, genetic diversity study, identification of important trait-marker associations, and gene discovery. These genomic resources have been used to create genetic maps, physical maps, reference genomes, and identify molecular markers, QTLs, and QTNs. The processed outputs constitute important value-added genomic resources for flax genetic improvement. Here we summarize some of these genomic resources (Fig. 13.1).

© His Majesty the King in Right of Canada, as represented by the Minister of Agriculture and Agri-Food 2023 F. M. You and B. Fofana (eds.), The Flax Genome, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-16061-5_13

273

274

P. Li et al.

Fig. 13.1 Overview of flax genomic and phenotypic resources relationship

13.2.1 Sequences A large number of flax DNA and RNA sequences have been generated, mostly since 2010. The DNA sequences include (1) the secondgeneration short reads (mostly Illumina platform) or the third-generation long reads (PacBio HiFi or Oxford Nanopore Technology) from flax genome sequencing and assembly projects (Wang et al. 2012b; Dmitriev et al. 2020; Zhang et al. 2020; Sa et al. 2021; Soto-Cerda et al. 2021) (Table 13.1), (2) scaffolds or pseudomolecules assembled from whole-genome sequencing data (Wang et al. 2012b; You et al. 2018b; Dmitriev et al. 2020; Zhang et al. 2020; Sa et al. 2021), and short reads from genetic populations and germplasm collections, generated by whole genome re-sequencing or reduced representation methods such as genotyping by sequencing (GBS primarily used for variant calling (e.g., SNPs, InDels, and simple sequence

repeats) (He et al. 2019; Guo et al. 2020; Zhang et al. 2020; Duk et al. 2021; You et al. 2022) (Table 13.2), (3) early-generated RNA-derived sequences. These RNA-derived sequences were expressed sequence tags (ESTs) that were sequenced from complementary DNA (cDNA) libraries. A cDNA library is a collection of cloned cDNA fragments inserted into plasmids that are maintained in self-replicating host cells. A cDNA library represents the portion of the organism transcriptome expressed in a given tissue, stage, condition, etc. Many cDNA libraries from flax tissues at different developmental stages were constructed and sequenced to study the tissue-specificity of the flax transcriptome (Venglat et al. 2011; Long et al. 2012). With the development of sequencing technologies, highthroughput RNA sequencing (RNA-seq) has been employed for gene expression studies (Galindo-Gonzalez and Deyholos 2016; Dmitriev et al. 2017; Wu et al. 2019; Khan et al.

13

Flax Genomic Resources and Databases

275

Table 13.1 Flax whole-genome sequences and assemblies BioProject

Genotype

Data type

NCBI accessions

No. of sequences

Total size

References

PRJNA68161

CDC Bethune

Raw sequences

SRX077940– SRX077946

587.4 M

34.9 Gb

Wang et al. (2012b)

48,397

282.2 Mb

Pseudomolecules

CP027619.1– CP027633.1

15

316.2 Mb

You et al. (2018b)

Scaffolds

QMEI02000001.1– QMEI02001608.1

1608

306.4 Mb

Pseudomolecules

CM036262.1– CM036276.1

15

282.2 Mb

Zhang et al. (2020)

Heiya-14

Scaffolds

QMEH01000001.1– QMEH01002772.1

2772

303.7 Mb

Pale flax

Scaffolds

QMEG01000001.1– QMEG01002654.1

2654

293.6 Mb

Yiya-5*

Sequences

SRX10693654 (HiFi)

1.84 M

21.8 Gb

Pseudomolecules

https://doi.org/10. 5281/zenodo.4872893

15

423.1 Mb

Sequences

SRX8829776 (Illimina) SRX8829775 (ONT)

20.8 M 1.6 M

7.8 Gb 8.0 Gb

Contigs

JACHUY010000001JACHUY010002458

2458

361.8 Mb

Contigs

PRJNA449140

PRJNA725636

PRJNA648016

Longya10

Atlante

Sa et al. (2021) Dmitriev et al. (2020)

ONT Oxford Nanopore Technologies; HiFi PacBio HiFi sequences The assembly and annotation files of Yiya-5 are deposited at Zenodo (https://doi.org/10.5281/zenodo.4872893)

*

2020). Specifically, RNA-seq data have been used to provide evidence-based data supporting the de novo gene predictions of flax genome annotations (Zhang et al. 2020; Sa et al. 2021). Table 13.3 lists the main EST and RNA-seq datasets deposited in the National Center for Biotechnology Information (NCBI). Most of this sequence data has been submitted to the NCBI Sequence Read Archive (SAR) database, which readily allows free downloads. Some sequences, such as the assembled genomes and their annotated genes, have been deposited in other databases like the well-curated Phytozome (https://phytozomenext.jgi.doe.gov/) and the general-purpose and free repository Zenodo databases (https://zenodo. org/). For example, the assembled chromosomebased genome sequences of the fiber flax cultivar Yiya-5 are deposited only in the Zenodo platform (https://doi.org/10.5281/zenodo.4872893) and are not available in the NCBI databases.

Publicly available biological data allow their free access for expanded analyses by the community and enriched data mining capabilities. Many peer-reviewed scientific journals request authors to submit the raw and annotated data used in their studies to a public data repository upon publication acceptance. The NCBI databases are the largest public repository for biological data. Tables 13.1, 13.2 and 13.3 list the available accession numbers of the major flax sequence data deposited in the NCBI databases that are retrievable and downloadable through accession numbers or BioProject identification numbers (IDs).

13.2.2 Molecular Markers Genetic markers are useful in studies that aim to identify causal genes and establish their mode(s) of action. Usually, a marker is a unique signpost

276

P. Li et al.

Table 13.2 Statistics of genomic sequences of flax genetic collections Collection

No. of genotypes

NCBI BioProject

NCBI accessions

Total size (Gbp)

Average size (Gbp)

Canadian flax core

Size range (Gbp)

References

405

PRJNA707038

SRX10246655– SRX9920554

2706.0

6.7

0.3–16.2

He et al. (2019)

Chinese flax

84

PRJNA478805

SRX4662822– SRX4662905

622.6

7.4

5.6–9.1

Zhang et al. (2020)

Chinese flax

200

PRJNA590636

SRX7205082– SRX7205281

708.2

3.5

0.4–13.7

Guo et al. (2020)

Russian flax

306

PRJEB46073*

Duk et al. (2021)

*

Single nucleotide polymorphism (SNP) data is stored in European Variation Archive (EVA) (https://www.ebi.ac.uk/ eva/). Read data information not available when blank

Table 13.3 Expressed sequence tag (EST) and RNA-seq data in flax Library

Genotype

Tissue

No. of sequences/reads

Total size (Gbp)

NCBI accessions

References

cDNA

Hermes

Stem

927



CV478070– CV478996

Day et al. (2005)

cDNA

CDC Bethune

Embryos, seed coats, endosperm, flowers, etiolated seedlings, leaves, stem

261,272



LIBEST_026995– LIBEST_027011

Venglat et al. (2011)

cDNA

Baihua

Stem (bark)

2297



EU828797– EU831093

Long et al. (2012)

cDNA

AC McDuff, Hermes, CDC Bethune

Bolls, stem. Embryos, endosperm, seed coat

146,611





Cloutier et al. (2009)

RNA-seq

Longya-10

Stem, boll

254.8 M

37.9

SRX5096709– SRX5096712

Zhang et al. (2020)

RNA-seq

Heiya-14

Stem, boll

267.8 M

39.9

SRX5096713, SRX5096714, SRX5096717, SRX5096718

Zhang et al. (2020)

RNA-seq

Pale flax

Stem, boll

287.2 M

42.8

SRX5096707, SRX5096708, SRX5096715, SRX5096716

Zhang et al. (2020)

RNA-seq

Yiya-5

Leaf, stem, flower, root and fruit

228.2 M

34.1

SRX10695178– SRX10695182

Sa et al. (2021)

RNA-seq

CDCBethune

Embryos, seeds, root, ovary, anther

1176.2 M

118.8

SRX9164974– SRX9164982

Khan et al. (2020)

RNA-seq

CDCBethune

Seedling

637.4

15.1

SRX400169– SRX400175, SRX400177– SRX400185

GalindoGonzalez and Deyholos (2016)

13

Flax Genomic Resources and Databases

differentiating individuals at a given locus. Genetic markers can be categorized into two general classes: classical and molecular. Classical markers include morphological, cytological and some biochemical markers. Molecular markers are mostly DNA-based and include small discrete polymorphisms such as point mutations and small to large structural variations such as insertions/deletions (InDels), duplications, and translocations. An example of classical markers is the fruit fly red-white eye markers used by Thomas Morgan to confirm the chromosome theory in the 1910s. He was also the first author to elucidate that chromosomes play roles in the inheritance of a specific trait, which earned him the 1933 Nobel Prize in Physiology and Medicine. Today, with the advent of nextgeneration sequencing (NGS), genetic markers, and more specifically molecular markers, can be developed at relatively low cost, with high throughput and high density across chromosomes in all species from which high quality DNA can be extracted. Nadeem et al. (2018) compared many aspects of classical and molecular markers, including their applications in plant breeding (Table 13.4). Of these features, polymorphisms can be codominant or dominant; the former could be represented by two or more alleles, whereas the latter is either present or absent. Co-dominance can determine the adoption of markers because it can be intrinsically linked to informativeness, cost, and ease of use. Co-dominant markers distinguish homozygous and heterozygous genotypes, thereby being more informative than dominant markers. Here, we focus on flax molecular markers and their applications in genetic map construction, QTL mapping, and association analysis. Molecular markers can be grouped into three major categories based on their detection methods: (1) hybridization-based markers, such as restriction fragment length polymorphism (RFLP) (Botstein et al. 1980) and diversity array technology (DArT) (Jaccoud et al. 2001), (2) PCR-based markers, such as randomly amplified polymorphic DNA (RAPD) (Williams et al. 1990), amplified fragment length polymorphism (AFLP) (Vos et al. 1995), inter simple

277

sequence repeat (ISSR) (Zietkiewicz et al. 1994), and simple sequence repeat (SSR) (Tautz 1989), and (3) sequencing-based markers, such as single nucleotide polymorphism (SNP) that were first used on a large scale by the Human Genome Project (Wang et al. 1998). Molecular markers have been broadly applied for genetic map construction, map-based gene cloning, diversity analysis, quantitative trait locus (QTL) mapping, genomic selection, comparative genomics, genome organization, evolutionary studies, and reference genome refinement. The first genetic map of flax, constructed with 213 AFLP and eight RFLP markers, constituted a moderately saturated linkage map covering 1400 cM of the flax genome, assembled into 18 linkage groups (LGs) (Spielmeyer et al. 1998). The second genetic map, covering 1000 cM and 15 linkage groups, was comprised of 13 RFLP and 80 RAPD markers (Oh et al. 2000). These maps were followed by a number of studies using a variety of markers (Fu et al. 2002; Allaby et al. 2005; Smykal et al. 2011; Soto-Cerda et al. 2012; Xie et al. 2018). However, these methods were time-consuming, laborious, and sometimes poorly reproducible (e.g., RAPD), thus creating a need for new types of molecular markers. In the last decade, SSR and then SNP markers have largely replaced these first generations of molecular markers. SSRs are short tandem repeats, with unit sizes generally ranging from one to six bp, and are widely distributed in the flax genome (Tautz 1989; Temnykh et al. 2001). SSR polymorphism is the result of the varying number of repeat units of a PCR amplicon spanning the SSR and resolved through gel electrophoresis. These markers are developed from various genomic sequences. Bacterial artificial chromosome (BAC) end sequences (BES) were used to develop 4064 putative SSRs from CDC Bethune (Ragupathy et al. 2011). SSRs have also been extracted from flax EST libraries (Cloutier et al. 2009; Soto-Cerda et al. 2011), SSR-enriched genomic libraries and other genomic sequences (Deng et al. 2010, 2011). With a panel of 16 flax accessions, 1506 putative SSRs were evaluated,

278

P. Li et al.

Table 13.4 Characteristics of commonly used marker in plants Characteristics

RFLP

RAPD

AFLP

ISSR

SSR

SNP

DArT

Codominant/dominant

Codominant

Dominant

Dominant

Dominant

Codominant

Codominant

Dominant

Reproducibility

High

High

Intermediate

Medium– High

High

High

High

Polymorphism level

Medium

very high

High

High

High

High

High

Required DNA quality

High

High

High

Low

Low

High

High

Required DNA quantity

High

Medium

Low

Low

Low

Low

Low

Marker index

Low

High

Medium

Medium

Medium

High

High

Genome abundance

High

Very high

Very high

Medium

Medium

Very high

Very high

Cost

High

Less

High

High

High

Variable

Cheapest

Sequencing

Yes

No

No

No

Yes

Yes

Yes

Status

Past

Past

Past

Present

Present

Present

Present

PCR requirement

No

Yes

Yes

Yes

Yes

Yes

No

Visualization

Radioactive

Agarose gel

Agarose gel

Agarose gel

Agarose gel

SNPVISTA

Microarray

Required DNA (ng)

10,000

20

500–1000

50

50

50

50–100

Source extracted from Nadeem et al. (2018) RFLP restriction fragment length polymorphism; RAPD randomly amplified polymorphic DNA; AFLP amplified fragment length polymorphism; ISSR inter simple sequence repeat; SSR simple sequence repeat; SNP single nucleotide polymorphism; DArT diversity array technology

leading to the discovery of 818 novel polymorphic SSRs (Cloutier et al. 2012a). Wu et al. (2016) developed SSRs for flax using reducedrepresentation genome sequencing (RRGS). Although the number of SSRs has significantly improved coverage density over the prior generation of markers, the density is not sufficiently high for precise QTL mapping. A SNP is, as the name suggests, a specific nucleotide position that displays a variation in the DNA sequence. SNPs are hitherto the most abundant type of DNA variations and are widely and evenly distributed throughout the genome (Brookes 1999). Thanks to advances in NGS technologies and high-throughput genotyping methods, millions of SNPs have been identified from various genetic populations. However, increasing the number of genotyped individuals can be expensive and requires the application of

SNP discovery methods. Thus, the discovery of SNPs from a given population often relies on reduced representation library strategies, such as genotyping-by-sequencing (GBS) and reduced representation library (RLL). Kumar et al. (2012) introduced the application of the genome-wide SNP discovery through the combination of NGS and the two reduced representation library strategies. Yi et al. (2017) took advantage of specific-locus amplified fragment sequencing (SLAF-seq) to identify SNP markers in an F2 population, resulting in 4,145 SNPs integrated into a genetic map. In addition, structural variations have also contributed to the collection of flax markers. Recently, Jiang et al. (2022) applied the novel InDel markers identification in flax by comparing the genome re-sequencing data of two accessions (87–3 and 84–3) with the flax reference genome (v1.0). A total of 17,110

13

Flax Genomic Resources and Databases

InDel markers were identified and used to perform a principal component analysis of 69 flax accessions (Jiang et al. 2022).

13.2.3 Genetic and Physical Maps A genetic map, also known as a linkage map, represents the relative order of markers and genes on linkage groups or chromosomes. High-density genetic maps are valuable resources for QTL mapping, map-based gene cloning, comparative genomics, genome organization, evolution studies, and for guiding genome assemblies and pseudomolecule generation. The genetic distance between markers is measured by their recombination frequency across the chromosomes, where 1 cM represents a corrected recombination frequency of 1% during meiosis. The physical map is a representation of chromosomes as measured by the actual physical distance in base pairs. In flax, more than ten genetic maps have been generated using an array of molecular markers, including AFLP (Vos et al. 1995; Spielmeyer et al. 1998), RAPD and RFLP (Botstein et al. 1980; Williams et al. 1990; Oh et al. 2000), SSR (Cloutier et al. 2011, 2012b; Asgarinia et al. 2013; Kumar et al. 2015) and SNP (Yi et al. 2017; Wu et al. 2018; Zhang et al. 2018) (Table 13.5). It is important to mention that although flax has 15 chromosomes, hence 15 LGs, many reasons, including the selection of the two parents, marker saturation on the linkage map, the distance between two adjacent markers (longer than a certain threshold cM), and even the setting of the software for genetic map construction can lead to more than 15 LGs in some linkage mapping, as shown in Table 13.5. Maps have been used for association mapping (SotoCerda et al. 2013b, 2014), population linkage disequilibrium (LD) analysis (Soto-Cerda et al. 2012) and reference genome improvement (You et al. 2018b). Ragupathy et al. (2011) constructed the first physical map of flax using 43,776 BAC clones of the cultivar CDC Bethune. A total of 416 contigs spanning 368 Mb were anchored to a genetic map using 129 EST-SSR markers. Now, physical

279

mapping has evolved into the second generation of optical mapping, providing a high-resolution recombination-independent technique that facilitates genome assembly (Lam et al. 2012; Hastie et al. 2013; Ganapathy et al. 2014; Shearer et al. 2014; Zhang et al. 2015). In optical mapping, the labeling patterns of large single DNA molecules are aligned to produce ordered restriction maps. A BioNano genome (BNG) optical map of L. usitatissimum cultivars CDC Bethune and MacBeth and of L. bienne accession LIN1917 have all been used to enhance the formerly assembled reference genome of flax cultivar CDC Bethune (You et al. 2018a, b). For the time being, the flax genetic and physical map information is only available in publications. An online database to provide data storage and user-friendly search tools for these maps is warranted for their reuse in other studies.

13.2.4 Genome Assemblies The algorithms greatly impact the completeness of genome assemblies. Overlap–layout–consensus (OLC) and de Bruijn graph (DBG) are currently favored in WGS assembly (Flicek and Birney 2009; Miller et al. 2010; Schatz et al. 2010; Li et al. 2012). In the last decade, many genome assemblers have been developed to address specific features of the fast-evolving sequencing technologies (Koren et al. 2017; Wang et al. 2021). The SPAdes assembler is a good example of an assembler tailored for small prokaryotic genomes (Bankevich and Pevzner 2016). For large eukaryotic genomes, Canu, based on the Celera assembler developed for noisy long-read data from PacBio and ONT, remains popular (Koren et al. 2017). For short reads only or short and long reads combined, assemblers such as MaSuRCA (a whole-genome assembly software combining the efficiency of the DBG and OLC approaches (Zimin et al. 2013)) are recommended. Recently, Hiltunen et al. (2021) developed ARBitR (Assembly Refinement with Barcode-identity-tagged Reads) which has the following merits: (1) a single application to execute both linkage-finding and

280

P. Li et al.

Table 13.5 Genetic maps of flax Population

Pop size

No. LGs

Map length (cM)

No. of markers AFLP

RFLP

References SSR

19

RAPD

STS

SNP

F2

50

15

DH

59

18

1400

F2

50

15

1000

DH

78

24

834

114

F2

300

15

1241

143

RIL

243

15

1551

665

5

RIL

243

15

362

329

F2

100

15

2633

4145

Yi et al. (2017)

F2

112

15

1483

2339

Wu et al. (2018)

RIL

110– 123

15

1658

4497

Zhang et al. (2018)

213

69

Cullis et al. (1995)

8

Spielmeyer et al. (1998)

13

80

1

Oh et al. (2000) 5

Cloutier et al. (2011) Asgarinia et al. (2013) Cloutier et al. (2012a, b) Kumar et al. (2015)

DH doubled haploid; RIL recombinant inbred line; LG linkage group; AFLP amplified fragment length polymorphism; RAPD randomly amplified polymorphic DNA; RFLP restriction fragment length polymorphism; SSR simple sequence repeat; STS sequence tagged site; SNP single nucleotide polymorphism; Pop population

scaffolding steps in succession, (2) consideration of overlaps between contigs, and (3) utilization of single-tube long fragment reads (stLFR) which makes it compatible with any type of linked-read data. Many tools are available to assess the quality of a genome assembly (Meader et al. 2010; Yuan et al. 2017; Yang et al. 2019; Manchanda et al. 2020). For example, GenomeScope can estimate the genome size, abundance of repetitive elements, and rate of heterozygosity using k-mer frequencies generated from raw reads (Vurture et al. 2017; Ranallo-Benavidez et al. 2020). Also, assembly contaminations can be checked by UniVec (a non-redundant database of sequences commonly attached to cDNA or genomic DNA during the cloning process) (http://www.ncbi. nlm.nih.gov/tools/vecscreen/univec/). The successful construction of a flax physical map by Ragupathy et al. (2011) based on the BAC libraries significantly improved the assembly accuracy of the first (V1.0) flax reference genome assembly that was developed using

94  genome coverage of raw shotgun short reads (Wang et al. 2012b). Despite the substantial number of contigs, it built a milestone for further flax assembly efforts. Integration of the flax consensus genetic map (Cloutier et al. 2012b),the BAC-based physical map (Ragupathy et al. 2011) and the BioNano optical map led to the publication of the first chromosome-scale pseudomolecules of the flax reference genome (V2.0) of CDC Bethune (You et al. (2018b). This second iteration had significantly better accuracy, with only 251 BioNano contigs distributed on chromosomes ranging from 15.6 to 29.4 Mb (You et al. 2018a). In recent years, with the expansion of NGS and the support of other relevant technologies, a few other flax genotypes have been sequenced using second- and third-generation sequencing technologies. The Chinese genotypes Longya-10 (oilseed type), Heiya-14 (fiber type), and a pale flax accession (L. bienne—the wild ancestor of the cultivated flax) were sequenced and assembled with Illumina paired-end short reads and

13

Flax Genomic Resources and Databases

Hi-C data (Zhang et al. 2020). In addition, the Russian fiber variety Atlante was sequenced using ONT and Illumina technologies (Dmitriev et al. 2020), while the Chinese fiber variety Yiya5 was sequenced using PacBio HiFi and Hi-C technology (Sa et al. 2021). The chromosomescale pseudomolecules of 15 chromosomes for Longya-10 and Yiya-5 have been obtained using Hi-C data, genetic maps, and the pseudomolecules of CDC Bethune (V2.0) as a reference guide (Zhang et al. 2020; Sa et al. 2021). Consequently, the chromosome-scale reference genome sequences of oil type varieties CDC Bethune and Longya-10, and that of fiber variety Yiya-5 are publically available. Information concerning the above-mentioned raw reads, assembled scaffolds, pseudomolecules, and corresponding projects are summarized in Table 13.1.

13.2.5 Quantitative Trait Loci (QTLs)/ Nucleotides (QTNs) QTLs are genomic regions associated with phenotypic variation of quantitative traits determined from models and statistical thresholds. QTLs are identified by two major methods: linkage mapping (LM) and association mapping (AM) or genome-wide association studies (GWAS) (Sehgal et al. 2016). LM uses genetic maps constructed from bi-parental populations to identify genomic regions defined by flanking markers and associated with the traits, whereas GWAS uses high-density genome-wide markers positioned on a diversity panel of individuals that permits marker-trait associations to specific nucleotides called QTNs. As of 2020, more than 300 QTLs for 31 quantitative traits, including seed yield and other agronomic traits, seed and fiber traits, and biotic and abiotic stress-related traits, had been reported and summarized (You and Cloutier 2020). These QTLs and QTNs were mapped to the CDC Bethune flax reference genome (V2.0) (You et al. 2018b; You and Cloutier 2020). Additional trait-associated QTNs have been reported since then, including root and shoot traits (Sertse et al. 2019), drought tolerance traits (Sertse et al. 2021), salt tolerance (Li et al. 2022),

281

disease resistance (He et al. 2019; Kanapin et al. 2021; You et al. 2022) and agronomic traits such as flowering time, maturity and plant height (Saroha et al. 2022). These QTLs/QTNs constitute a rich genomic resource to facilitate genomics-assisted breeding.

13.3

Phenotypic Evaluation of Flax Genetic Populations

Trait phenotyping of important traits of genetic panels is the cornerstone of QTL mapping studies and genomic selection (GS) in molecular breeding programs. A good experimental design for phenotyping of a genetic population usually includes the randomization and replications of tested genotypes in the experiment at multiple locations and/or years. Sufficiently large population size can significantly improve the precision, and decrease the bias of QTL mapping and false positives (Wang et al. 2012a). To date, several flax collections or populations have been compiled for QTL mapping and breeding. Diederichsen et al. (2013) compiled a core collection comprising 381 accessions selected from 38 countries representing diverse geographical origins that aimed to assist flax breeding and genetic studies (Diederichsen et al. 2013; Soto-Cerda et al. 2013a). Twenty-six breeding lines and varieties were added to this collection to better represent the overall variation from both ancestral and current germplasm, including recently registered varieties. This flax core collection has been evaluated over multiple years and locations for 27 traits, including ten agronomic, eight seed quality, six fiber, and three disease resistance traits (You et al. 2017). This phenotypic data provides a basis for parent selection for flax cross-breeding and has also been used for association mapping to identify QTLs or QTNs associated with important traits. Some QTNs and candidate genes associated with flax pasmo and powdery mildew resistance have been identified using this flax core collection (He et al. 2019; You et al. 2022). Guo et al. (2020) constructed a core collection containing 200 accessions that were selected

282

from their 901 worldwide germplasm collection, representing a rich genetic diversity and wide geographical distribution. Of the 200 accessions, 87 were from Plant Gene Resources Canada (PGRC), 105 from the US National Plant Germplasm System (NPGS), and eight from the Chinese Crop Germplasm Resource Information system (CGRIS). With this panel, 13 candidate genes associated with seed size and weight were identified via the de novo identified SNPs (Guo et al. 2020), suggesting that the oil morphotype is the ancestor of cultivated flax, and the intermediate oil-fiber morphotype is a transition state in the evolution of the species. This same panel was also studied for salt tolerance and endosperm imprinting (Jiang et al. 2021; Li et al. 2022). Soil salinization is a common environmental stress affecting flax seed germination. Traits such as relative germination rate (RGR), relative root length (RRL), and relative shoot length (RSL) determine the salt tolerance of flax at the seed germination stage. The GWAS study for salt tolerance using these 200 diverse accessions identified 64 QTLs that dispersed on all 15 chromosomes and explained 14.48–29.38% of phenotypic variation, as well as 268 candidate genes validated by transcriptomics (Li et al. 2022). Genomic imprinting is an interesting epigenetic phenomenon that leads to biased gene expression from maternal and paternal alleles, and is predominantly occurring in the endosperm. Analyses of RNA-seq data derived from reciprocal crosses between flax accessions Clli2179 and Z11637 identified 248 candidate genes for imprinting (Jiang et al. 2021). A phylogenetic tree, built with the SNPs that are associated with these imprinted gene candidates, clustered the accessions into three subgroups: oil, oil-fiber, and fiber, suggesting that the intraspecific morphotype is relevant to the genomic imprinting (Sa et al. 2021). Duk et al. (2021) generated another core collection comprising 306 accessions of different morphotypes and geographic origins that are maintained by the Russian Federal Research Center for Bast Fiber Crops (RFRCBFC). Besides the significant differentiation between

P. Li et al.

the oil and fiber flax morphotypes, the authors also unraveled the origins of the Russian heritage landrace Kryazhs and its genetic relatedness to modern fiber flax cultivars (Duk et al. 2021). RFRCBFC also phenotyped 406 samples that were selected from quantitative assessments of traits, including height, stem length, thousandseed weight, and other agronomic traits. This data was shared via the Figshare (Kanapin et al. 2021). Fusarium wilt is an important fungal disease affecting flax. To reveal the mechanisms underlying this wilt disease, Kanapin et al. (2021) evaluated wilt resistance for three years in a flax core collection of 297 accessions, including 179 fiber, 117 linseed, and one unknown morphotype accession. GWAS was performed, detecting a total of 15 QTNs in at least two years of observation. They found that ten of these QTNs were located in a region of 640 Kb at the start of chromosome 1 and explained 7–10% of wilt phenotypic variation (Kanapin et al. 2021).

13.4

Genomics and Breeding Databases

Searchable databases are useful for making full use of genomic resources, phenotypic data, and breeding information. Various genomics and breeding databases have been developed in the last few decades. The NCBI database is a large, global, and comprehensive genomic data repository. Most published, raw, and annotated genomics data are submitted to NCBI databases. Some comprehensive genomics and breeding databases have been developed for individual crops, such as MaizeGDB (Portwood et al. 2019), SoyBase (Grant et al. 2010; Brown et al. 2021), and CerealsDB (wheat) (Wilkinson et al. 2020; Winfield et al. 2022), as well as for a few related crops, such as GrainGenes (Matthews et al. 2003; Blake et al. 2019) and the Triticeae toolbox (T3) (wheat, barley, and oat), and for many crops and model plant species, such as Gramene (Tello-Ruiz et al. 2022). For flax, there are convenient online tools available (Table 13.6), collating data concerning germplasm, cultivars, genomic data,

13

Flax Genomic Resources and Databases

283

Table 13.6 Major genomic and breeding databases involving flax data Database

Data type

Website

Country

NCBI

Genomic

https://www.ncbi.nlm. nih.gov/

USA

Contact

Phytozome

Genomic

https://phytozome-next. jgi.doe.gov/

USA

UTILLdb

TILLING data, phenotype

https://bit.ly/34Fsjzm

FRA

[email protected]

ECPGR

Germplasm, Agronomy

https://bit.ly/3h7dmbX

CZE

prokopova@agritec. cz

RFRC

Phenotype

https://bit.ly/3rUgJcl

RUS

m. samsonova@spbstu. ru

CFIA

Canada Variety

https://bit.ly/3HHLc3y

CAN

GRIN

USA Variety

https://www.ars-grin. gov/

USA

GRINGlobal-CA

Variety

https://bit.ly/3BzgMgP

CAN

PGRC

Germplasm

https://bit.ly/3hTEUlq

CAN

axel. diederichsen@agr. gc.ca

FlaxDB

Phenotype, pedigree, SNPs, genomic data

Not available yet

CAN

[email protected]

NCBI National Center for Biotechnology Information; ECPGR European Cooperative Programme for Plant Genetic Resources; RFRC Russian Federal Research Center for Bast Fiber Crops; CFIA Canadian Food Inspection Agency; GRIN the Germplasm Resources Information Network; GRIN-CA a Canadian adaptation of the GRIN genebank information system; PGRC Plant Gene Resources of Canada; CZE Czech Republic; CAN Canada; FRA France; RUS Russia

phenotypic data, and breeding-related information. Of these resources, NCBI, Phytozome, UTILLdb, GRIN-global-CA, and IFDB (ECP/GR) provide powerful and user-friendly tools to retrieve and download useful data for flax genetic studies and breeding.

13.4.1 NCBI NCBI collates and provides online access to a wide range of biological information stored in 35 distinct databases, such as GenBank®, nr/nt database, and PubMed® for literature citations and abstracts published in most life science journals (Sayers et al. 2022). NCBI provides search, view, and data downloading functions for most of these resources. Each database supports text searching, record linking among databases based on asserted relationships, and data

downloading in multiple formats. Records retrieved from Entrez can be viewed in different formats and downloaded individually or in a batch mode. Various programming interfaces and tools are also provided for most of these databases. The NCBI databases can be accessed through their website’s home page at https:// www.ncbi.nlm.nih.gov or by using the E-utilities of the Application Programming Interface for Entrez functions. Most flax genomic data generated for publications are submitted to NCBI databases, constituting the largest genomic data repository for flax. As of April 4, 2022, the flax genomic data of the Linum genus, primarily L. usitatissimum and L. bienne species, was distributed into 20 separate databases in five major categories: Literature, Genomes, Genes, Proteins, and PubChem (Table 13.7). More than 4000 abstracts were found in the PubMed database using the

284

search keyword “Linum”. Because PubMed Central collects the full texts of publications; hence, the same query keywords would return an even greater number of hits through the extension of the literature beyond that specifically addressing flax. Thus, more than 12,000 publications refer to Linum genus. Five flax genome assemblies have been deposited into the Genome database since 2012, including one L. bienne and four L. usitatissimum and their sequence data are downloadable via the accession numbers listed in Table 13.1. More than 380,000 flax nucleotide sequences have been uploaded into GenBank, representing DNA fragments, mRNA, RefSeq, transcripts, and gene sequences, providing a foundation for flax research and discovery. The short read archive (SRA) database, a raw data repository for next-generation sequencing (NGS) data, contains more than 3000 runs of raw NGS reads related to 2371 biological samples (BioSample) from 124 biological projects (BioProject). These projects include genome assembly, RNA-seq and gene expression, DNA methylation, microRNA prediction, diversity and evolution, SNP identification, association mapping, and so on.

13.4.2 Phytozome Phytozome is a repository dedicated to offering a comprehensive plant genome database and a web portal for annotated plant genomes and gene families, accessible at http://www.phytozome.net. Phytozome provides functions to browse the evolutionary history of gene families and individual genes. It also offers tools to retrieve, view, and download sequences and functional annotations of genomes for plants, selected algae sequenced at the Joint Genome Institute, as well as some non-model species sequenced elsewhere (Goodstein et al. 2012). Since its release in 2008, Phytozome v13 has hosted 261 assembled and annotated genomes from 139 Archaeplastida species, 54 Brachypodium distachyon lines from the BrachyPan pan-genome project, 20 species from the Brassicales Map Alignment Project, and

P. Li et al.

eight cowpea (Vigna unguiculata) genomes from the CowpeaPan pan-genome project. To date, Phytozome hosts only the first version (V1.0) of the CDC Bethune flax reference genome sequence and its gene annotation (Wang et al. 2012b). It is the only public source of this assembly version and it contains the original assemblies (scaffolds) and gene annotation. The chromosomescale pseudomolecules (V2.0) (Table 13.1) and their gene annotation were developed based on the scaffolds of this version (You et al. 2018b). The complete sequences of the V2.0 are available at NCBI (assembly accession: ASM22429v2; accessions: CP027619.1–CP027633.1), and the gene annotation for the pseudomolecules is available as a supplementary file in the publication by You and Cloutier (2020).

13.4.3 Flax TILLING Platform To accelerate the functional characterization of genes related to important traits for crop improvement, targeting local induced lesions in genomes (TILLING) has been used. The TILLING technology is a method for probing gene function that combines random chemical mutagenesis with an efficient screening technology for identifying mutations. It is complementary to T-DNA mutant libraries in terms of reverse genetic studies and can speed up forward genetic studies. TILLING has the advantages of being more efficient, less costly, and less time-consuming than traditional methods. It employs chemical mutagens such as Ethane Methyl-Sulphonate (EMS) to create heritable genetic variations in plant genomes. After one generation of reproduction, researchers can take advantage of several conventional molecular operations to anchor the mutations within the targeted gene(s) by screening the DNA of M2 plants. This reverse genetic approach is thus named because it allows the functional characterization of genes from genotype to phenotype. The advent of TILLING has benefitted many crops since its first-time use in the Arabidopsis (Colbert et al. 2001), including wheat (Slade

13

Flax Genomic Resources and Databases

285

Table 13.7 Major NCBI databases and entries containing Linum (as of April 4, 2022) Database

Number of records

Description

Literature PubMed PubMed Central

4,059 12,767

Scientific abstracts and citations Free full-text journal articles

NLM Catalog

36

An Entrez-based interface to National Library of Medicine (NLM) bibliographic records

Bookshelf

93

Books and reports

MeSH

6

Ontology used for PubMed indexing

Genomes Nucleotide

383,227

BioSample

2,371

SRA

3,104

Assembly

5

BioProject

124

Genome

2

DNA/RNA deposited in GenBank and RefSeq Biological samples Runs of high-throughput DNA/RNA sequence data Genome assemblies for four flax cultivars of L. usitatissimum and one L. bienne Biological projects submitting data to NCBI Genome sequencing projects (L. usitatissimum and L. bienne)

Genes Gene

529

Gene loci

GEO DataSets

531

Gene expression datasets the Gene Expression Omnibus (GEO) repository

PopSet

438

NDA sequence sets derived from population, phylogenetic and mutation studies

Proteins Protein Identical Protein Groups Structure

14,670 5,735 13

Protein sequences from GenBank and RefSeq Protein sequences grouped by identity Experimentally-determined biomolecular structures

Chemicals PubChem Substance

4

PubChem BioAssay

16

Pathways

559

Substance and chemical information Bioactivity screening studies Molecular pathways

et al. 2005; Chen et al. 2012), rice (Wu et al. 2005; Till et al. 2007), sorghum (Xin et al. 2008), soybean (Cooper et al. 2008; Dierking and Bilyeu 2009), and Brassica rapa (Stephenson et al. 2010; Zhang et al. 2014). The database UTILLdb was initially developed to provide a high-throughput forward and reverse genetics tool, containing phenotypic and sequence information on mutant genes in pea

(Dalmais et al. 2008). UTILLdb (http://urgv.evry. inra.fr/UTILLdb) can be searched online for TILLING alleles using sequences or phenotypic information, by typing keywords on the page. The same platform has been applied to flax. Chantreau et al. (2013) developed this platform to accelerate flax genetic studies using cv. Diane’s seeds (M0) with three concentrations of EMS (0.3, 0.6 and 0.75%). A total of 4033 M2 families were

286

collected and phenotyped. Of these, 1552 showed visual phenotypic differences, including differences in stems, leaves, and plant architecture. More than 55% of the families showed multiple phenotypes compared to the wild type. To evaluate the quality of this TILLING population, two flax lignin genes coumarate-3-hydroxylase (C3H) and cinnamyl alcohol dehydrogenase (CAD) genes were TILLed. A total of 79 and 76 point mutations in C3H and CAD genes, respectively, were successfully identified from the TILLING population, with an average mutation rate of one per 41 Kb, giving rise to approximately 9000 mutations per genome. The web portal of UTILLdb (http://urgv.evry. inra.fr/UTILLdb) provides some basic query services based on gene name and phenotype to facilitate the search for interested gene tilling targets. For example, candidate genes in the vicinity of a QTL region can be queried and seeds of the mutant lines from the M2 population that contain mutations can be requested from Institut national de la recherche agronomique (INRA), Unité de Recherche en Génomique Végétale (URGV). Instructions for using the INRA URGV TILLING platform and UTILLdb project are available on their website (http://urgv. evry.inra.fr/UTILLdb).

13.4.4 Canadian National Gene Bank Information System— GRIN-Global-CA GRIN-Global-CA, a Canadian version of Germplasm Resources Information Network (GRIN)Global Genebank information system, is dedicated to the management and inspection of germplasm held at the Plant Gene Resources of Canada (PGRC), the Canadian Clonal Genebank (CCGB) and the Canadian Potato Genetic Resources (CPGR). All information, including passport, characterization, evaluation data, and taxonomic information of the germplasm, can be retrieved through GRIN-Global-CA at https:// pgrc-rpc.agr.gc.ca/gringlobal/landing. PGRC, a vital Canadian national plant germplasm repository or national seed genebank, was

P. Li et al.

established in 1970. Its main duty is to collect and share the genetically diverse germplasm of cultivated plants and their corresponding wild relatives, with a specific emphasis on germplasm relevant to Canada. PGRC is managed and funded by the Canadian government through Agriculture and Agri-Food Canada (AAFC). To date, the PGRC station located at the AAFC Research and Development Centre in Saskatoon has preserved an impressive collection of more than 110,000 unique seed samples, representing 47 botanical families, 258 genera and nearly 1000 species. All germplasm preserved at PGRC is freely available upon request for research, plant breeding, and education purposes according to the provisions of the International Treaty on Plant Genetic Resources for Food and Agriculture (https://www.fao.org/plant-treaty/). PGRC preserves more than 3,300 accessions belonging to the Linum genus (Diederichsen and Fu 2008; Diederichsen et al. 2013), making it one of the major collections of flax germplasm in the world. As of April 4, 2022, a total of 3529 flax entries with passport and taxonomic information as well as phenotypic data for some important traits are available from GRIN-GlobalCA, including accessions from the cultivated flax species L. usitatissimum (L.) and other Linum species. These were acquired from 68 countries, representing all historical and present-day regions where linseed and fiber flax have been or are being produced. More than 90% of these accessions have been evaluated for major agronomic traits, including height, fiber content, adaptation to dry growing environment, disease resistance and the following seed traits: color, thousand-seed weight, oil content, vigor, lignin content, and fatty acid composition (Diederichsen 2001; Diederichsen and Fu 2006; Diederichsen and Raney 2006; Diederichsen et al. 2006; Diederichsen and Fu 2008; Diederichsen and Ulrich 2009). The Canadian flax core collection was compiled from this collection for genomic studies and breeding uses (Diederichsen et al. 2013), and has been widely evaluated in multiple environments (You et al. 2017).

13

Flax Genomic Resources and Databases

13.4.5 Flax Variety Databases Released flax varieties and their pedigree and performance information are important for flax breeding. Information on the flax varieties registered in Canada can be retrieved from four major sources: (1) the flax database maintained by the Canadian Food Inspection Agency (https://bit.ly/3gRQ7CA), (2) publications of new variety descriptions in the Journal of Agronomy and Crop Sciences and the Canadian Journal of Plant Sciences, (3) the Canadian version of Germplasm Resources Information Network (GRIN) database (GRIN-Global-CA), and (4) the database of crop varieties registered in Canada supported by the Canadian Food Inspection Agency (CFIA) (https://inspection. canada.ca/active/netapp/regvar/regvar_lookupe. aspx). As of January 6, 2022, the CFIA variety database hosted a total of 78 released flax varieties.

13.4.6 International Flax Database (IFDB) The European Cooperative Programme for Crop Genetic Resources (ECP/GR) network is a collaborative organization of European countries whose long-term goals are the conservation and utilization of plant genetic resources in Europe. The program operates through ten networks that deal with either a group of crops (cereals, forages, vegetables, grain legumes, fruits, minor crops, industrial crops, and potatoes) or general themes (such as documents, information, in situ and on-farm conservation, inter-regional cooperation) related to plant genetic resources. One of the resources is the International Flax Database (IFDB), which was established and managed by AGRITEC, a company located in the Czech Republic since 1994. The size of this collection has increased from 1416 accessions at its inception to 11,141 in 2010. Flax seeds are stored in glass jars at −5 °C (active collection) and −15 °C (base collection) with moisture levels kept at 5% in both collections. These accessions are distributed throughout 22 contributing gene banks

287

in 15 countries. A total of 28 traits, including 17 morphological, four biological, six agronomic, and one cytological were evaluated (Maggioni et al. 2001). This database was first accessible online and can now be retrieved as a downloadable off-line Excel format (https://www. ecpgr.cgiar.org/resources/germplasm-databases/ list-of-germplasm-databases/crop-databases/ crop-database-windows/flax).

13.4.7 FlaxDB: A Flax Genome and Breeding Database FlaxDB is a new and comprehensive web-based platform that integrates genomic and breeding resources, including germplasm, pedigrees, genomic data, phenotypic data, and essential bioinformatics tools, to bridge phenotypes and genotypes in a way that could be helpful to flax breeding programs (Fig. 13.2). This database project, led by the University of Saskatchewan and Agriculture and Agri-Food Canada (AAFC), is financially supported by the Agriculture Development Fund (provincial government of Saskatchewan), Saskatchewan Flax Development Commission (SaskFlax), SeCan, and AAFC. This database is still under development and its public release is imminent. To date, FlaxDB comprises nice data modules to retrieve many types of data, including germplasm, pedigrees, phenotypes, genome sequences, genetic maps, markers, SNPs, and QTLs. The web application was developed using Python, based on the Django Framework, combined with other software tools such as C++, Java, and R statistical packages. All collected data is deposited into MySQL and MongoDB databases, which simplifies its management and improves query performance. Additional bioinformatics tools, such as JBrowse and BLAST, have been added to the application. Genomics data, such as SNPs, markers, and reference genome sequences, have been integrated into the browser. JBrowse is a comprehensive, pluggable, and open-source computational platform for biological data visualization and integration (Buels et al. 2016),

288

P. Li et al.

Fig. 13.2 The home page of FlaxDB

which was built with JavaScript and HTML5 and is dedicated to the development of websites or apps. JBrowse has its own database management, which is not dependent on MySQL or MongoDB. JBrowse loads the data as individual tracks, including the genome or coding gene sequences which are initialized from supported file formats prior to being loaded into the browser. At present, JBrowse supports almost all biological file formats, including GFF3, BED, VCF, BAM, and BigWIG. Users can load one or many data types into the browser as individual tracks using the embedded tools of the JBrowse package by initializing these supported file

formats, a process that significantly simplifies the management of the database from disparate datasets. Pedigree information of flax germplasm or cultivars is useful for parent selection in cross breeding. We have collected and verified the pedigree information of 82 cultivars released in Canada since 1910 (You et al. 2016). The pedigree data of these Canadian flax cultivars have been integrated into FlaxDB (Fig. 13.3a). The pedigree trees (Fig. 13.3b) and coefficients of percentage (CP) between cultivars can be interactively retrieved from FlaxDB.

13

Flax Genomic Resources and Databases

289

Fig. 13.3 Illustration of the pedigree functionality of FlaxDB (a) and an example of pedigree display illustrating the pedigree of Canadian cultivar CDC Bethune (b)

More than 250,000 flax SNPs identified from the flax core collection (He et al. 2019; You et al. 2022), selected breeding lines (You et al. 2022) and some biparental populations (You et al. 2018c), and phenotypic data of these populations from multiple years and locations (You et al. 2017, 2022; He et al. 2019) as well as related QTLs and QTNs associated with important flax traits have been integrated into FlaxDB. Additional published data will be continually added to the database in the future.

13.5

Perspectives

Over the last decade, a huge amount of genomic and phenotypic data have been generated in flax. It is imperative to merge the genomic and phenotypic data in order to capitalize on new plant breeding strategies that rely heavily on genomic data. These strategies include association mapping, where marker-phenotype associations are established to identify markers that can be used

290

in marker-assisted backcrossing to widen the genetic variability. Genomic selection (GS) is another strategy that has been shown to be tremendously efficient in wheat and barley, two crops with genomes far more complex than flax, indicating that GS is likely to be readily applicable and highly beneficial in flax. In addition, breeding programs, cooperative testing and other programs have accumulated additional information, including large amounts of phenotypic data that can be highly valuable for new cultivar development. Most resources, even though they have been published and the data has been submitted to NCBI, have not been fully analyzed. Their integration into a comprehensive database has value because (1) the data will be easier to utilize and (2) it will increase the bioinformatics capacity. More importantly, to take full advantage of this data, it also needs to be integrated and organized to be easily queried and retrieved to assist plant breeders in designing the best and most efficient breeding strategies. Deep analyzes, complete integration, and user-friendly web interfaces for efficient queries and convenient retrieval of the data are paramount to capitalize on these cuttingedge discoveries. There are already several welldeveloped databases managing both genomic and phenotypic information for crop species. For example, GrainGenes (Matthews et al. 2003; Blake et al. 2019) focuses on grasses and cereals, storing both genetic and phenotypic information; MaizeGDB is a repository for maize sequences, stocks, phenotypes, genotypic and karyotypic variations as well as chromosomal mapping data (Portwood et al. 2019); SoyBase, a soybean genetics and genomics database with a breeder’s toolbox, is a comprehensive repository for professionally-curated genetics, genomics and related data resources for soybean (Grant et al. 2010; Brown et al. 2021). CerealsDB is another online resource containing a range of genomic datasets for wheat that is dedicated to assisting plant breeders and scientists in selecting the most appropriate markers for MAS. CerealsDB currently contains in excess of 100,000 putative varietal SNPs, DArT markers, and EST sequences linked to the wheat reference genome

P. Li et al.

sequence of Chinese Spring (Wilkinson et al. 2012; Winfield et al. 2022). Though a huge amount of genetics, genomics, genotypic, and phenotypic data has been generated for flax, no comprehensive database has been released to date, a niche we hope will be filled by FlaxDB, which is such a comprehensive database. It is being developed to allow efficient management and full use of flax genomic resources and breeding data generated in a variety of research projects and to enable plant breeders to perform efficient parent and offspring selections. This database will be released to the public as soon as it is completed and a suitable web server is available. Acknowledgements We thank Dr. Bourlaye Fofana for his review and editing, and Tara Edwards for English editing.

References Allaby RG, Peterson GW, Merriwether DA, Fu YB (2005) Evidence of the domestication history of flax (Linum usitatissimum L.) from genetic diversity of the sad2 locus. Theor Appl Genet 112:58–65 Asgarinia P, Cloutier S, Duguid S, Rashid K, Mirlohi A et al (2013) Mapping quantitative trait loci for powdery mildew resistance in flax (Linum usitatissimum L.). Crop Sci 53:2462–2472 Bankevich A, Pevzner PA (2016) TruSPAdes: barcode assembly of TruSeq synthetic long reads. Nat Methods 13:248–250 Blake VC, Woodhouse MR, Lazo GR, Odell SG, Wight CP et al (2019) GrainGenes: centralized small grain resources and digital platform for geneticists and breeders. Database (Oxford) 2019:baz065 Botstein D, White RL, Skolnick M, Davis RW (1980) Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am J Hum Genet 32:314–331 Brookes AJ (1999) The essence of SNPs. Gene 234:177– 186 Brown AV, Conners SI, Huang W, Wilkey AP, Grant D et al (2021) A new decade and new data at SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res 49:D1496–D1501 Buels R, Yao E, Diesh CM, Hayes RD, Munoz-Torres M et al (2016) JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biol 17:66 Chantreau M, Grec S, Gutierrez L, Dalmais M, Pineau C et al (2013) PT-Flax (phenotyping and TILLinG of flax): development of a flax (Linum usitatissimum L.)

13

Flax Genomic Resources and Databases

mutant population and TILLinG platform for forward and reverse genetics. BMC Plant Biol 13:159 Chen L, Huang L, Min D, Phillips A, Wang S et al (2012) Development and characterization of a new TILLING population of common bread wheat (Triticum aestivum L.). PLoS ONE 7:e41570 Cloutier S, Niu Z, Datla R, Duguid S (2009) Development and analysis of EST-SSRs for flax (Linum usitatissimum L.). Theor Appl Genet 119:53–63 Cloutier S, Ragupathy R, Niu Z, Duguid S (2011) SSRbased linkage map of flax (Linum usitatissimum L.) and mapping of QTLs underlying fatty acid composition traits. Mol Breed 28:437–451 Cloutier S, Miranda E, Ward K, Radovanovic N, Reimer E et al (2012a) Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.). Theor Appl Genet 125:685–694 Cloutier S, Ragupathy R, Miranda E, Radovanovic N, Reimer E et al (2012b) Integrated consensus genetic and physical maps of flax (Linum usitatissimum L.). Theor Appl Genet 125:1783–1795 Colbert T, Till BJ, Tompa R, Reynolds S, Steine MN et al (2001) High-throughput screening for induced point mutations. Plant Physiol 126:480–484 Cooper JL, Till BJ, Laport RG, Darlow MC, Kleffner JM et al (2008) TILLING to detect induced mutations in soybean. BMC Plant Biol 8:9 Cullis CA, Oh TJ, MB G (1995) Genetic mapping in flax (Linum usitatissimum). In: Proceeding 3rd Meeting Int Flax Breed Res Group. St Valery-en-caux, France, pp 161–169 Dalmais M, Schmidt J, Le Signor C, Moussy F, Burstin J et al (2008) UTILLdb, a Pisum sativum in silico forward and reverse genetics tool. Genome Biol 9:R43 Day A, Addi M, Kim W, David H, Bert F et al (2005) ESTs from the fibre-bearing stem tissues of flax (Linum usitatissimum L.): expression analyses of sequences related to cell wall development. Plant Biol (Stuttg) 7:23–32 Deng X, Long S, He D, Li X, Wang Y et al (2010) Development and characterization of polymorphic microsatellite markers in Linum usitatissimum. J Plant Res 123:119–123 Deng S, Wu X, Wu Y, Zhou R, Wang H et al (2011) Characterization and precise mapping of a QTL increasing spike number with pleiotropic effects in wheat. Theor Appl Genet 122:281–289 Diederichsen A (2001) Comparison of genetic diversity of flax (Linum usitatissimum L.) between Canadian cultivars and a world collection. Plant Breed 120:360–362 Diederichsen A, Fu YB (2006) Phenotypic and molecular (RAPD) differentiation of four infraspecific groups of cultivated flax (Linum usitatissimum L. subsp. usitatissimum). Genet Resour Crop Evol 53:77–90 Diederichsen A, Fu YB (2008) Flax genetic diversity as the raw material for future success. In: 2008 international conference on flax and other bast plants Saskatoon, SK, pp 270–280

291 Diederichsen A, Raney JP (2006) Seed colour, seed weight and seed oil content in Linum usitatissimum accessions held by plant gene resources of Canada. Plant Breed 125:372–377 Diederichsen A, Ulrich A (2009) Variability in stem fibre content and its association with other characteristics in 1177 flax (Linum usitatissimum L.) genebank accessions. Ind Crops Prod 30:33–39 Diederichsen A, Rozhmina TA, Zhuchenko AA, Richards KW (2006) Screening for broad adaptation in 96 flax (Linum usitatissimum L.) accessions under dry and warm conditions in Canada and Russia. Plant Genet Resour Newsl 146:9–16 Diederichsen A, Kusters PM, Kessler D, Bainas Z, Gugel RK (2013) Assembling a core collection from the flax world collection maintained by plant gene resources of Canada. Genet Resour Crop Evol 60:1479–1485 Dierking EC, Bilyeu KD (2009) New sources of soybean seed meal and oil composition traits identified through TILLING. BMC Plant Biol 9:89 Dmitriev AA, Krasnov GS, Rozhmina TA, Novakovskiy RO, Snezhkina AV et al (2017) Differential gene expression in response to Fusarium oxysporum infection in resistant and susceptible genotypes of flax (Linum usitatissimum L.). BMC Plant Biol 17:253 Dmitriev AA, Pushkova EN, Novakovskiy RO, Beniaminov AD, Rozhmina TA et al (2020) Genome sequencing of fiber flax cultivar Atlant using Oxford Nanopore and Illumina platforms. Front Genet 11:590282 Duk M, Kanapin A, Rozhmina T, Bankin M, Surkova S et al (2021) The genetic landscape of fiber flax. Front Plant Sci 12:764612 Flicek P, Birney E (2009) Sense from sequence reads: methods for alignment and assembly. Nat Methods 6: S6–S12 Fu YB, Peterson G, Diederichsen A, Richards KW (2002) RAPD analysis of genetic relationships of seven flax species in the genus Linum L. Genet Resour Crop Evol 49:253–259 Galindo-Gonzalez L, Deyholos MK (2016) RNA-seq transcriptome response of flax (Linum usitatissimum L.) to the pathogenic fungus Fusarium oxysporum f. sp. lini. Front Plant Sci 7:1766 Ganapathy G, Howard JT, Ward JM, Li J, Li B et al (2014) High-coverage sequencing and annotated assemblies of the budgerigar genome. Gigascience 3:11 Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD et al (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40:D1178– D1186 Grant D, Nelson RT, Cannon SB, Shoemaker RC (2010) SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res 38:D843– D846 Guo D, Jiang H, Yan W, Yang L, Ye J et al (2020) Resequencing 200 flax cultivated accessions identifies

292 candidate genes related to seed size and weight and reveals signatures of artificial selection. Front Plant Sci 10:1682 Hastie AR, Dong L, Smith A, Finklestein J, Lam ET et al (2013) Rapid genome mapping in nanochannel arrays for highly complete and accurate de novo sequence assembly of the complex Aegilops tauschii genome. PLoS ONE 8:e55864 He L, Xiao J, Rashid KY, Yao Z, Li P et al (2019) Genome-wide association studies for pasmo resistance in flax (Linum usitatissimum L.) Front Plant Sci 9:1982 Hiltunen M, Ryberg M, Johannesson H (2021) ARBitR: an overlap-aware genome assembly scaffolder for linked reads. Bioinformatics 37:2203–2205 Jaccoud D, Peng K, Feinstein D, Kilian A (2001) Diversity arrays: a solid state technology for sequence information independent genotyping. Nucleic Acids Res 29:E25 Jiang H, Guo D, Ye J, Gao Y, Liu H et al (2021) Genomewide analysis of genomic imprinting in the endosperm and allelic variation in flax. Plant J 107:1697–1710 Jiang H, Pan G, Liu T, Chang L, Huang S et al (2022) Development and application of novel InDel markers in flax (Linum usitatissimum L.) through wholegenome re-sequencing. Genet Resour Crop Evol 69:1471–1483 Kanapin A, Bankin M, Rozhmina T, Samsonova A, Samsonova M (2021) Genomic regions associated with Fusarium wilt resistance in flax. Int J Mol Sci 22:12383 Khan N, You FM, Datla R, Ravichandran S, Jia B et al (2020) Genome-wide identification of ATP binding cassette (ABC) transporter and heavy metal associated (HMA) gene families in flax (Linum usitatissimum L.). BMC Genomics 21:722 Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH et al (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736 Kumar S, You FM, Cloutier S (2012) Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries. BMC Genomics 13:684 Kumar S, You FM, Duguid S, Booker H, Rowland G et al (2015) QTL for fatty acid composition and yield in linseed (Linum usitatissimum L.). Theor Appl Genet 128:965–984 Lam ET, Hastie A, Lin C, Ehrlich D, Das SK et al (2012) Genome mapping on nanochannel arrays for structural variation analysis and sequence assembly. Nat Biotechnol 30:771–776 Li Z, Chen Y, Mu D, Yuan J, Shi Y et al (2012) Comparison of the two major classes of assembly algorithms: overlap-layout-consensus and de-bruijngraph. Brief Funct Genomics 11:25–37 Li X, Guo D, Xue M, Li G, Yan Q et al (2022) Genomewide association study of salt tolerance at the seed germination stage in flax (Linum usitatissimum L.). Genes (Basel) 13:486

P. Li et al. Long SH, Deng X, Wang YF, Li X, Qiao RQ et al (2012) Analysis of 2,297 expressed sequence tags (ESTs) from a cDNA library of flax (Linum ustitatissimum L.) bark tissue. Mol Biol Rep 39:6289–6296 Maggioni LPM, van Soest LJM, Lipman E (comps) (2001) Flax genetic resources in Europe. Ad Hoc Meet 7–8 Manchanda N, Portwood JL 2nd, Woodhouse MR, Seetharam AS, Lawrence-Dill CJ et al (2020) GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations. BMC Genomics 21:193 Matthews DE, Carollo VL, Lazo GR, Anderson OD (2003) GrainGenes, the genome database for smallgrain crops. Nucleic Acids Res 31:183–186 Meader S, Hillier LW, Locke D, Ponting CP, Lunter G (2010) Genome assembly quality: assessment and improvement using the neutral indel model. Genome Res 20:675–684 Miller JR, Koren S, Sutton G (2010) Assembly algorithms for next-generation sequencing data. Genomics 95:315–327 Nadeem MA, Nawaz MA, Shahid MQ, Doğan Y, Comertpay G et al (2018) DNA molecular markers in plant breeding: current status and recent advancements in genomic selection and genome editing. Biotechnol Biotechnol Equip 32:261–285 Oh TJ, Gorman M, Cullis CA (2000) RFLP and RAPD mapping in flax (Linum usitatissimum). Theor Appl Genet 101:590–593 Portwood JL 2nd, Woodhouse MR, Cannon EK, Gardiner JM, Harper LC et al (2019) MaizeGDB 2018: the maize multi-genome genetics and genomics database. Nucleic Acids Res 47:D1146–D1154 Ragupathy R, Rathinavelu R, Cloutier S (2011) Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome. BMC Genomics 12:217 Ranallo-Benavidez TR, Jaron KS, Schatz MC (2020) GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11:1432 Sa R, Yi L, Siqin B, An M, Bao H et al (2021) Chromosome-level genome assembly and annotation of the fiber flax (Linum usitatissimum) genome. Front Genet 12:735690 Saroha A, Pal D, Gomashe SS, Akash X, Kaur V et al (2022) Identification of QTNs associated with flowering time, maturity and plant height traits in Linum usitatissimum L. using genome wide association study. Front Genet 13 (in press) Sayers EW, Bolton EE, Brister JR, Canese K, Chan J et al (2022) Database resources of the national center for biotechnology information. Nucleic Acids Res 50: D20–D26 Schatz MC, Delcher AL, Salzberg SL (2010) Assembly of large genomes using second-generation sequencing. Genome Res 20:1165–1173 Sehgal D, Singh R, Rajpal VR (2016) Quantitative trait loci mapping in plants: concepts and approaches. In: Rajpal VR, Rao SR, Raina SN (eds) Molecular

13

Flax Genomic Resources and Databases

breeding for sustainable crop improvement. Springer, pp 31–59 Sertse D, You FM, Ravichandran S, Cloutier S (2019) The complex genetic architecture of early root and shoot traits in flax revealed by genome-wide association analyses. Front Plant Sci 10:1483 Sertse D, You FM, Ravichandran S, Soto-Cerda BJ, Duguid S et al (2021) Loci harboring genes with important role in drought and related abiotic stress responses in flax revealed by multiple GWAS models. Theor Appl Genet 134:191–212 Shearer LA, Anderson LK, de Jong H, Smit S, Goicoechea JL et al (2014) Fluorescence in situ hybridization and optical mapping to correct scaffold arrangement in the tomato genome. G3 (Bethesda) 4:1395–1405 Slade AJ, Fuerstenberg SI, Loeffler D, Steine MN, Facciotti D (2005) A reverse genetic, nontransgenic approach to wheat crop improvement by TILLING. Nat Biotechnol 23:75–81 Smykal P, Bacova-Kerteszova N, Kalendar R, Corander J, Schulman AH et al (2011) Genetic diversity of cultivated flax (Linum usitatissimum L.) germplasm assessed by retrotransposon-based markers. Theo Appl Genet 122:1385–1397 Soto-Cerda BJ, Urbina Saavedra H, Navarro Navarro C, Mora Ortega P (2011) Characterization of novel genic SSR markers in Linum usitatissimum (L.) and their transferability across eleven Linum species. Elec J Biotechnol 14 Soto-Cerda BJ, Maureira-Butler I, Munoz G, Rupayan A, Cloutier S (2012) SSR-based population structure, molecular diversity and linkage disequilibrium analysis of a collection of flax (Linum usitatissimum L.) varying for mucilage seed-coat content. Mol Breed 30:875–888 Soto-Cerda BJ, Diederichsen A, Ragupathy R, Cloutier S (2013a) Genetic characterization of a core collection of flax (Linum usitatissimum L.) suitable for association mapping studies and evidence of divergent selection between fiber and linseed types. BMC Plant Biol 13:78 Soto-Cerda BJ, Duguid S, Booker H, Rowland G, Diederichsen A et al (2013b) Genomic regions underlying agronomic traits in linseed (Linum usitatissimum L.) as revealed by association mapping. J Integrat Plant Biol 56:75–87 Soto-Cerda BJ, Duguid S, Booker H, Rowland G, Diederichsen A et al (2014) Association mapping of seed quality traits using the Canadian flax (Linum usitatissimum L.) core collection. Theor Appl Genet 127:881–896 Soto-Cerda BJ, Aravena G, Cloutier S (2021) Genetic dissection of flowering time in flax (Linum usitatissimum L.) through single- and multi-locus genome-wide association studies. Mol Genet Genomics 296:877– 891 Spielmeyer W, Green AG, Bittisnich D, Mendham N, Lagudah ES (1998) Identification of quantitative trait loci contributing to Fusarium wilt resistance on an

293 AFLP linkage map of flax (Linum usitatissimum). Theor Appl Genet 97:633–641 Stephenson P, Baker D, Girin T, Perez A, Amoah S et al (2010) A rich TILLING resource for studying gene function in Brassica rapa. BMC Plant Biol 10:62 Tautz D (1989) Hypervariability of simple sequences as a general source for polymorphic DNA markers. Nucleic Acids Res 17:6463–6471 Tello-Ruiz MK, Jaiswal P, Ware D (2022) Gramene: a resource for comparative analysis of plants genomes and pathways. Methods Mol Biol 2443:101–131 Temnykh S, DeClerck G, Lukashova A, Lipovich L, Cartinhour S et al (2001) Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res 11:1441–1452 Till BJ, Cooper J, Tai TH, Colowit P, Greene EA et al (2007) Discovery of chemically induced mutations in rice by TILLING. BMC Plant Biol 7:19 Venglat P, Xiang D, Qiu S, Stone SL, Tibiche C et al (2011) Gene expression analysis of flax seed development. BMC Plant Biol 11:74 Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T et al (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res 23:4407–4414 Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H et al (2017) GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33:2202–2204 Wang DG, Fan JB, Siao CJ, Berno A, Young P et al (1998) Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science 280:1077–1082 Wang H, Smith KP, Combs E, Blake T, Horsley RD et al (2012a) Effect of population size and unbalanced data sets on QTL detection using genome-wide association mapping in barley breeding germplasm. Theor Appl Genet 124:111–124 Wang Z, Hobson N, Galindo L, Zhu S, Shi D et al (2012b) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72:461–473 Wang J, Chen K, Ren Q, Zhang Y, Liu J et al (2021) Systematic comparison of the performances of de novo genome assemblers for Oxford Nanopore technology reads from piroplasm. Front Cell Infect Microbiol 11:696669 Wilkinson PA, Winfield MO, Barker GL, Allen AM, Burridge A et al (2012) CerealsDB 2.0: an integrated resource for plant breeders and scientists. BMC Bioinform 13:219 Wilkinson PA, Allen AM, Tyrrell S, Wingen LU, Bian X et al (2020) CerealsDB-new tools for the analysis of the wheat genome: update 2020. Database (Oxford) 2020 Williams JG, Kubelik AR, Livak KJ, Rafalski JA, Tingey SV (1990) DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Res 18:6531–6535

294 Winfield M, Wilkinson P, Burridge A, Allen A, Coghill J et al (2022) CerealsDB: a whistle-stop tour of an open access SNP resource. Methods Mol Biol 2443:133–146 Wu JL, Wu C, Lei C, Baraoidan M, Bordeos A et al (2005) Chemical- and irradiation-induced mutants of indica rice IR64 for forward and reverse genetics. Plant Mol Biol 59:85–97 Wu J, Zhao Q, Wu G, Zhang S, Jiang T (2016) Development of novel SSR markers for flax (Linum usitatissimum L.) using reduced-representation genome sequencing. Front Plant Sci 7:2018 Wu J, Zhao Q, Zhang L, Li S, Ma Y, et al (2018) QTL mapping of fiber-related traits based on a high-density genetic map in flax (Linum usitatissimum L.). Front Plant Sci 9:885 Wu J, Zhao Q, Wu G, Yuan H, Ma Y et al (2019) Comprehensive analysis of differentially expressed unigenes under NaCl stress in flax (Linum usitatissimum L.) using RNA-Seq. Int J Mol Sci 20:369 Xie D, Dai Z, Yang Z, Sun J, Zhao D, et al (2018) Genome-wide association study identifying candidate genes influencing important agronomic traits of flax (Linum usitatissimum L.) using SLAF-seq. Front Plant Sci 8:2232 Xin Z, Wang ML, Barkley NA, Burow G, Franks C et al (2008) Applying genotyping (TILLING) and phenotyping analyses to elucidate gene function in a chemically induced sorghum mutant population. BMC Plant Biol 8:103 Yang LA, Chang YJ, Chen SH, Lin CY, Ho JM (2019) SQUAT: a sequencing quality assessment tool for data quality assessments of genome assemblies. BMC Genomics 19:238 Yi L, Gao F, Siqin B, Zhou Y, Li Q et al (2017) Construction of an SNP-based high-density linkage map for flax (Linum usitatissimum L.) using specific length amplified fragment sequencing (SLAF-seq) technology. PLoS ONE 12:e0189785 You FM, Cloutier S (2020) Mapping quantitative trait loci onto chromosome-scale pseudomolecules in flax. Methods Protoc 3:28 You FM, Duguid SD, Lam I, Cloutier S, Rashid KY et al (2016) Pedigrees and genetic base of the flax varieties registered in Canada. Can J Plant Sci 96:837–852 You FM, Jia G, Xiao J, Duguid SD, Rashid KY et al (2017) Genetic variability of 27 traits in a core collection of flax (Linum usitatissimum L.). Front Plant Sci 8:1636

P. Li et al. You FM, Rashid KY, Yao Z, Cloutier S, Chen W et al (2018a) High-quality genome sequences of cultivated (Linum usitatissimum) and wild (L. bienne) flax. In: Plant and animal genome conference XXVI, San Diego, USA, p P0911 You FM, Xiao J, Li P, Yao Z, Gao J et al (2018b) Chromosome-scale pseudomolecules refined by optical, physical, and genetic maps in flax. Plant J 95:371– 384 You FM, Xiao J, Li P, Yao Z, Jia G et al (2018c) Genome-wide association study and selection signatures detect genomic regions associated with seed yield and oil quality in flax. Int J Mol Sci 19:2303 You FM, Rashid KY, Zheng C, Khan N, Li P et al (2022) Insights into the genetic architecture and genomic prediction of powdery mildew resistance in flax (Linum usitatissimum L.). Int J Mol Sci 23:4960 Yuan Y, Bayer PE, Scheben A, Chan CK, Edwards D (2017) BioNanoAnalyst: a visualisation tool to assess genome assembly quality using BioNano data. BMC Bioinformatics 18:323 Zhang L, Wang Y, Sun M, Wang J, Kawabata S et al (2014) BrMYB4, a suppressor of genes for phenylpropanoid and anthocyanin biosynthesis, is downregulated by UV-B but not by pigment-inducing sunlight in turnip cv. Tsuda. Plant Cell Physiol 55:2092–2101 Zhang J, Li C, Zhou Q, Zhang G (2015) Improving the ostrich genome assembly using optical mapping data. Gigascience 4:24 Zhang J, Long Y, Wang L, Dang Z, Zhang T et al. (2018) Consensus genetic linkage map construction and QTL mapping for plant height-related traits in linseed flax (Linum usitatissimum L.). BMC Plant Biol 18:160 Zhang J, Qi Y, Wang L, Wang L, Yan X et al (2020) Genomic comparison and population diversity analysis provide onsights into the domestication and improvement of flax. iScience 23:100967 Zietkiewicz E, Rafalski A, Labuda D (1994) Genome fingerprinting by simple sequence repeat (SSR)anchored polymerase chain reaction amplification. Genomics 20:176–183 Zimin AV, Marcais G, Puiu D, Roberts M, Salzberg SL et al (2013) The MaSuRCA genome assembler. Bioinformatics 29:2669–2677