Handbook of Chemical Biology of Nucleic Acids 9789811997754

This handbook is the first to comprehensively cover nucleic acids from fundamentals to recent advances and applications.

450 15 56MB

English Pages 2846 [2847] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Cover
Half Title
Handbook of Chemical Biology of Nucleic Acids
Copyright
Preface
Contents
About the Editor
Section Editors
Contributors
Part I. Physical Chemistry of Nucleic Acids
1. High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids and Their Interactions
Introduction
Why High-Pressure Studies on Biomolecular Systems?
Materials and Methods
Förster Resonance Energy Transfer (Lakowicz 2006)
Confocal Microscopy Setup (Patra et al. 2018, 2019)
Time-Correlated Single Photon Counting (Wahl et al. 2013)
Pulsed Interleaved Excitation (PIE) (Rüttinger et al. 2006)
Data Analysis
Pressure Setup (Patra et al. 2018)
Measuring Kinetics with Immobilized Probes
Results and Discussion
Pressure Effects on Nucleic Acid Structures
Effect of High Hydrostatic Pressure on the Conformational Dynamics of DNA Hairpins
Effect of High Pressure on the Conformational Dynamics of G-Quadruplexes
Effect of High Pressure on the Conformational Dynamics of I-Motifs
Pressure Effects on the Interaction of Proteins with Non-canonical DNA
Conclusion
References
2. Stability Prediction of Canonical and Noncanonical Structures of Nucleic Acids
Introduction
Basics of Stability Prediction of Canonical Structures of Nucleic Acids
Structure and Thermodynamics of the Canonical Structure of Nucleic Acids
Melting Behavior of Nucleic Acid Structures
Measurement of Thermodynamic Stability and Calculation of Thermodynamic Parameters
Nearest-Neighbor (NN) Parameters: Prediction of Thermodynamic Stability and Its Calculation Method
Application of Prediction to Nonmatched Base Pair and Secondary Structure Based on NN Rules
Stability Prediction of Noncanonical Structures
Applicability of Stability Prediction to Noncanonical Structures
Hairpin Loop
Triplex
G-quadruplex
i-motif
Expansion and Application of Stability Prediction
Issues in Application of Stability Prediction Under Cellular Conditions
Stability of DNA Duplex Structure in Different Cation Concentrations
Stability of DNA Duplex Structure in a Molecular Crowding Environment
Extension of Stability Prediction to the DNA Duplex Structure in Various Solution Environments
Prediction of the Stability of the DNA Duplex Under Intracellular Conditions by Measuring the Intracellular Environment
Conclusion
References
3. The Effect of Pressure on the Conformational Stability of DNA
Introduction
Structural Considerations
Thermodynamic Considerations
Pressure Effects on Canonical Duplex DNA
Changes in Volume, ΔV
Changes in Expansibility, ΔE, and Compressibility, ΔKS
Pressure-Temperature Stability Phase Diagram of Duplex DNA
Effects of Cations, Cosolvents, and Sequence and Length of Oligomeric DNA on Transition Volume, ΔV
Pressure Effects on Noncanonical DNA Structures
Hairpins
Z-DNA
Three-Stranded DNA
G-Quadruplexes
i-Motif Structures
Pressure and the Kinetics of Helix Formation
Conclusions
References
4. Quadruplexes Are EverywhereOn the Other Strand Too: The i-Motif
i-Motif Forming Strands and Characters
i-Motif Characterization Methods
Factors Affecting i-Motif Stability
i-Motif Applications
Ligands/Compounds
Physiological Roles
Conclusion
References
5. i-Motif Nucleic Acids
Introduction
i-Motif Structure
Hemi-Protonated Cytosines
Intramolecular i-Motif Formation
Topologies
Grooves
Loops
Stabilizing Cations
i-Motifs and pH
Nanotechnology
i-Motifs at Neutral pH
i-Motifs in Biology
i-Motifs in the Telomeres
i-Motifs in Gene Promoter Regions
i-Motifs in DNA Replication
i-Motifs in Cells
i-Motifs and G-Quadruplexes
i-Motif Ligands and Probes
TMPyP4 and Macrocycles
Carboxyl-Modified Nanotubes and Quantum Dots
Small Molecules
Fluorescent Probes
Synergistic Ligands for i-Motifs and G-Quadruplexes
Conclusion
References
Part II. Structural Chemistry of Nucleic Acids
6. NMR Study on Nucleic Acids
Introduction
Elements of the Structural Buildup of Nucleic Acids and Their Conformational Landscape
Assessment of the Folding Topology by NMR
Assessment of Multimeric State by Translational Diffusion Coefficients
Site-Specific Low-Isotopic Enrichment
Nucleobase Substitutions with Nucleobase Analogs
Natural Abundance Heteronuclear Experiments
Resonance Assignment Through Sequential and Interstrand Interactions
Determination of 3D Structure
Labeling with Stable Heteronuclear 15N and 13C Isotopes
NMR Structural Studies in Combination with Complementary Methods
Challenges in Structural Studies of Biologically Relevant DNA and RNA
Dynamic Processes in RNA and Corresponding NMR Methods
Conclusion
References
7. Z-DNA
Introduction
Chemical and Structural Properties of Z-DNA
Left-Handed Z-DNA
Crystal Structures of Z-DNA in Complex with Chemical Inducers
Crystal Structures of Z-DNA in Complex with Z-DNA Binding Proteins
Crystal Structures of BZ Junctions
NMR Studies of Z-DNA Transition Induced by ZBP
NMR Monitoring on Z-DNA Formation of d(CGCGCG)2 by ZBPs
NMR Monitoring on Intermolecular Interaction of ZBPs with Z-DNA
B-to-Z Transition Mechanism of DNA Induced by ZBPs
NMR Dynamics Study on B-to-Z Transition of DNA Induced by ZBPs
A-to-Z Transition Mechanism of RNA Induced by ZBPs
BZ Junction Formation of DNA Induced by ZBPs
Chemical Biology Strategies Used to Elucidate the Biological Significance of Z-DNA and Z-DNA Binding Proteins
Strategies Used to Determine the Structure and Stability of Z-DNA
Strategies for Developing a Z-DNA Sensor
Strategies Applied to Ascertain the Z-DNA Function
Strategies for Developing Therapeutics Targeting Z-DNA
Strategies Applied for Nanotechnology Applications Using Z-DNA
Disease Implications
Z-DNA Is Immunogenic
Z-DNA Forming Sequence (ZFS) Controls the Expression of the Disease-Related Genes
Z-DNA Forming Sequence (ZFS) Is a Hotspot for the Large-Scale Deletion of DNA
Disease Implications of Z-DNA Binding Proteins
Conclusion and Perspective
References
8. Structures of G-Quadruplexes and Their Drug Interactions
Introduction
DNA G-Quadruplexes
Structural Characteristics of DNA G-Quadruplexes
Intramolecular DNA G-Quadruplexes
Human Telomeric DNA G-Quadruplexes
Human Telomeric G-Quadruplex Structures
Human Promoter DNA G-Quadruplexes
Parallel DNA G-Quadruplexes in Gene Promoters
Broken-Strand DNA G-Quadruplexes in Gene Promoters
Promoter DNA G-Quadruplexes with Long Loops and Hairpin Motifs
Left-Handed DNA G-Quadruplexes
Four-Tetrad DNA G-Quadruplexes
Structural Basis of Small Molecule Interactions of DNA G-Quadruplexes
G-Quadruplex Interactions with End-Stacking Compounds
Small Molecule Recognition of G-Quadruplexes with Additional Loop and Capping Interactions
Small Molecule Recognition of Parallel G-Quadruplexes
Small Molecule Interactions with Vacancy G-Quadruplex Bound by Metabolites
Small Molecule Interactions with G-Quadruplex-Duplex Junction
G-Quadruplex Intercalation with Small Molecule
Electrostatic Interactions of G-Quadruplex-Interactive Small Molecules
Conclusion
References
9. In Cell 19F NMR for G-Quadruplex
Introduction
Results and Discussion
In Cell 19F NMR for DNA G-quadruplex
In Cell 19F NMR for RNA G-quadruplex
In-Cell 19F NMR for Hybrid DNA/RNA G-quadruplex
Conclusion
References
10. Structures and Catalytic Activities of Complexes Between Heme and DNA
Introduction
G-quadruplex DNA and RNA
Molecular Recognition Between Heme and G-Quadruplex DNA
Spectroscopic Properties of a Heme(Fe3+)-DNA Complex
NMR Characterization of Heme-DNA Complexes
CO Adducts of Heme(Fe2+)-DNA Complexes
Resonance Raman Studies of CO Adducts of Heme(Fe2+)-DNA Complexes
pH-Dependence of a Heme(Fe3+)-DNA Complexes
Imidazole Adducts of Heme(Fe3+)-DNA Complexes
Peroxidase Activity of Heme(Fe3+)-DNA Complexes
Peroxidation Cycle of Heme(Fe3+)-DNA Complexes
Conclusion
References
11. Studying Nucleic Acid-Ligand Binding by X-Ray Crystallography
Introduction
The Crystallization of DNA-Ligand Complexes
The X-Ray Diffraction Experiment
Phasing of Data
Model Building and Refinement
Crystallographic Software
Some Key Features of Ligand-DNA Crystal Structures
DNA Duplexes, Junctions, and Mismatches
Classical Intercalation
Bisintercalation
Intercalative Binding to DNA Junctions
Intercalation by Metal Complexes
G-Quadruplexes
Conclusion
References
12. Predicting the 3D Structure of RNA from Sequence
Introduction
RNA 3D Structure
Experimental Determination of RNA 3D Structure
Predicting RNA 3D Structure
Protein 3D Structure Prediction
Challenges in RNA 3D Structure Prediction
Overview of the Chapter
RNA Basepairs, Loops, and Secondary Structure
Watson-Crick Basepairs and Secondary Structure
RNA Secondary Structure Prediction
Non-Watson-Crick Basepairs and Base-Backbone Interactions
RNA Hairpin, Internal, and Junction Loops
Long-Range Interactions in RNA
RNA Loop Motif Libraries and Prediction Methods
Fragment Assembly and Simulation-Based Methods for Predicting RNA 3D Structure
Fragment Assembly Methods
Computational Simulation Methods
Scoring RNA 3D Structure Predictions Using Machine Learning
RNA-Puzzles Evaluation of Blind RNA 3D Structure Prediction
Protein 3D Structure Prediction and CASP
AlphaFold in CASP13 and CASP14
Comparison of Protein and RNA 3D Structure Prediction
Machine Learning Methods for RNA 3D Structure Prediction
Prediction of RNA Contacts Using Correlation Analysis
RNA Contact Prediction Techniques Based on Machine Learning
RNA Distance Prediction and 3D Structure Prediction Based on Machine Learning
Learning a Foundation Model from RNA Sequence Databases
Conclusions
References
Part III. Organic Chemistry of Nucleic Acids
13. Hexitol Nucleic Acid (HNA): From Chemical Design to Functional Genetic Polymer
Introduction
Chemical and Enzymatic Synthesis of HNA and Related Nucleic Acids
Biophysical and Structural Properties of HNA and Related Six-Membered Nucleic Acids
In Vitro and In Vivo Synthetic Biology Applications of HNA and Akin Oligonucleotides
Biomedical Applications of HNA and Similarly Modified Oligonucleotides
Conclusion
References
14. The Effects of FANA Modifications on Non-canonical Nucleic Acid Structures
Introduction
FANA: A Historical Perspective
Synthesis of FANA: Beyond AraF-N Nucleosides
FANA/RNA Duplexes: Investigating Their Superior Stability
FANA in Triple Helical Structures
FANA in G-Quadruplexes
Structural Effects of Modifying G-Quadruplexes with AraF-G
Effects of Substituting Anti-dG and Syn-dG Residues with AraF-G
Comparing the Potency of AraF-G and RiboF-G in Stabilizing One of Two Forms of a (3 + 1) Hybrid G-Quadruplex
Comparing the Effect of Systematic Single AraF-G, RiboF-G, or LNA-G Substitutions on G-Quadruplex Topology
Structural Characterization of G-Quadruplex Stabilization by AraF-G
Studying the Compatibility of Thrombin-Binding Aptamer with AraF-N Modifications
Improving the Nuclease Resistance of Thrombin-Binding Aptamer and its Binding Affinity to Thrombin
Using Microarray Technology to Find Optimal Positions for AraF-N Stabilization of Thrombin-Binding Aptamer (TBA)
Understanding the Dramatic Effect of a Single AraF-T Substitution at Position T3 of TBA1 on the Enhancement of Binding Affinit...
AraF-G-Modified Parallel G-Quadruplexes in Telomere Biology
AraF-G-Modified Parallel G-Quadruplexes Are a Substrate of Telomerase
Probing the Mechanism of Telomerase Extension of Parallel Telomeric G-Quadruplexes
FANA in i-Motif Structures
Effects of Modifying i-Motifs with AraF-C
AraF-C Stabilizes i-Motifs at Neutral pH
Structural Insights on i-Motif Stabilization by AraF-C
Comparing the Effects of AraF-C and 5-Methyl-araF-C on i-Motif Stability
Simultaneously Stabilizing Complementary i-Motifs and G-quadruplexes Using AraF-C and AraF-G, Respectively
Conclusion and Outlook
References
15. Isomorphic Fluorescent Nucleoside Analogs
Introduction
Enzymatic Reactions of Nucleosides and Nucleobases
Enzymatic Reactions of Nucleobase-Based Cofactors
RNA and DNA Oligonucleotide Constructs
RNA Folding and Ribozyme Activity
RNA/DNA-Protein Interactions
DNA Constructs and Conformations
Conclusions
References
16. Bridged Nucleic Acids for Therapeutic Oligonucleotides
Introduction
Concept and Chemistry of Bridged Nucleic Acids
Synthesis and Biophysical Properties of Bridge Nucleic Acids
Five-Membered Bridged Nucleic Acids
Development of Parent 2′-O,4′-C-Methylene-Bridged Nucleic Acid
Phosphate Linkage Modifications in Bridged Nucleic Acids
Base Modifications in Bridge Nucleic Acids
Sugar Modification in Bridge Nucleic Acids
SeLNA and SeOLNA
6′-Mercapto-thioBNA
Amide-Bridged Nucleic Acids (AmNAs)
Guanidine-Bridged Nucleic Acid (GuNA)
2′-O, 4′-C-Spirocyclopropylene-Bridged Nucleic Acid (scpBNA)
2′,4′-BNA/LNA-2-Thiothymine: scpBNA-S2T, scpBNA-Se2T, and ThioAmNA-S2T
Methyleneoxy-Bridged 2′-Deoxyribonucleic Acid (MoDNA)
Triazole- and Tetrazole-Bridged Nucleic Acids
2′,4′-BNA/LNA Derivative with Expanded Ring Size
Six-Membered Bridged Nucleic Acid
Ethylene-Bridged Nucleic Acid (ENA)
2′,4′-BNANC
Hydroxamate-Bridged Nucleic Acid (HxNA)
Six-Membered AmNA (6-AmNA)
Sulfonamide-Bridged Nucleic Acids (SuNAs)
2′-C,4′-C-Ethyleneoxy-Bridged 2′-Deoxyribonucleic Acids (Methylene-EoDNAs)
Seven-Membered Bridged Nucleic Acids
2′-O,4′-C-Methyleneoxymethylene BNA (2′,4′-BNACOC)
Benzylidene Acetal-Type Bridged Nucleic Acids (BA-BNAs)
Urea-Type BNA
2′-O, 4′-C-Ethyleneoxy Bridged Nucleic Acid (EoNA)
Conclusion
References
17. Mesyl Phosphoramidate Oligonucleotides: A New Promising Type of Antisense Agents
Introduction
Structure and Synthesis
Duplex Formation with Complementary DNA and RNA
Circular Dichroism (CD) Spectra
Thermal Stability
Formation of G-Quadruplexes
Enzymatic Stability
RNase H Recruitment
Cellular Uptake
RNase H-Dependent Antisense Application
Splice-Switching Application
Immunomodulatory Activity
Toxicity
Future Prospects
Conclusion
References
18. Chemistry of Cyclic Dinucleotides and Analogs
Introduction
Basic Strategies for Chemical Synthesis of CDNs and Their Analogs
Chemical Synthesis of CDNs Using a Phosphotriester-Phosphotriester Approach
Chemical Synthesis of CDNs Using a H-Phosphonate-H-Phosphonate Approach
Chemical Synthesis of CDNs Using a Phosphoramidite-H-Phosphonate Approach
Chemical Synthesis of CDNs Using a Phosphoramidite-Phosphotriester Approach
Chemical Synthesis of CDNs Using a Phosphoramidite-Phosphoramidite Approach
Synthesis of Synthetic CDN Analogs
CDN Analogs with Modified Phosphodiester Linkages
Phosphorothioates
Others
Sugar-Modified CDN Analogs
Nucleobase-Modified CDN Analogs
Conclusion
References
19. Labeling and Detection of Modified Nucleic Acids
Introduction
Labeling and Detection of DNA Modifications
5mC: The Predominant DNA Modification
5hmC: The First Step Intermediate in the Active Demethylation Pathway
5fC: The Second Step Intermediate in the Active Demethylation Pathway
5caC: The Final Oxidized Derivative of 5mC
N6-Methyldeoxyadenosine (6mA): Predominantly Present in Prokaryotes and a Limited Number of Eukaryotes
5-Formyluracil (5fU)
Deoxyuridine (dU)
Labeling and Detection of RNA Modifications
N6-Methyladenosine (m6A): The Most Abundant Internal mRNA Modification
N1-Methyladenosine (m1A)
5-Methylcytosine (m5C)
N4-Acetylcytidine (ac4C)
Inosine(I): From A-to-I RNA Editing
Pseudouridine (Ψ): The Rotation Isomerization of Uridine
N7-Methylguanosine (m7G): A Well-Known mRNA Cap Modification
Conclusion and Outlook
References
20. Cross-Linking Duplex of Nucleic Acids with Modified Oligonucleotides
Introduction
Cross-Linked Double-Stranded DNA with ONs Containing Nonnatural Nucleic Acids in Both Strands
Cross-Linked Duplex for Biological Tools
Cross-Linked Duplex by Photoirradiation with ODNs Containing Nonnatural Nucleic Acids in Both Strands
Cross-Linked Duplex by Click Chemistry
ONs with Cross-Linking Reactivity Targeting Natural-Type DNA and RNA
CFOs Activated by Photoirradiation
Cross-Linking Via [2 + 2] Reaction by Photoirradiation
Photo-Cross-Linking Using Thionucleobases
Other Photo-Cross-Linking
CFOs Activated by Chemical Reactions
Cross-Linking Reactions Without External Stimuli
Vinyl Purine Derivatives as Cross-Linking Agents
Vinyl Pyrimidine Derivatives as Cross-Linking Agents
Conclusion
References
21. Enzymatic Synthesis of Base-Modified Nucleic Acids
Introduction
General Concepts on Polymerase-Mediated Synthesis of Modified Nucleic Acids
Natural Modifications of Nucleobases: Synthesis and Polymerase-Mediated Incorporation
Epigenetic Base-Modification Patterns on DNA
Epigenetic Base-Modifications on RNA
Pseudouridine (Ψ)
Methylated Adenosine Analogs m1A (8) and m6A (7)
5-methylcytosine m5C (9)
N4-acetylcytidine ac4C (10)
7-Methylguanosine m7G (11)
Base Modifications for Aptamer and Catalytic Nucleic Acid Generation Via SELEX
Aptamer Selection with Base-Modified Nucleotides
Expansion of the Genetic Alphabet and Aptamers
Selection of Base-Modified Catalytic Nucleic Acids
Overview of Other Applications
Generation of Chemically Modified mRNA Vaccines
Controlled Enzymatic Synthesis
Conclusion
References
Further Readings
22. Charge Transfer in Natural and Artificial Nucleic Acids
Introduction
What We Have Learned from Charge Transfer Studies Through DNA
Charge Transfer in Natural and Artificial Nucleic Acids
Charge Transfer in RNA
Charge Transfer in Peptide Nucleic Acids (PNA)
Charge Transfer in Locked Nucleic Acids (LNA)
Charge Transfer Inducing Moieties: Are the Donor and Acceptor Systems the Clandestine Key Players?
Conclusion
References
23. Nucleic Acid Aptamers: From Basic Research to Clinical Applications
Introduction
Aptamer Discovery Technologies
Standard SELEX
Magnetic Bead SELEX
Cell-SELEX
In Vivo SELEX
Challenges of the Clinical Application of Aptamers
Strategies to Overcome the Challenges Associated with Clinical Applications of Aptamers
Improvement of Aptamers´ Nuclease Susceptibility
Ribose Modifications and Alternative Sugar Entities
Substitution of Phosphodiester Linkage
Spiegelmers
Reduction of Renal Clearance of Aptamers
Nucleobase Modifications
Aptamers Bearing an Expanded Genetic Alphabet
Aptamers as Drug Delivery Vehicles
Conclusion
References
Part IV. Ligand Chemistry of Nucleic Acids
24. Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga
Introduction
Bisquinolinium Pyridodicarboxamide (PDC) and PhenDC3: Prototypic G4 Ligands
Genesis and Design of Bisquinolinium Ligands: The Preorganization Concept
In Vitro Binding: Affinity, Selectivity, and Ligand-Induced Conformation Changes
Binding to Alternative Quadruplexes (VK2)
Biological Effects of Bisquinolinium Ligands
Telomeric Effects
Genetic Instability and Inhibition of Helicases
Miscellaneous DNA- or RNA-Related Effects
Functionalized Bisquinolinium Ligands for Detection and Manipulation of G4 Structures
Biotinylated PDC and PhenDC3 Derivatives
Fluorescent Derivatives for In Vitro Detection and Cellular Imaging of G-Quadruplexes
G4 Cross-Linking and Alkylating Agents
Immunotagged G4 Ligands
Other Bisquinolinium Derivatives as G4 Ligands
Dimeric Derivatives
Variations of the PDC Core
Variations in Linker Groups and Quinolinium Residues
Conclusion
References
25. Compound Shape and Substituent Effects in DNA Minor Groove Interactions
Introduction
AT Sequence-Specific MG Compounds That Can Also Bind at GC Sequences by Intercalation
Diversity in the Recognition of AT MG Sequences
Heterocyclic Diamidines That Recognize Some AT Sequences as Dimers
Curvature Determination for MG Binders
Out-of-Shape DNA MG Binders: Inclusion of Interfacial Water for Induced Fit Interactions of Heterocyclic Dications with DNA
Development of Heterocyclic Amidine MG Binders with GC Recognition
Pyridine Compound Design
N-Alkyl-Benzimidazole-Thiophene Compound Design
Azabenzimidazole Compound Design
MG Binders with Additional GC BP-Binding Capability: Compounds with the Same GC Recognizing Modules
MG Binders with Additional GC BP-Binding Capability: Compounds with Different GC Recognizing Modules
MG Binders with Additional GC BP-Binding Capability: Compounds That Recognize the GGAA Sequence That Is Conserved in the PU.1 ...
Conclusion
References
26. Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs
Introduction
G4 and G4 Ligands
TMPyP4: A Macrocyclic G4 Ligand
Telomestatin: A Natural Macrocyclic G4 Ligand
Macrocyclic Polyoxazoles
HXDVs as Macrocyclic Polyoxazoles (Rice Group)
OTDs as Macrocyclic Polyoxazoles (Nagasawa Group)
7OTDs as G4 Ligands: Chemical-Biology Studies
6OTDs as G4 Ligands
Control of G4 Topologies by 6OTDs
G4-Forming Sequence-Selective 6OTDs
Detection of G4 by Fluorescent 6OTDs In Vitro and In Vivo
Anticancer Activity of 6OTDs
6OTD Multimers as G4 Ligands
6OTD Dimer
6OTD Tetramer
6OTD Dendrimer
Control of G4-Protein Interaction by OTD
G4-3R02 Protein with G4
Rif1 Protein with G4
hnRNPA1 Protein with RNA G4
BLM Helicase with G4
S1 Nuclease with Telomeric G4
Micelle-Type Macrocyclic 4OTDs as G4/i-Motif Ligands
Conclusion
References
27. Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands
Introduction
Polymorphism of G-Quadruplex DNA
G4 Binders
Properties of NDI and Its Binding to Double-Stranded DNA
Binding of NDI to G4
Interaction of cNDI with G4
Interaction of cNDI with G4 Under Molecular Crowding Conditions
Conversion of G4 Structure by NDI
Telomerase Inhibitory Ability of cNDI and Inhibition of Cell Growth
Ferrocenyl cNDI
cNDI Dimer
Conclusions
References
28. Imaging Study of Small Molecules to G-Quadruplexes in Cells
Introduction
Development of G4 Fluorescent Probes to Study G4s In Vivo
BMVC
o-BMVC
BMVC-nC-P and BMVC-8C3O-P
o-BMVC-nC-P
o-2B-P
Fluorescence Images for Identifying the Existence of Endogenous G4s In Vivo
Visualization of Telomeric G4s in Metaphase Chromosomes by BMVC
Detection of G4s in Live Cancer Cells by o-BMVC
Detection of G4 Foci by BG4 Antibody in Fixed Cells
o-BMVC Foci are G4 Foci in Fixed Cells
Telomeric G4s Detected in Fixed Cells by Antisense DNA
Detection of Mitochondrial G4s in Live Cancer Cells by o-BMVC-12C-P
Binding of Small Molecules to G4s in Fixed Cells
Imaging Study of G4 Ligands Binding to Exogenous G4s in Live Cells
Cellular Response to Exogenous G4s in Live Cells
G4 Dynamics of Exogenous G-Rich Oligonucleotides in Live Cells
o-BMVC Foci as a Biosensor for Clinical Cancer Diagnosis
DNA Damage May Facilitate G4 Formation
o-BMVC Test for Clinical Cancer Diagnosis
Other Fluorescent Probes for the Imaging Study of G4s in Cells
Carbazole Derivatives and BMVC Analogues
NBTE for FLIM Image
DAOTA-M2 for FLIM Image
Conjugates of G4 Fluorescent Probes
Conclusion
References
29. DNA/Metal Cluster-Based Nano-lantern
Introduction
DNA Composition, Structure, and Properties
Double-Helix DNA
Multistranded DNA Structures
Interaction Between DNA and Small Drug Molecules
Covalent Binding (Taatjes et al. 1999)
Noncovalent Binding (Rehman et al. 2015)
Intercalation (Brana et al. 2001)
Groove Binding (Baraldi et al. 2004)
Electrostatic Interaction (Rehman et al. 2015)
Other Interaction Modes
DNA Metallization
Introduction of DNA Metallization
Principle of DNA Metallization
DNA Metallization Methods
Chemical Reduction
Photoreduction
Electrochemical Deposition
DNA Metallization with a Localized Reducing Group
Construction of DNA Nano-Lantern
Applications of DNA Nanostructures in Therapeutics
The Design for DNA Nano-Lantern
The Main Achievement of the Nano-Lantern
Conclusion
References
30. Interaction of Poly(Ethylene Glycol)-b-Poly-L-Lysine Copolymers with DNA Structures: A Thermodynamic Investigation
Introduction
Materials and Methods
Result and Discussion
Conclusion
References
31. Chemical Tools to Target Noncoding RNAs
Introduction
RNA As a Therapeutic Target
Targeting Bacterial RNAs
Targeting Viral RNAs
Targeting Eukaryotic RNAs
RNA Nucleotides Repeats
MicroRNAs
Targeting of Long Noncoding RNAs
Current Trends for the Development of Innovative Chemical Tools for RNA Targeting
RIBOTAC Strategy
Targeting Pre-mRNA Splicing
Conclusion
References
32. Targeting DNA Junctions with Small Molecules for Therapeutic Applications in Oncology
Introduction
Structural Studies
Biological and Pathological Functions
DNA Junction-Targeting Anticancer Agents
Targeting TWJs
Targeting FWJs
Conclusion
References
Part V. Nucleic Acids and Gene Expression
33. DNA Damage and Repair in G-Quadruplexes Impact Gene Expression
Introduction
Reactive Oxygen Species and Endogenous DNA Damage
Oxidation of Guanine in Duplex Versus Quadruplex DNA
Initiation of Base Excision Repair After Oxidative Stress
Cell-Based Assays of Gene Expression
AP Endonuclease-1 Binding to G-Quadruplexes
Conclusion and Outlook
References
34. DNA Structural Elements as Potential Targets for Regulation of Gene Expression
Introduction
Does Gene Expression Depend on DNA Structure?
Functions of Non-canonical Structures at Gene Promoters
Targeting G-quadruplexes for Medical Purposes
The Lesson from the Studied Ligands
Alternative G4 Arrangements as More Selective Targets
Physiological Relevance of Alternative G4 Repeats
Epigenetics
Conclusions and Perspectives
References
35. Effects of Molecular Crowding on Structures and Functions of Nucleic Acids
Introduction
Physicochemical and Molecular Factors Influencing Nucleic Acid Structures and Their Stabilities
Structural Factors
Hydrogen Bonding
Stacking Interaction
Conformational Entropy
Environmental Factors
Hydration and Dehydration
Cation Binding
Specific Interaction of Biomolecular Ligands
Model Experimental Systems In Vitro to Investigate the Effects of Molecular Crowding on Biomolecules
Characteristics of Co-solutes to Mimic the Intracellular Molecular Environment
Change in Solution Properties by the Addition of Co-solutes
Effects of Molecular Crowding Environments on Nucleic Acids
Effects of Molecular Crowding on Canonical Duplexes
Structure of Large Genomic DNAs Under a Crowding Environment
Stability of Polymer Nucleotide Duplexes Under a Crowding Environment
Stability of Short Oligonucleotide Duplexes in Crowding Environments with Reduced Water Activity
Effects of Molecular Crowding on Noncanonical DNA Structures and Stabilities
Formation of Left-Handed Duplex Under a Crowding Environment
Stabilization of Blanched Junction Under a Crowding Environment
Stabilization of Multistranded Helix Under a Crowding Environment
Triplex Structures Under a Crowding Environment
G-Quadruplex Structures Under a Crowding Environment
i-Motif Structures Under a Crowding Environment
Effects of Molecular Crowding on RNA Structure and Functions
Tertiary Structure Folding Under the Molecular Crowding Environments
Activities of RNA Catalyst Under the Molecular Crowding Environments
Affinities of RNA Aptamers Under the Molecular Crowding Environments
Biological Reactions Influenced by Nucleic Acid Structures and Their Stabilities
Effects of Nucleic Acid Structures on DNA Replication
Effects of Nucleic Acid Structures on RNA Transcription
Effects of Nucleic Acid Structures on Protein Translation
Effects of Nucleic Acid Structures on Concurrent Reactions
Conclusion
References
36. Structure-Guided Optimization of siRNA and Anti-miRNA Properties
Introduction
RNAi and MicroRNA Pathways
Structures of Argonaute Protein Domains and Argonaute-RNA Complexes
siRNA Modifications Whose Design Was Inspired by Ago2-RNA Complexes
siRNA and Anti-miRNA Modifications from Computational Screening
Conclusions
References
37. Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome
Introduction
Current Tools to Study the tRNA Epitranscriptome and tRNA Reprogramming
Current Tools to Study Codon-Biased Translation
Conclusion
References
Further Reading
38. Sulfur- and Selenium-Modified Bacterial tRNAs
Introduction
Sulfur- and Selenium-Containing Nucleosides in the Wobble Position of the Bacterial tRNAs
Sulfur-Containing Nucleosides in the tRNA Chain
S-geranyl- and Selenonucleosides
The Modification Pathways of Thio-, S-geranyl, and Selenonucleosides
Biosynthesis of 2-Thiouridines
Biosynthesis of S-Geranyl- and 2-Selenonucleosides
Escherichia coli tRNA 2-Selenouridine Synthase (SelU), the Enzyme Modifying the R5-Substituted 2-Thiouridines in the Anticodon...
Structure of SelU
SelU Is a tRNA-Bound Nucleoprotein
Substrate Specificity of SelU
Readout of 5′-NNA-3′ and 5′-NNG-3′ Synonymous mRNA Codons by Sulfur- and Selenium-Modified tRNA Anticodons
Synonymous Codons Specific for Lys, Glu, and Gln
U-A and U-G Base Pairing Modes
Tautomeric Forms of Modified Uridines and Their Base Pairs with Guanosine
Ionizable Tautomeric Forms of 2-Thio- and 2-Selenouridines
Theoretical Modeling of U-G Base Pairs with mnm5S2Ura and mnm5Se2Ura
Crystal Structures of U*-G Base Pairs in tRNA-mRNA at the Ribosome Context
Conclusions
References
39. Chemical-Assisted Epigenome Sequencing
Background
Genomic Mapping of 5mC
Bisulfite Sequencing
The Chemistry of Bisulfite Treatment
The Mapping of Bisulfite Sequencing
Limitations of Bisulfite Sequencing
Degradation of DNA during Bisulfite Treatment
Undistinguishable between 5mC and 5hmC
Improvements on Bisulfite Sequencing
T-WGBS
RRBS
PBAT
Bisulfite-Free Methylome Sequencing
TET-Assisted Pic-Borane Sequencing (TAPS)
Enzymatic Methyl Sequencing (EM-Seq)
Genomic Mapping of Oxidized 5mCs
Genomic Mapping of 5hmC
Affinity-Based Methods
TET-Assisted Bisulfite Sequencing (TAB-Seq)
Oxidative Bisulfite Sequencing (OxBS-Seq)
Bisulfite-Free Hydroxymethylome Sequencing
Chemical-Assisted C-to-T Conversion of 5hmC Sequencing (hmC-CATCH)
Chemical-Assisted Pyridine Borane Sequencing (CAPS)
APOBEC-Seq
Aba-Seq
Genomic Mapping of 5fC and 5caC
fC-Seal and 5fC Chemical-Assisted Bisulfite Sequencing (fCAB-Seq)
Reduced Bisulfite Sequencing (redBS-Seq)
Methylase-Assisted Bisulfite Sequencing MAB-Seq and caMAB-Seq
Chemical-Enabled 5fC-to-T Sequencing (fC-CET)
Long-Read Sequencing for DNA Modifications
Single-Molecule Sequencing of 5-Hydroxymethylcytosine
Long-Read DNA Methylation and Hydroxymethylation Sequencing with TAPS
Single-Cell Profiling Methods for DNA Modifications
Single-Cell Methylome Sequencing
Single-Cell Profiling Methods for Other Cytosine Modifications
Genome Amplification in Single-Cell Profiling Methods
Prospect of Single-Cell Bisulfite-Free Methods
Biological Applications of Single-Cell Methods
Early Mammalian Development
Clinical Purposes
Conclusions
References
40. Telomerase
Introduction
Telomerase Components
Discovery of Telomerase
Telomerase Reverse Transcriptase (TERT)
Telomerase RNA
Telomerase Processivity
Human Telomerase Structure
Telomerase and Telomere Dynamics in Aging and Disease
Aging and Cancer
Telomerase Activation in Cancer
Telomerase Inhibitors
Potential Caveats of Telomerase Inhibition
BIBR1532
NU-1
Imetelstat
Conclusions and Future Directions
References
41. Telomeres: Structure and Function
Introduction
Telomeres
Telomeric DNA and Telomere Secondary Structures
Telomere Binding Proteins
Role of Telomere Binding Proteins in End Protection
Role of Telomere Binding Proteins in Telomere Length Regulation
Telomere Dynamics in Aging and Disease
Aging and Cancer
Alternative Lengthening of Telomeres (ALT)
Telomere Biology Disorders
Conclusions and Future Directions
References
42. Genetic Alphabet Expansion of Nucleic Acids
Introduction
Development of Replicable and Transcribable UBPs
Preparation of DNAs with UBPs
DNA and RNA Sequencing Involving UBPs
UBP Application to PCR Technology
UBP Application to DNA Aptamer Generation
UBP Application to Transcription Systems
Conclusion
References
43. Unnatural Base Pairs to Expand the Genetic Alphabet and Code
Introduction
Rational Design of Unnatural Nucleotides
An Approach to Developing UBPs
Development of a Replicable UBP
UBP in E. coli: Creation of the First SSO
UBP Optimization Using the SSO
UBP Decoding in an SSO
Applications of the SSO
Conclusion
References
44. OGG1 at the Crossroads Between Repair and Transcriptional Regulation
Introduction
Genome-Wide Distribution of 8-OxoG Is not Random
Role of 8-OxoG and OGG1 in Transcriptional Regulation
Regulation of Transcription Mediated by Induction of 8-OxoG at G-Quadruplex Sequences
Regulation of Transcription Mediated by 8-OxoG Induced by the Enzymatic Activity of LSD1
Role of 8-OxoG and OGG1 in the Regulation of Transcription of Inflammatory Genes
Origin of ROS as a Signaling Molecule for the Induction of 8-oxoG
Finding the 8-OxoG in the Chromatin Context: A Challenging Task for OGG1
DNA Packaging into Nucleosomes: A Barrier for OGG1 and BER Activity
BER in the Highly Compacted Nuclear Environment
Interplay Between BER, NER, and Transcription
BER Is Linked to Transcription via the Mediator Complex and the Cohesin Structure
Structure and Function of the Mediator Complex
Nuclear Condensates Related to Transcription Initiation: Mediator and Super-Enhancers
Conclusion
References
45. Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature
Introduction
Polydiastereomerism of PS-Oligomers
Synthesis of P-Stereodefined Phosphorothioate Oligonucleotides
Stereodefined Phosphorothioate Nucleotides
Interactions of P-Stereodefined PS-Oligomers with DNA, RNA, and Protein Molecules
Formation of the Homoduplexes DNA/DNA and RNA/RNA and Heteroduplexes DNA/RNA
Formation of Higher-Order Structures
Stereodefined PS-Oligomers As Tools in Mechanistic Studies
Interactions with Proteins
Metabolism of PS-Oligomers
Biological Activity of Synthetic Nucleic Acids Containing Phosphorothioate Backbone
Physiological Phosphorothioate Modification of Nucleic Acids
References
Part VI. Analytical Methods and Applications of Nucleic Acids
46. Aptamer Molecular Evolution for Liquid Biopsy
Introduction
Molecular Evolution of Aptamers
Generation of Oligonucleotide Library with Increased Diversity
Library Containing Modified Oligonucleotides
Expanded Libraries with Artificial Oligonucleotides
Library of High-Order Structures
Selection of Aptamer Candidates from Library
Efficient Selection Platforms
Various Target Types
Identification and Characterization of Aptamer Candidates
Effective Identification Techniques
Effective Characterization Methods
Synthesis and Modification
Aptamer-Based Detection of CTCs
CTC Isolation and Enrichment
Aptamer-Based Magnetic Isolation
Aptamer-Based Microfluidic Isolation
Microfluidics-Assisted Magnetic Isolation
Multivalent Aptamer Capture Interface
Release of CTCs
Release by Disrupting the Conformations of Aptamers
Release by Digesting Aptamers with Nucleases
Release by Detaching Aptamers from Capture Substrates
Detection of CTCs
Aptamer-Based Detection of EVs
Isolation-Free Homogeneous Detection
Detection on Solid-Liquid Interface
Electrochemical Detection
Fluorescence Detection
Visual Detection
Conclusion
References
47. Single-Molecule DNA Visualization
Introduction
Optical Mapping Based on Single-Molecule DNA Visualization
DNA Visualization Using DNA-Binding Fluorescent Proteins
Damage Visualization on Single DNA Molecules
DNA Modification Labeling
DNA Recombination
In Vitro Observation of DNA Replication
Observation of DNA Replication in Cells
Conclusion
References
48. Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles
Introduction
Four Classes of NANPs
Drug Loading in NANPs
In Vivo Stability of NANPs for Targeted Delivery
NANPs for Passive Tumor-Targeting
NANPs for Active Tumor-Targeting
Discovery of Tumor-Specific NANPs Using Library Approach
NANPs for Lung-Targeted Delivery
NANPs for Kidney-Targeted Delivery
NANPs for Liver-Targeted Delivery
NANPs for Brain-Targeted Delivery
NANPs for Spleen-Targeted Delivery
Challenges and Outlook
References
Further Readings
49. Nanobiodevice for Nucleic Acid Sensing
Nanobiodevice for the Preparation of Nucleic Acid
Extraction of Nucleic Acids
Chemical Lysis
Mechanical Lysis
Thermal Lysis
Electrical Lysis
Isolation of Nucleic Acids
Label-Based Isolation
Label-Free and Label-Based Isolation
Nanobiodevice for Detection of Nucleic Acid
Optical-Based Detection
Electrochemical-Based Detection
Electrical-Based Detection
Conclusion
References
50. Functional Nucleic Acid-Protein Complexes: Application to Fluorescent Ribonucleopeptide Sensors
Introduction
Nucleic Acid-Protein Interactions
DNA-Binding Proteins
Nucleic Acid-Protein Complexes
Sequence-Selective DNA Binding by DNA-Binding Peptide Dimers
Preparation of DNA Nanostructure-Protein Complexes Using DNA-Binding Proteins
RNA-Binding Proteins and Peptides
Preparation of Functional Molecules Using RNA-Peptide Complexes
RNP Receptors and Catalysts
Fluorescent RNP Sensors
Conclusion
References
51. Detection Systems Using the Ternary Complex Formation of Nucleic Acids
Introduction
Branched DNA Assays
Enzyme-Free bDNA-Based Signal Amplification ECL Assay
SMART
3WJ-EXPAR
3WJ-PGRCA
SATIC
3WJ DNAzyme-Based Probe Method
Y-Shaped DNA Dual-Probe Transistor Assay
Other Related Techniques
Conclusion
References
52. Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application
Introduction
Reactions Involved in DNA/RNA Conjugate Formation
Functional Group-Specific Coupling Reactions
Bioorthogonal Reactions
Azide-alkyne Cycloaddition
Staudinger Ligation
Inverse Electron-Demand Diels-Alder Cycloaddition
Unique Binding and Functional Switching
Kinetic and Thermodynamic Stabilization by Conjugation with Click Chemistry
Switching
pH
Metal Ions
Light
Nucleic Acid Conjugates for Sensing and as a Research Tool
Complementary DNA Probes Modified with Reporter Molecules
Spectrophotometry
Electrochemistry
Nucleic Acid Aptamer Conjugates
Spectrophotometry
Electrochemistry
Lipid/Cholesterol-Modified DNAs
Liposome/Micelle Manipulation
Sensing on the Cell Surface
Liposome Fusion
Antibody/Enzyme-Modified DNAs
Preparation of Antibody-DNA Conjugates
Refined iPCR
Proximity Assays
Autonomous DNA Assembly
Caged Nucleic Acids
DNA/RNA Backbone
Nucleobase
Ribose
Conclusions and Perspectives
References
53. Molecular Beacons With and Without Quenchers
Introduction
Molecular Beacons
Structure of Molecular Beacons
Mechanism and Principles of Molecular Beacons
Advantages and Limitations of Molecular Beacons
Modifications of Molecular Beacons
Applications of Molecular Beacons
Quencher-Free Molecular Beacons
Mono-labeled Quencher-Free Molecular Beacons
Quencher-Free Molecular Beacons with Fluorophore at the Loop or Middle of the Oligonucleotide
Quencher-Free Molecular Beacons with Fluorophore at the Stem or Strand-end
Dual-Labeled Quencher-Free Molecular Beacons
Applications of Quencher-Free Molecular Beacons
Conclusion
References
Part VII. Nanotechnology and Nanomaterial Biology of Nucleic Acids
54. Gene Nanovector for Genome Therapy
Introduction
Multiplex Gene Regulation at Different Levels
The Gene Regulation Toolbox for Genome Therapy
Gene Rescue
Gene Silencing
Gene Editing
Gene Activation
RNA Editing
Gene Read-Through
Exon Skipping
Aptamer and Riboswitch
Catalytic Nucleic Acids
The Gene Vectors for Delivery
The Delivery Approaches
The Form of Gene Vectors
Plasmid DNA
Linear DNA
Viral Vector
Nano DNA
The Presumptive Model of Archimedes Solid-like Nanostructures Assembled from Branch-PCR
The Design of ASN-TO Gene Nanovector
Size-Tunability of ASN-TO Gene Nanovector
The Applications of ASN-TO Gene Nanovector in Genome Therapy
The Gene Overexpression of ASN-TO Gene Nanovector
The Gene Silencing of ASN-TO Gene Nanovector
The Genome Editing of ASN-TO Gene Nanovector
Multiplex Gene Regulation of ASN-TO Gene Nanovector for Cancer Therapy
Prospects of ASN-TO Gene Nanovector
Systemic and Targeted Delivery
Stimulus-Responsive DNA Release
Network Target-Based Genome Therapy with Co-Branch PCR Perspective
Conclusion
References
55. The Frame-Guided Assembly of Nucleic Acids
Introduction
The Demonstration and Development of Frame-Guided Assembly
The Demonstration of Frame-Guided Assembly
The Molecular Generality of Frame-Guided Assembly
The Application of Frame-Guided Assembly in Drug Delivery
The Mechanism of Frame-Guided Assembly
Development of Structural DNA Nanotechnology
Structural DNA Nanotechnology Based on Tile-Tile Interactions
DNA Origami
Classic DNA Origami
Single-Stranded RNA and DNA Origami
Wireframe Structures
Single-Stranded Tiles (SST)
FGA Based on DNA Nanotechnology
Frame-Guided Assembly with Inner DNA Frame
Frame-Guided Assembly with Outer DNA Frame
Frame-Guided Assembly with Planar DNA Frame
Polymer Membrane
Lipid Membrane
Conclusion
References
56. Graphene Oxide and Nucleic Acids
Introduction
Nucleic Acid Interaction with GO
Fluorescence Quenching by GO
Mechanism of DNA Adsorption and Desorption on GO
GO in Nucleic Acid Amplification
Functionalization of GO
GO-Mediated Facilitation of Nucleic Acid Amplification
Applications in FRET-Based Nucleic Acid Biosensors
DNA and RNA Detection
Aptamer-Based Detection
Applications in Biomedical Therapeutics
Biopolymer-Conjugated GO for Gene and Drug Delivery
Biocompatible GO Derivatives as Photothermal Therapeutic Reagents
Conclusion
References
57. Carbon Nanotubes and Nucleic Acids
Introduction
Carbon Nanotubes: ``Helical Microtubules of Graphitic Carbon´´
(n,m) Notation
Other Morphologies Making Use of CNTs
Synthesis Methods and Their Evolution
Arc-Discharge Method
Laser Ablation
Chemical Vapor Deposition (CVD)
Control Parameters of CVD
Variants in CVD Technique
Properties of CNTs
Mechanical Properties
Electrical Properties
Other Properties
Applications in Nucleic Acid Research
Gene Delivery (pDNA, siRNA, and miRNA)
Biosensors
Electrochemical Biosensors
Semiconductor-Based Biosensors
Optical Biosensors
Conclusion and Future Perspectives
References
58. Artificial Genetic Switches and DNA Origami: Current Landscape and Prospects as Designer Therapeutics and Visualization Too...
Introduction
Artificial Genetic Switches for Transcription Therapy
Targeting of the Promoter Region to Control TF-Regulated Gene Expression
Targeting the Coding Region to Control Mutant Gene Expression
Biomimetic Epigenetic Control to Switch ON the Gene Regulatory Network
PIPs as DNA-Based Visualization Tools
DNA Origami to Visualize Single-Molecule Interactions and Epigenetic Events
Direct Observation of Macromolecular Events Using DNA Origami
Force Spectroscopy-Based Biophysical Studies Using DNA Origami
Conclusion
References
59. Functional Engineering of Synthetic RNA Through Circularization
Introduction
Dumbbell siRNA
Buildup siRNA with Circular RNA
Circular RNA for Prokaryotic System
Circular RNA for Eukaryotic System
Chemical Synthesis of Circular RNA
Conclusion
References
60. Stimuli-Responsive DNA Nanostructures for Biomedical Applications
Introduction
DNA Nanostructures
DNA Nanostructures for Bio-Imaging and Drug Delivery
Bio-Imaging
Fluorescent Imaging
Photoacoustic Imaging
Magnetic Resonance Imaging
Positron Emission Computed Tomography/Computed Tomography Imaging
Drug Delivery
Delivery of Small Molecule Drug
Delivery of Functional Nucleic Acid
Delivery of Functional Protein
Delivery of Multiple Therapeutic Components
Conclusion and Future Perspective
References
61. Gene-Like Precise Construction of Functional DNA Materials
Introduction
Design Principle and Assembly Strategies
The Design Principle of DNA Sequence
Assembly Strategies of DNA Materials
Base-Pairing Based Assembly Strategy
Enzyme-Promoted Synthesis and Assembly
Dynamic Assembly
Hybrid Assembly
DNA Biomaterials
DNA Hydrogel
Branched DNA Assembled Hydrogel
RCA Produced DNA Hydrogel
Polymerase Chain Reaction (PCR) Produced DNA Hydrogel
Chemical Cross-Linking DNA Hydrogel
DNA-Based Nanomaterials
Branched DNA Assembled Nanomaterials
RCA Assembled DNA Nanomaterials
HCR Assembled DNA Nanomaterials
DNA Origami Nanomaterials
The Applications of DNA Materials
Cell Engineering
Diagnosis
Therapy
Conclusions and Further Reading
References
62. Design and Self-Assembly of Therapeutic Nucleic Acid Nanoparticles (NANPs) with Controlled Immunological Properties
Introduction
Functional Nucleic Acids
Messenger RNAs
Non-coding RNAs
siRNAs
Riboswitches
Ribozymes
Aptamers
Therapeutic Nucleic Acids (TNAs)
Nucleic Acids as Nanomaterials
Rational Design and Self-Assembly
Sequence Preparation
Incorporating Function into Scaffolds
Optimization of Storage for Increased Nucleic Acid Stability
Methods to Achieve Anhydrous Sample Storage
Light-Assisted Drying (LAD)
NANPs Tolerance of Dehydration Techniques
Recognition of Nucleic Acids and NANPs by the Human Innate Immune System
Pattern Recognition Receptors (PRRs)
Pathogen-Associated Molecular Patterns (PAMPs)
Incorporating Immunological Properties into Design Parameters of NANPs
Design Strategies of NANPs
Composition and Chemical Modifications of NANPs
Purity of NANPs
Route of Delivery of NANPs
Conclusion
References
Further Readings
63. Nanomaterials for Therapeutic Nucleic Acid Delivery
Introduction
Therapeutic Nucleic Acids
ASOs
RNAi
siRNA
miRNA
mRNA
Challenges and Biological Barriers for Delivery of Therapeutic Nucleic Acids
Nanomaterials for the Delivery of Therapeutic Nucleic Acids
Polymeric Nanoparticles
LNPs
Cell-Penetrating Peptides (CPPs)
Inorganic Nanoparticles
Nucleic Acid-Based Nanoparticles
VLPs and Others
Conclusion
References
Part VIII. Nucleic Acid Therapeutics
64. Flex-Nucleosides: A Strategic Approach to Antiviral Therapeutics
Introduction
Background and Significance
Nucleobase Modifications
Distal Fleximers
Proximal Fleximers
Reverse Fleximers
Click Fleximers and Other Triazole Fleximers
Fleximer Bases
Sugar Modifications
2′ Modifications
3′ Modifications
Carboxylic Modifications
Acyclic Modifications
Conclusion
References
65. Small Molecules Targeting Repeat Sequences Causing Neurological Disorders
Introduction
Contractions of Trinucleotide Repeats in HD Targeting CAG Repeats
Alleviation of Splicing Defects in DM1 Targeting CUG Repeat
Alleviation of RNA Toxicity in SCA31 by a Small Molecule Targeting UGGAA Repeat
Conclusion
References
66. Targeted Cancer Therapy: KRAS-Specific Treatments for Pancreatic Cancer
Introduction
Sequence Determinants That Control KRAS Gene Expression
The Biological Function of the KRAS Oncogene
The Role of G4 in the KRAS Promoter
Antigene Strategies Based on G4-Binding Small Molecules
G4-Binding Compounds Binding to the 5′-UTR Region of the KRAS Gene
RG4-Binding Alkyl Porphyrins Promote Cell Death by Apoptosis and Ferroptosis
Transcription Factor Decoy G-Quadruplex Oligonucleotides against the KRAS Gene
Suppression of the KRAS Gene by miRNAs
Conclusion
References
67. Functional XNA and Biomedical Application
Introduction
Functional Nucleic Acids
Threose Nucleic Acid (TNA)
Polymerases Capable of Recognizing TNA Substrates
TNA Aptamers
TNAzymes
2′-Deoxy-2′-Fluoroarabinose Nucleic Acid (FANA)
Polymerases Capable of Recognizing FANA Substrates
FANA Aptamers
FANAzymes
Other XNAs
Polymerases for Other XNAs
Aptamers Selected from Other XNA Chemistries
Enzymes Selected from Other XNA Chemistries
XNA-Modified Existing DNAzymes
Conclusion
References
68. Controlled Intracellular Trafficking and Gene Silencing by Oligonucleotide-Signal Peptide Conjugates
Introduction
Synthesis of Oligonucleotide-Peptide Conjugate
Solution Phase Synthesis
Solid Phase Synthesis
Synthesis of Oligonucleotide Conjugates by SPFC
Syntheses of Oligonucleotide-Peptide Conjugates at the 5′-End of Oligonucleotide by SPFC
Synthesis of Oligonucleotide-Peptide Conjugates Using Modified Base Amino Modifier C2dT
Syntheses of Oligonucleotide-Peptide Conjugates Using 2′-OH Group of Ribose
Hybridization Properties of Oligonucleotide-Peptide Conjugates
Resistance of Oligonucleotide-Peptide Conjugates Against Nuclease Digestion
Activation of RNase H
Cytotoxicity
Control of Intracellular Trafficking by Oligonucleotide-Signal Peptide Conjugate
Inhibition of Telomerase and Telomere Attrition by sASO-NLS Conjugates Targeting hTERC
Silencing of BCR/ABL Chimeric Gene by siRNA-NES Conjugates
Silencing of BCR/ABL Chimeric Gene by siRNAs Bearing 5′-Amino Modifier 5
Silencing of BCR/ABL Chimeric Gene by siRNA-NES Conjugates
Nontoxic Transfection of siRNA-NES Conjugates by Designed Peptides
Conclusions
References
69. First- and Second-Generation Nucleoside Triphosphate Prodrugs: TriPPPro-Compounds for Antiviral Chemotherapy
Introduction
Earlier Nucleoside Triphosphate Prodrugs Bearing One Masking Group
Nucleoside Triphosphate Prodrugs Bearing Two Biodegradable Masking Groups First Generation Compounds
Application of the TriPPPro-Concept to Various Nucleoside Analogues
γ-Nonsymmetrically Modified TriPPPro-Prodrugs Bearing One Biodegradable Group (Second Generation Triphosphate Delivery Systems)
Primer Extension Assays
Summary and Conclusion
References
70. New Molecular Technologies for Oligonucleotide Therapeutics-1: Properties and Synthesis of Boranophosphate DNAs
Introduction
Properties of PB Oligodeoxyribonucleotides
Chemical Stability
Duplex Stability
Nuclease Resistance
RNase H Activity
Syntheses of PB Oligodeoxynucleotides
Challenges of PB Oligonucleotide Syntheses
Synthesis of PB Oligonucleotides from the Phosphoramidite Monomer Bearing Amino-Protecting Groups is Compatible with a Boronat...
Synthesis of PB Oligonucleotides Employing a Nucleoside 3′-O-H-Phosphonate
Synthesis with a P-Boronated Monomer
Stereoselective Synthesis of PB Oligonucleotides Employing an Oxazaphospholidine Monomer Bearing an Acid-Labile Chiral Auxilia...
Conclusion
References
71. Extracellular Vesicle-Mediated CRISPR/Cas Delivery: Their Applications in Molecular Imaging and Precision Biomedicine
Introduction
Extracellular Vesicle Platforms for Targeted CRISPR/Cas Delivery
Endogenous Extracellular Vesicles for CRISPR/Cas Delivery
Engineered Extracellular Vesicles for CRISPR/Cas Delivery
Hybridized Extracellular Vesicles for Targeted CRISPR/Cas Delivery
Delivery of CRISPR/Cas with Various Genome Editing Modes via Extracellular Vesicles
Extracellular Exosome Cargo Systems for CRISPR/Cas Plasmid DNA Delivery
Extracellular Exosomes for Targeted Delivery of CRISPR/Cas RNA
Extracellular Exosome Cargo Systems for Targeted Delivery of CRISPR/Cas Ribonucleoprotein (RNP)
Biomedical Applications via Extracellular Vesicle-Mediated CRISPR/Cas Delivery
Extracellular Exosome-Mediated CRISPR/Cas Delivery in Precise Gene Therapy
Extracellular Vesicle Delivery Systems of CRISPR/Cas for Precise Diagnosis
Machine Learning-Assisted EV-Based CRISPR System as Next-Generation Gene Editing Tool for Personalized Precise Medicine and Di...
Conclusion and Future Perspectives
References
72. Advancing XNAzymes as Nucleic Acid Therapeutics
Introduction
Xeno Nucleic Acids
XNA-Modified DNAzyme 10-23
RNA-Cleaving XNAzymes Isolated by SELEX
Conclusion
References
73. New Molecular Technologies for Oligonucleotide Therapeutics-2: A-Type Nucleic Acid Duplex-Specific Binding Oligocationic Mo...
Introduction
Artificial Cationic Oligosaccharides that Specifically Bind to A-Type Nucleic Acid Duplexes
Artificial Cationic Oligopeptides that Bind Specifically to A-Type Nucleic Acid Duplexes
Conclusion
References
Part IX. Biotechnology and Synthetic Biology of Nucleic Acids
74. Amides and Other Nonionic Backbone Modifications in RNA
Introduction
Amide-Modified DNA
Synthesis, Biophysical, and Structural Properties of Amide-Modified RNA
Biological Activity of Amide-Modified RNAs
Other Nonionic Backbones
Conclusion
References
75. Expanding the RNA- and RNP-Based Regulatory World in Mammalian Cells
Introduction
RNA Switch
Background
Protein-Responsive OFF Switch
miRNA-Responsive Switch
NMD-Mediated RNA Inverter
Alternative Splicing-Based ON Switch
Ribozyme-Based ON Switch
PERSIST
miRNA ON Switch
eToehold Switch
RNA Sensors by ADAR-Mediated Base Editing
CaVT System
CRISPR-Cas Technology
Fundamental Knowledge of CRISPR-Cas Systems
PAM-Altered Cas Proteins
Genome Editing Without DNA Double-Strand Break
Transcriptional Regulation with Engineered Cas Proteins
Epigenetic Regulation
RNA-Targeting CRISPR Technology
Conclusion and Future Perspective
References
Further Readings
76. Design and Biological Application of RTK Agonist Aptamers
Introduction
Main Text
Generation of RTK-Binding Aptamers
TrkB-Binding Aptamers
VEGFR-Binding Aptamers
Met-Binding Aptamers
FGFR-Binding Aptamers
IR-Binding Aptamers
Perspectives
Conclusion
References
77. G-Quadruplex-Based Aptamers in Therapeutic Applications
Introduction
G-Quadruplex (G4) Motifs in Oligonucleotide Aptamers
G4-Forming Aptamers in Therapeutic Applications
G4-Based Aptamers as Potential Drugs
Antiviral G4-Aptamers
Anti-SARS-CoV-2 Aptamers
Anti-HIV Aptamers
G4-Based Aptamers Against Other Viruses
Anticancer G4-Based Aptamers
Anticoagulant G4-Based Aptamers
G4-Based Aptamers as Drug Delivery Systems
AS1411-Drug Covalent/Noncovalent Conjugates
AS1411-Drug-Liposome Systems
AS1411-Drug-Nanoparticle Systems
Other G4-Forming Aptamers as Drug Delivery Systems
Conclusions and Perspectives
References
78. Nucleic Acids in Green Chemistry
Introduction
Environmentally Friendly Synthetic Chemistry with Nucleic Acids
Environmentally Friendly Nucleic Acid Synthesis
Atomically Economic Synthesis of Ribose and Its Precursors Using Hydroxyapatite
Connection of Nucleotides
Nucleic Acids that Provide Special Reaction Fields
Platform for Supporting Metal Catalysts
Asymmetric Reaction Field
Role of Ionic Liquids in Nucleic Acid Green Chemistry
Stability of Nucleic Acid Structures in Ionic Liquids
Solubility of Nucleic Acids in Ionic Liquids
Extraction of Nucleic Acids Using Ionic Liquids
Function of Nucleic Acids in Ionic Liquids
DNA Used as a New Material
DNA Fuel
DNA Cast Film
Bioplastics
Conclusions
References
79. G-Quadruplexes in Human Viruses: A Promising Route to Innovative Antiviral Therapies
Introduction
Baltimore Class I: Double-Stranded DNA
Herpesviridae
Alphaherpesviruses
Betaherpesviruses
Gammaherpesviruses
Papillomaviridae
Adenoviridae
Baltimore Class IV: (+) Single-Stranded RNA
Coronaviridae
Flaviviridae
Togaviridae
Picornaviridae
Baltimore Class V:(-) Single-Stranded RNA
Filoviridae
Orthomyxoviridae
Baltimore Class VI: (+) Single-Stranded RNA - RT
Baltimore Class VII: Double-Stranded DNA - RT
Discussion and Future Perspectives
References
80. Nonchromatographic Purification of Synthetic RNA
Introduction
Purification of Oligonucleotides by Polymerization of Target Strands
Purification of Oligonucleotides by Polymerization and Removal of Failure Strands
Fluorous Affinity Purification of Synthetic Oligonucleotides
Oligonucleotide Purification Using Photocleavable Biotinylated Handle
A ``Catch and Release´´ Strategy for Nonchromatographic Purification of Oligonucleotides
Solid-Phase-Assisted Purification via Oximation Chemistry
Purification of Synthetic RNAs Using Bond-Breaking Bio-orthogonal Chemistry
Purification of Synthetic RNAs Using Inverse Electron Demand Diels-Alder Chemistry
Conclusion
References
81. Genome Editing Using CRISPR
Introduction
Canonical Methods for Genome Editing
Restriction Enzymes
ZFN
Talen
Genome Editing Using the CRISPR-Cas System
Discovery of CRISPR Genes and Identification of Their Function
Potential of the CRISPR-Cas9 System as a Genome Editing Tool
Single RNA-Guided Endonuclease
The Mechanisms of Genome Editing Using the CRISPR-Cas9 System
Canonical Nonhomologous End Joining (c-NHEJ) Repair
Homology-Directed Repair (HDR)
Microhomology-Mediated End-Joining (MMEJ) Repair
Single-Strand Annealing (SSA)
CRISPR-Cas9-Based Genome Editing Strategies Reliant upon DSBs
Enhancing HDR
Suppression of c-NHEJ
Activating HDR Factors
Regulating the Cell Cycle
Increasing the Accessibility of Donor DNA
Using ssDNA as a Donor
Non-HDR-Based Genome Editing
Precise Integration into a Target Chromosome (PITCh)
Homology-Mediated End Joining (HMEJ)
Homology-Independent Targeted Integration (HITI)
Genome Editing Derivatives Utilizing CRISPR Without Generating DNA DSBs
Base Editors (BEs)
Cytosine Base Editor (CBE)
Adenine Base Editor (ABE)
Efforts toward Versatile Application of BEs
Prime Editors (PEs)
Development of PEs
Improvements of PEs
Transposases Associated with the CRISPR-Cas System
Insertion of Transposable Elements by Guide RNA-Assisted Targeting (INTEGRATE)
CRISPR-Associated Transposase (CAST)
Cas-Transposon (CasTn)
Cas13-Based Tool for RNA Editing
Conclusion
References
82. Biomaterials Based on DNA Conjugates and Assemblies
Introduction
DNA-Sugar Conjugates through Diazo-Coupling
DNA-Sugar and DNA-Peptide Conjugates with Spatial Arrangement
DNA-Modified Artificial Viral Capsids Self-Assembled from DNA-Peptide Conjugates
Nucleo-Spheres: Nano- and Micron-Sized Spherical DNA Assemblies
Spatiotemporal Control of Peptide Nanofiber Growth Using DNA-Peptide Conjugates
Conclusion
References
83. G-Quadruplex Resolving by Specific Helicases
Introduction
Classifications of Helicases
G4-Unwinding Helicases
G4 Helicases in DNA Replication, Transcription, and Maintenance
RecQ-Like Helicases: BLM and WRN
G4 Unwinding Helicases in mRNA Regulation
G4-Helicases in Immunity and Infection
Helicases in Viruses
Conclusion
References
84. Binding and Modulation of G-Quadruplex DNA and RNA Structures by Proteins
Introduction
RGG Domains Indicate the Potential of a Protein to Bind to G4s
Multiple Transcription Factors Bind to G4 Regions via RGG-Motif
G4-Binding Proteins and Genome Stability
Conclusion
References
Part X. Functional Nucleic Acids
85. Targeting DNA with Triplexes
Introduction
Triplets
Structures
Kinetics
Nucleotide Analogues
Base Analogues for Overcoming the pH Limitation in Parallel Triplets
Recognizing Pyrimidine Interruptions in Parallel Triplexes
Recognizing Pyrimidine Interruptions in Antiparallel Triplexes
Recognizing all Four Bases
Increasing Triplex Stability
Triplex Binding Ligands
Extended Bases
Backbones
Addition of Positive Charges
Uncharged Backbones
Triplexes in Biology
Intramolecular Triplexes
Intermolecular Triplexes in Biology
Triplexes and RNA Stability
RNA:DNA Triplexes Formed by Noncoding RNAs
Interaction of RNA with Chromatin
Triplexes and Noncoding RNA
Triplex Applications
Triplexes and DNA Nanostructures
Assembly
Triplexes for Sequence Detection
Attaching a Cargo
Conclusions
References
86. Metal Ion-Induced Changes in the Stability of DNA Duplexes
Introduction
Duplex Stabilization by Metal-Mediated Artificial Base Pairing
History of Metal-Mediated Unnatural Base Pairs
Duplex Stabilization by a Single Metal-Mediated Base Pair
Duplex Stabilization by Multiple Metal-Mediated Base Pairs
Metal-Mediated Regulation of Duplex Stability Based on the Bifacial Base Pairing of Modified Pyrimidine Bases
Effects of pH on the Duplex Stabilization by Metal-Mediated Base Pairing
Metal-Mediated Stabilization of Other DNA Structures
Metal-Mediated Stabilization of DNA Triplexes and Quadruplexes
Metal-Mediated Stabilization of DNA Three-Way Junction Structures
Recent Applications of Metal-Mediated Duplex Stabilization
Development of Metal-Responsive DNAzymes Based on the Metal-Mediated Duplex Stabilization
Development of Other Types of Metal-Responsive DNA Molecules
Conclusion
References
87. Liquid-Liquid Phase Separation and Nucleic Acids
Introduction
Structure and Stability of Nucleic Acids Depend on the Surrounding Environment
Behavior of Nucleic Acids in a Solution That Mimics the Cellular Environment Using Co-solutes
Behavior of Nucleic Acids in a Solution That Mimics the Environment Within Cellular Organelles
Behavior of Nucleic Acids in Ionic Liquids
Effects of Nucleic Acid Structure on Phase Separation
Biomolecules Contained Within Droplets
Influence of Nucleic Acid Structure on Droplet Formation and Function
Nucleic Acids in Droplets That Lead to Onset and Progression of Diseases
Nucleic Acids in Droplets Involved in Cancer
Nucleic Acid Structures of Cancer-Related Genes
Effects of Nucleic Acid Structure on Cancer-Related Gene Expression
Phase Separation Related to Cancer Progression
Nucleic Acids in Droplets Involved in Neurodegenerative Disease Mechanisms
Nucleic Acids Involved in Neurodegenerative Diseases
Noncanonical Structures of DNA and RNA in Genes Associated with Neurodegenerative Diseases
Structures of Nucleic Acids That Affect Cytotoxicity via Phase Separation in Neurodegenerative Diseases
Therapeutic Strategies Targeting Nucleic Acids That Cause Phase Separation
Nucleic Acids as Therapeutic Targets
Oligonucleotide Therapy Targeting Nucleic Acids That Cause Phase Separation
CRISPR Therapy Targeting Nucleic Acids That Cause Phase Separation
Small Molecules Targeting Nucleic Acids That Cause Phase Separation
Conclusions
References
88. Natural Riboswitches
Introduction
Natural Riboswitches as Models for RNA Structure and RNA-Small Molecule Interactions
In-Line Probing as a Method for In Vitro Riboswitch Validation
Structural Methods to Study Riboswitch-Ligand Interactions
Biophysical Methods to Study Riboswitch Dynamics
Natural Riboswitches Showcase Mechanisms of Gene Regulation at the RNA Level
Transcription Regulation
Translation Regulation
Other Mechanisms of Regulation
Sophisticated Regulation by Tandem Riboswitches
Riboswitches as Antibiotic Targets
Conclusions
References
89. External Stimulation-Responsive Artificial Nucleic Acids: Peptide Ribonucleic Acid (PRNA)-Programmed Assemblies
General Introductions
Nucleic Acid Medicines: Oligonucleotide Therapeutics
Antisense Strategies
Issues for Improving Oligonucleotide Therapeutics
Overview of Artificial/Modified Nucleic Acid Design: Nucleotide Modification for the Therapeutic Use
Stimuli-Responsive Nucleoside/Nucleotide Analogs and Oligonucleotides
Photo-Irradiation-Responsive Nucleosides/Nucleotides and Oligonucleotides
Redox-Responsive Nucleosides/Nucleotides and Oligonucleotides
Reactive Oxygen Species-Responsive Nucleosides/Nucleotides and Oligonucleotides
Enzymatic Reaction-Responsive Nucleosides/Nucleotides and Oligonucleotides
pH-Responsive Nucleosides/Nucleotides and Oligonucleotides
Development of Peptide Ribonucleic Acids (PRNAs) for the pH Change-Dependent Nucleobase Orientation Control
Conclusion
References
90. Targeting RNA with Small Molecules
Introduction
Small Molecules That Target RNA: Discovery, Design, and Modes of Action
SMIRNA Targeting of miRNAs
Traditional Methods to Identify SMIRNAs
High Throughput Screening
Identifying Chemical Matter from In Vitro Binding Assays
Phenotypic Screening
Structured-Based Drug Design to Enable Identification of SMIRNAs
General Overview of Sequence-Based Design of Structure-Specific SMIRNAs
Defining Privileged RNA Motif-SMIRNA Interactions via 2DCS and HiT-STARTS
Inforna-Enabled Sequence-Based Design of SMIRNAs that Target Human miRNAs
Design of Dimeric SMIRNAs Targeting Disease-Causing miRNAs
Neomycin-Nucleobase Conjugates Targeting Oncogenic miRNAs
Fragment-Based Approach to Identify and Optimize Bioactive SMIRNAs
Targeted Cleavage and Degradation of miRNAs
Overview of Targeted RNA Degradation by Ribonuclease Targeting Chimeras (RIBOTAC)
The Natural Activator 2′-5′ Poly(A) as the RNase L-Recruiting Module
Synthetic Small Molecule as the RNase L-Recruiting Module
SMIRNAs Targeting of mRNAs of Traditionally Undruggable Proteins
Overview of Undruggable Proteins
SMIRNAs Targeting mRNAs as Translational Inhibitors
SMIRNAs Targeting SNCA mRNA to Inhibit Translation
SMIRNAs that Modulate MAPT Splicing
Conclusion
References
Index
Recommend Papers

Handbook of Chemical Biology of Nucleic Acids
 9789811997754

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Naoki Sugimoto Editor

Handbook of Chemical Biology of Nucleic Acids

Handbook of Chemical Biology of Nucleic Acids

Naoki Sugimoto Editor

Handbook of Chemical Biology of Nucleic Acids With 1111 Figures and 134 Tables

Editor Naoki Sugimoto Frontier Institute for Biomolecular Engineering Research (FIBER) Konan University Kobe, Japan Graduate School of Frontiers of Innovative Research in Science and Technology (FIRST) Konan University Kobe, Japan

ISBN 978-981-19-9775-4 ISBN 978-981-19-9776-1 (eBook) https://doi.org/10.1007/978-981-19-9776-1 © Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

One of the most important and interesting biomolecules is nucleic acids: DNA and RNA, even though nucleic acids consist of very simple materials: phosphate, sugar, and organic bases. Their structures are also very simple as single strands or a double helix, in comparison with other biomolecules such as proteins and carbohydrates; however, the nucleic acids have very important genetic information and functions. Nucleic acids are molecules that carry genetic information and can self-replicate and do gene expression. The basic chemical property of nucleic acid replication is base pairing between the nucleobases. In 1953, Watson and Crick reported a righthanded canonical double helix consisting of two DNA strands. On the other hand, nucleic acids can form Hoogsteen base pairs, which are used to form non-canonical structures such as triplex and quadruplex structures. It could be considered that organisms can utilize the “genetic codes” of the canonical double helix and “functional codes” of non-canonical structures. As perturbation of the balance in the formation of non-canonical structures can cause cancer, neurodegenerative disease, and other diseases, not only the fundamental mechanism of the formation of nucleic acid structures in cells but also the therapeutic applications targeting nucleic acids related to these diseases are one of the biggest topics in the field of science and technology of nucleic acids now. In this year (2023) which commemorates the 70th anniversary of the DNA double helix discovery, I am very happy to publish this handbook Chemical Biology of Nucleic Acids as the Editor-in-chief. It is divided into 10 sections including 90 chapters where authors present not only basic knowledge but also recent top research. Each section consists of extensive review chapters covering the chemistry, biology, and biophysics of nucleic acids as well as their applications in molecular medicine, biotechnology, and nanotechnology. All sections within this handbook are Physical Chemistry of Nucleic Acids (Section Editor: Prof. Roland Winter), Structural Chemistry of Nucleic Acids (Section Editor: Prof. Janez Plavec), Organic Chemistry of Nucleic Acids (Section Editor: Prof. Piet Herdewijin), Ligand Chemistry of Nucleic Acids (Section Editor: Prof. Marie-Paule Teulade-Fichou), Nucleic Acids and Gene Expression (Section Editor: Prof. Cynthia Burrows), Analytical Methods and Applications of Nucleic Acids (Section Editor: Prof. Chaoyong Yang), Nanotechnology and Nanomaterial Biology of Nucleic Acids (Section Editor: Prof. Zhen Xi), Nucleic Acids Therapeutics (Section Editor: Prof. Katherine Seley-Radtke), Biotechnology v

vi

Preface

and Synthetic Biology of Nucleic Acids (Section Editor: Prof. Eriks Rozners), and Functional Nucleic Acids (Section Editor: Prof. Keith R. Fox). The handbook is edited by outstanding leaders with contributions written by international renowned experts. It is a valuable resource not only for researchers but also graduate students for working in areas related to nucleic acids, who would like to learn more about their important role and potential applications. I wish all readers enjoy this handbook and know the importance of not only Watson–Crick double helical nucleic acids (B-form) but also non-canonical nucleic acids like triplex and quadruplex. Instead of Hamlet by written by William Shakespeare, please answer the question “To B or not to B, that is the question” in the research field of nucleic acids. I am deeply grateful to all the authors and section editors for their outstanding contributions, and the Springer Nature editing team, especially Mr. Shinichi Koizumi and Ms. S. Shameem Aysha, for their valuable support and for encouraging me for a long time. Kobe, Japan August 2023

Naoki Sugimoto

Contents

Volume 1 Part I 1

2

3

4

5

Physical Chemistry of Nucleic Acids . . . . . . . . . . . . . . . . . . . .

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids and Their Interactions . . . . . . . . . . . . . . . . . . . . . . . Sanjib K. Mukherjee, Jim-Marcel Knop, and Roland Winter

3

Stability Prediction of Canonical and Noncanonical Structures of Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuntaro Takahashi, Hisae Tateishi-Karimata, and Naoki Sugimoto

37

The Effect of Pressure on the Conformational Stability of DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tigran V. Chalikian and Robert B. Macgregor, Jr.

81

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jean-Louis Mergny, Mingpan Cheng, and Jun Zhou

113

i-Motif Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zoë A. E. Waller

Part II

Structural Chemistry of Nucleic Acids . . . . . . . . . . . . . . . . .

139

167

6

NMR Study on Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Janez Plavec

169

7

Z-DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Doyoun Kim, Vinod Kumar Subramani, Soyoung Park, Joon-Hwa Lee, and Kyeong Kyu Kim

213

8

Structures of G-Quadruplexes and Their Drug Interactions . . . . . Yichen Han, Jonathan Dickerhoff, and Danzhou Yang

243

vii

viii

Contents

9

In Cell 19F NMR for G-Quadruplex . . . . . . . . . . . . . . . . . . . . . . . . Yan Xu

10

Structures and Catalytic Activities of Complexes Between Heme and DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yasuhiko Yamamoto and Atsuya Momotake

293

Studying Nucleic Acid-Ligand Binding by X-Ray Crystallography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christine J. Cardin and Kane T. McQuaid

331

11

12

Predicting the 3D Structure of RNA from Sequence . . . . . . . . . . . James Roll and Craig L. Zirbel

Part III 13

14

Organic Chemistry of Nucleic Acids

..................

273

365

399

Hexitol Nucleic Acid (HNA): From Chemical Design to Functional Genetic Polymer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elisabetta Groaz and Piet Herdewijn

401

The Effects of FANA Modifications on Non-canonical Nucleic Acid Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roberto El-Khoury, Miguel Garavís, and Masad J. Damha

435

15

Isomorphic Fluorescent Nucleoside Analogs Kfir B. Steinbuch and Yitzhak Tor

.................

473

16

Bridged Nucleic Acids for Therapeutic Oligonucleotides . . . . . . . . Md Ariful Islam and Satoshi Obika

497

17

Mesyl Phosphoramidate Oligonucleotides: A New Promising Type of Antisense Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dmitry A. Stetsenko

543

18

Chemistry of Cyclic Dinucleotides and Analogs . . . . . . . . . . . . . . . Noriko Saito-Tarashima and Noriaki Minakawa

585

19

Labeling and Detection of Modified Nucleic Acids . . . . . . . . . . . . . Jing Mo, Xiaocheng Weng, and Xiang Zhou

615

20

Cross-Linking Duplex of Nucleic Acids with Modified Oligonucleotides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fumi Nagatsugi

647

21

Enzymatic Synthesis of Base-Modified Nucleic Acids Marcel Hollenstein

..........

687

22

Charge Transfer in Natural and Artificial Nucleic Acids . . . . . . . . Sabine Müller and Jennifer Frommer

727

Contents

23

Nucleic Acid Aptamers: From Basic Research to Clinical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David-M. Otte, Moujab Choukeife, Tejal Patwari, and Günter Mayer

Part IV 24

25

26

27

28

ix

Ligand Chemistry of Nucleic Acids . . . . . . . . . . . . . . . . . . .

747

773

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniela Verga, Anton Granzhan, and Marie-Paule Teulade-Fichou

775

Compound Shape and Substituent Effects in DNA Minor Groove Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . W. David Wilson and Ananya Paul

833

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yue Ma, Keisuke Iida, and Kazuo Nagasawa

873

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shigeori Takenaka

905

Imaging Study of Small Molecules to G-Quadruplexes in Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ting-Yuan Tseng and Ta-Chau Chang

933

29

DNA/Metal Cluster–Based Nano-lantern . . . . . . . . . . . . . . . . . . . . Can Xu and Xiaogang Qu

30

Interaction of Poly(Ethylene Glycol)-b-Poly-L-Lysine Copolymers with DNA Structures: A Thermodynamic Investigation . . . . . . . . Hui-Ting Lee, Alexander Lushnikov, and Luis A. Marky

971

995

31

Chemical Tools to Target Noncoding RNAs . . . . . . . . . . . . . . . . . . 1017 Maurinne Bonnet and Maria Duca

32

Targeting DNA Junctions with Small Molecules for Therapeutic Applications in Oncology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1051 Joanna Zell and David Monchaud

Volume 2 Part V 33

Nucleic Acids and Gene Expression . . . . . . . . . . . . . . . . . . .

1075

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1077 Aaron M. Fleming and Cynthia J. Burrows

x

Contents

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1097 Manlio Palumbo and Claudia Sissi

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1127 Tamaki Endoh, Hisae Tateishi-Karimata, and Naoki Sugimoto

36

Structure-Guided Optimization of siRNA and Anti-miRNA Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1173 Kevin M. Pham and Peter A. Beal

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1201 Junzhou Wu, Thomas J. Begley, and Peter C. Dedon

38

Sulfur- and Selenium-Modified Bacterial tRNAs . . . . . . . . . . . . . . 1231 B. Nawrot, M. Sierant, and P. Szczupak

39

Chemical-Assisted Epigenome Sequencing . . . . . . . . . . . . . . . . . . . 1265 Dongsheng Bai, Jinying Peng, and Chengqi Yi

40

Telomerase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1291 Tracy M. Bryan and Scott B. Cohen

41

Telomeres: Structure and Function Scott B. Cohen and Tracy M. Bryan

42

Genetic Alphabet Expansion of Nucleic Acids . . . . . . . . . . . . . . . . 1335 Michiko Kimoto and Ichiro Hirao

43

Unnatural Base Pairs to Expand the Genetic Alphabet and Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1369 Floyd E. Romesberg

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1391 Anne-Marie Di Guilmi, Nuria Fonknechten, and Anna Campalans

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1425 Róża Pawłowska and Piotr Guga

. . . . . . . . . . . . . . . . . . . . . . . . 1317

Part VI Analytical Methods and Applications of Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1451

46

Aptamer Molecular Evolution for Liquid Biopsy . . . . . . . . . . . . . . 1453 Lingling Wu, Qi Niu, and Chaoyong Yang

47

Single-Molecule DNA Visualization . . . . . . . . . . . . . . . . . . . . . . . . 1497 Xuelin Jin and Kyubong Jo

Contents

xi

48

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1527 Kyoung-Ran Kim, Junghyun Kim, and Dae-Ro Ahn

49

Nanobiodevice for Nucleic Acid Sensing . . . . . . . . . . . . . . . . . . . . . 1555 Hiromi Takahashi, Takao Yasui, and Yoshinobu Baba

50

Functional Nucleic Acid-Protein Complexes: Application to Fluorescent Ribonucleopeptide Sensors . . . . . . . . . . . . . . . . . . . 1585 Arivazhagan Rajendran, Shiwei Zhang, and Takashi Morii

51

Detection Systems Using the Ternary Complex Formation of Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1605 Hiroto Fujita and Masayasu Kuwahara

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1623 Toshihiro Ihara, Yusuke Kitamura, and Yousuke Katsuda

53

Molecular Beacons With and Without Quenchers . . . . . . . . . . . . . 1659 SueJin Lee and Byeang Hyean Kim

Part VII Nanotechnology and Nanomaterial Biology of Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1695

54

Gene Nanovector for Genome Therapy . . . . . . . . . . . . . . . . . . . . . 1697 Dejun Ma and Zhen Xi

55

The Frame-Guided Assembly of Nucleic Acids Yuanchen Dong and Dongsheng Liu

56

Graphene Oxide and Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . 1765 Khushbu Chauhan, Eunbin Cho, and Dong-Eun Kim

57

Carbon Nanotubes and Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . 1797 Priyannth Ramasami Sundharbaabu, Junhyuck Chang, and Jung Heon Lee

58

Artificial Genetic Switches and DNA Origami: Current Landscape and Prospects as Designer Therapeutics and Visualization Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1835 Ganesh N. Pandian, Shubham Mishra, and Hiroshi Sugiyama

59

Functional Engineering of Synthetic RNA Through Circularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1865 Hiroshi Abe

60

Stimuli-Responsive DNA Nanostructures for Biomedical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1913 Jianbing Liu and Baoquan Ding

. . . . . . . . . . . . . . . 1733

xii

Contents

61

Gene-Like Precise Construction of Functional DNA Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1941 Feng Li, Shuai Li, and Dayong Yang

62

Design and Self-Assembly of Therapeutic Nucleic Acid Nanoparticles (NANPs) with Controlled Immunological Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975 Morgan Chandler, Leyla Danai, and Kirill A. Afonin

63

Nanomaterials for Therapeutic Nucleic Acid Delivery . . . . . . . . . . 2005 Shi Du, Jeffrey Cheng, and Yizhou Dong

Volume 3 Part VIII

Nucleic Acid Therapeutics . . . . . . . . . . . . . . . . . . . . . . . . .

2035

64

Flex-Nucleosides: A Strategic Approach to Antiviral Therapeutics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2037 Katherine L. Seley-Radtke, Christianna H. M. Kutz, and Joy E. Thames

65

Small Molecules Targeting Repeat Sequences Causing Neurological Disorders . . . . . . . . . . . . . . . . . . . . . . . . . . . 2107 Bimolendu Das, Tomonori Shibata, and Kazuhiko Nakatani

66

Targeted Cancer Therapy: KRAS-Specific Treatments for Pancreatic Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2139 Himanshi Choudhary and Luigi E. Xodo

67

Functional XNA and Biomedical Application . . . . . . . . . . . . . . . . 2173 Dongying Wei, Xintong Li, Yueyao Wang, and Hanyang Yu

68

Controlled Intracellular Trafficking and Gene Silencing by Oligonucleotide-Signal Peptide Conjugates . . . . . . . . . . . . . . . . . . 2203 Masayuki Fujii, Marija Krstic-Demonacos, and Constantinos Demonacos

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs: TriPPPro-Compounds for Antiviral Chemotherapy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2237 Xiao Jia, Chenglong Zhao, and Chris Meier

70

New Molecular Technologies for Oligonucleotide Therapeutics-1: Properties and Synthesis of Boranophosphate DNAs . . . . . . . . . . . 2267 Kazuki Sato and Takeshi Wada

Contents

xiii

71

Extracellular Vesicle-Mediated CRISPR/Cas Delivery: Their Applications in Molecular Imaging and Precision Biomedicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2287 Dong Bingxue, Lang Wenchao, and Bengang Xing

72

Advancing XNAzymes as Nucleic Acid Therapeutics . . . . . . . . . . . 2309 Yajun Wang and John C. Chaput

73

New Molecular Technologies for Oligonucleotide Therapeutics-2: A-Type Nucleic Acid Duplex-Specific Binding Oligocationic Molecules for Oligonucleotide Therapeutics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2323 Rintaro Iwata Hara and Takeshi Wada

Part IX Biotechnology and Synthetic Biology of Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2337

74

Amides and Other Nonionic Backbone Modifications in RNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2339 Eriks Rozners

75

Expanding the RNA- and RNP-Based Regulatory World in Mammalian Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2361 Shunsuke Kawasaki, Moe Hirosawa, and Hirohide Saito

76

Design and Biological Application of RTK Agonist Aptamers . . . . 2397 Ryosuke Ueki and Shinsuke Sando

77

G-Quadruplex-Based Aptamers in Therapeutic Applications . . . . 2421 Domenica Musumeci and Daniela Montesarchio

78

Nucleic Acids in Green Chemistry . . . . . . . . . . . . . . . . . . . . . . . . . 2447 Akimitsu Okamoto

79

G-Quadruplexes in Human Viruses: A Promising Route to Innovative Antiviral Therapies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2465 Emanuela Ruggiero and Sara N. Richter

80

Nonchromatographic Purification of Synthetic RNA . . . . . . . . . . . 2493 Ian McClain, Hilal Dagci, and Maksim Royzen

81

Genome Editing Using CRISPR . . . . . . . . . . . . . . . . . . . . . . . . . . . 2511 Beomjong Song and Sangsu Bae

82

Biomaterials Based on DNA Conjugates and Assemblies Kazunori Matsuura and Hiroshi Inaba

83

G-Quadruplex Resolving by Specific Helicases . . . . . . . . . . . . . . . 2569 Philipp Schult, Philipp Simon, and Katrin Paeschke

. . . . . . . 2537

xiv

Contents

84

Binding and Modulation of G-Quadruplex DNA and RNA Structures by Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2587 Philipp Simon, Philipp Schult, and Katrin Paeschke

Part X

Functional Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2611

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2613

85

Targeting DNA with Triplexes Keith R. Fox

86

Metal Ion-Induced Changes in the Stability of DNA Duplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2645 Yusuke Takezawa and Mitsuhiko Shionoya

87

Liquid-Liquid Phase Separation and Nucleic Acids . . . . . . . . . . . . 2685 Hisae Tateishi-Karimata, Saki Matsumoto, and Naoki Sugimoto

88

Natural Riboswitches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2725 Bryan Banuelos Jara and Ming C. Hammond

89

External Stimulation-Responsive Artificial Nucleic Acids: Peptide Ribonucleic Acid (PRNA)-Programmed Assemblies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2747 Masahito Inagaki and Takehiko Wada

90

Targeting RNA with Small Molecules . . . . . . . . . . . . . . . . . . . . . . . 2773 Peiyuan Zhang, Jessica A. Bush, Jessica L. Childs-Disney, and Matthew D. Disney

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2807

About the Editor

Naoki Sugimoto received Ph.D. degree in 1985 from Kyoto University, Japan. After postdoctoral work at the University of Rochester, USA, he joined Konan University, Kobe, Japan, in 1988 and is a full professor since 1994. From 2003, he also holds a director of Frontier Institute for Biomolecular Engineering Research (FIBER) at Konan University. He is a member of the Editorial Board of Nucleic Acids Research from 2007 and Scientific Reports from 2015 to the present, was the first President of the Japan Society of Nucleic Acids Chemistry (JSNAC) from 2017 to 2020, and is now the Fellow of JSNAC. He received the Dr. Masao Horiba’s Award in 2004; Distinguished Scientist Award from ICA (International Copper Association), USA, in 2005; Hyogo Science Award from Hyogo prefecture, Japan, in 2006; the Chemical Society Japan (CSJ) Award for Creative Work in 2007; JSCC Contribution Award, Japan Society of Coordination Chemistry (JSCC) in 2014; The Iue Cultural Award in Science and Technology in 2015; The Imbach-Townsend Award, IS3NA in 2018; The 72nd CSJ Award (Top Award in CSJ) in 2019; and so on. His research interests focus on biophysical chemistry, biomaterials, bionano-engineering, molecular design, biofunctional chemistry, biotechnology, and therapeutic application of nucleic acids.

xv

Section Editors

Cynthia J. Burrows Department of Chemistry University of Utah Salt Lake City, UT, USA

Keith R. Fox Emeritus Professor of Biochemistry University of Southampton Southampton, UK

xvii

xviii

Section Editors

Piet Herdewijn Department of Pharmaceutical and Pharmacological Sciences, KU Leuven Laboratory of Medicinal Chemistry, Rega Institute Leuven, Belgium

Janez Plavec National Institute of Chemistry Slovenian NMR Centre Ljubljana, Slovenia

Eriks Rozners Department of Chemistry Binghamton University Binghamton, NY, USA

Katherine Seley-Radtke Department of Chemistry and Biochemistry University of Maryland, Baltimore County (UMBC) Baltimore, MD, USA

Section Editors

xix

Marie-Paule Teulade-Fichou Chemistry and Modelling for the Biology of Cancer (CMBC), CNRS UMR9187-INSERM U1196, Institut Curie Paris-Saclay University Orsay, France

Roland Winter Department of Chemistry and Chemical Biology, Physical Chemistry I – Biophysical Chemistry TU Dortmund University Dortmund, Germany

Zhen Xi State Key Laboratory of Elemento-Organic Chemistry, Department of Chemical Biology, College of Chemistry National Engineering Research Center of Pesticide (Tianjin), Nankai University Tianjin, China

xx

Section Editors

Chaoyong Yang Department of Chemical Biology Xiamen University Xiamen, China

Contributors

Hiroshi Abe Department of Chemistry, Graduate School of Science, Institute for Glyco-Core Research (iGCORE), Nagoya University, Nagoya, Japan Kirill A. Afonin Nanoscale Science Program, Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC, USA Dae-Ro Ahn Chemical and Biological Integrative Research Center, Biomedical Research Division, Korea Institute of Science and Technology (KIST), Seoul, Republic of Korea Division of Bio-Medical Science and Technology, KIST School, University of Science and Technology (UST), Seoul, Republic of Korea Yoshinobu Baba Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Nagoya, Japan Institute of Nano-Life-Systems, Institutes of Innovation for Future Society, Nagoya University, Nagoya, Japan Institute for Quantum Life Science, National Institutes for Quantum and Radiological Science and Technology, Chiba, Japan Sangsu Bae Genome Medicine Institute, Medical Research Center, Seoul National University College of Medicine, Seoul, Republic of Korea Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul, Republic of Korea Dongsheng Bai State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China Bryan Banuelos Jara Department of Chemistry, Center for Cell and Genome Science, University of Utah, Salt Lake City, UT, USA Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA

xxi

xxii

Contributors

Peter A. Beal University of California Davis, Davis, CA, USA Thomas J. Begley The RNA Institute, University at Albany, State University of New York, Albany, NY, USA Department of Biological Sciences, University at Albany, State University of New York, Albany, NY, USA Dong Bingxue Division of Chemistry and Biological Chemistry, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore Maurinne Bonnet Université Côte d’Azur, CNRS, Institute of Chemistry of Nice (ICN), Nice, France Tracy M. Bryan Children’s Medical Research Institute, Faculty of Medicine & Health, University of Sydney, Sydney, NSW, Australia Cynthia J. Burrows Department of Chemistry, University of Utah, Salt Lake City, UT, USA Jessica A. Bush Department of Chemistry, The Scripps Research Institute, Jupiter, FL, USA Anna Campalans Université Paris Cité, CEA, Stabilité Génétique Cellules Souches et Radiations, LCE/iRCM/IBFJ, F-92260, Fontenay-aux-Roses, France Université Paris-Saclay, CEA, Stabilité Génétique Cellules Souches et Radiations, LCE/iRCM/IBFJ, F-92260, Fontenay-aux-Roses, France Christine J. Cardin Department of Chemistry, University of Reading, Reading, UK Tigran V. Chalikian Department of Pharmaceutical Sciences, Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada Morgan Chandler Nanoscale Science Program, Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC, USA Junhyuck Chang School of Advanced Materials Science & Engineering, Sungkyunkwan University (SKKU), Suwon, South Korea Ta-Chau Chang Institute of Atomic and Molecular Sciences, Academia Sinica, Taipei, Taiwan John C. Chaput Department of Pharmaceutical Sciences, University of California, Irvine, CA, USA Department of Chemistry, University of California, Irvine, CA, USA Department of Molecular Biology and Biochemistry, University of California, Irvine, CA, USA Department of Chemical and Biomolecular Engineering, University of California, Irvine, CA, USA

Contributors

xxiii

Khushbu Chauhan Department of Bioscience and Biotechnology, Konkuk University, Seoul, Korea Jeffrey Cheng Division of Pharmaceutics & Pharmacology, College of Pharmacy, The Ohio State University, Columbus, OH, USA Mingpan Cheng School of Engineering, China Pharmaceutical University, Nanjing, China Jessica L. Childs-Disney Department of Chemistry, The Scripps Research Institute, Jupiter, FL, USA Eunbin Cho Department of Bioscience and Biotechnology, Konkuk University, Seoul, Korea Himanshi Choudhary Department of Medicine, Laboratory of Biochemistry, University of Udine, Udine, Italy Moujab Choukeife LIMES Institute, Center of Aptamer Research & Development, University of Bonn, Bonn, Germany Scott B. Cohen Children’s Medical Research Institute, Faculty of Medicine & Health, University of Sydney, Sydney, NSW, Australia Hilal Dagci Department of Chemistry, University at Albany, SUNY, Albany, NY, USA Masad J. Damha Department of Chemistry, McGill University, QC, Canada Leyla Danai Nanoscale Science Program, Department of Chemistry, University of North Carolina at Charlotte, Charlotte, NC, USA Bimolendu Das Department of Regulatory Bioorganic Chemistry, SANKEN (The Institute of Scientific and Industrial Research), Osaka University, Ibaraki, Osaka, Japan Peter C. Dedon Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA Antimicrobial Resistance Interdisciplinary Research Group, Singapore-MIT Alliance for Research and Technology, Singapore, Singapore Constantinos Demonacos Faculty of Biology Medicine and Health, School of Health Science, Division of Pharmacy and Optometry, University of Manchester, Manchester, UK Jonathan Dickerhoff College of Pharmacy, Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, USA Baoquan Ding CAS Key Laboratory of Nanosystem and Hierarchical Fabrication, National Center for Nanoscience and Technology, Beijing, China

xxiv

Contributors

Matthew D. Disney Department of Chemistry, The Scripps Research Institute, Jupiter, FL, USA Yizhou Dong Division of Pharmaceutics & Pharmacology, College of Pharmacy, The Ohio State University, Columbus, OH, USA Department of Biomedical Engineering, Center for Clinical and Translational Science, Comprehensive Cancer Center, Dorothy M. Davis Heart & Lung Research Institute, Department of Radiation Oncology, Center for Cancer Engineering, Center for Cancer Metabolism, Pelotonia Institute for Immune-Oncology, The Ohio State University, Columbus, OH, USA Yuanchen Dong CAS Key Laboratory of Colloid, Interface and Chemical Thermodynamics, Beijing National Laboratory for Molecular Sciences, Institute of Chemistry, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China Shi Du Division of Pharmaceutics & Pharmacology, College of Pharmacy, The Ohio State University, Columbus, OH, USA Maria Duca Université Côte d’Azur, CNRS, Institute of Chemistry of Nice (ICN), Nice, France Roberto El-Khoury Department of Chemistry, McGill University, QC, Canada Tamaki Endoh Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan Aaron M. Fleming Department of Chemistry, University of Utah, Salt Lake City, UT, USA Nuria Fonknechten Université Paris Cité, CEA, Stabilité Génétique Cellules Souches et Radiations, LCE/iRCM/IBFJ, F-92260, Fontenay-aux-Roses, France Université Paris-Saclay, CEA, Stabilité Génétique Cellules Souches et Radiations, LCE/iRCM/IBFJ, F-92260, Fontenay-aux-Roses, France Keith R. Fox School of Biological Sciences, University of Southampton, Southampton, UK Jennifer Frommer School of Chemistry, University of Birmingham, Birmingham, UK Masayuki Fujii Department of Biological & Environmental Chemistry, School of Humanity Oriented Science and Technology, Kindai University, Fukuoka, Japan Hiroto Fujita Graduate School of Integrated Basic Sciences, Nihon University, Tokyo, Japan Miguel Garavís Instituto de Química Física Rocasolano, CSIC, Madrid, Spain

Contributors

xxv

Anton Granzhan CMIB, CNRS UMR9187, INSERM U1196, Institut Curie, PSL Research University, Orsay, France CMIB, CNRS UMR9187, INSERM U1196, Institut Curie, Paris-Saclay University, Orsay, France Elisabetta Groaz KU Leuven, Rega Institute for Medical Research, Medicinal Chemistry, Leuven, Belgium Department of Pharmaceutical and Pharmacological Sciences, University of Padova, Padova, Italy Piotr Guga Department of Bioorganic Chemistry, Centre of Molecular and Macromolecular Studies, Polish Academy of Sciences, Łódź, Poland Anne-Marie Di Guilmi Université Paris Cité, CEA, Stabilité Génétique Cellules Souches et Radiations, LCE/iRCM/IBFJ, F-92260, Fontenay-aux-Roses, France Université Paris-Saclay, CEA, Stabilité Génétique Cellules Souches et Radiations, LCE/iRCM/IBFJ, F-92260, Fontenay-aux-Roses, France Ming C. Hammond Department of Chemistry, Center for Cell and Genome Science, University of Utah, Salt Lake City, UT, USA Yichen Han College of Pharmacy, Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, USA Rintaro Iwata Hara Department of Neurology and Neurological Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan Piet Herdewijn KU Leuven, Rega Institute for Medical Research, Medicinal Chemistry, Leuven, Belgium Ichiro Hirao Institute of Bioengineering and Bioimaging (IBB), A*STAR and Xenolis Pte. Ltd., Singapore, Singapore Moe Hirosawa Department of Life Science Frontiers, Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan Marcel Hollenstein Department of Structural Biology and Chemistry, Laboratory for Bioorganic Chemistry of Nucleic Acids, Institut Pasteur, Université Paris Cité, Paris, France Toshihiro Ihara Division of Materials Science and Chemistry, Faculty of Advanced Science and Technology, Kumamoto University, Kumamoto, Japan Keisuke Iida Chiba University, Chiba, Japan Hiroshi Inaba Tottori University, Tottori, Japan Masahito Inagaki Graduate School of Science, Nagoya University, Nagoya, Japan

xxvi

Contributors

Md Ariful Islam Graduate School of Pharmaceutical Sciences, Osaka University, Osaka, Japan Xiao Jia Organic Chemistry, Department of Chemistry, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany Xuelin Jin College of Agriculture, Yanbian University, Yanji, China Kyubong Jo Department of Chemistry, Sogang University, Seoul, South Korea Yousuke Katsuda Division of Materials Science and Chemistry, Faculty of Advanced Science and Technology, Kumamoto University, Kumamoto, Japan Shunsuke Kawasaki Department of Life Science Frontiers, Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan Byeang Hyean Kim Bioneer, Daejeon, Republic of Korea Dong-Eun Kim Department of Bioscience and Biotechnology, Konkuk University, Seoul, Korea Doyoun Kim Therapeutics and Biotechnology Department, Drug Discovery Platform Research Center, Korea Research Institute of Chemical Technology (KRICT), Daejeon, Republic of Korea Medicinal Chemistry and Pharmacology, Korea University of Science and Technology (UST), Daejeon, Republic of Korea Junghyun Kim Chemical and Biological Integrative Research Center, Biomedical Research Division, Korea Institute of Science and Technology (KIST), Seoul, Republic of Korea Kyeong Kyu Kim Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea Kyoung-Ran Kim Chemical and Biological Integrative Research Center, Biomedical Research Division, Korea Institute of Science and Technology (KIST), Seoul, Republic of Korea Michiko Kimoto Institute of Bioengineering and Bioimaging (IBB), A*STAR and Xenolis Pte. Ltd., Singapore, Singapore Yusuke Kitamura Division of Materials Science and Chemistry, Faculty of Advanced Science and Technology, Kumamoto University, Kumamoto, Japan Jim-Marcel Knop Physical Chemistry I – Biophysical Chemistry, Department of Chemistry and Chemical Biology, TU Dortmund University, Dortmund, Germany Marija Krstic-Demonacos School of Science, Engineering and Environment, University of Salford, Salford, UK Christianna H. M. Kutz Chemistry & Biochemistry, University of Maryland, Baltimore County, Baltimore, MD, USA

Contributors

xxvii

Masayasu Kuwahara Graduate School of Integrated Basic Sciences, Nihon University, Tokyo, Japan Joon-Hwa Lee Department of Chemistry and RINS, Gyeongsang National University, Jinju, Republic of Korea Jung Heon Lee School of Advanced Materials Science & Engineering, Sungkyunkwan University (SKKU), Suwon, South Korea Hui-Ting Lee Department of Chemistry, University of Alabama at Birmingham, Birmingham, AL, USA SueJin Lee Bioneer, Daejeon, Republic of Korea Feng Li Frontiers Science Center for Synthetic Biology, Key Laboratory of Systems Bioengineering (MOE), Institute of Biomolecular and Biomedical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin, China Shuai Li Frontiers Science Center for Synthetic Biology, Key Laboratory of Systems Bioengineering (MOE), Institute of Biomolecular and Biomedical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin, China Xintong Li Department of Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China Dongsheng Liu Key Laboratory of Bioorganic Phosphorus Chemistry and Chemical Biology, Department of Chemistry, Tsinghua University, Beijing, China Jianbing Liu CAS Key Laboratory of Nanosystem and Hierarchical Fabrication, National Center for Nanoscience and Technology, Beijing, China Alexander Lushnikov Department of Pharmaceutical Sciences, University of Nebraska Medical Center, Omaha, NE, USA Dejun Ma State Key Laboratory of Elemento-Organic Chemistry and Department of Chemical Biology, National Engineering Research Center of Pesticide (Tianjin), College of Chemistry, Nankai University, Tianjin, China Yue Ma Tokyo Medical and Dental University, Tokyo, Japan Robert B. Macgregor Jr. Department of Pharmaceutical Sciences, Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada Luis A. Marky Department of Pharmaceutical Sciences, University of Nebraska Medical Center, Omaha, NE, USA Saki Matsumoto Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan Kazunori Matsuura Tottori University, Tottori, Japan Günter Mayer LIMES Institute, Center of Aptamer Research & Development, University of Bonn, Bonn, Germany

xxviii

Contributors

Ian McClain Department of Chemistry, University at Albany, SUNY, Albany, NY, USA Kane T. McQuaid Department of Chemistry, University of Reading, Reading, UK Chris Meier Organic Chemistry, Department of Chemistry, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany Jean-Louis Mergny State Key Laboratory of Analytical Chemistry for Life Science, School of Chemistry & Chemical Engineering, Nanjing University, Nanjing, China Laboratoire d’Optique et Biosciences, Ecole Polytechnique, CNRS, Inserm, Institut Polytechnique de Paris, Palaiseau cedex, France Noriaki Minakawa Graduate School of Pharmaceutical Science, Tokushima University, Tokushima, Japan Shubham Mishra Department of Chemistry, Graduate School of Science, Kyoto University, Kyoto, Japan Jing Mo College of Chemistry and Molecular Sciences, Wuhan University, Wuhan, Hubei, China Atsuya Momotake Department of Chemistry, University of Tsukuba, Tsukuba, Japan David Monchaud Institut de Chimie Moléculaire de l’Université de Bourgogne, ICMUB CNRS, Dijon, France Daniela Montesarchio Department of Chemical Sciences, University of Napoli Federico II, Naples, Italy Takashi Morii Institute of Advanced Energy, Kyoto University, Kyoto, Japan Sanjib K. Mukherjee Physical Chemistry I – Biophysical Chemistry, Department of Chemistry and Chemical Biology, TU Dortmund University, Dortmund, Germany Sabine Müller Institute for Biochemistry, University of Greifswald, Greifswald, Germany Domenica Musumeci Department of Chemical Sciences, University of Napoli Federico II, Naples, Italy Kazuo Nagasawa Tokyo University of Agriculture and Technology, Tokyo, Japan Fumi Nagatsugi Institute of Multidisciplinary Research for Advanced Materials, Tohoku University, Sendai, Miyagi, Japan Kazuhiko Nakatani Department of Regulatory Bioorganic Chemistry, SANKEN (The Institute of Scientific and Industrial Research), Osaka University, Ibaraki, Osaka, Japan

Contributors

xxix

B. Nawrot Department of Bioorganic Chemistry, Centre of Molecular and Macromolecular Studies, Polish Academy of Sciences, Lodz, Poland Qi Niu The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China Satoshi Obika Graduate School of Pharmaceutical Sciences, Osaka University, Osaka, Japan Akimitsu Okamoto Research Center for Advanced Science and Technology (RCAST), The University of Tokyo, Meguro-ku, Tokyo, Japan Department of Chemistry and Biotechnology, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo, Japan David-M. Otte LIMES Institute, Center of Aptamer Research & Development, University of Bonn, Bonn, Germany Katrin Paeschke Department of Oncology, Hematology and Rheumatology, University Hospital Bonn, Bonn, Germany Manlio Palumbo Department of Pharmaceutical and Pharmacological Sciences, University of Padova, Padova, Italy Ganesh N. Pandian Institute for Integrated Cell-Material Sciences (WPI-iCeMS), Kyoto University, Kyoto, Japan Soyoung Park Immunology Frontier Research Center, Osaka University, Osaka, Japan Tejal Patwari LIMES Institute, Center of Aptamer Research & Development, University of Bonn, Bonn, Germany Ananya Paul Department of Chemistry and Center for Diagnostics and Therapeutics, Georgia State University, Atlanta, GA, USA Róża Pawłowska Department of Bioorganic Chemistry, Centre of Molecular and Macromolecular Studies, Polish Academy of Sciences, Łódź, Poland Jinying Peng State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China Kevin M. Pham University of California Davis, Davis, CA, USA Janez Plavec Slovenian NMR Centre, National Institute of Chemistry, Ljubljana, Slovenia Faculty of Chemistry and Chemical Technology, University of Ljubljana, Ljubljana, Slovenia EN-FIST Center of Excellence, Ljubljana, Slovenia

xxx

Contributors

Xiaogang Qu Laboratory of Chemical Biology and State Key Laboratory of Rare Earth Resource Utilization, Changchun Institute of Applied Chemistry, Chinese Academy of Science, Changchun, Jilin, P. R. China University of Science and Technology of China, Hefei, Anhui, P. R. China Arivazhagan Rajendran Institute of Advanced Energy, Kyoto University, Kyoto, Japan Sara N. Richter Department of Molecular Medicine, University of Padua, Padua, Italy James Roll Department of Computer Science, University of Findlay, Findlay, OH, USA Floyd E. Romesberg Synthorx, a Sanofi Company, La Jolla, CA, USA Maksim Royzen Department of Chemistry, University at Albany, SUNY, Albany, NY, USA Eriks Rozners Department of Chemistry, Binghamton University, Binghamton, NY, USA Emanuela Ruggiero Department of Molecular Medicine, University of Padua, Padua, Italy Hirohide Saito Department of Life Science Frontiers, Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan Noriko Saito-Tarashima Graduate School of Pharmaceutical Science, Tokushima University, Tokushima, Japan Shinsuke Sando The University of Tokyo, Tokyo, Japan Kazuki Sato Department of Medicinal and Life Sciences, Faculty of Pharmaceutical Sciences, Tokyo University of Science, Noda, Chiba, Japan Philipp Schult Department of Oncology, Hematology and Rheumatology, University Hospital Bonn, Bonn, Germany Katherine L. Seley-Radtke Chemistry & Biochemistry, University of Maryland, Baltimore County, Baltimore, MD, USA Tomonori Shibata Department of Regulatory Bioorganic Chemistry, SANKEN (The Institute of Scientific and Industrial Research), Osaka University, Ibaraki, Osaka, Japan Mitsuhiko Shionoya Department of Chemistry, Graduate School of Science, The University of Tokyo, Tokyo, Japan M. Sierant Department of Bioorganic Chemistry, Centre of Molecular and Macromolecular Studies, Polish Academy of Sciences, Lodz, Poland

Contributors

xxxi

Philipp Simon Department of Oncology, Hematology and Rheumatology, University Hospital Bonn, Bonn, Germany Claudia Sissi Department of Pharmaceutical and Pharmacological Sciences, University of Padova, Padova, Italy Beomjong Song Genome Medicine Institute, Medical Research Center, Seoul National University College of Medicine, Seoul, Republic of Korea Kfir B. Steinbuch Department of Chemistry & Biochemistry, University of California, San Diego, CA, USA Dmitry A. Stetsenko Russo-Franco-Japanese Laboratory of Bionanotechnology, Department of Physics, Novosibirsk State University, Novosibirsk, Russia Laboratory of Nucleic Acid Chemistry, Institute of Cytology and Genetics, Russian Academy of Sciences, Siberian Branch, Novosibirsk, Russia Vinod Kumar Subramani Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea Naoki Sugimoto Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan Graduate School of Frontiers of Innovative Research in Science and Technology (FIRST), Konan University, Kobe, Japan Hiroshi Sugiyama Institute for Integrated Cell-Material Sciences (WPI-iCeMS), Kyoto University, Kyoto, Japan Department of Chemistry, Graduate School of Science, Kyoto University, Kyoto, Japan Priyannth Ramasami Sundharbaabu School of Advanced Materials Science & Engineering, Sungkyunkwan University (SKKU), Suwon, South Korea P. Szczupak Department of Bioorganic Chemistry, Centre of Molecular and Macromolecular Studies, Polish Academy of Sciences, Lodz, Poland Hiromi Takahashi Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Nagoya, Japan Shuntaro Takahashi Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan Shigeori Takenaka Department of Applied Chemistry, Kyushu Institute of Technology, Kitakyushu, Japan Yusuke Takezawa Department of Chemistry, Graduate School of Science, The University of Tokyo, Tokyo, Japan Hisae Tateishi-Karimata Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan

xxxii

Contributors

Marie-Paule Teulade-Fichou CMIB, CNRS UMR9187, INSERM U1196, Institut Curie, PSL Research University, Orsay, France CMIB, CNRS UMR9187, INSERM U1196, Institut Curie, Paris-Saclay University, Orsay, France Joy E. Thames Chemistry & Biochemistry, University of Maryland, Baltimore County, Baltimore, MD, USA Yitzhak Tor Department of Chemistry & Biochemistry, University of California, San Diego, CA, USA Ting-Yuan Tseng Institute of Atomic and Molecular Sciences, Academia Sinica, Taipei, Taiwan Ryosuke Ueki The University of Tokyo, Tokyo, Japan Daniela Verga CMIB, CNRS UMR9187, INSERM U1196, Institut Curie, PSL Research University, Orsay, France CMIB, CNRS UMR9187, INSERM U1196, Institut Curie, Paris-Saclay University, Orsay, France Takehiko Wada Institute of Multidisciplinary Research for Advanced Materials (IMRAM), Tohoku University, Sendai, Japan Takeshi Wada Department of Medicinal and Life Sciences, Faculty of Pharmaceutical Sciences, Tokyo University of Science, Noda, Chiba, Japan Zoë A. E. Waller Drug Discovery, UCL School of Pharmacy, London, WC1N 1AX, UK Yajun Wang Institute of Basic Medicine and Cancer (IBMC), Chinese Academy of Sciences, The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Hangzhou, China College of Pharmaceutical Sciences, Soochow University, Suzhou, China Yueyao Wang State Key Laboratory of Coordination Chemistry, Department of Biomedical Engineering, College of Engineering and Applied Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Jiangsu Key Laboratory of Artificial Functional Materials, Nanjing University, Nanjing, China Dongying Wei State Key Laboratory of Coordination Chemistry, Department of Biomedical Engineering, College of Engineering and Applied Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Jiangsu Key Laboratory of Artificial Functional Materials, Nanjing University, Nanjing, China Lang Wenchao Division of Chemistry and Biological Chemistry, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore

Contributors

xxxiii

Xiaocheng Weng College of Chemistry and Molecular Sciences, Wuhan University, Wuhan, Hubei, China W. David Wilson Department of Chemistry and Center for Diagnostics and Therapeutics, Georgia State University, Atlanta, GA, USA Roland Winter Physical Chemistry I – Biophysical Chemistry, Department of Chemistry and Chemical Biology, TU Dortmund University, Dortmund, Germany Junzhou Wu Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA Antimicrobial Resistance Interdisciplinary Research Group, Singapore-MIT Alliance for Research and Technology, Singapore, Singapore Lingling Wu Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China Zhen Xi State Key Laboratory of Elemento-Organic Chemistry and Department of Chemical Biology, National Engineering Research Center of Pesticide (Tianjin), College of Chemistry, Nankai University, Tianjin, China Bengang Xing Division of Chemistry and Biological Chemistry, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore Luigi E. Xodo Department of Medicine, Laboratory of Biochemistry, University of Udine, Udine, Italy Can Xu Laboratory of Chemical Biology and State Key Laboratory of Rare Earth Resource Utilization, Changchun Institute of Applied Chemistry, Chinese Academy of Science, Changchun, Jilin, P. R. China Yan Xu Division of Chemistry, Department of Medical Sciences, Faculty of Medicine, University of Miyazaki, Miyazaki, Japan Yasuhiko Yamamoto Department of Chemistry, University of Tsukuba, Tsukuba, Japan Chaoyong Yang Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China

xxxiv

Contributors

Danzhou Yang College of Pharmacy, Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, USA Purdue Center for Cancer Research, West Lafayette, IN, USA Department of Chemistry, Purdue University, West Lafayette, IN, USA Dayong Yang Frontiers Science Center for Synthetic Biology, Key Laboratory of Systems Bioengineering (MOE), Institute of Biomolecular and Biomedical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin, China Takao Yasui Department of Biomolecular Engineering, Graduate School of Engineering, Nagoya University, Nagoya, Japan Japan Science and Technology Agency (JST), PRESTO, Saitama, Japan Institute of Nano-Life-Systems, Institutes of Innovation for Future Society, Nagoya University, Nagoya, Japan Chengqi Yi State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China Synthetic and Functional Biomolecules Center, College of Chemistry and Molecular Engineering, Peking University, Beijing, China Hanyang Yu State Key Laboratory of Coordination Chemistry, Department of Biomedical Engineering, College of Engineering and Applied Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Jiangsu Key Laboratory of Artificial Functional Materials, Nanjing University, Nanjing, China Joanna Zell Institut de Chimie Moléculaire de l’Université de Bourgogne, ICMUB CNRS, Dijon, France Peiyuan Zhang Department of Chemistry, The Scripps Research Institute, Jupiter, FL, USA Shiwei Zhang Institute of Advanced Energy, Kyoto University, Kyoto, Japan Chenglong Zhao Organic Chemistry, Department of Chemistry, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany R&D Center for Nucleic Acid Drug, CSPC Pharmaceutical Group Limited, Shanghai, China Jun Zhou State Key Laboratory of Analytical Chemistry for Life Science, School of Chemistry & Chemical Engineering, Nanjing University, Nanjing, China Xiang Zhou College of Chemistry and Molecular Sciences, Wuhan University, Wuhan, Hubei, China Craig L. Zirbel Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH, USA

Part I Physical Chemistry of Nucleic Acids

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids and Their Interactions Sanjib K. Mukherjee, Jim-Marcel Knop, and Roland Winter

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Why High-Pressure Studies on Biomolecular Systems? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pressure Effects on Nucleic Acid Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effect of High Hydrostatic Pressure on the Conformational Dynamics of DNA Hairpins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effect of High Pressure on the Conformational Dynamics of G-Quadruplexes . . . . . . . . . . . . . Effect of High Pressure on the Conformational Dynamics of I-Motifs . . . . . . . . . . . . . . . . . . . . . . Pressure Effects on the Interaction of Proteins with Non-canonical DNA . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 5 9 17 17 18 24 26 28 32 32

Abstract

High hydrostatic pressure affects the structure, dynamics, and the stability of biomolecular systems. Therefore, in order to describe the entire energy and conformational landscape and the set of parameters required for a comprehensive understanding of the general phase behavior of biomolecular systems, one needs to scan the full thermodynamic parameter space, including high pressure. In addition, high hydrostatic pressures are encountered in organisms living in the deep sea and in subseafloor ecosystems, which constitute a significant portion of the Earth’s biosphere and where pressures up to the 1000 bar level or more prevail. High pressure is also a key parameter in the context of exploring the origin and the physical limits of life on Earth or on other planets and moons. In this review, we lay out the conceptual framework for exploring conformational fluctuations, dynamical properties, and the activity of biomolecular systems using S. K. Mukherjee · J.-M. Knop · R. Winter (*) Physical Chemistry I – Biophysical Chemistry, Department of Chemistry and Chemical Biology, TU Dortmund University, Dortmund, Germany e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_1

3

4

S. K. Mukherjee et al.

pressure perturbation, focusing in particular on non-canonical nucleic acid systems, such as DNA hairpins, G-quadruplexes and i-motifs, and their interactions. Moreover, the effects of cosolutes (salts, osmolytes), macromolecular crowding, and intrinsically disordered peptides on the conformational dynamics of non-canonical nucleic acid structures at ambient and high-pressure conditions will be discussed.

Introduction The genetic blueprints of all cells are the deoxyribonucleic acid (DNA) and ribosomal ribonucleic acid (rRNA), and the translation of genetic information into proteins is carried out by RNA’s, such as messenger and transfer RNAs (Sinden et al. 1998; Sugimoto 2021). Unlike DNA, which mostly exists in a well-defined structure, the iconic double helix (B-helix) discovered by WATSON and CRICK, RNAs take on many more variable conformations (Sugimoto 2021). A variety of secondary structures arises from the difference in the 20 position of the ribose and the different conditions of cellular compartments (Sinden et al. 1998; Wang and Vasquez 2014). In addition to the structure of the B-helix, there exist several types of non-canonical DNA structures, such as triplexes, tetraplexes, and i-motifs as well as cruciforms (four-way junctions) that include a hairpin and a three-way junction, formed by either WATSON–CRICK type hydrogen bonding, HOOGSTEEN- or reverse HOOGSTEENtype hydrogen bonds (Sugimoto 2021). Strikingly, most sequences that can adopt non-canonical DNA structures are associated with diseases (Sugimoto 2021; Wells 2009). A variety of biological processes, such as replication, transcription, translation, and reverse transcription, appear to involve these non-canonical DNA structures (Sugimoto 2021). Thus, their regulatory elements and the non-canonical DNA structures themselves appear to be promising therapeutic targets (Sugimoto 2021; Neidle 2010). Moreover, in many cellular processes, the transition from folded to unfolded states of such structures occurs in a manner that can give rise to structural polymorphism of higher order (Ha 2004). The emergence of one or more intermediate states with partially folded secondary structures could be a prerequisite for a biological process to proceed uninterruptedly. Moreover, when proteins are bound to the secondary nucleic acid structures, their explicit dynamics may play a crucial role in regulating gene expression (Myong et al. 2007). Different conformational states of biomolecules such as nucleic acids are often masked in an ensemble experiment upon averaging. By using single-molecule (sm) methods, the conformations, dynamics, and interactions of nucleic acid structures can be discriminated and determined not only qualitatively but also quantitatively at the nanometer scale (Deniz et al. 2008). In recent years, singlemolecule-based fluorescence techniques have proven to be a powerful tool for measuring the heterogeneous distribution of molecular properties, while maintaining the concept of measuring one molecule at a time. Fluorescence-based detection methods have always been in the spotlight due to their high sensitivity and high

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

5

spatial and temporal resolution. FÖRSTER resonance energy transfer (FRET) is considered the most general and adoptable technique for obtaining structural and dynamic information about biomolecular systems compared to other fluorescencebased detection methods (Lerner et al. 2021). With the help of sm-FRET, it is possible to obtain detailed information about the conformational and energy landscape of biomacromolecules as well as their structural intermediates (Lerner et al. 2021). Furthermore, through sm-FRET experiments it is possible to extract information about the structural dynamics of biomolecular complexes undergoing multiple conformational alternations with unprecedented resolution and sensitivity (Lerner et al. 2021; McKinney et al. 2006; Lu 2005). For this method, the biomolecules must be labeled with distinct fluorophores acting as donors (D) and acceptors (A) that have substantial overlap in their spectral signatures. In the 1–10 nm range, the extent of non-radiative energy transfer between the donor and acceptor is determined by the distance between them, rendering sm-FRET an effective one-dimensional spectroscopic ruler (Lu 2005). The conformational dynamics of the system, either under diffusing or immobilized conditions, can be studied at a molecular level because the distance sensitivity of this approach is close to the length scale of conformational changes of biomolecular processes. Orthogonal methods, like X-ray crystallography, NMR, Fourier-transform infrared (FTIR), or circular-dichroism (CD) spectroscopy, are still necessary to assign distance changes to specific conformational changes, however. The extremely low concentrations used in sm-FRET-microscopy (pM range) are an advantage in terms of probe and cost efficiency, but also limit the comparability with the named orthogonal methods which need micromolar (CD) or even millimolar (NMR) concentrations. In recent years, the sm-FRET technology has not only been applied in a temperature dependent fashion, but more demanding experiments under high-pressure conditions have been further developed and applied to probe the conformational dynamics of nucleic acid systems. This review focuses on the high-pressure sm-FRET technology and its applications.

Why High-Pressure Studies on Biomolecular Systems? In order to describe the entire energy and conformational landscape and the set of parameters required to provide an understanding of the general phase behavior of biomolecular systems, one needs to examine the full parameter space. Next to the chemical potential (activity) of the constituents and the temperature, pressure is the third important thermodynamic parameter. Pressure-dependent studies help, for example, to identify the often low fractional population of conformers (conformational substates) under unperturbed ambient conditions (Akasaka and Matsuki 2015; Winter 2019; Silva et al. 2014; Harish et al. 2022). Because conformational substates generally differ only by small energy differences, separation by purely energetic means (e.g., by a temperature change) is often difficult to achieve. In such circumstances, pressure modulation can provide an efficient means to redistribute the population via volume differences. By favoring the structure with smaller partial

6

S. K. Mukherjee et al.

molar volume, pressure shifts the equilibrium toward a state with smaller overall volume, in accordance with Le Châtelier’s principle (Winter 2019). Pressures relevant to studies of biochemical systems generally range from 1 bar to several kbar (for the pressure unit, either bar or MPa is used; 1 bar ¼ 0.1 MPa ¼ 0.9869 atm, 10 kbar ¼ 1 GPa). Such pressures essentially alter intermolecular distances, affect conformations and supramolecular structures of biomolecular systems, but generally do not change covalent bonds (exceptions can be -S-S- bonds). The quantitative description of the effect of pressure on any chemical equilibrium and reaction rate is given by the following equations (Winter 2019): @ ln K ðpÞ @p

T

¼

ΔV ; RT

@ ln kðpÞ @p

T

¼

ΔV ‡ RT

ð1Þ

where K( p) is the pressure-dependent equilibrium constant, k( p) is the rate constant of the reaction (or process), and ΔV and ΔV‡ are the associated reaction and activation volumes under standard-state conditions (e.g., infinite dilution), respectively (Fig. 1); R is the universal gas constant and T the Kelvin temperature. Volume changes result not only from changes in the biomolecule itself (including its internal cavities) but comprise also the volume change of the solvent shell surrounding the biomolecule (including interfacial solvent-voids). Moreover, formation of quaternary/higher order contacts of biomolecules can create (mostly hydrophobic) voids that exclude water molecules, which would also contribute to the folded state occupying an effectively larger volume compared to the overall van der Waals volume of the constituents. The volumetric parameters reflect and, thus, can be used to characterize intra- and intermolecular interactions stabilizing biomolecular

Fig. 1 Volume profile of a reversible bimolecular reaction A þ B Ð AB and definition of the relevant thermodynamic and kinetic volumetric parameters (f forward, r reverse reaction; the difference between the partial molar volume of the transition state [AB]‡, V‡, and that of the reactants is the activation volume, ΔV‡ ¼ V‡VAVB; ΔV ¼ VABVAVB is the volume change of the reaction)

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

7

interactions and driving their reactions. Hydration (or generally solvation) is an important component of these interactions that often contributes largely to both the energetics of the process and the volume change observed. The activation volume, ΔV‡, represents, stricto sensu, the difference between the partial molar volumes of the transition state and the initial state, and its magnitude informs on the rate change that can be expected when running the reaction at a given pressure. Any reaction involving a negative ΔV‡, i.e., if the transition state has a smaller volume than the reactants’ partial volumes, will be accelerated under pressure, and vice versa. It can be determined by measuring the effect of pressure on the rate constant, k( p), which characterizes the chemical process (Eq. 1). According to Eyring’s transition state theory, the rate constant (ignoring the transmission coefficient across the transition state barrier (‡) and assuming dilute solutions) is given by. k ð pÞ ¼

‡ ðp1 barÞ ‡ kB T ΔG k T ΔG‡ ð1 barÞþΔV RT e RT ¼ B e h h

ð2Þ

with kB the Boltzmann constant and h Plank’s constant. The prefactor kBT/h can be viewed as an attempt frequency, i.e. the number of times per second that the system vibrates along the reaction coordinate. ΔG‡ ¼ ΔH‡  T ΔS‡ is the Gibbs energy of activation of the reaction. Again, one should have in mind that the thermodynamic response functions are determined by the changes of the entire system, which includes structural effects from both the biomolecule and the surrounding solvent. Hence, the activation volume, ΔV‡, includes contributions from steric and solvational factors next to changes due to bond breaking and bond making (Harish et al. 2022). The magnitude and the sign of the activation volume can therefore provide information about the mechanism of the reaction. For example, ΔV‡ is in the order of +10 mL mol1 for homolytic bond cleavage and 20 mL mol1 for an ionization reaction (e.g., an electrostrictive effect in polar solvents due to a higher density of hydration water around exposed charged residues). The effects of the medium can be diverse since they may result from electrostatic, dispersion, and repulsive forces as well as specific interactions such as hydrogen bonding. The most significant contribution originates from changes in polarity, such as by electrostriction. For example, the contraction of a dielectric medium (solvent) surrounding an ion, ΔV‡electr, is given in rough approximation by the Drude-Nernst equation: ΔV‡electr ¼ (NAq2/(2rε))(dln(ε)/dp), with NA Avogadro’s number, q ¼ ze the ionic charge, r the ionic radius, and ε the dielectric permittivity of the medium (e.g., dln(ε)/ dp ≈ +4103 bar1 for water at 20  C). It follows that ΔV‡electr varies greatly for reactions in which charges are generated or neutralized. The volume change of the reaction, ΔV, i.e., the difference in the partial molar volumes of the products and reactants, can be determined by measuring the pressure dependence of the equilibrium constant, K( p) (Eq. 1) and provides information about changes in steric properties and hydration of the biomolecule in the cause of the process (Winter 2019; Harish et al. 2022). For example, ΔV ¼ 21 mL mol1 for

8

S. K. Mukherjee et al.

the dissociation of water (H2O ! H+ + OH), the negative volume change being due to the electrostrictive effect. For an equilibrium between two different states (A, B) of a system with mole fractions xA and xB, the pressure-dependent equilibrium constant K( p) relative to that at atmospheric pressure, K(1 bar), is given to second order on pressure, p (in bar), by (Winter 2019): ln

K ð pÞ ΔV Δκ ðp  1 barÞ þ T ðp  1 barÞ2 ¼ 2RT RT K ð1 barÞ

ð3Þ

where K( p) ¼ xB( p)/xA( p), xi are the mole fractions of the two states (i ¼ A, B), and ΔV ¼ VBVA and ΔκT are the differences in partial molar volume and partial molar isothermal compressibility (κ T, i ¼  (@Vi/@p)T), respectively, of the two states at ambient conditions. (In practice, the partial molar volumes are often replaced by their apparent values, and concentrations or mole fractions are used instead of thermodynamic activities for dilute solutions). For example, assuming no differences in compressibility between two conformers being in equilibrium with each other, a volume difference of ΔV ¼ 20 mL mol1 leads to a population shift of about a factor of five upon applying a pressure ramp of 2 kbar at 298 K. The logarithmic response means that one obtains yield increments that are significant by going to quite high pressures. The compressibility term is often negligible below 2 kbar, resulting in an approximately linear dependence of the standard Gibbs (free) energy of the reaction on p with a slope of ΔV, i.e., ΔGo( p) ¼ RT ln(K( p)) ¼ ΔGo(1 bar) þ ΔV ( p1 bar). A significant portion of the global biosphere is found at depths exceeding 1000 meters and is therefore subjected to considerable hydrostatic pressure of several hundred bar. Hence, besides the general physical-chemical interest in using pressure, high hydrostatic pressures (HHP) are also of biological interest, e.g., for understanding the physiology of organisms that live in high-pressure habitats and thrive at pressures of up to about 1000 bar or more in the deep sea (Akasaka and Matsuki 2015; Daniel et al. 2006). HHP biophysics studies have been used for many years now to study the thermodynamics, kinetics, and dynamics of proteins and membranes, but pressure studies on the conformational dynamics of nucleic acids are rare and have only recently been performed using single-molecule techniques (Macgregor 2002; Son et al. 2014; Fan et al. 2011; Patra et al. 2018, 2019; Takahashi and Sugimoto 2013a; Fiore et al. 2009; Sung and Nesbitt 2020a). At ambient temperature, pressurization of long B-DNA imposes a small decrease of the WATSON-CRICK H-bond distance only and stabilizes the double-stranded DNA structure, whereas non-canonical DNA structures, such as G-quadruplex DNA, have been shown to be quite sensitive to pressure (Fan et al. 2011; Patra et al. 2019). Therefore, pressure modulation of non-canonical nucleic acid structures, such as G-quadruplexes (G4Q), i-motifs, and hairpins, can help uncover the multitude of conformational states that such molecules can adopt. Conformational transitions often exhibit a distinct volumetric profile that depends sensitively on the packing arrangement of the structural units in their respective states and are generally sensitive to hydration changes. Here, we discuss the intricate details of different

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

9

secondary structures of DNA and RNA, like hairpins, G-quadruplexes, and i-motifs, at high-pressure conditions by using sm-FRET as the main tool to reveal their population distribution and dynamics of their conformational substates along with various conformational attributes. In addition to the intrinsic conformational landscape of these systems, the conformational diversity of such systems is increased by intermolecular interactions with proteins as well as cosolutes present in the cellular milieu. Their effects on non-canonical nucleic acid structures at ambient and high-pressure conditions need to be also discoursed. In fact, the biological cell is an extremely crowded environment, which is known to have a profound impact on the stability, conformational dynamics, and functionality of its biomolecules, which includes osmolytes, molecular chaperones, and macromolecular crowders. As a result, traditional in vitro experiments with dilute buffers may not correctly represent the folding landscape and dynamics of the biomolecules in vivo. For example, the presence of crowders reduces the space available for a given biomolecule. As a result, the biomolecule seeks to adopt a more compact conformation to maximize the number of accessible microstates of the entire biomolecule-solvent system, i.e., the crowders impose an entropically driven excluded volume effect which promotes folding of biomolecules and compaction of biomolecular assemblies. This ubiquitous stabilization mechanism is commonly referred to as macromolecular crowding. An example of the variety of cosolvents also present in the cell is the osmolyte trimethylamine-N-oxide (TMAO). TMAO is a cosolvent commonly found in marine organisms, with increasing accumulation in deep sea fish and antipods with increasing ocean depth. TMAO has been shown to act as a chemical chaperone to stabilize proteins and nucleic acid structures, and to preserve enzyme function at high pressures.

Materials and Methods Fo¨rster Resonance Energy Transfer (Lakowicz 2006) The Förster resonance energy transfer (FRET) is a rayless energy transfer from an excited donor molecule (D) to an acceptor molecule in the electronic ground state (A). The transition dipole moment of a molecule excited by absorption of a photon can affect nearby molecules. If the excited state is energetically in the range of the excited state of a second molecule and the spatial distance is small enough (i.e., in the nm range), the energy can be transferred. This leads to three major dependencies of the FRET: the distance, R, between acceptor and donor, the orientation dipole moments of the fluorophores as described by the orientation factor, κ, and the spectral overlap, J. The rate constant of the FRET is proportional to those dependencies as follows: kFRET ¼ const:∙

κ 2 ∙J∙ΦD n4s ∙R6 ∙τD

ð4Þ

10

S. K. Mukherjee et al.

ΦD and τD are the quantum yield and the fluorescence lifetime of the donor fluorophore in the absence of an acceptor, respectively, and ns is the refractive index of the solvent. The orientation factor, κ, corresponds to the angular part of the scalar product of the two dipole moments: κ ¼ cos ðφDA Þ  3 cos ðφDR Þ∙ cos ðφAR Þ

ð5Þ

The angles φ denote dipole orientations between donor and acceptor, φDA, between donor and the distance vector, φDR, and between acceptor and distance vector, φAR. For freely diffusing particles, the squared orientation factor is κ 2 ¼ 2/3. The spectral overlap, J, can be determined from the normalized fluorescence intensity of the donor, fD(λ), and the extinction coefficient of the acceptor, εA(λ), both being dependent on the wavelength, λ: J ¼ f D ðλÞ∙eA ðλÞ∙λ4 dλ

ð6Þ

As an example, Fig. 2 shows the normalized excitation and emission intensity of the commercial donor and acceptor fluorophores Atto 550 and Atto 647 N as well as their Förster radius, R0, in water (ATTO-TEC n.d.). The transferred energy, denoted FRET efficiency, E, is defined as the number of acceptor fluorescence photons, FA,FRET, divided by the sum of acceptor fluorescence photons, FA,FRET, and donor fluorescence photons, FD: E¼

FA,FRET FA,FRET þ FD

ð7Þ

The FRET efficiency can also be calculated from the lifetimes, τ, of the fluorophores:

Fig. 2 Left: Normalized excitation and emission spectra of Atto 550 (lime green and dark green) and Atto 647 N (red and dark red). The red and green shaded area represents the spectral overlap, J. Right: FRET efficiency, E, vs. distance, R, with Förster radius, R0, shown as blue dot. (Plotted from Atto Tec data (n.d.))

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

E¼1

τDA τD

11

ð8Þ

with τDA and τD being the lifetime of the donor fluorophore in the presence and absence of acceptor fluorophore, respectively. The FRET efficiency depends on the -6th power of the distance between donor and acceptor, RDA: E¼ 1þ

R6DA R60

1

ð9Þ

R0 denotes the Förster radius, which is defined as the distance where the FRET efficiency is 50%. The Förster radius of most FRET pairs is in the range of 4–8 nm. Accordingly, the distance range of interactions that can be tested with FRET ranges typically from about 3 to 10 nm.

Confocal Microscopy Setup (Patra et al. 2018, 2019) First developed to enhance the contrast of microscopy pictures, confocal microscopy became a powerful technique together with correlative analysis. In confocal microscopy, only a tiny spot of the probe is illuminated. A pinhole blocks all light not arising from the small confocal volume chosen. In case of imaging methods, an immobilized probe is scanned spot by spot, while in correlative techniques, like fluorescence correlation spectroscopy (FCS) or single-molecule FRET (sm-FRET) microscopy, the fluorescence intensity fluctuations of a fixed spot are observed over time. Due to the small confocal volume of about 1 fL and concentrations below the nanomolar range, correlative analysis allows the calculation of physicochemical properties like the diffusion coefficient, hydrodynamic radius, and the concentration of the fluorescent molecule. Carefully chosen labeling with a FRET pair facilitates the measurement of intramolecular distances, enabling the differentiation of conformational states of the biomolecule. A common type of microscopy setup for those tasks including measurements at high-pressure conditions is depicted in Fig. 3. Here, the donor is excited with a ps pulsed diode laser at 560 nm, whereas the acceptor molecules are excited at 635 nm. Narrow band filters ensure that no parasitic light reaches the sample. Realization of PIE excitation (see below) is achieved with a pulse train of alternative colors. A dualband dichroic mirror reflecting 560 nm and 635 nm guides the light to a high numerical aperture apochromatic objective (UplansApo 60, NA 1.2), which finally focuses the light on a confocal volume of about 1 fL. The fluorescence from the excited molecules is collected with the same objective and focused on a 50 μm diameter pinhole to enable confocal detection. Donor and acceptor emission is separated by means of a dichroic longpass filter. Suitable bandpass filters are inserted to eliminate the respective excitation wavelength and to minimize spectral crosstalk. Fluorescence light is detected with two single-photon avalanche diodes (SPAD) using Time-Correlated Single Photon Counting (TCSPC). The data are stored in the Time-Tagged Time-Resolved Mode, allowing to record every detected photon

12

S. K. Mukherjee et al.

Fig. 3 Confocal microscopy setup for single-molecule FRET microscopy employing the PIE technique. Donor excitation laser light (560 nm) and acceptor excitation light (635 nm) are coupled into an optical fiber. The excitation light is guided to a dichroic mirror and reflected a high aperture objective (water immersion 60, NA 1.2). The objective focuses the light on the probe and collects the fluorescence light from the focal spot. After passing the dichroic mirror, excitation wavelengths are blocked by a filter and out of focus light is blocked by a 50 μm pinhole. The emission light is split with a longpass dichroic mirror into donor and acceptor emission light. Narrow bandpass filters minimize the crosstalk of the channels before detection by single-photon avalanche detectors (SPAD). The fused silica capillary (outer diameter 360 μm, inner diameter 50 μm) and highprecision moving stage enable measurements under high-pressure conditions. High pressure is generated by a manually operated piston screw pump. For ambient pressure conditions, high precision thickness coverslips are used

with its individual timing and detection channel information, which is the basis for the subsequent analysis.

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

13

Time-Correlated Single Photon Counting (Wahl et al. 2013) In diffusion-based sm-FRET-measurements, the confocal volume is excited with laser pulses. At the same time, a clock is started and stopped by fluorescence photon detection via a SPAD. In Time-Correlated Single Photon Counting (TCSPC), a common technique to measure fluorescence decays in the time domain, the detection of these single photons is saved together with time information, i.e., the overall time of detection since the start of the experiment and the time between the laser pulse and the time of photon arrival at the SPAD, and saved together with optional channel information in case of multiple detectors. This allows the calculation of FRET efficiency histograms from the photons counted as well as the fluorescence lifetime, and conducting the subsequent (auto- or cross-) correlation analysis. Pulsed Interleaved Excitation (PIE) (Ru¨ttinger et al. 2006) When sm-FRET microscopy is used to analyze the conformation of single molecules, the uncertainty of correct labeling of all molecules may become a problem, leading to artifacts of low FRET efficiency peaks from donor-only labeled species. To exclude those artifacts from the analysis, the Pulsed Interleaved Excitation (PIE) technique can be used (Fig. 4). After every donor excitation pulse and fluorescence detection, a second acceptor excitation pulse ensures the presence of the acceptor fluorophore. FPIE denotes the number of photons detected after direct excitation of the acceptor. For a double-labeled molecule, the photon stoichiometry, SEFF, defined as the ratio of photons emitted after excitation of the donor (donor and acceptor emission channels) and the sum of all emitted photons after donor and direct acceptor excitation, SEFF ¼

FA,FRET þ FD FA,FRET þ FD þ FPIE

ð10Þ

Fig. 4 Principle of pulsed interleaved excitation (PIE). In the first time-gate, the donor is excited by a laser pulse (green) and the fluorescence of donor (yellow) and acceptor (pale red) is detected for calculation of the FRET efficiency. In a second time-gate, the acceptor is excited (red) and the fluorescence is detected to discard data from molecules devoid of the acceptor fluorophore (right panel)

14

S. K. Mukherjee et al.

equals 0.5 assuming ideal labeling conditions, regardless of the FRET efficiency and the fluorophore distance within the confocal volume. In later analysis, a threshold for SEFF, typically 0.8, leads to exclusion of data from single-labeled species. Acceptoronly labeled molecules impose no similar problem due to the applied threshold excluding background signal in the donor-excitation channel.

Data Analysis The FRET efficiency calculated using Eq. (7) is only valid under ideal conditions. Differences in quantum yield of the donor, ϕD, and acceptor, ϕA, as well as wavelength-dependent detector sensitivities, αλA=D , have to be taken into account. This is achieved by the correction factor, γ, leading to the corrected FRET efficiency: E¼

αλ ∙ϕA FA,FRET with γ ¼ A FA,FRET þ γ∙FD αλD ∙ϕD

ð11Þ

If distinct FRET efficiency peaks are present and display a GAUSSIAN form, fitting and integrating over single peaks yields the opportunity to derive the ratio of different conformational states. As an example, Fig. 5b shows the histograms of the telomeric G-quadruplex at different temperatures with peaks at three different positions, centered at E ≈ 0.9, E ≈ 0.6, and E ≈ 0.3, respectively. Figure 5c shows the corresponding relative population of conformers. Using least-square methods, these data were analyzed by fitting the experimental data with equation. Relative Events ¼ A1 ∙e



EB1 C1

2

þ A2 ∙e



EB2 C2

2

þ A3 ∙e



EB3 C3

2

ð12Þ

where Ax is the height of peak x, Bx is its center position, and Cx controls the width of the peak. For preventing artifacts, it might be useful to restrict certain parameters to physically viable values and restrict center positions. There are many software

Fig. 5 Time traces of diffusion-based sm-FRET measurements of a G-quadruplex construct in 20 mM TrisHCl, pH 7.5 (DNA concentration: ~100 pM). Donor (green) and acceptor (red) traces show the counts after donor excitation. The histograms at different temperatures are calculated from the time traces (A) and fitted with GAUSSIAN functions (B) for determining the relative population of conformers present (C). In this case, red denotes the folded antiparallel G-quadruplex, blue the folded parallel G-quadruplex, and green the unfolded DNA. (Modified from Arns et al. (2019) with permission from Elsevier (license number: 5244750826512))

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

15

options available to perform this task and visualize the result, like python, R, Matlab, or OriginLab.

Pressure Setup (Patra et al. 2018) Measuring fluorescence under high-pressure conditions with a microscopy setup is still a major challenge. Either optical cells with thick transparent material like special quartz glass or thin capillaries are used. Both sample cells have advantages and disadvantages. Compact cells with thick windows are usually more reliable regarding handling and pressure tightness. The downside is the thick glass window, needing special objectives with large focal distance. To achieve a similar glass thickness to usual coverslips, square-shaped fused silica capillaries are a good option, allowing maximum light excitation and collection for a high NA fluorescence measurement. The capillary’s polyimide coating can be burned away with a standard lighter to obtain an optical window and to glue the end of the capillary to a threaded pressure plug using two-component epoxy glue. The sample solution can be loaded into the capillary with a septum. After filling the capillary, it has to be closed at the other end. The easiest way is to melt the end of the capillary with a small oxy-hydrogen or oxy-propane torch. After use, the pressure plug can be recycled by exchanging the capillary. Heating the pressure plug with a Bunsen burner removes the glue and sonication in organic solvents and water with detergent removes any leftovers and ashes. The square-shaped capillary needs to be centered using a highprecision 3D-stage. In order to align the capillary parallel to the surface of the objective, it must also be possible to rotate the capillary holder around its own axis with high precision. Measuring Kinetics with Immobilized Probes The single-molecule FRET microscopy technique can also be used to determine the kinetics of conformational changes. To this end, the molecule of interest has to be immobilized and the fluorescence of single molecules has to be observed as a function of time. Immobilization of biomolecules is often achieved via streptavidin/biotin interactions. The extensively cleaned glass surface of a coverslip is first functionalized with (3-aminopropyl)triethoxysilane, followed by passivation with succinimidyl polyethylene glycol (SC-PEG) containing a small fraction of biotinylated SC-PEG. Sequential addition of streptavidin solution and the biotinylated probe leads to the surface-immobilized molecules of interest. The detailed protocol for this process is described, for example, in Ha and Selvin (2008). Another technique is to adsorb bovine serum albumin (BSA) with a small fraction of biotinylated BSA for passivation. With this method, the use of squaredshaped capillaries for high-pressure experiments has become feasible (Sung and Nesbitt 2020a). To reduce photobleaching and photoblinking of the fluorophores, the addition of oxygen scavenging systems and triplet quenchers to the imaging buffer is recommended (e.g., 3 wt% D-glucose, 0.02 mg mL1 catalase, 0.1 mg mL1 glucose oxidase, and 1 mM trolox). The first measurement step is to identify the position of an immobilized labeled biomolecule. To this end, a small area (e.g., 30 μm  30 μm) of the prepared surface

16

S. K. Mukherjee et al.

Fig. 6 Anticorrelated signal pattern of donor and acceptor fluorescence time traces (B) from an immobilized DNA hairpin construct. Cumulative FRET efficiency states lead to the histogram (C) and cumulative dwell time of either state yields the rate constant kf/u (A) after fitting with a single exponential. (Modified from Patra et al. (2019) (open access, © Oxford University Press))

is scanned with low laser intensity (~0.5–1 μW). A spot of high fluorescence is selected and fluorescence is measured in a time-dependent fashion with the same PIE pulses as in diffusion-based measurements, but with lower laser intensity (~0.5–1 μW). The time trace is recorded until bleaching occurs. A sharp loss of fluorescence counts (one-step bleaching) indicates the absence of multiple molecules in the observed spot (Patra et al. 2019). In case of a FRET pair, the fluorescence of donor and acceptor after donor excitation should be anti-correlated as shown in Fig. 6B. Low FRET efficiency shows high donor fluorescence and low acceptor fluorescence, and vice versa for high FRET efficiency. To calculate the FRET efficiency (Eq. 11), the time trace is sorted into bins of 1 ms, which is close to the diffusion time of the biomolecule across the confocal volume. The calculated FRET efficiencies can be plotted in a histogram to display the overall FRET efficiency distribution (Fig. 6C). Analysis of the time-dependent FRET efficiency can be carried out via Hidden Markov modeling to generate dwell time distributions. The dwell time is the time, the molecule stays in a particular FRET state, for example, the folded or unfolded conformation, before making a transition to another state. The Hidden Markov model allows to identify these states and the probability of transitions between different states in an otherwise noisy time series. Integrating the raw dwell time histograms leads to cumulative distribution plots, where each point represents the counted events that have a dwell time equal or less than the specified time (Fig. 6A). These cumulative dwell time distribution plots are then fitted to a single exponential function (Eq. 13) to deduce the rate constants for folding and unfolding, respectively (McKinney et al. 2006; Lee 2009). N ðtÞ ¼ N 0 þ A 1  eku=f

t

ð13Þ

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

17

N(t) are the cumulated normalized events of a conformational change in one direction, while A and ku/f are the extent and the rate constant of either folding or unfolding.

Results and Discussion Pressure Effects on Nucleic Acid Structures Along with breakthroughs in the fields of biochemistry, biophysics, chemical biology, and microbiology, significant progress has also been made with regard to the application of high pressure in bioscience and biotechnology (Akasaka and Matsuki 2015; Winter 2019). In high-pressure food processing, as an example of a biotechnological application, HHP harbors the potential to inactivate microorganisms, viruses, and enzymes while the effect on the flavor and nutrient content of food is low compared to usual thermal treatments. In biophysics, high pressures are being used to solve basic problems, such as protein folding and recognition. Pressure modulation has been used to help determine the high level of plasticity of proteins, their folding landscape, and their interactions with ligands. Further, it has been possible to isolate folding intermediates of amyloidogenic proteins upon fibril formation and identify novel aggregation pathways and the polymorphism of amyloid. The recognition of protein and DNA or RNA binding partners plays a crucial role in cellular processes involving nucleic acids, including replication, transcription, and translation. The primary forces involved in formation of such complexes are often of electrostatic and hydrophobic nature, which are generally destabilized by pressure (Akasaka and Matsuki 2015). Hence, pressure-dependent studies can also help provide information about the thermodynamic driving forces and stoichiometry of protein-DNA/RNA interactions. In comparison to proteins, fewer studies have examined how pressure impacts nucleic acids (Akasaka and Matsuki 2015; Macgregor 2002; Son et al. 2014; Fan et al. 2011; Patra et al. 2018, 2019; Takahashi and Sugimoto 2013a; Fiore et al. 2009; Sung and Nesbitt 2020a). Nucleic acids are highly negatively charged polymers, with one phosphate group per nucleotide. Therefore, they interact strongly with solute cations, which markedly affect their temperature and pressure stability. Canonical DNA structures differ from proteins in that they generally lack large compressible voids. Hence, previously conducted high-pressure studies on nucleic acids found that long canonical B-DNA duplexes are generally rather pressure stable at ambient temperature (Macgregor 1998; Girard et al. 2007). Girard et al. observed only a minor distortion of the double helix upon pressure perturbation (Girard et al. 2007). The level of stabilization or destabilization of helical nucleic acid structures has generally been found to be quite low, since their conformation is mainly controlled by hydrogen bonds, which are not significantly affected by high pressure. At high temperature (the melting temperature, Tm, is typically >50  C at low salt), double-stranded DNA undergoes a helix-to-coil transition, which is accompanied by a positive volume change, leading to positive values of the Clapeyron transition

18

S. K. Mukherjee et al.

slope dTm/dp ¼ TmΔV/ΔH. Pressurization may also alter the geometry of particular DNA duplexes. According to Kryzyzaniak et al., poly[d(G-C)], which at atmospheric pressure is in the B-form, takes on the Z-form at 10 kbar (Krzyzaniak et al. 1991). On the other hand, it has been discovered recently that non-canonical DNA structures, such as DNA hairpins and G-quadruplexes, are rather sensitive to pressure, and their pressure sensitivity varies with their base sequence and the type and concentration of the counterions present (Fan et al. 2011; Patra et al. 2018, 2019; Takahashi and Sugimoto 2013a; Fiore et al. 2009; Sung and Nesbitt 2020a). Molecular dynamics simulations showed that small RNAs are destabilized by pressure in a similar fashion to proteins (Garcia and Paschek 2008). Pressure generally destabilizes G-quadruplexes, and unfolding of short oligodeoxyribonucleotides forming a G-quadruplex with a repeat of the human telomeric sequence occurs with significant uptake of water molecules (Fan et al. 2011). Negative volume changes reflect structural and hydration changes (including elimination of internal cavities and voids) accompanying the unfolding transition. Their specific values depend on the conformation of the G-quadruplexes, the loop nucleotides that link their Q-quartets, and on the particular solution conditions (i.e., the salt and cosolvent concentration). Not only unfolding of G-quadruplexes may occur at HHP, but also different folds of quadruplexes may be induced. Recently, applying sm-FRET measurements, Knop et al. showed that the htel G-quadruplex unfolds above 1200 bar; however, the G-quadruplex undergoes a conformational transition from an antiparallel to a parallel state at lower pressures (~400 bar) (Knop et al. 2018). In the following chapters, we will discuss selected examples based on HHP-sm-FRET data in more detail.

Effect of High Hydrostatic Pressure on the Conformational Dynamics of DNA Hairpins Hairpins are involved in the regulation of gene expression, they act as target sites for protein recognition and nucleation sites for higher ordered nucleic acid structures. Next to their biological functions, DNA hairpins serve meanwhile also as biosensors and play a significant role in DNA nanotechnology (Sugimoto 2021). Several methods have been employed to study the stability of hairpins, including NMR and CD spectroscopy, along with different calorimetric assays (Summers et al. 1985; Senior et al. 1988). These ensemble-based techniques generally do not provide insights into the conformational dynamics and multitude of conformations accessible by these structures. Single-molecule investigations simplify the analysis by providing crucial information about the structural dynamics and population distribution of conformational substates of the system. Sm-FRET studies have been carried out on DNA loops having polyadenine repeats or random nucleotides. Sticky variable nucleotides, which can be hybridized, are often used to introduce fluorophores. The number of base-pairs between the donor and acceptor fluorophore in the closed and open state is generally designed to yield high (EFRET ≈ 0.80) and low (EFRET ≈ 0.3) FRET efficiency for these states (Patra et al. 2019; Sung and

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

19

Nesbitt 2020a, b). Conformations with EFRET values of about 0.6 have also been detected in FRET efficiency spectra under particular conditions, indicating population of conformational substates with an intermediate, partially folded structure (Mukherjee et al. 2020). Depending on the sequence of nucleotides in the stem region of the hairpin and the counterion concentration, the hairpin may undergo pronounced fluctuations between closed and fully open loop structures (Patra et al. 2019). Several studies have also been carried out to study the dynamic transition rates of the conformational change from the folded to the unfolded state of DNA hairpins at a single-molecule level at ambient pressure (McKinney et al. 2006; Patra et al. 2019; Lee 2009; Tsukanov et al. 2013). Tsukanov et al. showed that although the rate of hairpin closure is not strongly affected by the stem sequence, the closing rate increases with a decrease in the intramolecular Coulomb repulsion in the DNA backbone (Tsukanov et al. 2013). Patra et al. found that the folding rate of DNA hairpins is significantly affected by cosolutes (Fig. 7) (Patra et al. 2019). The addition of Mg2+ enhances the folding rate, kf, from 7.5  0.2 s1 in neat buffer to 13.3  0.2 s1 in the presence of 6 mM Mg2+, and at the same time the unfolding rate ku decreases from 3.13  0.05 s1 in neat buffer to 1.65  0.04 s1. Further, they found that the (viscosity-corrected) folding rate increases in the presence of the compatible osmolyte trimethylamine-N-oxide (TMAO), up to 23.2  0.4 s1 in 4 M TMAO, while 4 M of the chaotropic agent urea increases the unfolding rate ku from 3.13  0.05 s1 to 14.5  0.5 s1 (Patra et al. 2019). The unfolding rate constant in 4 M TMAO is quite similar to the neat buffer data and amounts to 4.01  0.16 s1. Hence, TMAO strongly affects the folding rate constant, while the unfolding rate remains more or less the same with respect to that in neat buffer solution. The kinetic stabilizing mechanism found for TMAO is quite different from that of divalent cations such Mg2+, where both rate constants are affected. The sm-FRET technique was also applied to assess the influence of temperature and pressure on the conformational dynamics of DNA hairpins. Winter and coworkers carried out extensive work to reveal the influence of HHP on the conformational dynamics of the polyA DNA hairpin at various solution conditions. They found that over the whole pressure range covered (1–1500 bar, Fig. 8), the hairpin is always in an equilibrium between the open and close conformation (Patra et al. 2019; Arns and Winter 2019). The closed conformation is favored at ambient temperature and atmospheric pressure, where the closed-to-open ratio is 1.6:1. Upon pressurization in neat buffer solution, the structure of the DNA hairpin is destabilized via disruption of the stem region adjacent to the loop, shifting the equilibrium towards the open conformation. At 1500 bar, the fraction of the open conformation increased to 80%. The volume change, ΔV, for the hairpin-to-coil transition was found to be 17.7 mL mol1, indicating a smaller partial molar volume of the open conformer. Note that the uncertainties in measuring volume changes from K( p) data typically amount to ~2–5 mL mol1, which is actually less than one-third the volume of a single water molecule. Furthermore, it was found that the addition of compatible osmolytes (like TMAO) and crowding agents (like the polysaccharide Ficoll) are able to significantly reduce

20

S. K. Mukherjee et al.

Fig. 7 Folding (kf) and unfolding (ku) rate constants of a polyadenine hairpin from cumulated dwell time analysis in 15 mM NaCl, 20 mM TrisHCl buffer, pH 7.5, in the presence of different cosolutes. (A) neat buffer, (B) 6 mM MgCl2, (C) 4 M TMAO, (D) 4 M urea. U, F, and P denote unfolded, folded, and partially folded conformations. (Modified from Patra et al. (2019) (open access, © Oxford University Press))

the pressure sensitivity of the DNA hairpin. The addition of salt, TMAO or Ficoll counteracts the pressure-induced destabilization of the folded conformation by rendering the unfolding process volumetrically less favorable. For example, the volume change for the conformational transition from the folded to the unfolded state is significantly reduced (by ~10 mL mol1) in the presence of TMAO or Ficoll (Fig. 9). Divalent and trivalent cations are able to modulate the energy and conformational landscape of DNA hairpins to a greater extent than monovalent cations (Patra et al. 2019). Patra et al. found that different from a 12 mM K+ solution, which did not show a significant effect on the conformational stability of the DNA hairpin against pressure, the addition of Mg2+ and Co3+ strongly stabilizes the folded conformation and counteracts the destabilization effect of pressure. The transition volume changes from 17.7 mL mol1 in neat buffer to about 5 mL mol1 in the presence of 1 mM Mg2+ or Co3+ (Patra et al. 2019). The pressure stability of the

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

21

Fig. 8 (A) FRET efficiency histogram of a DNA hairpin in buffer solution. Two conformations, the closed and open state, can be attributed to the high and low FRET efficiency peaks. Pressure shifts the equilibrium toward the open state. (B) Typical concentrations of TMAO and urea found in sharks. (C) FRET efficiency histogram of the DNA hairpin in 1 M TMAO. The pressure-induced shift to the open conformation is prohibited in the presence of TMAO. (Data from Patra et al. (2019) (open access, © Oxford University Press) and Yancey et al. (2002))

DNA hairpin follows the order Co3+ > Mg2+ > K+, i.e., increases with increasing charge density of the cation added (Patra et al. 2019). The results can be rationalized by invoking changes in hydration (e.g., electrostrictive effects upon hydration of the phosphate anionic backbone, which are proportional to the solvent accessible surface area, SASA) and intrinsic void volume (e.g., due to formation of water-excluding hydrophobic voids in the folded state that are filled with water upon unfolding) of the conformers involved. The addition of salt does not only lead to charge screening, but can also change volumetric properties via hydration changes. Cations shield the highly negatively charged DNA phosphate backbone, weakening the Coulomb interactions that align the water molecules and therefore allowing DNAs to adopt a more weakly bound solvation shell with a larger effective hydration volume. At high salt concentrations, such effect will be particularly pronounced in the unfolded state of the DNA hairpin which has a larger surface area exposed to water. Recently, Nesbitt and coworkers succeeded in carrying out pressure-dependent sm-FRET measurements with an immobilized DNA hairpin at different NaCl concentrations. The Eyring equation (Eq. 2) was used to calculate folding and unfolding rate constants from cumulative dwell time distributions. At 50 mM NaCl, the folding rate constant, kf, decreased from 1.21 s1 to 0.41 s1 and the unfolding rate, ku increased from 0.61 s1 to 1.02 s1 upon a pressure increase from 1 bar to 1250 bar. VAN’T HOFF analysis (Eq. 1) was used to calculate the

22

S. K. Mukherjee et al.

Fig. 9 Schematic representation of the volumetric profile of the DNA hairpin at 25  C for the unfolding transition in neat buffer and in the presence of 1 M TMAO, 1 mM MgCl2, 1 M urea, and the crowding agent 20 wt% Ficoll. U and F represent the unfolded and folded states, respectively, and ΔV o ¼ V ouV of is the partial molar volume change upon unfolding. (Modified from Patra et al. (2019) (open access, © Oxford University Press))

corresponding change of transition state volumes of unfolding and folding, ΔV ‡u=f , with ΔV ‡f ¼ 22:1 mL mol1 and ΔV ‡u ¼ 10:3 mL mol1 . ΔV u ¼ ΔV ‡u  ΔV ‡f represents the volume change upon unfolding, which amounts to 32.4 mL mol1. Comparison of different salt concentrations (25 mM and 100 mM NaCl) revealed a more pronounced salt dependence of ΔV ‡u compared to ΔV ‡f ‡ 1 1 ‡ (ΔΔV f ¼ 5:6 mL mol , ΔΔV u ¼ 12:8 mL mol ), which indicates a more folded-like transition state at the higher Na+ concentration. Due to the small changes of ΔV ‡u=f observed for 75 mM to 100 mM NaCl concentration

(ΔΔV ‡u ¼ 1 mL mol1 , ΔΔV ‡f ¼ 0:4 mL mol1 ), the effect is most likely saturated at in vivo salt conditions (Sung and Nesbitt 2020a). Another study by Sung and Nesbitt (2020b) focused on diffusion-based measurements to obtain sequence-dependent thermodynamic information. In temperaturedependent studies, unfolding of a 40A DNA hairpin construct was found to be endothermic (ΔH > 0) with an entropic gain (ΔS > 0), and pressure-induced unfolding experiments indicated that the DNA hairpin takes up less volume in the unfolded vs. folded state, i.e., increasing pressure destabilizes the DNA hairpin stem region. Measurements of volume changes with single basepair resolution have been achieved, allowing ΔV for unfolding of the DNA hairpin to be linearly deconstructed into stem (ΔVbp ¼ 1.98 cm3 mol1 bp1) and loop (ΔVloop ¼ 7.0 cm3 mol1)

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

23

contributions by analyzing the dependence on length and complementary sequence. Further, from temperature-dependent studies it was found that single-strand loop formation is endothermic (ΔHloop ¼ +63 kJ mol1) and compensated by a positive entropic gain (TΔSloop ¼ 31 kJ mol1 K1 at T ¼ 298 K), comprising translational and internal (i.e., hindered rotational) contributions (Sung and Nesbitt 2020b). Sung and Nesbitt also investigated a lysine riboswitch, driving the progress of this method to more complex and biological relevant systems. They found that the lysine riboswitch shifts to the unfolded conformation including the loss of the lysine ligand upon pressurization, which is accompanied by a volume change of about 75 mL mol1 (Sung and Nesbitt 2020c). They also observed that the effect of the concentration of the ligand lysine as well as the cations Na+ and Mg2+ on the volume change are negligible. Furthermore, Na+ had no significant influence on the stability of the folded conformation, whereas lysine and divalent Mg2+ significantly stabilized the folded conformation at elevated pressures (Sung and Nesbitt 2020c). The presence of TMAO, even at a concentration as high as 1.25 M, lead to a moderate shift of the equilibrium toward the folded state, only. The transition volume decreased accordingly, to 40 mL mol1, thereby stabilizing the riboswitch against a pressure-induced shift of the equilibrium toward the unfolded state. Every 250 mM TMAO was found to compensate roughly for a pressure-induced equilibrium shift of 200 bar, highlighting the importance of the cosolvent TMAO in helping maintain biomolecular function of deep sea organisms at deteriorating high-pressure conditions. It was concluded that this may signal an elegant strategy behind controlled upregulation of compatible osmolytes such as TMAO in cells. It promotes biomolecular folding under high pressures, but also maintains the required “on–off” two-state riboswitch sensitivity to lysine over a wide range of ocean depth profiles at medium-to-high hydrostatic pressures (Sung and Nesbitt 2020c). As we have shown, the dissection into volumetric and kinetic parameters at different solution conditions generally improves our understanding of the effects of the complex cellular environment on the conformational dynamics of biomolecules such as the DNA hairpins. We anticipate that the stabilizing or destabilizing effects of cosolutes in terms of volumetric and kinetic properties will be similar for other systems. Our data also demonstrate the important role of compatible osmolytes and cellular crowding in rescuing biological functions under harsh environmental conditions, such as HHP. The equilibrium and kinetic parameters under highpressure conditions are significantly modulated by the cosolute’s impact on the volumetric properties of the dissolved biomolecule. It is important to point out that such effects would not necessarily make also high salt (i.e., cation) concentrations suitable candidates to stabilize nucleic acid structures, thereby helping deep sea organisms maintain cellular function at extreme pressures. This is due to the fact that high concentrations of charged cations and anions strongly affect many structures of biomolecules in cells and thus highly perturb their function. Hence, instead of recruiting more ions, deep sea organisms accumulate high concentrations of small organic solutes such as TMAO that are much more compatible with the biomolecule’s structure and function.

24

S. K. Mukherjee et al.

Effect of High Pressure on the Conformational Dynamics of G-Quadruplexes In recent years, among various non-canonical DNA structures, G-quadruplexes (G4Qs) have attracted particular attention for their unique conformation, gene functions as well as potential targets for chemical intervention of biological functions (Sugimoto 2021; Neidle 2010; Davis 2004). G-quadruplexes are four-stranded nucleic acid structures formed by guanine-rich sequences with G-tetrads being formed by coplanar arrangements of four HOOGSTEEN-paired guanines, and are known to be highly polymorphic in nature (Sugimoto 2021). G4Qs are present in living cells and mostly found in ribosomal DNAs, RNAs, and key regions of the human genome, such as gene promoters (e.g., c-MYC, BCL-2) and telomeres (Biffi et al. 2013). They regulate several cellular functions, like gene transcription and telomere lengthening, and are involved in important cancer-related biological processes, therefore requiring particular attention among other non-canonical motifs. The human telomeric G-quadruplexes are known to take up several conformations, like an antiparallel, parallel, and hybrid conformation (Fig. 10), depending on the monovalent cationic salt (like NaCl and KCl), osmolyte concentration, as well as on other environmental parameters like temperature and pressure (Arns et al. 2019; Knop et al. 2018; del Villar-Guerra et al. 2018). Sm-FRET experiments are able to reveal the folding dynamics of the G4Qs in real time and capture the interconversion scenario between the different structural forms. The length and sequence of nucleotides in the loop regions strongly affect the kinetic rate of transition from the folded to the unfolded states. Miura et al. showed that short-looped DNA G4Q has a more persistent structure compared to looping regions consisting of more nucleotides, causing reduced transition rates (Miura et al. 1995). In the presence of both Na+ and K+, parallel and antiparallel G4Q structures exist in equilibrium, the antiparallel form having the lowest free energy (Miura et al. 1995; Ren et al. 2002). Despite the small free energy difference between the antiparallel

Fig. 10 Graphical representation of different G-quadruplex conformations of the (human) telomeric G-quadruplex sequence. The red line represents the DNA strand with the guanines (blue) forming planes (grey) via Hoogsteen base pairing (right). While the intercalating potassium ions (violet) stabilize the parallel conformation, the plane-integrated sodium ions (green) stabilize the antiparallel conformation. Dependent on the environmental conditions including temperature, pressure, salt concentration, and macromolecular crowding, different conformations are often observed to be in equilibrium with each other

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

25

and parallel conformations, interconversion between these structures may occur through partially folded intermediate states (Ying et al. 2003). Along with the nature of the monovalent cation and the number of G-quartets, hydration is a major driving force for the stability of G-quadruplexes (Sugimoto 2021; Fan et al. 2011; Olsen and Marky 2009; Yu et al. 2012; Takahashi and Sugimoto 2013b). Upon formation of G4Qs, water molecules are released. Therefore, volumetric analysis of G-quadruplexes using pressure modulation has proven very useful for understanding its conformational conversions and folding dynamics. Several studies based on different analytical tools have shown that the stability of G4Qs is significantly affected by HHP (Takahashi and Sugimoto 2013a; Macgregor 1998; Miyoshi et al. 2009). Fan et al. showed that the melting temperature of the human telomeric (htel) oligonucleotide, d[A(GGGTTA)3GGG], is significantly decreased under high-pressure conditions, which suggests destabilization upon compression (Fan et al. 2011). The transition volume of unfolding decreases accordingly, amounting 68 mL mol1 in the presence of 20 mM Na+ and 56 mL mol1 in 100 mM Na+. In another study by Sugimoto and coworkers it was found that the thermal stability of the thrombin binding aptamer (TBA; 50 -GGTTGGTGTGGTTGG-30 ) decreases upon increasing the pressure up to 4000 bar. Their study also revealed that the transition volume changes significantly, from 54.6 mL mol1 to 12.5 mL mol1, in the presence of macromolecular crowders at HHP conditions, indicating stabilization of the folded state in the presence of the crowding agent (Takahashi and Sugimoto 2013b). Although the pressure stability of G4Qs is quite well established, the conformational dynamics of G4Qs at a single-molecule level and the rates of interconversion are less explored. Using pressure-assisted sm-FRET methodology, Knop et al. showed that pressure can significantly modulate the folding dynamics of G4Qs (Knop et al. 2018). They showed that in pure buffer condition, at 400 bar, the parallel conformation of htel G4-DNA is prevailing at the expense of the antiparallel state being present at ambient pressure. Finally, beyond ~1000 bar, unfolding of the G4Q structure takes place (Fig. 11). The volume change, ΔV, for

Fig. 11 Pressure-dependent FRET efficiency histograms of the telomeric G-quadruplex construct are shown in the first panel (left). Middle: Relative population of the different FRET efficiency peaks obtained by integrating over GAUSSIAN peaks. Red represents the antiparallel conformation, blue represents the parallel conformation, and green the unfolded G-quadruplex sequence. Right: The volume change for the transition from the antiparallel to the parallel/hybrid conformation obtained from the VAN’T HOFF plot. (Modified from Knop et al. (2018))

26

S. K. Mukherjee et al.

Fig. 12 Graphical sketch summarizing the results from the pressure-dependent sm-FRET measurements of the telomeric G-quadruplex in the presence of different cosolvents (urea, TMAO)

the conformational transition from the antiparallel to the parallel/hybrid structure was found to be ΔV ¼ 26.2 mL mol1. This indicates that the parallel conformation has a lower partial molar volume compared to the antiparallel one. The antiparallel G4Q-DNA is more pressure stable in the presence of 1 M TMAO, even up to pressures as high as ~750 bar. At higher pressures, partial conformational switching from the antiparallel to the parallel conformation was observed, but no unfolding was seen for the G4Q in pure buffer solution below 1000 bar. The volume change from the antiparallel to the parallel conformation changes from 26.2 mL mol1 in buffer to 17 mL mol1 in the presence of TMAO, indicating that the cosolvent is able to partially compensate for the pressureinduced effects on the conformational dynamics of the G4Q-DNA. The stabilizing effect of TMAO is even present in a mixture of 2 M of the chaotropic agent urea and 1 M TMAO (Fig. 12). As the solvent accessible surface area (SASA) is larger in case of the parallel conformation, which leads to a larger hydration shell of slightly higher density, pressurization may result in a more negative hydration volume, ΔVhydr, hence an overall smaller partial molar volume, favoring the parallel structure over the antiparallel one at higher pressures. Unfolding at higher pressures leads to a further decrease of ΔV owing to a further increase in SASA and hence decrease of ΔVhydr. Upon unfolding, also the hydration of the internal cations, and possibly also the release of void volume (packing defects) contribute to the decrease of the molar volume.

Effect of High Pressure on the Conformational Dynamics of I-Motifs Also C-rich sequences are prevalent in genomic DNA (Sugimoto 2021; Zeraati et al. 2018). These sequences can fold into tetraplex helical structures called i-motifs and are implicated in various biological processes including gene regulation and replication (Sugimoto 2021; Choi et al. 2011). The stability of this non-canonical nucleic

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

27

Fig. 13 Sm-FRET histograms of the telomeric i-motif at 25  C and 1 bar at pH 5 (A) and at pH 7.5 (B). The DNA concentration was ~100 pM. (C) shows a schematic representation of the dominant i-motif conformations, and (D) shows the cytosine-hemicytosine H-bonding pattern of the i-motif. The i-motif consists of two parallel-stranded duxplexes which are stabilized by intercalated hemiprotonated cytosine-cytosine (C-C+) base pairs. The protonated cytosine in the formation of an i-motif results in a strong dependence of the structure on pH. (Modified with permission from Knop et al. (2020) (© 2020 American Chemical Society))

acid conformation depends on cytosine protonation, hence leading to pH-dependent H-bond formation between cytosine and hemi-cytosine (Fig. 13) (Choi et al. 2011; Knop et al. 2020). Besides pH, extrinsic factors influencing structural transitions of i-motifs include salt concentration, cosolvents, temperature, and pressure. Recent biochemical studies have demonstrated that folded i-motif structures exist not only at low pH, but can also exist at physiological pH (Wright et al. 2017). In recent studies, it has been demonstrated that intracellular factors, such as macromolecular crowding, promote the formation of the i-motifs in C-rich strands even at neutral pH (Cui et al. 2013; Rajendran et al. 2010). A sm-FRET analysis revealed that the C-rich sequences present in the human telomeric region adopt a fully folded state at pH 5.5. A pH of 6.5 enhances the dynamic nature of the conformational changes, resulting in increased dominance of a partially folded state and frequent interconversion between these states (Megalathan et al. 2019). Using sm-FRET, Paul et al. showed that the folding process of the c-MYC promoter-based i-motif is not a simple

28

S. K. Mukherjee et al.

two-state transition between a random coil and a folded i-motif structure, it rather involves a partially folded conformation as an intermediate state where the bases are not well stacked (Paul et al. 2020). Limited work has been carried out on the pressure stability of i-motifs. Lepper et al. studied the thermodynamic stability of a cytosine(C)-rich i-motif tract of DNA (5’-CCC-(ATT-CCC)3) as a function of both pH and pressure. After correcting pressure effects on the buffer-pH, the i-motif was observed to become less stable as pressure is increased at pH > 4.6, giving a negative volume change for dissociation (ΔV ¼ 54 cm3 mol1 (per mole i-motif) for T < 328 K. At pH ~4.6 and T ≈ 332 K, ΔV ≈ 0, and at lower pH, ΔV becomes positive (Lepper et al. 2019). Takahashi and Sugimoto found that formation of the DNA i-motif structure was stabilized by high pressure, with pressure presumably having a volumetric effect on phosphate buffer that enhances the acid dissociation constant of the phosphate buffer (Takahashi and Sugimoto 2015). Recent work employing infrared spectroscopy by Smeller and coworkers showed that the htel i-motif is destabilized by pressure and unfolds with a negative volume change (Somkuti et al. 2020). In recent work, Winter and coworkers studied the effects of pressure on the htel i-motif at physiological pH at a single-molecule level. Results from the sm-FRET measurements showed that even at 1.5 kbar there is no evidence of a conformational change, indicating that the i-motif is packed tightly and does not have cavities in its partially folded state (Mukherjee et al. 2021). A similar observation was made regarding the structure of the i-motif in the presence of 1 M TMAO (Mukherjee et al. 2021).

Pressure Effects on the Interaction of Proteins with Non-canonical DNA Many crucial biochemical reactions result from interactions of nucleic acids with proteins, such as during DNA replication, transcription, processing and repair of damaged DNA, and DNA rearrangement (Rohs et al. 2009; Wang and Brown 2006). Next to H-bonding, π-π, and van der Waals interactions, a number of electrostatic interactions may occur between nucleic acids and proteins, resulting in conformational and hydration changes (Takahashi and Sugimoto 2013b). Since pressure can significantly modulate such interactions, additional information about the driving forces of protein-nucleic acid interactions can be obtained by pressure modulation. Sm-FRET has also proven to be a valuable tool for elucidating nucleic acid-protein interactions on a molecular level and enable one to study nanometer-accurate dynamic interactions of the system also under physiologically relevant conditions. Studies of DNA-protein interactions have benefited significantly from the possibility that fluorescently labeled DNA can be used to study many of these interactions (Phelps et al. 2013; Chaurasiya and Dame 2018; Schärfen and Schlierf 2019). The influence of pressure on such interactions has not yet been largely addressed, however, and also few studies have been carried out on DNA-protein interactions at high pressure by ensemble measurements only (Robinson and Sligar 1994; Royer et al. 1990; Merrin et al. 2011; Silva et al. 2002). In a recent study, Merrin et al.

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

29

Fig. 14 Conformation of the DNA hairpin as a function α-Syn concentration as revealed from sm-FRET experiments. While in pure buffer solution an equilibrium between folded (red) and unfolded (green) conformational states is observed, increasing α-Syn concentration leads to an increasing population of an intermediate conformation (blue). (Reprinted from Mukherjee et al. (2020) (© 2020 Authors. Published by Wiley-VCH GmbH))

showed that interactions between a highly conserved protein involved in DNA repair in prokaryotes, RecA, and single-stranded DNA (ssDNA) were significantly affected by high pressure (Merrin et al. 2011). Using sm-FRET measurements, Mukherjee et al. showed that the conformational dynamics of a DNA hairpin composed of polyadenine residues is significantly modulated in the presence of intrinsically disordered proteins, such as α-synuclein (α-Syn) (Mukherjee et al. 2020). In the FRET histogram of the DNA hairpin, a peak emerges around 0.5–0.6, which implies formation of a partially folded state of the DNA-hairpin induced by non-specific interactions with monomeric α-Syn (Fig. 14). They also observed that such intermediate conformation is pressure stable over the pressure range of 1–1500 bar, which suggests formation of a compact, void-free DNA-hairpin-α-Syn complex. They also found that protein-DNA hairpin interactions may be quite different inside aqueous two-phase systems (ATPS), such as formed by PEG and Dextran (Fig. 15), which has been used as a simple mimic for intracellular liquid-liquid phase separated conditions. In the dense droplet phase of the ATPS, the DNA-hairpin takes up one single conformer at E ¼ 0.6 in the presence of 150 μM α-Syn, which is also pressure stable in the whole pressure range covered. In many cases, stabilization of folded or partially folded conformations of noncanonical DNA structures has been observed upon interaction with conformationally

30

S. K. Mukherjee et al.

Fig. 15 Pressure dependence of the structure of a poly-A hairpin in the presence of monomeric α-Syn (150 μM) in buffer (A, B) and ATPS consisting of 11 wt% PEG and 11 wt% Dextran (C, D). Further conditions: 25  C, 20 mM TrisHCl, pH 7.5, 15 mM NaCl, ~50 pM DNA. (Modified from Mukherjee et al. (2020) (© 2020 Authors. Published by Wiley-VCH GmbH))

disordered proteins. The role of different conformational states of proteins and protein oligomers on the conformation of DNAs and vice versa is not well understood, however. Silva et al. showed that, depending on the protein involved, nucleic acids can both stabilize or destabilize protein oligomers (Silva et al. 2002). Owing to the general high-pressure sensitivity of oligomeric proteins, such as of amyloidogenic proteins, DNA-interacting proteins can also be examined under high hydrostatic pressure to reveal additional information about the complex formed. In general, pressure forces water molecules into protein-DNA complexes at their intermolecular interface, which often contains packing defects, leading to dissociation of the complex. In a recent study through sm-FRET experiments, Knop et al. showed that α-Syn at physiologically relevant local concentrations affect single noncanonical DNA structures in a sequence-specific way (Knop et al. 2020). They found that monomeric and oligomeric α-Syn influence the conformational dynamics of G4Qs and i-motifs in a different way, leading to remodeling of their conformational substates. Aggregated α-Syn destabilizes the G4Q, leading to unfolding. In contrast, both monomeric and aggregated α-Syn enhanced folding of the i-motif sequence of telomeric DNA and acts like a molecular chaperon (Fig. 16). Volumetric data

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

31

Fig. 16 α-Synuclein in its oligomeric form imposes different effects on non-canonical DNA motifs. (A, B) show the sm-FRET histograms and relative population of conformers of the telomeric G-quadruplex with increasing concentration of oligomeric α-Syn. (C, D) show the pressuredependent of sm-FRET histograms and relative population of conformers of the telomeric i-motif in the presence of 100 μM oligomeric α-Syn. Both measurements were performed in 20 mM TrisHCl buffer, pH 7.5, at 25  C. (a, b are modified and reused with permission from Knop et al. (2020) (© 2020 American Chemical Society). c, d are modified and reused from Mukherjee et al. (2021) (© 2021 The Royal Society of Chemistry))

showed that the monomeric α-Syn-i-motif complex is tightly packed and resistant to pressure, unlike the oligomeric case where pressure dissociates α-Syn aggregates. The volume change of 54 mL mol1 upon pressure perturbation indicates the formation of a cavity-rich aggregated α-Syn-i-motif complex. The insensitivity to pressure of the monomeric α-Syn-i-motif complex suggests that binding involves mainly H-bonds, which are strengthened even further by high pressure. The data presented here show a strong dependence of protein-DNA interactions on the sequence and secondary structure of the different DNA constructs (hairpins, G-quadruplexes, and i-motif), although the protein α-Syn used is intrinsically disordered and the interactions are therefore most likely nonspecific and electrostatic in nature.

32

S. K. Mukherjee et al.

Conclusion We have seen that pressure modulation allows us to peek into hidden structures, dynamical properties, and interactions of biomolecules such as nucleic acids, and, in turn, enables us to control the function of these biomolecules. The knowledge gained from such studies might help develop drugs, e.g., for targeting G-quadruplexes involved in oncogene promoters and telomeres. Owing to their sensitivity toward packing, conformational, and hydration changes as well as their capability to modulate intermolecular forces, such single-molecule-based pressure-dependent studies may also help develop new genetic therapeutics, such as those based on modified nucleic acids embedded in synthetic lipid mesophase structures, which are very pressure sensitive as well. In analogy to industrial chemical production applications, such as the HABER-BOSCH process for producing ammonia, high pressures will also be able to optimize the yield and stereospecificity of enzymatic reactions, including those based on DNA- or RNAzymes. We may expect in the very near future that the advanced experimental tools presented here will reveal their great potential also in studies of systems involved in more complex and biologically relevant research, such as in studies of the dynamic behavior of nucleic acid-based intracellular processes. To better understand also the processes associated with the physical limits of life, extensive high-pressure studies of biological systems at different levels of complexity (genome, proteome, lipidome, metabolome, . . .) are still needed, which should also take into account the particular geological settings to which extremophiles are exposed and which affect their cellular milieu. These results could not only lead to a deeper understanding of fundamental life phenomena, but also reveal previously unknown adaptive mechanisms of extremophiles. Acknowledgments This project received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 801459 – FP-RESOMUS and was funded by the Deutsche Forschungsgemeinschaft (DFG) under Germany’s Excellence Strategy – EXC 2033 – 390677874 – RESOLV.

References Akasaka K, Matsuki H (2015) High pressure bioscience. In: Akasaka K, Matsuki H (eds) Subcellular biochemistry, vol 72. Springer Netherlands, Dordrecht. https://doi.org/10.1007/978-94017-9918-8 Arns L, Winter R (2019) Liquid-liquid phase separation rescues the conformational stability of a DNA hairpin from pressure-stress. Chem Commun 55(72):10673–10676. https://doi.org/10. 1039/c9cc04967c Arns L, Knop J-M, Patra S, Anders C, Winter R (2019) Single-molecule insights into the temperature and pressure dependent conformational dynamics of nucleic acids in the presence of crowders and osmolytes. Biophys Chem 251:106190. https://doi.org/10.1016/j.bpc.2019. 106190 ATTO-TEC. https://www.atto-tec.com/product_info.php?info¼p103_atto-550.html; https://www. atto-tec.com/product_info.php?info¼p114_atto-647n.html; https://www.atto-tec.com/? cat¼c45_R-0%2D%2DWerte%2D%2DFRET%2D%2Dr-0-werte-fret.html

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

33

Biffi G, Tannahill D, McCafferty J, Balasubramanian S (2013) Quantitative visualization of DNA G-quadruplex structures in human cells. Nat Chem 5(3):182–186. https://doi.org/10.1038/ nchem.1548 Chaurasiya, KR, Dame, RT (2018) Single Molecule FRET Analysis of DNA Binding Proteins. In: Peterman, E (eds) Single Molecule Analysis. Methods in Molecular Biology, vol 1665. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7271-5_12 Choi J, Kim S, Tachikawa T, Fujitsuka M, Majima T (2011) PH-induced intramolecular folding dynamics of i-motif DNA. J Am Chem Soc 133(40):16146–16153. https://doi.org/10.1021/ ja2061984 Cui J, Waltman P, Le VH, Lewis EA (2013) The effect of molecular crowding on the stability of human C-MYC promoter sequence i-motif at neutral pH. Molecules 18(10):12751–12767. https://doi.org/10.3390/molecules181012751 Daniel I, Oger P, Winter R (2006) Origins of life and biochemistry under high-pressure conditions. Chem Soc Rev 35(10):858–875. https://doi.org/10.1039/b517766a Davis JT (2004) G-quartets 40 years later: from 50 -GMP to molecular biology and supramolecular chemistry. Angew Chemie Int Ed 43(6):668–698. https://doi.org/10.1002/anie.200300589 del Villar-Guerra R, Trent JO, Chaires JB (2018) G-quadruplex secondary structure obtained from circular dichroism spectroscopy. Angew Chemie Int Ed 57(24):7171–7175. https://doi.org/10. 1002/anie.201709184 Deniz AA, Mukhopadhyay S, Lemke EA (2008) Single-molecule biophysics: at the interface of biology, physics and chemistry. J R Soc Interface 5(18):15–45. https://doi.org/10.1098/rsif. 2007.1021 Fan HY, Shek YL, Amiri A, Dubins DN, Heerklotz H, MacGregor RB, Chalikian TV (2011) Volumetric characterization of sodium-induced G-quadruplex formation. J Am Chem Soc 133(12):4518–4526. https://doi.org/10.1021/ja110495c Fiore JL, Kraemer B, Koberling F, Edmann R, Nesbitt DJ (2009) Enthalpy-driven RNA folding: single-molecule thermodynamics of tetraloop-receptor tertiary interaction. Biochemistry 48(11): 2550–2558. https://doi.org/10.1021/bi8019788 Garcia AE, Paschek D (2008) Simulation of the pressure and temperature folding/unfolding equilibrium of a small RNA hairpin. J Am Chem Soc 130(3):815–817. https://doi.org/10. 1021/ja074191i Girard E, Prange T, Dhaussy A-C, Migianu-Griffoni E, Lecouvey M, Chervin J-C, Mezouar M, Kahn R, Fourme R (2007) Adaptation of the base-paired double-helix molecular architecture to extreme pressure. Nucleic Acids Res 35(14):4800–4808. https://doi.org/10.1093/nar/gkm511 Ha T (2004) Structural dynamics and processing of nucleic acids revealed by single-molecule spectroscopy. Biochemistry 43(14):4055–4063. https://doi.org/10.1021/bi049973s Ha T, Selvin PR (2008) The new era of biology in singulo. In: Single-molecule techniques: a laboratory manual. CSH Press, pp 1–36 Harish B, Wang J, Hayden EJ, Grabe B, Hiller W, Winter R, Royer CA (2022) Hidden intermediates in mango III RNA aptamer folding revealed by pressure perturbation. Biophys J 121(3): 421–429. https://doi.org/10.1016/j.bpj.2021.12.037 Knop J-M, Patra S, Harish B, Royer CA, Winter R (2018) The deep sea osmolyte trimethylamine N -oxide and macromolecular crowders rescue the antiparallel conformation of the human telomeric G-quadruplex from urea and pressure stress. Chem A Eur J 24(54):14346–14351. https://doi.org/10.1002/chem.201802444 Knop J-M, Mukherjee SK, Oliva R, Möbitz S, Winter R (2020) Remodeling of the conformational dynamics of noncanonical DNA structures by monomeric and aggregated α-synuclein. J Am Chem Soc 142(43):18299–18303. https://doi.org/10.1021/jacs.0c07192 Krzyzaniak A, Sałański P, Jurczak J, Barciszewski J, Krzyźaniak A, Salański P, Jurczak J, Barciszewski J (1991) B-Z DNA reversible conformation changes effected by high pressure. FEBS Lett 279(1):1–4. https://doi.org/10.1016/0014-5793(91)80235-U Lakowicz JR (2006) In: Lakowicz JR (ed) Principles of fluorescence spectroscopy. Springer US, Boston, MA. https://doi.org/10.1007/978-0-387-46312-4

34

S. K. Mukherjee et al.

Lee T (2009) Extracting kinetics information from single-molecule fluorescence resonance energy transfer data using hidden Markov models. J Phys Chem B 113(33):11535–11542. https://doi. org/10.1021/jp903831z Lepper CP, Williams MAK, Edwards PJB, Filichev VV, Jameson GB (2019) Effects of pressure and pH on the physical stability of an i-motif DNA structure. ChemPhysChem 20(12):1567–1571. https://doi.org/10.1002/cphc.201900145 Lerner E, Barth A, Hendrix J, Ambrose B, Birkedal V, Blanchard SC, Börner R, Chung HS, Cordes T, Craggs TD, Deniz AA, Diao J, Fei J, Gonzalez RL, Gopich IV, Ha T, Hanke CA, Haran G, Hatzakis NS, Hohng S, Hong SC, Hugel T, Ingargiola A, Joo C, Kapanidis AN, Kim HD, Laurence T, Lee NK, Lee TH, Lemke EA, Margeat E, Michaelis J, Michalet X, Myong S, Nettels D, Peulen TO, Ploetz E, Razvag Y, Robb NC, Schuler B, Soleimaninejad H, Tang C, Vafabakhsh R, Lamb DC, Seidel CAM, Weiss S, Boudker O (2021) FRET-based dynamic structural biology: challenges, perspectives and an appeal for open-science practices. elife 10: e60416. https://doi.org/10.7554/eLife.60416 Lu HP (2005) Probing single-molecule protein conformational dynamics. Acc Chem Res 38(7): 557–565. https://doi.org/10.1021/ar0401451 Macgregor RB (1998) Effect of hydrostatic pressure on nucleic acids. Biopolymers 48(4):253–263. https://doi.org/10.1002/(SICI)1097-0282(1998)48:43.0.CO;2-F Macgregor RB (2002) The interactions of nucleic acids at elevated hydrostatic pressure. Biochim Biophys Acta Protein Struct Mol Enzymol 1595(1–2):266–276. https://doi.org/10.1016/S01674838(01)00349-1 McKinney SA, Joo C, Ha T (2006) Analysis of single-molecule FRET trajectories using hidden Markov modeling. Biophys J 91(5):1941–1951. https://doi.org/10.1529/biophysj.106.082487 Megalathan A, Cox BD, Wilkerson PD, Kaur A, Sapkota K, Reiner JE, Dhakal S (2019) Singlemolecule analysis of i-motif within self-assembled DNA duplexes and nanocircles. Nucleic Acids Res 47(14):7199–7212. https://doi.org/10.1093/nar/gkz565 Merrin J, Kumar P, Libchaber A (2011) Effects of pressure and temperature on the binding of RecA protein to single-stranded DNA. Proc Natl Acad Sci U S A 108(50):19913–19918. https://doi. org/10.1073/pnas.1112646108 Miura T, Benevides JM, Thomas GJ (1995) A phase diagram for sodium and potassium ion control of polymorphism in telomeric DNA. J Mol Biol 248(2):233–238. https://doi.org/10.1016/ s0022-2836(95)80046-8 Miyoshi D, Nakamura K, Tateishi-Karimata H, Ohmichi T, Sugimoto N (2009) Hydration of Watson Crick base pairs and dehydration of Hoogsteen base pairs inducing structural polymorphism under molecular crowding conditions. J Am Chem Soc 131(10):3522–3531. https://doi. org/10.1021/ja805972a Mukherjee SK, Knop JM, Möbitz S, Winter RHA (2020) Alteration of the conformational dynamics of a DNA hairpin by α-synuclein in the presence of aqueous two-phase systems. Chem A Eur J 26(48):10987–10991. https://doi.org/10.1002/chem.202002119 Mukherjee SK, Knop J-M, Oliva R, Möbitz S, Winter R (2021) Untangling the interaction of α-synuclein with DNA i-motifs and hairpins by volume-sensitive single-molecule FRET spectroscopy. RSC Chem Biol 2(4):1196–1200. https://doi.org/10.1039/d1cb00108f Myong S, Bruno MM, Pyle AM, Ha T (2007) Spring-loaded mechanism of DNA unwinding by hepatitis C virus NS3 helicase. Science 317(5837):513–516. https://doi.org/10.1126/science. 1144130 Neidle S (2010) Human telomeric G-quadruplex: the current status of telomeric G-quadruplexes as therapeutic targets in human cancer. FEBS J 277(5):1118–1125. https://doi.org/10.1111/j.17424658.2009.07463.x Olsen CM, Marky LA (2009) Energetic and hydration contributions of the removal of methyl groups from thymine to form uracil in G-quadruplexes. J Phys Chem B 113(1):9–11. https://doi. org/10.1021/jp808526d Patra S, Anders C, Schummel PH, Winter R (2018) Antagonistic effects of natural osmolyte mixtures and hydrostatic pressure on the conformational dynamics of a DNA hairpin probed

1

High-Pressure Single-Molecule Studies on Non-canonical Nucleic Acids. . .

35

at the single-molecule level. Phys Chem Chem Phys 20(19):13159–13170. https://doi.org/10. 1039/C8CP00907D Patra S, Schuabb V, Kiesel I, Knop J-M, Oliva R, Winter R (2019) Exploring the effects of cosolutes and crowding on the volumetric and kinetic profile of the conformational dynamics of a poly DA loop DNA hairpin: a single-molecule FRET study. Nucleic Acids Res 47(2):981–996. https:// doi.org/10.1093/nar/gky1122 Paul S, Hossain SS, Samanta A (2020) Insights into the folding pathway of a C-MYC-promoterbased i-motif DNA in crowded environments at the single-molecule level. J Phys Chem B 124(5):763–770. https://doi.org/10.1021/acs.jpcb.9b10633 Phelps C, Lee W, Jose D, Von Hippel PH, Marcus AH (2013) Single-molecule FRET and linear dichroism studies of DNA breathing and helicase binding at replication fork junctions. Proc Natl Acad Sci U S A 110(43):17320–17325. https://doi.org/10.1073/pnas.1314862110 Rajendran A, Nakano SI, Sugimoto N (2010) Molecular crowding of the cosolutes induces an intramolecular i-motif structure of triplet repeat DNA oligomers at neutral pH. Chem Commun 46(8):1299–1301. https://doi.org/10.1039/b922050j Ren J, Qu X, Trent JO, Chaires JB (2002) Tiny telomere DNA. Nucleic Acids Res 30(11): 2307–2315. https://doi.org/10.1093/nar/30.11.2307 Robinson CR, Sligar SG (1994) Hydrostatic pressure reverses osmotic pressure effects on the specificity of EcoRI-DNA interactions. Biochemistry 33(13):3787–3793. https://doi.org/10. 1021/bi00179a001 Rohs R, West SM, Sosinsky A, Liu P, Mann RS, Honig B (2009) The role of DNA shape in proteinDNA recognition. Nature 461(7268):1248–1253. https://doi.org/10.1038/nature08473 Royer CA, Chakerian AE, Matthews KS (1990) Macromolecular binding equilibria in the lac repressor system: studies using high-pressure fluorescence spectroscopy. Biochemistry 29(20): 4959–4966. https://doi.org/10.1021/bi00472a028 Rüttinger S, Macdonald R, Krämer B, Koberling F, Roos M, Hildt E (2006) Accurate single-pair Förster resonant energy transfer through combination of pulsed interleaved excitation, time correlated single-photon counting, and fluorescence correlation spectroscopy. J Biomed Opt 11(2):024012. https://doi.org/10.1117/1.2187425 Schärfen L, Schlierf M (2019) Real-time monitoring of protein-induced DNA conformational changes using single-molecule FRET. Methods 169:11–20. https://doi.org/10.1016/j.ymeth. 2019.02.011 Senior MM, Jones RA, Breslauer KJ (1988) Influence of loop residues on the relative stabilities of DNA hairpin structures. Proc Natl Acad Sci U S A 85(17):6242–6246. https://doi.org/10.1073/ pnas.85.17.6242 Silva JL, Oliveira AC, Gomes AMO, Lima LMTR, Mohana-Borges R, Pacheco ABF, Foguel D (2002) Pressure induces folding intermediates that are crucial for protein–DNA recognition and virus assembly. Biochim Biophys Acta Protein Struct Mol Enzymol 1595(1–2):250–265. https://doi.org/10.1016/S0167-4838(01)00348-X Silva JL, Oliveira AC, Vieira TCRG, de Oliveira GAP, Suarez MC, Foguel D (2014) High-pressure chemical biology and biotechnology. Chem Rev 114:7239–7267. https://doi.org/10.1021/ cr400204z Sinden RR, Pearson CE, Potaman VN, Ussery DW (1998) DNA: structure and function. In: Advances in genome biology, vol 5, pp 1–141. https://doi.org/10.1016/S1067-5701(98)80019-3 Somkuti J, Molnár OR, Smeller L (2020) Revealing unfolding steps and volume changes of human telomeric i-motif DNA. Phys Chem Chem Phys 22(41):23816–23823. https://doi.org/10.1039/ d0cp03894f Son I, Shek YL, Dubins DN, Chalikian TV (2014) Hydration changes accompanying helix-to-coil DNA transitions. J Am Chem Soc 136(10):4040–4047. https://doi.org/10.1021/ja5004137 Sugimoto N (2021) In: Maiti D, Guin S (eds) Chemistry and biology of non-canonical nucleic acids, 1st ed. Wiley, Weinheim. https://doi.org/10.1002/9783527817856 Summers MF, Byrd RA, Gallo KA, Samson CJ, Zon G, Egan W (1985) Nuclear magnetic resonance and circular dichroism studies of a duplex – single-stranded hairpin loop equilibrium

36

S. K. Mukherjee et al.

for the oligodeoxyribonucleotide sequence d(CGCGATTCGCG). Nucleic Acids Res 13(17): 6375–6386. https://doi.org/10.1093/nar/13.17.6375 Sung HL, Nesbitt DJ (2020a) Single-molecule kinetic studies of DNA hybridization under extreme pressures. Phys Chem Chem Phys 22(41):23491–23501. https://doi.org/10.1039/d0cp04035e Sung H-LL, Nesbitt DJ (2020b) DNA hairpin hybridization under extreme pressures: a singlemolecule FRET study. J Phys Chem B 124(1):110–120. https://doi.org/10.1021/acs.jpcb. 9b10131 Sung H-LL, Nesbitt DJ (2020c) High pressure single-molecule FRET studies of the lysine riboswitch: cationic and osmolytic effects on pressure induced denaturation. Phys Chem Chem Phys 22(28):15853–15866. https://doi.org/10.1039/D0CP01921F Takahashi S, Sugimoto N (2013a) Effect of pressure on the stability of G-quadruplex DNA: thermodynamics under crowding conditions. Angew Chemie Int Ed 52(51):13774–13778. https://doi.org/10.1002/anie.201307714 Takahashi S, Sugimoto N (2013b) Effect of pressure on thermal stability of G-quadruplex DNA and double-stranded DNA structures. Molecules 18(11):13297–13319. https://doi.org/10.3390/ molecules181113297 Takahashi S, Sugimoto N (2015) Pressure-dependent formation of i-motif and G-quadruplex DNA structures. Phys Chem Chem Phys 17(46):31004–31010. https://doi.org/10.1039/C5CP04727G Tsukanov R, Tomov TE, Berger Y, Liber M, Nir E (2013) Conformational dynamics of DNA hairpins at millisecond resolution obtained from analysis of single-molecule FRET histograms. J Phys Chem B 117(50):16105–16109. https://doi.org/10.1021/jp411280n Wahl M, Röhlicke T, Rahn H-J, Erdmann R, Kell G, Ahlrichs A, Kernbach M, Schell AW, Benson O (2013) Integrated multichannel photon timing instrument with very short dead time and high throughput. Rev Sci Instrum 84(4):043102. https://doi.org/10.1063/1.4795828 Wang L, Brown SJ (2006) BindN: a web-based tool for efficient prediction of DNA and RNA binding sites in amino acid sequences. Nucleic Acids Res 34(Web Server):W243–W248. https:// doi.org/10.1093/nar/gkl298 Wang G, Vasquez KM (2014) Impact of alternative DNA structures on DNA damage, DNA repair, and genetic instability. DNA Repair 19(1):143–151. https://doi.org/10.1016/j.dnarep.2014. 03.017 Wells RD (2009) Discovery of the role of non-B DNA structures in mutagenesis and human genomic disorders. J Biol Chem 284(14):8997–9009. https://doi.org/10.1074/jbc.X800010200 Winter R (2019) Interrogating the structural dynamics and energetics of biomolecular systems with pressure modulation. Annu Rev Biophys 48(1):441–463. https://doi.org/10.1146/annurevbiophys-052118-115601 Wright EP, Huppert JL, Waller ZAE (2017) Identification of multiple genomic DNA sequences which form i-motif structures at neutral pH. Nucleic Acids Res 45(6):2951–2959. https://doi. org/10.1093/nar/gkx090 Yancey PH, Blake WR, Conley J (2002) Unusual organic osmolytes in deep-sea animals: adaptations to hydrostatic pressure and other perturbants. Comp Biochem Physiol Part A Mol Integr Physiol 133(3):667–676. https://doi.org/10.1016/S1095-6433(02)00182-4 Ying L, Green JJ, Li H, Klenerman D, Balasubramanian S (2003) Studies on the structure and dynamics of the human telomeric G quadruplex by single-molecule fluorescence resonance energy transfer. Proc Natl Acad Sci 100(25):14629–14634. USA. https://doi.org/10.1073/pnas. 2433350100 Yu H, Gu X, Nakano SI, Miyoshi D, Sugimoto N (2012) Beads-on-a-string structure of long telomeric DNAs under molecular crowding conditions. J Am Chem Soc 134(49): 20060–20069. https://doi.org/10.1021/ja305384c Zeraati M, Langley DB, Schofield P, Moye AL, Rouet R, Hughes WE, Bryan TM, Dinger ME, Christ D (2018) I-motif DNA structures are formed in the nuclei of human cells. Nat Chem 10(6):631–637. https://doi.org/10.1038/s41557-018-0046-3

2

Stability Prediction of Canonical and Noncanonical Structures of Nucleic Acids Shuntaro Takahashi, Hisae Tateishi-Karimata, and Naoki Sugimoto

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basics of Stability Prediction of Canonical Structures of Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . Structure and Thermodynamics of the Canonical Structure of Nucleic Acids . . . . . . . . . . . . . . Melting Behavior of Nucleic Acid Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Measurement of Thermodynamic Stability and Calculation of Thermodynamic Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nearest-Neighbor (NN) Parameters: Prediction of Thermodynamic Stability and Its Calculation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application of Prediction to Nonmatched Base Pair and Secondary Structure Based on NN Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stability Prediction of Noncanonical Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applicability of Stability Prediction to Noncanonical Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hairpin Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Triplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-quadruplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-motif . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Expansion and Application of Stability Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Issues in Application of Stability Prediction Under Cellular Conditions . . . . . . . . . . . . . . . . . . . . Stability of DNA Duplex Structure in Different Cation Concentrations . . . . . . . . . . . . . . . . . . . . Stability of DNA Duplex Structure in a Molecular Crowding Environment . . . . . . . . . . . . . . . . Extension of Stability Prediction to the DNA Duplex Structure in Various Solution Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38 41 41 44 46 47 51 53 53 57 58 60 63 66 66 67 68 70

S. Takahashi · H. Tateishi-Karimata Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan e-mail: [email protected]; [email protected] N. Sugimoto (*) Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan Graduate School of Frontiers of Innovative Research in Science and Technology (FIRST), Konan University, Kobe, Japan e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_2

37

38

S. Takahashi et al.

Prediction of the Stability of the DNA Duplex Under Intracellular Conditions by Measuring the Intracellular Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Abstract

The folding and unfolding of nucleic acids (DNA and RNA) is essential for cellular functions. These structural changes in nucleic acids are also widely used in various technical applications using nucleic acids. Thermodynamics for the structural changes is highly useful and important for understanding the biological mechanism of nucleic acid function, as well as for the design of materials for nucleic acids. The canonical structure of nucleic acids is a duplex comprising of Watson-Crick base pairs. As the thermodynamic properties of nucleic acid structures depend on the chemical interactions between nucleotides in the strands, the stability of the duplex can be determined by the sequence, which indicates that stability is predictable. In fact, the stability prediction of nucleic acid duplexes has been developed and widely used. However, such predictions cannot always be adopted in various solution conditions, especially cellular conditions, as the concentrations of cations and co-solutes in the intracellular condition, termed molecular crowding, vary from those under standard experimental conditions. In addition, the crowding conditions in cells are spatiotemporally variable. Furthermore, there are noncanonical structures that are different from duplexes, such as triplexes and tetraplexes. Therefore, there is a need for a method to predict the stability of various nucleic acid structures under cellular conditions. This chapter guides readers through the study of the physicochemical basis for predicting nucleic acid stability and discusses recent studies on the prediction of stability in cellular conditions.

Introduction Nucleic acids (deoxyribonucleic acids [DNA] and ribonucleic acids [RNA]) are biomolecules used by living organisms as genetic materials. The science and technology of nucleic acids have advanced dramatically in the past decade. In particular, remarkable developments have been made in applications for freely manipulating nucleic acids in cells, such as genome editing technology represented by CRISPRCas, chemical synthesis of biological genomes, and nucleic acid medicine used in vaccines against SARS-CoV2. In addition, noncanonical structures, different from the duplex structure proposed by Watson and Crick in 1953 (Watson and Crick 1953), have been discovered one after another in the cell. In fact, the complex behavior of nucleic acids in the cell is gradually gaining the attention of researchers. Currently, various researchers have entered the field of nucleic acid research and are conducting numerous studies using cells. Although chemical techniques for handling nucleic acids have been developed with great sophistication outside the cell, the behavior of nucleic acids inside the cell remains a black box.

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

39

Nucleic acids are anionic macromolecules consisting of sugars, bases, and phosphate nucleotide units. The phosphate diester bonded to these nucleotides is a singlestranded nucleic acid (primary structure). Nucleic acids are either deoxyribose or ribose of the d-furanose type, and are classified into DNA and RNA according to this difference, respectively. Both DNA and RNA contain two purine and two pyrimidine derivatives, respectively, for a total of four bases. Purine bases, adenine (6-aminopurine) and guanine (2-amino-6-oxopurine), and the pyrimidine base cytosine (4-amino-2-oxopyrimidine) are used in both DNA and RNA. The other pyrimidine bases are thymine (2,4-dioxo-5-methylpyrimidine) in DNA and uracil (2,4-dioxo-5-pyrimidine) in RNA. The only difference between thymine and uracil was the presence or absence of a methyl group at the position C5. The abbreviations for each base are as follows: A, adenine; G, guanine; C, cytosine; T, thymine; and U, uracil (Fig. 1). The canonical structure of nucleic acids forms a duplex via base-pairing. The Watson-Crick base pair is a representative base pair. In this base pair, A forms a base pair with T or U and G forms a base pair with C. Two hydrogen bonds are formed in the A-T and A-U base pairs, and three hydrogen bonds are formed in the G-C base pair (Fig. 2). The sequences of the double-stranded bases that form the Watson-Crick base pair must be complementary. For example, the DNA chain with the sequence “d(GCATATGC)” is self-complementary, and “d(CGTGACTC)” and “d(GAGTCACG)” form Watson-Crick base pairs in two molecules. The former is called a self-complementary sequence and the latter a non-self-complementary sequence (Fig. 3). Two oligonucleotides formed a right-handed double-helix structure linked by Watson-Crick base pairs with phosphate on the outside and base on the inside (Fig. 4a). The phosphate diester bond of the oligonucleotide links the 50 and 30 positions of sugars, and thus the nucleotide chain is oriented. Complementary nucleotides were oriented antiparallel to each other. While nucleic acids form canonical duplex structures, as described above, there are also other nucleic acid structures. Mismatched base pairs generated gaps in the Watson-Crick base pairs of the duplex structure. Base-pairing within single-stranded DNA and RNA can cause Watson-Crick base pairs in the same strand to generate hairpin and bulge structures. Furthermore, DNA and RNA can fold into structures via non-Watson-Crick base-pairing. Hoogsteen base pairs, which were identified before the discovery of Watson-Crick base pairs by single X-ray analysis (Hoogsteen 1959, 1963), were found in triplex and tetraplex structures (Fig. 4b, c). Triplex formation is governed by sequence-specific and Watson-Crick base-pairing rules. The third nucleotide strand, known as the “triplex-forming oligonucleotide” (TFO), binds to the purine-rich strand of the duplex by the formation of Hoogsteen or reverse Hoogsteen hydrogen bonds, where T•A*T or C•G*C+ triad are formed (Fig. 4b). For the tetraplex, the G-quadruplex (G4) is formed by guanine quartets via Hoogsteen base-pairings among four guanine bases (Fig. 4c). This tetraplex structure was stabilized by stacking more than two G-quartets. Additionally, there is another tetraplex, called the i-motif. Cytosine has the highest pKa value among the nucleobases and can form a non-Watson-Crick base pair with protonated cytosine (C*C+) under acidic conditions. The i-motif is a tetraplex composed of mutually

40

S. Takahashi et al.

Fig. 1 Chemical structure of DNA (left) and RNA (right) strands. Differences in the RNA structure are highlighted in different colors

Fig. 2 Chemical structure of Watson-Crick base pairs

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

41

Fig. 3 (a) Self- or (b) nonself-complementary duplex structures

intercalated C*C+ pairs (Fig. 4c). These structures differ from duplex structures and are thus referred to as noncanonical structures. Because RNA preferentially forms non-Watson-Crick base-pairings, it has more diverse tertiary structures than DNA. To understand the behavior of nucleic acids, it is important to analyze and quantitatively clarify their structural stability. The Gibbs free energy change (ΔG) for the structure formation is the most commonly used parameter for quantitatively indicating structural stability. To clarify the behavior of nucleic acids in the cell, the structural stability of nucleic acids in the cell can be described by ΔG at 37  C and the reaction of nucleic acids can be understood. Other thermodynamic parameters, such as the enthalpy change ΔH and entropy change ΔS, are beneficial for understanding the chemical interactions and dynamics upon folding of the structures. The melting temperature (Tm) is also useful for evaluating stability. This chapter introduces the physicochemical basics of the determinants of nucleic acid structures and how these properties lead to the development of methods for predicting duplex stability. The prediction method for the stability of noncanonical structures is also explained. Finally, a new physicochemical treatment for understanding the behavior of nucleic acids in cells is described, which also introduces the findings on the new roles of nucleic acids in cells.

Basics of Stability Prediction of Canonical Structures of Nucleic Acids Structure and Thermodynamics of the Canonical Structure of Nucleic Acids The stability of nucleic acid structure is determined by the following five major factors (Fig. 5): 1. Hydrogen bonds: Hydrogen-bonded base pairs are the major interactions that define and maintain the higher-order structure. The positional relationship between the hydrogen atom acceptor and donor determines this interaction. As shown in Figs. 2 and 5, hydrogen bonds were formed in a sequence-specific manner. In non-base-pairing sites, a single base may form hydrogen bonds with multiple nucleotides, and the hydroxyl groups of sugars may also form hydrogen bonds.

42

S. Takahashi et al.

Fig. 4 Canonical and noncanonical structures of nucleic acids (a) duplex, (b) triplex, and (c) tetraplex

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

43

Fig. 5 Factors determining the thermodynamic stability of a canonical duplex structure

2. Stacking interactions: It is an interaction resulting from the overlap of bases, where the contribution of van der Waals interactions is considered larger than that of hydrophobic interactions in natural bases. The stabilization energy is significantly affected by the type of neighboring base. In particular, a relatively large stabilization energy is generated by the overlap of the purine bases. 3. Conformational entropy: The formation of nucleic acid structures is enthalpically stabilized by hydrogen bonding and stacking interactions (ΔH < 0), and the loss of structural entropy due to the decrease in the degree of freedom of the nucleotide torsion angle is also associated (ΔS < 0). Seven bond angles define the conformation of nucleotides, and the degrees of freedom of these bond angles are limited during helix formation. Many mismatched base pairs and non-basepairing sites are not formed independently because the large energy loss due to structural entropy may not be compensated by hydrogen bonding and stacking interactions. 4. Counterion condensation: When nucleotides assemble to form a helix, large electrostatic repulsion occurs because of the negative charges around the

44

S. Takahashi et al.

proximity of phosphate groups. To suppress this electrostatic repulsion, counterion condensation occurs around the phosphate groups, where the formation of higher-order structures is accompanied by cation binding. In particular, in nucleotides that form higher-order structures, there are areas where the negative charge density increases locally, and in these cases, specific cation binding is essential for structure formation. 5. Hydration: Due to the fact that the contribution of hydrophobic interactions is not significant in the structure formation of nucleic acids, the effect of solvent (water) on the stability of nucleic acid structures is considered to be small. However, hydration (or dehydration) has a significant influence on nucleic acid structure, and the structure tends to be destabilized when a nonaqueous solvent is mixed with the aqueous solution. During the folding process of the structural region, a large amount of hydration water is released, and ΔS at this time may contribute to structural stabilization. On the other hand, the structure-specific hydration occurs after helix formations. In the case of duplexes, hydration waters align along the minor groove of the helix. In addition, the pH of the solution may affect base pair formation. While (1–3) are factors on the nucleotide side, (4) and (5) are factors in the surrounding environments (Fig. 5).

Melting Behavior of Nucleic Acid Structures When the thermal energy with increasing temperature exceeds the stabilization energy of the nucleic acid helix, the helical structure unfolds and becomes a single-stranded coil form. This process is called the melting of nucleic acid structures (▶ Chap. 35, “Effects of Molecular Crowding on Structures and Functions of Nucleic Acids,” Section “Effects of Molecular Crowding Environments on Nucleic Acids”). As this melting behavior corresponds to the stability of the helix, melting analysis is a widely used method for quantifying the thermodynamic stability of higher-order structures. In this subsection, a method for measuring melting curves using spectroscopic techniques is described. Nucleobases exhibit strong absorption bands in the ultraviolet (UV) region, and their structural stability can be determined by measuring their absorbance (Abs). The absorption band around 260 nm is caused by the transition dipole moment, which is attributed to the π-π* transition of the base and has an extinction coefficient of approximately 104 M1 cm1 per nucleotide. Since this phenomenon originates from the dipole moment, a dipole moment in the opposite direction is induced in neighboring bases. Owing to this effect, the transition dipole moment of the base is reduced by the stacking of bases. Because the amount of light absorption is determined by the magnitude of the transition dipole moment, the absorbance of single-stranded nucleotides is smaller than that of the bases alone. This effect of reduced absorbance is called “hypochromicity.” Due to the fact that the absorbance of a base is affected by the direction and magnitude of the

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

45

Table 1 Molar extinction coefficient of nucleotide at 260 nm (M1 cm1) (Crothers et al. 2000) Nucleotide AA AC AG AT(AU) CA CC CG CT(CU) GA GG

DNA 13,650 10,670 12,790 11,420 10,670 7520 9390 7660 12,920 11,430

RNA 13,650 10,670 12,790 12,140 10,670 7520 9390 8370 12,920 11,430

Nucleotide GC GT(GU) TA(UA) TG(UG) TC(UC) TT(UU) A C G T(U)

DNA 9190 10,220 11,780 9700 8150 8610 15,340 7600 12,160 8700

RNA 9190 10,960 12,520 10400 8900 10110 15,340 7600 12,160 10210

transition dipole moment of the neighboring base, the absorbance coefficient ε of a nucleotide can be estimated by considering only two consecutive bases, which are part of nearest-neighbor base pairs. For example, the absorption coefficient of singlestranded RNA r(GCAUAUGC) can be calculated as follows: e rðGCAUAUGCÞ ¼ 2feðGCÞ þ eðCAÞ þ eðAUÞ þ eðUAÞ þ e ðAUÞ þ e ðUGÞ þ eðGCÞg  feðCÞ þ e ðAÞ þ e ðUÞ þ e ðAÞ þ e ðUÞ þ e ðGÞg,

(1) In the context of dinucleotides, all but the terminal nucleotides will be calculated twice. Therefore, it was necessary to subtract the absorption coefficient of the extra mononucleotide (monomer). The parameters measured at 260 nm are listed in Table 1. These values could be used to determine the absorption coefficient of any oligonucleotide. If the UV absorbance of a nucleic acid sample at 260 nm is measured using a cuvette with a path length of l, the concentration c (M) of the nucleic acid can be determined from the path length l (cm) and the absorption coefficient ε according to the Lambert-Beer rule in Eq. (2): Abs ¼ e l c

ð2Þ

The absorption coefficient of the double-helix structure is smaller than that of the single-stranded nucleotides because the transition dipole moment of the bases is smaller when the Watson-Crick base pair is formed. Instead of UV measurements, other spectroscopic assays, such as circular dichroism (CD) spectroscopy, are also available to monitor the melting behavior of nucleic acid helices (CD melting). When unfolding can be monitored by fluorescence, fluorescence spectroscopy can also track the melting profile according to fluorescence energy transfer.

46

S. Takahashi et al.

Measurement of Thermodynamic Stability and Calculation of Thermodynamic Parameters When nucleic acid structures, such as duplexes, dissociate into a single strand with increasing temperature, the melting of the duplex into a single strand can be observed as an increase in absorbance (hyperchromicity) with increasing temperature (Fig. 6a). The melting curve is sigmoidal (S-shaped) and the midpoint of the curve is called the melting temperature (Tm). Tm is the temperature at which the concentrations of single-stranded and double-stranded nucleic acids are the same in the case of nucleic acid duplex melting, which is used as an apparent index of the structural stability of nucleic acids. The thermodynamic parameters can be calculated from the melting curve, and in the sigmoidal melting curve, the absorbance changes almost linearly between the low- and high-temperature regions (Fig. 6a) because the absorbance coefficients of double- and single-stranded chains change proportionally with temperature. The equilibrium constants K of single- and double-stranded nucleic acids at each temperature can be obtained from the melting curves, considering that the change in absorbance in the region between them reflects the structural transition of the nucleic acids. When the mole fraction of the duplex is α, the equilibrium constant K for the system can be expressed as follows: K ¼ ½Strand1 • Strand2=ð½Strand1 ½Strand2Þ ¼ 2α= ð1  αÞ2 Ct ,

ð3Þ

where Ct is the total concentration of nucleic acids. Assuming the Gibbs free energy change, enthalpy change, and entropy change of duplex formation as ΔG , ΔH , and ΔS , and the change in molar specific heat (ΔCp) as zero (see also the subsection “Nearest-Neighbor (NN) Parameters: Prediction of Thermodynamic Stability and Its

Fig. 6 Analysis of the melting profiles. (a) Typical melting curve of duplex. Tm equals the temperature when α ¼ 0.5. (b) Typical plot of Tm1 and log(Ct/n)

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

47

Calculation Method”), ΔH and ΔS can be calculated from the temperature dependence of In K, as follows: ΔG ¼ ΔH   TΔS ¼ RT ln K,

ð4Þ

where R is the gas constant. The standard state here is 1 atm. In the case of a reaction in which two strands form a duplex in a two-state transition manner, the Tm of the melting curve varies with nucleic acid concentration. When the temperature is Tm, i.e., α ¼ 1/2, substituting Eq. (3) into Eq. (4), ΔH (kcal mol1) and ΔS (cal K1 mol1) can be obtained from the dependence of Tm (K) on the total concentration of nucleic acids, Ct. 1=T m ¼ 2:303Rð log ðCt =nÞÞ=ΔH  þ ΔS =ΔH 

ð5Þ

In Eq. (5), R is the gas constant (1.987 cal K1 mol1), n ¼ 1 for selfcomplementary sequences, and n ¼ 4 for non-self-complementary sequences. Figure 6b shows a plot of Tm on the vertical axis and log(Ct/n) on the horizontal axis for Tm values obtained at various nucleic acid concentrations. Because the slope of this linear relationship is 2.303R/ΔH and the intercept is ΔS /ΔH , ΔH and ΔS can be also calculated from this relationship, once ΔH and ΔS are obtained, and the value of ΔG (kcal mol1) at any temperature T can be calculated using Eq. (4). For example, in the case of r(GAACGUUC), the Gibbs free energy change (ΔG 37: the value of –ΔG is called the stabilization energy) at 37  C for the formation of a self-complementary duplex is obtained to be –9.03 kcal mol1 (Xia et al. 1998).

Nearest-Neighbor (NN) Parameters: Prediction of Thermodynamic Stability and Its Calculation Method As described above, it is possible to analyze the structural melting caused by heating in vitro by spectroscopic methods; however, if the sequence-dependent stability is elucidated, the stability of duplexes without melting experiments can be predicted. A method was developed to predict the thermodynamic parameters of duplexes in vitro without experiments. This method is called the nearest neighbor (NN) method. Currently, the NN method is widely used to predict the thermal stability of doublehelix structures. This NN model, first proposed by Tinoco Jr. et al. in 1971 (Tinoco et al. 1971), is based on the idea that the most influential base pair in nucleic acid formation is the neighboring base pair that has already been formed. This is because the strength of hydrogen bonds in base pairs is determined by the combination of bases, and the stacking interaction is inversely proportional to the sixth power of the distance. Thus, the energetic contribution from base pairs beyond the nearestneighboring base pairs can be considered negligible. According to this idea, duplex stability is determined by the addition of adjacent base pairs. In other words, among the five main factors that determine the stability of nucleic acid structures, the NN

48

S. Takahashi et al.

model considers (1) hydrogen bonding, (2) stacking interactions, and (3) conformational entropy under a certain solution environment at each nearest-neighbor base pair unit. There are 10 possible pairs of nearest-neighbor base pairs in the duplex for DNA/DNA and RNA/RNA and 16 pairs for RNA/DNA hybrid, as shown below. Thus, methods to predict the thermodynamic parameters (ΔH , ΔS , ΔG , and Tm) of DNA/DNA, RNA/RNA, and RNA/DNA hybrid duplexes have been established for stability in a solution containing 1 M NaCl. Duplex stability can be calculated easily using NN base pair parameters. According to the NN model, the thermodynamic parameters of duplex formation consist of three terms. The first term is the free energy change in helix propagation, as shown above, which is reflected in the sum of each subsequent base pair formation. The second term is the free energy change of helix initiation to form the first base pair of the duplex. This parameter, called the “initiation factor,” differs for duplexes with at least one GC base pair and for duplexes formed with AT alone. The third term is the free energy change owing to the mixing of the entropy terms of the self-complementary strands. Thus, the total ΔG 37 (free energy change of duplex formation at 37  C) is given by Eq. (6): ΔG37ðtotalÞ ¼

i

ni ΔG37,NNðiÞ þ ΔG37ðinitÞ þ ΔG37ðsymÞ

ð6Þ

For example, r(GAACGUUC) formed a self-complementary dimer described above. This dimer is represented by the sum of the NN base pairs, as shown below (omitting notation r): GAACGUUC=GAACGUUC ¼ GA=CU þ AA=UU þ AC=UG þ CG=GC þ GU=CA þ UU=AA þ UC=AG ðNote the description of the NN set:For example, GC=CG means 50  GC  30 =30  CG  50 :Þ

Thus, ΔG 37 for r(GCAUAUGC) dimer formation in 1 M NaCl condition (Xia et al. 1998) (Table 2) is: ΔG 37 ¼ ΔG 37 ðinitÞ þ ΔG 37 ðsymÞ þ 2ΔG 37 ðGA=CUÞ þ 2ΔG 37 ðAA=UUÞ þ 2ΔG 37 ðAC=UGÞ þ ΔG 37 ðCG=GCÞ ¼ ð4:09Þ þ ð0:4Þ þ 2  ð2:35Þ þ 2  ð0:93Þ þ 2  ð2:24Þ þ ð2:36Þ ¼ 8:91 kcal mol1 Although the NN parameters often include ΔG 37, ΔG in different temperatures can be easily predicted using the NN parameters ΔS and ΔH . Thus, the entropy change ΔS for the formation of the GCAUAUGC dimer is:

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

49

Table 2 Comparison of nearest-neighbor parameters (ΔG NN) for RNA/RNA duplex formation in different solution conditions at 37  C Nearestneighbor set d(AA/UU) d(AU/UA) d(UA/AU) d(CU/GA) d(CA/GU) d(GU/CA) d(GA/CU) d(CG/GC) d(GG/CC) d(GC/CG) Initiation Per terminal AU Symmetry factor

1 M NaCl in the absence of co-solutea (kcal mol1) –0.93 –1.10 –1.33 –2.08 –2.11 –2.24 –2.35 –2.36 –3.26 –3.42 4.09 0.45

0.43

100 mM NaCl in the absence of co-soluteb (kcal mol1) n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d. n.d.

n.d.

1 M NaCl in the presence of 20 wt% PEG 200c (kcal mol1) –0.88 –1.13 –1.36 –2.03 –1.91 –2.36 –2.36 –2.19 –3.32 –3.44 4.63 0.55

0.68

a

Data obtained from the report of Xia et al. (1998) Data have not been determined c Data collected from the report of Adams and Znosko (2019) b

ΔS ¼ ΔS ðinitÞ þ ΔS ðsymÞ þ 2ΔS ðGA=CUÞ þ 2ΔS ðAA=UUÞ þ 2ΔS  ðAC=UGÞ þ ΔS ðCG=GCÞ ¼ ð1:5Þ þ ð1:4Þ þ 2  ð32:5Þ þ 2  ð19:0Þ þ 2  ð29:5Þ þ ð26:7Þ ¼ 191:6 cal mol1 K1 where ΔS (init) is the correction term for the entropy change of the first base pair formation during dimer formation and ΔS (sym) is the entropy correction term for dimer formation by the self-complementary oligomer. Similarly, the enthalpy change ΔH of dimer formation was –68.35 kcal mol1. Furthermore, the value of ΔG 37 was: ΔG 37 ¼ ΔH   ð37 þ 273:15ÞΔS ¼ 8:92 kcal mol1 This calculated (predicted) value is in good agreement with the measured value of –9.03 kcal mol1 shown in the previous subsection (Xia et al. 1998). Based on this NN model, different sequences with the same NN set and initiation factor should exhibit the same stability. For example, the different sequences d (ATGAGCTCAT) and d(ATCAGCTGAT) had the same NN set and the same initiator of the duplex (Fig. 7). On the other hand, sequence d(AGTCATGACT)

50

S. Takahashi et al.

Fig. 7 Schematic representation of the prediction of duplex stability by NN methods

has the same AT content but different stability compared to the above sequences because the NN set is different from the above sequences (Fig. 7). NN parameters have been developed by several groups, including our group, and have been used in various types of DNA/DNA, RNA/RNA, and RNA/DNA hybrid duplexes, as well as peptide nucleic acid/DNA duplexes (Allawi and SantaLucia 1997; Breslauer et al. 1986; Freier et al. 1986b; Hudson et al. 2013; SantaLucia et al. 1996; Sugimoto et al. 1995, 1996, 2001; Xia et al. 1998). These examples show that the NN model can be generally applied to accurately predict the stability ΔG 37 and Tm of all types of duplexes. When determining the NN parameters, the data source is the thermodynamic parameters of the formation of various duplexes (usually more than 30 sequences). The total frequency of NN sets containing all sequences tested should be unbiased to obtain correct NN parameters. Computational calculations based on a linear leastsquares computer program drive the calculation of the determined ΔG 37 and ΔH via experiments using parameter sets (e.g., 13 parameters of DNA/DNA, 10 WatsonCrick NN base pairs, two initiation factors, and a symmetry parameter for selfcomplementary sequences; Tables 2, 3 and 4). To treat the prediction parameters easily, some approximations are usually performed. Typically, the difference in heat capacities (ΔCp) of the two states (single strand and duplex) is assumed to be zero. This is because the Tm values for most studied sequences are not far from 37  C, and thus a zero ΔCp approximation is expected to be acceptable for the ΔG 37

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

51

Table 3 Comparison of nearest-neighbor parameters (ΔG NN) for DNA/DNA duplex formation in different solution conditions at 37  C Nearestneighbor set d(AA/TT) d(AT/TA) d(TA/AT) d(CA/GT) d(GT/CA) d(CT/GA) d(GA/CT) d(CG/GC) d(GC/CG) d(GG/CC) Initiation per GCd Initiation per ATd Symmetry factore

1 M NaCl in the absence of co-solutea (kcal mol1) –1.00 –0.88 –0.58 –1.45 –1.44 –1.28 –1.30 –2.17 –2.24 –1.84 0.98

100 mM NaCl in the absence of co-soluteb (kcal mol1) –0.65 –0.60 –0.36 –1.23 –1.20 –1.11 –0.93 –1.85 –2.05 –1.69 0.98

100 mM NaCl in the presence of 40 wt% PEG 200c (kcal mol1) –0.55 –0.28 –0.16 –1.00 –0.89 –0.91 –0.87 –1.38 –1.31 –1.25 0.76

1.03

1.03

1.00

0.40

0.40

0.40

a

Data obtained from the report of SantaLucia (1998) Values corrected as described by Huguet et al (2010) c Data collected from our recent report (Ghosh et al. 2020) d Because stacking interactions determine the salt dependency of DNA stability in monovalent conditions, initiation parameters remain the same at low NaCl concentrations, as initiation does not involve a stacking interaction between base pairs (only hydrogen bonding involved) e Free energy change due to entropic penalty for maintaining C2 symmetry for self-complementary sequences is independent of the environment b

calculation. In general, it has been reported that ΔC p is often small for nucleic acids (Allawi and SantaLucia 1997). Moreover, owing to the enthalpy-entropy compensation, ΔCp does not generally affect the ΔG 37. Another approximation is the phosphate at the 50 end. Usually, synthetic oligonucleotides that have 50 -OH are used for the melting assays. The 50 -phosphate slightly stabilizes duplexes (Freier et al. 1985). However, the magnitude of stabilization is sufficiently small to neglect compared the dangling end. Therefore, the established NN parameters were obtained from the data using 50 -OH oligonucleotides without any corrections.

Application of Prediction to Nonmatched Base Pair and Secondary Structure Based on NN Rules Although matched Watson-Crick base pairs are selective and stable, mismatched base pairs can form hydrogen bonds between themselves and stack with NN base pairs. Thus, mismatched base pairs have a stabilizing effect on duplex formation, which

52

S. Takahashi et al.

Table 4 Comparison of nearest-neighbor parameters (ΔG NN) for RNA/DNA hybrid duplex formation in different NaCl concentrations in the absence of co-solute at 37  C

Nearest-neighbor set rAA/dTT rAC/dTG rAG/dTC rAU/dTA rCA/dGT rCC/dGG rCG/dGC rCU/dGA rGA/dCT rGC/dCG rGG/dCC rGU/dCA rUA/dAT rUC/dAG rUG/dAC rUU/dAA initiation

a

1 M NaClb (kcal mol1) –1.0 –2.1 –1.8 –0.9 –0.9 –2.1 –1.7 –0.9 –1.3 –2.7 –2.9 –1.1 –0.6 –1.5 –1.6 –0.2 3.1

100 mM NaClc (kcal mol1) –0.7 –1.5 –1.3 –0.4 –1.2 –1.7 –1.4 –0.4 –1.5 –2.0 –2.3 –1.4 –0.5 –1.4 –1.6 0.2 2.0d, 2.6e

a

Note the description of the NN set. For example, rAC/dTG means 50 -rArC-30 /30 -dTdG-50 b Data obtained from our earlier report (Sugimoto et al. 1995) c Data obtained from our earlier report (Banerjee et al. 2020b, 2021) d Initiation parameters for the duplexes that contain at least one rG  dC or rC  dG base pair in any terminal e Initiation parameters for the duplexes that contain only rA  dT or rU  dA base pairs in both terminals

indicates that NN rules are applicable to duplex stability with mismatches (Allawi and SantaLucia 1997, 1998a, b, c; Peyret et al. 1999). Eight possible single mismatches occur in DNA with varying frequencies and stabilities: A•A, A•C, C•C, C•T, G•G, G•A, G•T, and T•T (U for RNA rather than T). In the case of DNA, the effect of mismatches on duplex stability has been extensively investigated. The analysis of hybrid RNA/DNA duplexes revealed that the order of mismatch stabilities is rG•dTrU•dG≈rGdG>rA•dG≈rG•dA≈rA•dC>rA•dA≈rU•dT≈rU•dC>rC•dA≈rC•dT, which can be slightly altered by the identity of adjacent Watson-Crick base pairs (Sugimoto et al. 2000). In RNA, the rG•rU base pair, which frequently found in RNA structures, forms a stable non-Watson-Crick base pair and stabilizes the RNA duplex (Mathews et al. 1999b). Furthermore, free energy increases have been reported for each double mismatch, indicating that r(GU/UG), r(GA/AG), and r(UU/UU) double mismatches stabilize a duplex, whereas other mismatches destabilize a duplex, which would form an internal loop (Mathews et al. 1999b). As the terminal end is not restricted by the backbones compared to the nucleotides in a helix core, terminal mismatches can favorably contribute to stabilization of the duplex (Bommarito et al. 2000; Sugimoto et al. 1987b). These terminal mismatches are more stable than mismatches in the chain. This is because the flexible geometry

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

53

of the ends favors stacking interactions with the NN base pair and the formation of hydrogen bonds between mismatched bases. Similarly, the dangling end stabilizes the duplex by stacking on the NN base pair (Bommarito et al. 2000; Freier et al. 1985, 1986b; Sugimoto et al. 1987a). In the case of RNA, the stabilization effect is large and the 30 dangling ends are more stable than the 50 ends, which is different from the case for DNA. These differences might be due to the different geometry of the helix, in which DNA forms the B-type, whereas RNA forms the A-type. The length of the dangling ends also has a greater effect on its stabilization than short dangling ends (Ohmichi et al. 2002). This information is useful for predicting structures with long single-stranded regions, such as pseudoknot structures. The application of these matched and nonmatched base pairs leads to the prediction of the secondary structure of nucleic acids from the sequence information. The MFOLD is the representative platform for predicting DNA and RNA secondary structures (Zuker 1989, 2003; Zuker and Stiegler 1981). This algorithm calculates the minimum free energy (MFE) ΔGmin of the secondary structure from possible combinations of base pairs in a single-stranded sequence. Recently, the prediction accuracy of secondary structures has been improved by combining chemical footprinting and non-MFE approaches, including a deep learning technique. Table 5 shows a summary of the representative NN parameters and related software used to predict the stability of nucleic acid structures. Based on these predictions, prediction methods have been further developed for three-dimensional structures such as DNA origami, DNA-guided crystallization of colloids, and DNA-mediated nanoparticle superlattices.

Stability Prediction of Noncanonical Structures Applicability of Stability Prediction to Noncanonical Structures The versatility of NN rules for canonical duplex structures fundamentally relies on a uniform tertiary structure independent of sequences. As mentioned previously, mismatched base pairs that can be part of a Watson-Crick type base pair are applicable to the stability prediction of duplexes by NN rules. In the case of noncanonical structures, the tertiary structures were developed by non-Watson Crick base pairs. Although hydrogen bonding utilizes combinations of donors and acceptors of hydrogen bonds in the bases, which is different from canonical structures, the physicochemical properties of noncanonical structures should also be dominated by the five factors listed in the subsection “Structure and Thermodynamics of the Canonical Structure of Nucleic Acids.” Thus, the fundamental concept of NN rules should be adopted, even in the case of noncanonical structures. Noncanonical structures can be formed from intermolecular and intramolecular strands. From biological and technological aspects, it is very important to predict the stability of intramolecular structures. In the intramolecular structures, the loops and topologies of the strands exhibited various geometries (Fig. 8). Thus, the

Method or software name Classical method to obtain NN parameters Classical method to obtain NN parameters Classical method to obtain NN parameters Classical method to obtain NN parameters

Classical method to obtain NN parameters Classical method to obtain NN parameters

Type of nucleic acid structure DNA duplex (Intermolecular)

RNA duplex (Intermolecular)

Stability in 1 M Na+ and 20 wt% PEG200

Stability in 1 M Na+

Stability in 10 mM ~ 1 M Na+ concentrations and various crowders

Stability in 100 mM Na þ and 40 wt% PEG200

Stability in 10 mM ~ 1 M Na+ concentrations

Target Stability in 1 M Na+ and any different concentrations

Multiple regression analysis of experimental data

Method/Algorithm Multiple regression analysis of experimental data with empirical formula based on nonsequencespecific salt correction Multiple regression analysis of experimental data with empirical formula based on sequencespecific salt correction Multiple regression analysis of experimental data with empirical formula based on nonsequencespecific salt correction Multiple regression analysis of experimental data with empirical formula based on sequencespecific salt and crowders correction Multiple regression analysis of experimental data

Table 5 Overview of methods and software for an estimation of stability of various nucleic acid structures

NNDB https://rna.urmc. rochester.edu/ NNDB/index.html Dataset and equations in the literature

Dataset and equations in the literature

Adams and Znosko (2019)

Turner and Mathews (2010), Xia et al. (1998)

Ghosh et al. (2020)

Ghosh et al. (2019)

Huguet et al. (2010)

Dataset and equations in the literature Dataset and equations in the literature

References SantaLucia (1998)

Links to database Dataset and equations in the literature

54 S. Takahashi et al.

Triplex DNA (Intermolecular)

Classical method to obtain NN parameters Stability in different pH conditions

Stability in different salt conditions

Melting temperature optimization (MTO) method to obtain NN parameters for DNA directly from measured Tm Minimum free energy (MFE) approach using NN parameters and energetic data derived from other secondary structures Minimum free energy (MFE) approach modified with information of chemical mapping Empirical formula based on sequence-specific pH correction

Stability in various Na+ concentrations

RNA structure

Empirical formula based on nonsequence-specific salt correction

Stability in 100 mM Na+

Stability in different salt conditions

All duplexes (Intermolecular)

MFOLD

Multiple regression analysis of experimental data

Stability in 100 mM Na+

DNA and RNA duplexes containing bulge, hairpin, and dangling end (Intramolecular)

Multiple regression analysis of experimental data

Stability in 1 M Na+

Classical method to obtain NN parameters Classical method to obtain NN parameters Classical method to obtain NN parameters VarGibbs

DNA/RNA hybrid duplex (Intermolecular)

https://rna.urmc. rochester.edu/ RNAstructure. html Dataset and equations in the literature

http://unafold.rna. albany.edu/

https://bioinf. fisica.ufmg.br/ app/comparetm.pl

Equations in the literature

Dataset and equations in the literature

Dataset and equations in the literature

(continued)

Lilley (2000), Roberts and Crothers (1996)

Mathews et al. (2004)

Basilio Barbosa et al. (2019), Ferreira et al. (2019) Zuker (1989, 2003), Zuker and Stiegler (1981)

Nakano et al. (1999)

Banerjee et al. (2020a, 2021)

Sugimoto et al. (1995)

2 Stability Prediction of Canonical and Noncanonical Structures of. . . 55

RNAfold 2.0 g (ViennaRNA package) G4-iM Grinder

G4 RNA (Intramolecular)

Stability of i-motif

Stability of G4 RNA

Target Stability of G4 DNA

Reproduced from Ref. Takahashi and Sugimoto (2020)

i-motif DNA (Intramolecular)

Method or software name Quadpredict

Type of nucleic acid structure G4 DNA (Intramolecular)

Table 5 (continued)

Evaluation of potential quadruplex sequences using several published and contrasted scoring algorithms

Minimum free energy (MFE) approach

Method/Algorithm Machine learning based on a Bayesian learning algorithm

https://github. com/EfresBR/ G4iMGrinder

Links to database http://www. quadruplex.org/? view¼quadpredict https://www.tbi. univie.ac.at/RNA/

Belmonte-Reche and Morales (2019)

References Stegle et al. (2009), Wong et al. (2010) Lorenz et al. (2013)

56 S. Takahashi et al.

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

Fig. 8 Tertiary structures of (a) hairpin and noncanonical structures, such as (b) triplex, (c) antiparallel G4, (d) mixed G4, (e) parallel G4, and (f) i-motif

(a)

57

(b) 3’

5’

5’

3’ (d)

(c)

(e)

(f)

3’

3’ 5’ 3’ 5’

5’

5’

3’

three-dimensional interaction between loops and helices or loops contributes to the stability of noncanonical structures.

Hairpin Loop Hairpin loops occur when the DNA or RNA strands fold back on themselves to form base pairs via intramolecular folding and are thus often found in RNA. An increase in the length of the coil structure increased the entropic contribution. Thus, hairpin stability decreases with increasing loop size. In addition, the base pair closing the loop affects the loop flexibility and stability of the hairpin structure. The energetic contribution of the loop is calculated as follows (Mathews et al. 1999b): ΔG 37ðloopÞ ¼ ΔG 37ðlengthÞ þ ΔG 37ðmmÞ  0:8 if the first mismatch is GA or UU

ð7Þ

where ΔG 37(length) is the free energy increase for the length of the hairpin loop, and ΔG 37(mm) is the free energy increase for the first mismatch in the loop (Freier et al. 1986a; Hickey and Turner 1985; Sugimoto et al. 1987b). The stem region is a duplex structure, which can be predicted by classical NN parameters. Thus, the overall stability of the hairpin loop can be determined as the sum of the stability of the loop (ΔG 37(loop)) and stem (ΔG 37(stem)), calculated using NN parameters of duplex regions (Fig. 9). However, the stability prediction often deviates because sequence-specific hydrogen bonding and stacking interactions can occur within the loop region (Mathews et al. 1999b). For example, the GNRA (N: A, G, C, or U, R: A or G) tetraloop, a common hairpin loop in ribosomal RNA, shows exceptional stability compared to other hairpin loops (Antao et al. 1991).

58

S. Takahashi et al.

Fig. 9 Structural factors that affect the thermodynamic stability of a hairpin structure. The tertiary structure of an RNA hairpin having a GAAA tetra loop was obtained from PDB:1ZIF. The cyan region is the stem and the green region is the loop

Fig. 10 Structural factors that affect the thermodynamic stability of the triplex. The tertiary structure of a triplex DNA was obtained from PDB:1GN7. The cyan region is the duplex (hairpin), the orange region is the TFO, and the green region is the loop

Triplex A triplex is formed with Watson-Crick duplex and another third strand named as the “triplex-forming oligonucleotide (TFO)” in the sequence-specific manner. The base pairing of TFO is governed by the formation of Hoogsteen or reverse Hoogsteen hydrogen bonds between A of the duplex and T of TFO or G and C+ (Figs. 4 and 10), respectively. Thus, TFO is pyrimidine rich and binds to the purine-rich region of the duplex. As triplex binding is sequence dependent as well as duplex, stability can be determined by the availability of the formation of Hoogsteen base pairs together with the conformational entropy of the bound TFO strand. Thus, the mechanism that determines the stability gained by TFOs should be similar to that of Watson-Crick base pairs, indicating that NN rules can be adopted for the stability prediction of triplexes. The parameters for triplex stability include energetic contributions, such as (i) helix initiation, (ii) formation of a T•A*T triad (“*” indicates Hoogsteen base pair), (iii) formation of a C•G*C+ triad, and (iv) base stacking between TFO bases. However, the base stacking in the TFO is weaker than that in the duplex due to steric differences. Furthermore, two additional factors were considered for the triplex

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

59

Fig. 11 Comparison of measured versus predicted melting temperatures of triplexes. (The figure has been published previously and reproduced from Ref. Roberts and Crothers (1996) with permission from the National Academy of Sciences, copyright (1996))

derived from the protonation of the C bases in the TFO. One is (v) the repulsion between consecutive protonated C bases. This repulsion causes incomplete protonation of the C base along TFO, resulting in destabilization compared to T-rich triplexes. Incomplete deprotonation of an unfavorable C+-C base pair within the TFO strand also affects triplex stability (Roberts and Crothers 1991). Furthermore, (vi) the pH dependence of the triad is the other factor. As the pKa value of cytosine is approximately 4.6, the stability of the C•G*C+ triad is most stable in acidic conditions. In addition, the energetic contribution of the loop region to the intramolecular i-motif should be considered (Booher et al. 1994; Prakash and Kool 1992). These factors led to the development of the stability prediction of the triplex from the sequence information and pH as follows: (Lilley 2000; Roberts and Crothers 1996) ΔG ¼ ΔGðtriplexÞ þ ΔGðpHÞ þ ΔGðinitÞ þ ΔGðloopÞ ¼ ½3:00ðCÞ  0:65ðTÞ þ 1:65ðCCÞ þ ½ðCÞðpH  5:0Þð1:26  0:08 ðCCÞÞ þ 6:0 ð8Þ where ΔG (loop) is the free energy change contributed by the loop sequences of the triplex, (Booher et al. 1994) C is the number of C bases, T is the number of T bases, and CC is the number of CC dinucleotides. The concept of Eq. (8) is shared with Eq. (7) based on the NN rule, although the effect of the NN base is only considered for the CC dinucleotide and helix initiation. As the accuracy of this prediction was quite high (an rms residual of 0.61 kcal mol1 at 37  C) (Roberts and Crothers 1996), this indicates that the energetic contribution from TFO is mainly due to pH dependency and not a stacking interaction (Fig. 11).

60

S. Takahashi et al.

G-quadruplex The G-quadruplex (G4) is one of the tetraplexes of nucleic acid structures. G4 is formed with stacks of guanine quartets according to the two Hoogsteen hydrogen bonding reactions between each other (Figs. 1c and 12). The common sequence forming G4 is the consecutive repeats of G bases. Such G-rich sequences assemble in an inter- or intramolecular manner. Thus, for the intramolecular G4, the potential G4-forming sequence has four tandem G repeats for the tetraplex core connected with spacing sequences for loop regions such as GxLyGxLyGxLyGx (x  2 and y  1). One unique property of G4 formation is the requirement of certain cations to build up the G4 structure. These cations coordinate the G-quartet inside the core of G4. As physiological cations, Na+ or K+ is the specific binder to G4. Another unique point is the structural diversity compared with other structures. The G4 topology can change owing to the glycosidic conformational patterns and loop orientations (Fig. 8). These topology changes are caused by the sequence, cation, and solution environment. Regarding the structural aspects, the G4 stability mainly depends on several points as follows: (i) number of G-quartets, (ii) length of loops, (iii) base composition of the loops, and (iv) topology of the G4. Compared to duplexes, these structural variations of G4 make it complex to find the rules determining the stability of G4s. Indeed, a method to predict G4 stability from the sequence information has not been established. However, some prediction approaches were reported based on those points of G4 stability. For example, of (i), as shown in duplexes, the increase in the number of G-quartets directly stabilizes the G4 owing to the increase in the number of hydrogen bonds and stacking interaction between G-quartets. Each G-quartet has 8 Hoogsteen hydrogen bonds and the large planar structure of the G-quartet generates large stabilization via the π–π stacking interactions. Thus, the energetic gain per G-quartet step is larger than the base pair step in the duplex (Pandey et al. 2013). One

Fig. 12 Structural factors that affect the thermodynamic stability of G4. The tertiary structure of G4 DNA was obtained from PDB:2HY9. The cyan region is the stem and the green region is the loop

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

61

study reported that the energetic contribution of each G-quartet could be calculated from the subtraction from the net stability of the three quartets of G4 DNA (d (G3T3G3TGTG3T3G3)) to the two quartets of G4 DNA (d(G2T2G2TGTG2T2G2)). As these sequences both form an antiparallel G4 topology, the energetic comparison of the stability minimizes the effect of the G4 topology on the stability. As a result, the thermodynamic profile for the formation of a G-quartet stack is ΔG 20 ¼ 2.2 kcal mol1, ΔH ¼ 14.6 kcal mol1, and TΔS ¼ 12.4 kcal mol1 at 20  C (Olsen et al. 2006). For the case of (ii), basically the loop length negatively affects the stability of G4 as found in the hairpin loop. However, the trend in the stability, depending on loops, is more complex than that of the G-quartet number, because the loop length can restrict some possible patterns of G4 topologies. The geometries of a loop are classified into the propeller, lateral, and diagonal type. These types depend on how the G-tracts are linked with the loop and need different lengths from each other. When the loop lengths are all one (y ¼ 1) in G4 having three quartets, only a parallel topology can be formed. Thus, loops that are too short for a certain topology decrease the stability of G4 or induces the formation of G4 with a smaller number of G-quartets, whereas other topologies need longer loops (Hazel et al. 2004). For example, it has been reported that the stability of d(GxT2) G4 increases per stack added (Rachwal et al. 2007) but decreases when x is more than 3. In another study, the systematic permutation of thymine loops revealed drastic changes in Tm (within 17  C) and topologies of G4 (Cheng et al. 2018). In the case of RNA, RNA G4s only adopt a parallel topology and showed a simple decrease in the stability with increasing the loop length (Matsumoto et al. 2020). For the case of (iii), the stability depends on the type of base in the loop region that interacts with the G-quartet. Basically, since the loop and G-quartet interact via π-π stacking, the G4 structure is more stable when the loop region contains purine bases (Nagatoishi et al. 2011). In fact, NMR structural analysis of G4 has revealed that the G4 structure derived from human telomeres is stabilized by the stacking of bases in the TTA sequence loop of the loop onto the G-quartet (Wang and Patel 1993). In addition, as mentioned in (iv), the topology can affect its stability owing to different orientations of the loop. The loops of G4 in a hybrid topology can interact with its G-quartets but cannot interact when in a parallel topology, in which the loops spread out like a propeller structure. Actually, there is little loop sequence dependence on the stability of RNA G4. Furthermore, various “noncanonical G4s” have been identified. These structures contain other motifs that affect G4 stability. For example, a hairpin structure can be formed within a relatively long loop of G4, which stabilizes the structure (Onel et al. 2016). When one of the G-tracts includes an incomplete number to form the bulged G4 structure, the guanine base in a loop region can be complemented to form G4 (Chen et al. 2012; Li et al. 2015; Phan et al. 2007). Furthermore, the G4, forming a parallel topology, can interact with its adjacent segments to stack the G-quartet and stabilize the G4 (Mathad et al. 2011). Recently, the classification of the rule forming a specific topology from a specific sequence has been started (Dvorkin et al. 2018). As shown with these complex factors for the stability of G4s, the prediction method of G4 stability is still difficult compared to the case of duplexes. Thus, some

62

S. Takahashi et al.

studies tried to solve the nonlinear manner of thermodynamic stability of G4s. As the G4 stability shows a logarithmic trend depending on the loop length (Zhang et al. 2011), the thermodynamic stability has been simply adopted as ΔG ¼ (n  1)ΔGq þ (m  2)ΔGL, where n is the number of G-quartets, m is the sum of the length of three loops in the G4 structure, ΔGq is the free energy contribution of the G4 core, and ΔGL is the free energy contribution of the G4 loops (Fig. 13a) (Lorenz et al. 2013).

Fig. 13 Prediction of G4 stability using a G4 search algorithm. (a) Folding energies for RNA G4s with three quartets depending on the total linker length. RNAfold 2.0 g prediction, which includes the stability information of RNA G4s, enables the prediction of G4 formation in the secondary structure (red lines); (non-)Watson-Crick base pairs are shown as black lines (Lorenz et al. 2013). (b) Schematic illustration of the prediction of the Tm of DNA G4s via Quadpredict (Stegle et al. 2009). The graph is only an image, which does not reflect the results of the literature. (Reproduced from Ref. Takahashi and Sugimoto (2020) with permission from the Royal Society of Chemistry)

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

63

These relationships between the sequence and the G4 stability have been tested for incorporation into the prediction method for the secondary structure of RNAs established previously. One example is the application of the ViennaRNA package (Lorenz et al. 2011) named RNA fold 2.0 g. This system performs RNA prediction including possible G4 structures via the MFE approach (Fig. 13a) (Lorenz et al. 2013). To overcome the complexity of the rule for definition of the prediction of G4, machine learning approaches have also been tested. Quadpredict is one of the computational approaches to predict G4 stability using Gaussian process (GP) regression, where a Bayesian learning algorithm was used for training of a dataset of sequence information (Stegle et al. 2009; Wong et al. 2010). The design of covariance functions for machine learning was used with the various sequences and cation concentrations. The sequence information was based on the determinant of G4 stability mentioned previously herein including the number of G-quartets (from 2 to 4), each loop length, each base composition in the loop, and the total length of the sequence. The marginal GP test showed that the accuracy of the prediction was sufficiently high to show a Tm value within 5  C compared with the experimental data (Fig. 13b).

i-motif The i-motif is a tetraplex structure formed by base pairs with cytosine and protonated cytosine intercalating with each other (Figs 4 and 14). As shown in the subsection about triplexes, cytosine has the highest pKa value of the nucleobases and is protonated in mildly acidic conditions. Like the G4 structure, successive cytosine regions assemble to form the i-motif. Since cytosine is a complementary sequence to guanine, the complementary strand of the G4-forming region on the chromosome becomes the i-motif-forming sequence. Such intramolecular i-motif-forming sequences have a core tetraplex and a loop region similar to G4, such as CxLyCxLyCxLyCx (x  2, y  1). In addition to the case of G4s, the effect of the number of core d(C*C+) base pairs and loop length on the stability of the i-motif has been investigated (Bhavsar-Jog et al. 2014; Gurung et al. 2015; Mergny et al. 1995). Regarding the number of d(C*C+) base pairs, the structural stability of the i-motif tends to increase as the C*C+ number increases, similar to that with the G4 structure. Fig. 14 Structural factors that affect the thermodynamic stability of the i-motif. The tertiary structure of the i-motif DNA was obtained from PDB:1A83. The cyan region is the stem and the green region is the loop

64

S. Takahashi et al.

Fig. 15 Trend in i-motif stability depending on the sequence. (a) Tm versus the loop length of d (C3Tn)4 (Gurung et al. 2015). (b) Tm versus the number of C-tracts of d(CnT3)4 (Wright et al. 2017). (c) Transition pH of the formation of the i-motif (pHT) versus the length of dCn. (Reproduced from Ref. Fleming et al. (2017) with permission from the American Chemical Society, copyright (2017))

Since i-motifs do not form diverse topologies, there are fewer restrictions on the tetraplex structure owing to loop length, as is the case with the G4 structure. For example, analysis using d(CnT3)4 has shown that structural stability increases (increase in Tm) with an increasing number (n) of cytosines (Fig. 15a). However, longer C-tracts of 5 or more units resulted in a decreasing effect of increased stability (Fig. 15b) (Wright et al. 2017). Furthermore, for sequences consisting only of cytosine (dCn), the stability of the i-motif shows a rule that increases with the addition of up to four nucleotides and decreases thereafter. This trend follows the “4n-1 rule” with stability maxima at dCn of 15, 19, 23, and 27 nucleotides up to dC27 (Fig. 15c) (Fleming et al. 2017). Furthermore, the i-motif was most stable when the number of C*C+ base pairs (n) is even and each loop length is one nucleotide. Regarding loop regions, longer loops destabilize the i-motif, and i-motifs with two long loops and one short loop are stable, which is a similar fashion to hairpin and G4s owing to entropy-dependent energetic loss (Gurung et al. 2015). When an i-motif has odd number of core C-tracts, one nucleotide in the first and third loops and three nucleotides in the central loop are most stable (Fleming et al. 2018). Unlike G4, the tightly packed core of the C*C+ base pair in i-motifs does not provide

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

65

enough stacking interactions with the bases in loops, and thus the contribution of the stacking interaction is considered small. However, the base composition in the loops can marginally change the stability via possible interactions between bases within the loops (Fujii and Sugimoto 2015). For example, the NMR structure of the human telomere i-motif shows that the first and third loops interact with each other. Furthermore, bases in these loops can form base pairs via inter- or intra-loop hydrogen bonding (Fujii and Sugimoto 2015; Ruggiero et al. 2019). T bases at the first and third loops form non-Watson-Crick base pairs to stabilize the i-motif, more so than A bases (Fujii and Sugimoto 2015; Wright et al. 2017). The stabilization effect on the i-motif is in the order of T/T (T for the start and end bases of the first loop and T for the third loop), G/G > G/T, T/A > A/A, and G/A, at the bases closest to the C-tract. These results suggest that there are hydrogen bonding and stacking interactions between loops. In fact, among natural i-motif sequences, cancer-related genes such as c-MYC, BCL-2, PDGF-A, HIF-1α, and hTERT have long loops and form i-motifs that are stable at neutral pH, indicating the significance of these loop–loop interactions (Brazier et al. 2012; Dettler et al. 2010; Kang et al. 2014; Kendrick et al. 2009; Simonsson et al. 2000; Sun and Hurley 2009). These systematic data are expected to facilitate the development of future methods for predicting i-motif stability. It should also be noted that the i-motif with long C-tracts and loops can cause hysteresis of i-motif formation during the thermal annealing and melting process, which requires a kinetic perspective (Rogers et al. 2018; Wright et al. 2017). Although a method for predicting the stability of i-motifs has not been established, it is expected to be relatively easier to achieve than the development of a prediction method for G4 because the topology that can be formed by i-motifs is less than that of G4 and there are fewer stacking interactions between the loop and the core compared to those of G4s. At present, it has been reported that the linear relationships between the stability of the i-motif and C-tract length or pH can be generally valid for i-motif sequences in various conditions including inside the cell (Cheng et al. 2021; Iaccarino et al. 2021). This indicates that once the Tm and ΔG value is determined in a specific condition, those values in any pH conditions can be predicted. Furthermore, taking advantage of the fact that the sequence forming the i-motif is complementary to G4, the method used to identify G4 can be applied to search for potential i-motifs forming the sequence. Quadfinder searches for CxLyCxLyCx or GxLyGxLyGx motifs in the template or nontemplate strand, respectively (CxLyCxLyCxLyCx or GxLyGxLyGxLyGx, x ¼ 35, y ¼ 125) (Bhavsar-Jog et al. 2014). For example, the possibility of i-motif-forming regions at transcription start sites can be analyzed. G4-iM Grinder can also evaluate potential tetraplex-forming sequences by scoring the possibility of G4 formation from the sequence (Fig. 16) (Belmonte-Reche and Morales 2019). Here, the higher the probability that the target sequence is a G4, the greater the positive score. However, a large negative value suggests that the opposite C-rich sequence is less likely to form a G4 (i.e., a higher probability of forming an i-motif). This algorithm, against a variety of i-motifs for which stability has been reported, found a correlation between the score and Tm in various pH conditions. The applicability of the G4 prediction to i-motif prediction indicates that the factors determining their stability are similar, as

66

S. Takahashi et al.

Fig. 16 Correlation plot between prediction scores of G4-iM Grinder and Tm values experimentally obtained in different pH values (Belmonte-Reche and Morales 2019). The graph is only an image, which does not reflect the results of the literature. (Reproduced from Ref. Takahashi and Sugimoto (2020) with permission from the Royal Society of Chemistry)

mentioned previously herein. Therefore, these results suggest that the development of structural stability predictions for tetraplexes will provide useful information for future predictions for both the G4 and i-motif.

Expansion and Application of Stability Prediction Issues in Application of Stability Prediction Under Cellular Conditions As shown above, nucleic acid stability is highly dependent on the environmental factors of solutions. To predict the stability of nucleic acids, it is important to remember that the classical parameters for stability prediction are based on data obtained under specific solution conditions. As the structures of DNA and RNA are essential for cellular functions, prediction should be carried out under optimized conditions reflecting intracellular conditions. As a result, the predicted structure unfortunately often fails to reproduce the structures in cell (Ding et al. 2014; Kwok et al. 2013; Rouskin et al. 2014). Actually, for the structure of rationally designed antisense oligonucleotides, there are prediction errors, including a high false-positive rate (Gorodkin and Ruzzo 2014) and low accuracy for longer RNAs greater than 500 mer (Mathews et al. 1999a). The cause of these errors should be derived from the ignorance of the environmental factors in cells, which affects the stability of nucleic acids. As shown in Fig. 5, cation and hydration significantly affect the stability. As the intracellular condition is far different from the standard condition for the prediction (e.g., 1 M NaCl solution), it is important to investigate how the prediction works in different cation concentrations and molecular crowding conditions where the presence of a co-solute alters the water activity and exclude the

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

67

molecular volume in the solution. This subsection introduces the correction of stability prediction for these environmental factors, particularly in the case of DNA.

Stability of DNA Duplex Structure in Different Cation Concentrations Because nucleic acids are negatively charged polyelectrolytes, cations stabilize the duplex structure by screening the coulombic repulsion between the phosphate backbones according to the counterion condensation mechanism. As a result, ΔG of the duplexes formation decreases with increasing cation concentration. Classical NN parameters were developed in a solution containing 1 M NaCl. However, the physiological cation concentration is much less than 1 M NaCl, and corresponds best to 100 mM NaCl. Therefore, to understand biological processes and develop nucleic acid technologies under physiological conditions, the stability of nucleic acid duplexes should be corrected for 100 mM NaCl. The stability correction for 100 mM NaCl can be achieved in two ways. The first method involves calculating the ΔG 37 value of DNA duplexes in 1 M NaCl using the classical NN parameters, and then converting the value for 100 mM NaCl using a linear relation proposed by our group (Nakano et al. 1999). The experimentally measured ΔG 37 for DNA duplexes in a buffer containing 100 mM NaCl showed a linear correlation (Eq. 9) with the predicted ΔG 37 in 1 M NaCl: ΔG 37 ð100 mM NaClÞ ¼ 0:63 ΔG 37 ð1 M NaClÞ  1:667

ð9Þ

Equation (9) is also valid for obtaining the stability in 100 mM NaCl for RNA and RNA/DNA hybrid duplexes. The equation provides a better estimation for relatively shorter sequences than for longer sequences, which may be due to the non-two-state transition of longer sequences. Another method to determine DNA stability in 100 mM NaCl is to correct the NN parameters of 1 M NaCl. The following relation (Eq. 10) was proposed to correct the NN parameters at lower NaCl concentrations from its value in 1 M NaCl (SantaLucia 1998): ΔG 37 ½Naþ  ¼ ΔG 37 ½1 M Naþ   0:114 ln ð½Naþ Þ

ð10Þ

The correction method is clearly not sequence specific. However, the number of Na+ ions bound to a duplex when the duplex forms was significantly influenced by the base composition, suggesting that Na+ has a preferential NN base pair for counter cation binding. Therefore, a sequence-specific salt correction was required, which was achieved by mechanical unzipping of single DNA molecules (Huguet et al. 2010). Combining the experimental results with a physical model, an optimized sequence-dependent salt correction relation was proposed (Eq. 11): ΔG 37 ½Naþ  ¼ ΔG 37 ½1 M Naþ   m ln ð½Naþ Þ

ð11Þ

68 Table 6 m values for DNA NN base pairs at 37  C

S. Takahashi et al.

Nearest-neighbor set d(AA/TT) d(AT/TA) d(TA/AT) d(CA/GT) d(GT/CA) d(CT/GA) d(GA/CT) d(CG/GC) d(GC/CG) d(GG/CC)

m 0.151 0.122 0.095 0.095 0.103 0.073 0.161 0.137 0.082 0.066

The m value depends on both the temperature and individual NN base pair. The m values for 10 DNA NN base pairs at 37  C are given in Table 6, with ranges from 0.06–0.16. In addition to the corrections for Na+ concentration, the effect of Mg2+ has also been calculated by similar unzipping experiments (Huguet et al. 2017). It should be noted that the salt correction for Mg2+ agrees well within the concentration range of 10 mM, where ion fluctuations and correlations are weak. Therefore, additional corrections may be required for higher salt concentrations.

Stability of DNA Duplex Structure in a Molecular Crowding Environment The prediction methods described above are widely used in nucleic acid technology performed in vitro, such as in the design of polymerase chain reaction (PCR). However, these predictions are not always consistent across cells, as described above. Apart from cation concentrations, another possible reason for this deviation is the molecular environment of the cell. In contrast to the in vitro environment, the intracellular environment is a molecular-crowding environment in which various biomacromolecules are highly concentrated (Fig. 17a). For example, the total amount of biomolecules in E. coli is estimated to be 300–400 mg mL1 in total, including 200 mg mL1 of protein, 75 mg mL1 of RNA, and 10–20 mg mL1 of DNA. In eukaryotic cells, 50–400 mg mL1 of biomolecules is present in the cytoplasm. The molecular crowding environment also varies depending on the cell organelle, such as 50–400 mg mL1 in the cytoplasm, 100–400 mg mL1 in the nucleus, 100–200 mg mL1 in the nucleoli, and 270–560 mg mL1 in the mitochondrial matrix. Furthermore, the intracellular crowding environment varies with the cell type, differentiation stage, and cell volume. Because the physical properties of nucleic acids in solution reflect their behavior as polyelectrolytes, the stability of nucleic acid structures depends on intracellular molecular crowding (Fig. 17b, c). Compared to the in vitro environment, the decrease in water activity, dielectric constant, and excluded volume effects were markedly different in the molecular crowding environment. Therefore, among the factors that determine the stability of

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

69

Fig. 17 Intracellular environments and helix formation of nucleic acids depending on the solution physical properties. (a) Intracellular environments related to nucleic acids. The ionic environment is dynamically changed, and numerous macromolecules are present. (b, c) Counterion condensation and (de) hydration upon helix formation of nucleic acids. The excluded volume caused by crowders also affects helix stability. (Reproduced from Ref. Takahashi and Sugimoto (2020) with permission from the Royal Society of Chemistry)

nucleic acids, as described in the previous section, the effects on cation and hydration are particularly significant in the molecular crowding environment. Therefore, to elucidate the mechanism of biological reactions related to nucleic acids and for the development of materials for intracellular gene therapy, it is necessary to improve the prediction methods reflecting intracellular conditions in molecular crowding. To mimic the intracellular molecular crowding environment, the physical properties of nucleic acid structures is analyzed using solutions of neutral water-soluble polymers dissolved at high concentrations. For example, polyethylene glycol (PEG) is widely used to mimic molecular crowding conditions. Polysaccharides (Ficoll and dextran) are also used as crowding reagents. These nonionic polymers have high solubility in aqueous solutions and are thought to interact less with the target nucleic acids. In addition, polymers with different molecular weights are readily available,

70

S. Takahashi et al.

Fig. 18 (a) Normalized UV melting curves of d(ATGAGCTCAT) in 0.1 M NaCl- phosphate buffer in the absence (black) and presence (red) of 40 wt% PEG 200. (b) Normalized UV melting curves of d(GATCCGGATC), d(GGATCGATCC), d(ATGAGCTCAT), and d(ATCAGCTGAT) in buffer containing 0.1 M NaCl, 10 mM Na2HPO4 (pH 7.0), and 1 mM Na2EDTA in the presence of 40 wt% PEG 200. Oligonucleotide sequences are described in the legends. The concentration of these oligonucleotides was 100 μM

which is a characteristic of their general use. To predict the stability of nucleic acid structures in molecular crowding environments, the stabilities of DNA duplexes are systematically analyzed in solutions containing high concentrations of these watersoluble polymers. Figure 18a shows the stability of the DNA duplex structures of 100 μM d(ATGAGCTCAT) in the absence and presence of 40 wt% PEG 200 (average molecular weight 200). Tm decreased to 34.3  C in the crowded state compared to 46.4  C in a dilute solution. This result indicates that the DNA duplex is significantly destabilized in a molecularly crowded environment, such as a high concentration of PEG200. This destabilization is due to decrease in water activity of the solution (see the detail in the subsection “Extension of Stability Prediction to the DNA Duplex Structure in Various Solution Environments”). Figure 18b shows the UV melting curves of d(GATCCGGATC), d(GGATCGATCC), d(ATGAGCTCAT), and d(ATCAGCTGAT) in the presence of 40 wt% PEG200. The melting curves of d (GATCCGGATC) and d(GGATCGATCC), or d(ATGAGCTCAT) and d (ATCAGCTGAT) were almost identical, respectively. Tm was 37.3  C and 38.2  C for d(GATCCGGATC) and d(GGATCGATCC), respectively, and 34.3  C and 34.0  C for d(ATGAGCTCAT) and d(ATCAGCTGAT), respectively. These results indicate that the NN model is valid even for crowded environments.

Extension of Stability Prediction to the DNA Duplex Structure in Various Solution Environments To develop NN parameters available in a crowded environment, the NN parameters of the DNA duplex in a molecular crowding environment were calculated in the presence of PEG200 (Table 2) (Ghosh et al. 2020). Compared the NN parameters in

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

71

dilute solutions (ΔG 37,NN), each NN parameter for the solution containing 40 wt% PEG200 increased and was uniformly destabilized (Table 2). However, the degree of destabilization varied depending on the base pair combination. The relative destabilization of NNs consisting of only GC pairs (d(CG/GC), d(GC/CG), and d (GG/CC)) was much higher than that of the other NN pairs. This may be due to the fact that the GC pair requires more water molecules for stabilization than the AT pair, and thus the NN pair consisting only of GC showed greater instability in the environment where the water activity was reduced by PEG 200. In addition, the most striking difference was found in the initiation factor, where there was a significant difference in ΔH and ΔS for the initiation of duplex formation under molecular crowding compared with the solution without PEG200. The parameters obtained under the PEG 200 condition were further extended to develop NN parameters that can be used in various molecular crowding environments. Here, the stability (-ΔG 37, NN) per nearest base pair at 37  C was considered to be the sum of the hydrogen bonding and stacking interactions between base pairs, the structural entropy (① þ ② þ ③), the contribution from cations (④), and the contribution from the environment, such as hydration due to molecular crowding (⑤) (Eq. 12): ΔG 37,NN ¼ ΔG 37,NN ð①þ②þ③Þ þ ΔG 37,NN ð④Þ þ ΔG 37,NN ð⑤Þ

ð12Þ

From actual experiments, Eq. (13) is often used because ① þ ② þ ③ (first term) are obtained in the form included ④ (second term) and ⑤ (third term) in Eq. (12), respectively: ΔG 37,NN ¼ ΔG 37,NN ðcationÞ þ ΔG 37,NN ðcrowderÞ

ð13Þ

The ΔG 37, NN (cation) parameter at any concentration of NaCl has already been reported for the [Na+] dependence of each NN base pair, and can be calculated from the existing parameters in dilute solution environments (Table 6). The ΔG 37 NN (crowder) parameter can be calculated from a linear function of the water activity change Δaw in the presence of a crowding agent because the destabilization of the DNA duplex structure correlates linearly with the water activity change in a molecular crowding environment (Fig. 19), as shown in Eq. (14). It can be obtained from a linear function of the water activity change Δaw, as shown in Eq. (14): ΔG 37 NN,½crowder ¼ mcs • Δaw

ð14Þ

where mcs is a coefficient that depends on the structure of the crowding agent. PEG and 1,2-dimethoxyethane (1,2 DME) are the largest (most destabilizing), ethylene glycol (EG) and glycerol (GOL) are the smallest (less destabilizing), and 1,3-propanediol (1,3 PDO) and 2-methoxyethanol (2-ME) are intermediate between these two (Fig. 19 and Table 7). Water activity can be measured using osmotic pressure measurements. Therefore, by parameterizing the cation concentration, type

72

S. Takahashi et al. 5

''G°37 / kcal mol-1

4

3

2

1

0 0.00

0.02

0.04

0.06 0.08 'a W

0.10

0.12

0.14

Fig. 19 Difference in ΔG 37 between crowded conditions and those without co-solute (ΔΔG 37) versus Δaw for d(ATGCGCAT) with EG (red circles); GOL (black circles); 1,3 PDO (purple triangles); 2-ME (orange triangles); 1,2 DME (cyan squares); PEG 200 (blue squares); PEG 2000 (magenta squares); and PEG 8000 (yellow squares) at 1 M NaCl and with PEG 200 at 100 mM NaCl (green squares). The PEGs of different molecular weights and 1,2 DME belong to the same group. EG and glycerol are also in one group, whereas 1,3 PDO and 2-ME are in another. (Reproduced from Ghosh et al. 2020 with permission from the National Academy of Sciences USA, copyright (2020))

Table 7 NN parameters for 100 mM NaCl and 40% PEG 200 with prefactors (mcs) for different co-solutesa

NN set d(AA/TT) d(AT/TA) d(TA/AT) d(CA/GT) d(GT/CA) d(CT/GA) d(GA/CT) d(CG/GC) d(GC/CG) d(GG/CC) Initiation per GC Initiation per AT

ΔG 37 NN, [cation] (kcal mol1) –0.65 –0.60 –0.36 –1.23 –1.20 –1.11 –0.93 –1.85 –2.05 –1.69 0.98 1.03

ΔG 37 NN, [40wt% (kcal mol1) 0.10 0.32 0.20 0.23 0.31 0.20 0.06 0.47 0.72 0.44 –0.22

mPEG/1,2 DME (kcal mol1) 2.0 6.4 4.0 4.6 6.2 4.0 1.2 9.4 14.4 8.8 –4.4

mEG/GOL (kcal mol1) 0.7 2.2 1.4 1.6 2.2 1.4 0.4 3.3 5.0 3.0 –1.5

m1,3PDO/2-ME (kcal mol1) 1.3 4.2 2.6 3.0 4.1 2.6 0.8 6.2 9.5 5.8 –2.9

–0.03

–0.6

–0.2

–0.4

EG 200]

Symmetry factor for ΔG 37 is 0.4 kcal mol1 for all co-solutes, as it is independent of the crowding environment a

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

73

of co-solute, and water activity, a versatile prediction method that can predict the stability of DNA duplexes in various molecular crowding environments was developed. In addition to the effect of water activity on the duplex stability, the excluded volume effect should be also considered. For example, 30 mer DNA duplex (50 -A27GCG-30 /50 -CGCT27-30 ) is slightly stabilized by 10% PEG 8000 (0.8  C increase of Tm), although the predicted value calculated using the Eq. (14) showed destabilization (ΔG 37 in the absence of PEG is 29.7 kcal mol–1, while ΔG 37 in the presence of 10% PEG 8000 is 29.1 kcal mol1) (Ghosh et al. 2020). The excluded volume effect on the duplex stability can be calculated independent from the water activity by Hermans theory (Knowles et al. 2011), which corrects the prediction value as 30.4 kcal mol–1 in this case. However, the excluded volume effect on the stability was very slight compared to that of the water activity. Therefore, the correction is only needed for the long chain such as 30 mers in the presence of co-solute having large molecular weight.

Prediction of the Stability of the DNA Duplex Under Intracellular Conditions by Measuring the Intracellular Environment To predict the stability of a duplex structure in a real cell, it is necessary to characterize the intracellular environment. This subsection introduces an approach with which to analyze the cellular environment. To this end, a probe that changes according to the water activity in the solution is needed as the stability of nucleic acids is markedly affected by water activity, as shown above. G4 DNA undergoes structural changes to different topologies depending on the solution environment (Nakano et al. 2014). By tracking the changes in G4 topology modified with fluorescent dyes by fluorescence energy resonance transfer (FRET), the solution properties around the probe can be estimated (Takahashi et al. 2019). For the FRETtype probe, a sequence containing a hairpin structure and a G4 structure derived from human telomeres was designed. For FRET analysis, each end of this sequence was modified with Cy3 and Cy5 as the FRET donor and acceptor (Fig. 20). The human telomere DNA sequence formed an antiparallel form in NaCl solution, a hybrid form in KCl solution, and a parallel form in a solution of KCl and 20 wt% PEG 200. The FRET efficiencies of the prepared fluorescent probes were analyzed as the ratio of the fluorescence intensity of Cy5 to that of Cy3 in each solution, which showed different FRET efficiencies. These results indicate that FRET efficiency can be used to evaluate the topological differences of G4 structures depending on the environment. In addition, the G4 structure is stabilized by cations coordinated to the quadruplex. To change the stability of the G4 structure, the salt concentration dependence of the FRET efficiency was compared with the salt concentration at which the FRET efficiency was half the maximum FRET efficiency (C1/2), and the FRET efficiency was maximum at the lowest C1/2 in KCl þ PEG 200 (Fig. 21a). Furthermore, the different types of crowder molecules were tested. In addition to PEG 200, Ficoll 70 and bovine serum albumin (BSA) were used. These molecules

74

S. Takahashi et al.

(a)

3’

(b)

5’

(c)

5’

5’

3’ 3’

3’

S 5’

3’

S 5’

Cy 5

S 5’ 3’ 3’

Fig. 20 G4 structures of human telomere sequence and possible conformations of the DNA probe H-telo with the following G4 topology: (a) antiparallel in NaCl solution, (b) mixed in KCl solution, and (c) parallel in KCl solution with concentrated PEG. (Reproduced from Ref. Takahashi et al. (2019) with permission from the American Chemical Society, copyright (2020))

b

0.55 NaCl KCl KCl + 20%PEG200

1666 / (1666 + 1568)

0.5

1666 / (1666 + 1568)

a

0.45 0.4 0.35 0.3 0.25

0

0.02

0.04 0.06 Salt (M)

0.08

0.1

0.5 0.45 0.4 0.35

20% PEG200 20% Ficoll70 20% BSA

0.3 0.25

0

0.02

0.04 0.06 KCI (M)

0.08

0.1

Fig. 21 Changes in the FRET efficiency of a DNA probe in response to (a) salt (NaCl, KCl, and KCl with 20 wt% PEG 200) and (b) crowding (20 wt% PEG 200, 20 wt% Ficoll 70, and 20 wt% BSA). The solution was buffered with 30 mM Tris-HCl (pH 7.0), and the temperature was 37  C unless otherwise indicated. (Reproduced from Ref. Takahashi et al. (2019) with permission from the American Chemical Society, copyright (2020))

are representing crowders showing each different property for G-quadruplex DNAs. PEG 200 mainly decreases the water activity, which stabilizes G-quadruplexes. Furthermore, PEG 200 transforms the topology of G-quadruplexes of human telomere into a parallel form. Ficoll 70 also stabilizes G-quadruplexes but does not affect

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

75

Fig. 22 FRET analysis of living HeLa cells injected with DNA probes at 25  C. (Reproduced from Ref. Takahashi et al. (2019) with permission from the American Chemical Society, copyright (2020))

its topology. BSA is a proteinaceous crowder and destabilizes the human telomere G-quadruplex without changing the topology, although the detailed mechanism is still unclear. In the solutions with different molecular crowding agents, the C1/2 values were larger for PEG 200, Ficoll 70, and BSA, in that order (Fig. 21b). This confirms that the stability of the G4 structure, which changes depending on the topology and environment of molecular crowding, can be evaluated based on the differences in FRET efficiency. To analyze the molecular crowding environment around the probe DNA, the probe DNA was injected into the cells. The difference in FRET efficiency measured by confocal microscopy indicates a difference in the molecular environment due to topology changes of the probe DNA induced by the environment. FRET efficiency tended to be higher in the nucleus than in the cytoplasm. Interestingly, the nucleolus showed a much higher FRET efficiency than the other parts of the nucleus (Fig. 22). Therefore, it is suggested that the nucleolus is in a molecular-crowding environment that can be reproduced by the physical properties of the solution containing PEG200. To verify whether the stability of nucleic acids in the nucleolus could be predicted in an environment containing PEG 200, the stability of DNA in the nucleolus was analyzed using the stability prediction parameters described in the previous section. Previously, the stability of the DNA duplexes has been analyzed in the liquid-liquid phase separation structure formed by the natural denatured protein Ddx4, which mimics the nucleolus condition (Nott et al. 2016). They found that the destabilization (ΔΔG 25) of DNA duplexes d(ACTG)3 and d(ACTG)4 compared to an external dilute environment was 3.0 and 2.3 kcal mol1, respectively. These effects well correspond to destabilization in 50% PEG 200 and 100 mM NaCl in vitro, which was 2.0 and 2.8 kcal mol1, respectively. These results suggest that the NN parameters using 50% PEG 200 and 100 mM NaCl are suitable for predicting the stability of DNA in nucleolus. Therefore, these NN parameters can be applied for the prediction of DNA duplexes in different intracellular organelles.

76

S. Takahashi et al.

Conclusion This chapter introduced recent advances in stability prediction of canonical and noncanonical structures of nucleic acids. To predict the stability of canonical structures, there are not only DNA duplexes, but also RNA duplexes and RNA/DNA hybrid duplexes. By expanding these prediction methods, stability prediction for RNA/DNA hybrids that takes into account the physiological cation concentration is now available. The modified parameters can be used in various applications. For example, it has been reported that these parameters can improve the efficiency of genome-editing technology. However, stability prediction for noncanonical structures has not yet been well established. These structures regulate gene expression and have been shown to be involved in the development of cancer and neurological diseases. Therefore, the development of predictions for the stability of noncanonical structures in the cellular environment is an urgent issue. In addition, the cellular environment changes not only along the local spatial axis within the cell, but also along the temporal axis of the cell cycle. By parameterizing these factors, it is possible to predict the behavior of nucleic acids in the cell in detail. With the recent success of the new coronavirus vaccine, medical engineering technology using nucleic acids is likely to advance dramatically in the near future. Therefore, thermodynamics, which clarifies the behavior of nucleic acids in cells, is an increasingly important field of study for the accurate functioning of such cellular technologies.

References Adams MS, Znosko BM (2019) Thermodynamic characterization and nearest neighbor parameters for RNA duplexes under molecular crowding conditions. Nucleic Acids Res 47:3658–3666 Allawi HT, SantaLucia Jr J (1997) Thermodynamics and NMR of internal G.T mismatches in DNA. Biochemistry 36:10581–10594 Allawi HT, SantaLucia Jr J (1998a) Nearest neighbor thermodynamic parameters for internal G.A mismatches in DNA. Biochemistry 37:2170–2179 Allawi HT, SantaLucia Jr J (1998b) Nearest-neighbor thermodynamics of internal A.C mismatches in DNA: sequence dependence and pH effects. Biochemistry 37:9435–9444 Allawi HT, SantaLucia Jr J (1998c) Thermodynamics of internal C.T mismatches in DNA. Nucleic Acids Res 26:2694–2701 Antao VP, Lai SY, Tinoco Jr I (1991) A thermodynamic study of unusually stable RNA and DNA hairpins. Nucleic Acids Res 19:5901–5905 Banerjee D, Tateishi-Karimata H, Ghosh S, Ohyama T, Endoh T, Takahashi S, Sugimoto N (2020a) Improved nearest-neighbor parameters for the stability of RNA/DNA hybrids under a physiological condition. Nucleic Acids Res Banerjee D, Tateishi-Karimata H, Ohyama T, Ghosh S, Endoh T, Takahashi S, Sugimoto N (2020b) Improved nearest-neighbor parameters for the stability of RNA/DNA hybrids under a physiological condition. Nucleic Acids Res 48:12042–12054 Banerjee D, Tateishi-Karimata H, Ohyama T, Ghosh S, Endoh T, Takahashi S, Sugimoto N (2021) Correction to ‘Improved nearest-neighbor parameters for the stability of RNA/DNA hybrids under a physiological condition’. Nucleic Acids Res 49:10796–10799 Basilio Barbosa V, de Oliveira Martins E, Weber G (2019) Nearest-neighbour parameters optimized for melting temperature prediction of DNA/RNA hybrids at high and low salt concentrations. Biophys Chem 251:106189

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

77

Belmonte-Reche E, Morales JC (2019) G4-iM Grinder: when size and frequency matter. G-quadruplex, i-Motif and higher order structure search and analysis tool. NAR Genom Bioinform 2:lqz005 Bhavsar-Jog YP, Van Dornshuld E, Brooks TA, Tschumper GS, Wadkins RM (2014) Epigenetic modification, dehydration, and molecular crowding effects on the thermodynamics of i-motif structure formation from C-rich DNA. Biochemistry 53:1586–1594 Bommarito S, Peyret N, SantaLucia Jr J (2000) Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Res 28:1929–1934 Booher MA, Wang S, Kool ET (1994) Base pairing and steric interactions between pyrimidine strand bridging loops and the purine strand in DNA pyrimidine.purine.pyrimidine triplexes. Biochemistry 33:4645–4651 Brazier JA, Shah A, Brown GD (2012) I-Motif formation in gene promoters: unusually stable formation in sequences complementary to known G-quadruplexes. Chem Commun 48:10739–10741 Breslauer KJ, Frank R, Blocker H, Marky LA (1986) Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A 83:3746–3750 Chen Y, Agrawal P, Brown RV, Hatzakis E, Hurley L, Yang D (2012) The major G-quadruplex formed in the human platelet-derived growth factor receptor beta promoter adopts a novel broken-strand structure in K+ solution. J Am Chem Soc 134:13220–13223 Cheng M, Cheng Y, Hao J, Jia G, Zhou J, Mergny JL, Li C (2018) Loop permutation affects the topology and stability of G-quadruplexes. Nucleic Acids Res 46:9264–9275 Cheng M, Qiu D, Tamon L, Ištvánková E, Víšková P, Amrane S, Guédin A, Chen J, Lacroix L, Ju H et al (2021) Thermal and pH stabilities of i-DNA: confronting in vitro experiments with models and In-Cell NMR data. Angew Chem Int Ed 60:10286–10294 Crothers DM, Bloomfield VA, Tinoco I (2000) Nucleic acids: structures, properties. and functions (University science books) Dettler JM, Buscaglia R, Cui J, Cashman D, Blynn M, Lewis EA (2010) Biophysical characterization of an ensemble of intramolecular i-motifs formed by the human c-MYC NHE III1 P1 promoter mutant sequence. Biophys J 99:561–567 Ding Y, Tang Y, Kwok CK, Zhang Y, Bevilacqua PC, Assmann SM (2014) In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505:696–700 Dvorkin SA, Karsisiotis AI, Webba da Silva M (2018) Encoding canonical DNA quadruplex structure. Sci Adv 4:eaat3007 Ferreira I, Jolley EA, Znosko BM, Weber G (2019) Replacing salt correction factors with optimized RNA nearest-neighbour enthalpy and entropy parameters. Chem Phys 521:69–76 Fleming AM, Ding Y, Rogers RA, Zhu J, Zhu J, Burton AD, Carlisle CB, Burrows CJ (2017) 4n-1 Is a “Sweet Spot” in DNA i-Motif Folding of 20 -Deoxycytidine Homopolymers. J Am Chem Soc 139:4682–4689 Fleming AM, Stewart KM, Eyring GM, Ball TE, Burrows CJ (2018) Unraveling the 4n – 1 rule for DNA i-motif stability: base pairs vs. loop lengths. Org Biomol Chem 16:4537–4546 Freier SM, Alkema D, Sinclair A, Neilson T, Turner DH (1985) Contributions of dangling end stacking and terminal base-pair formation to the stabilities of XGGCCp, XCCGGp, XGGCCYp, and XCCGGYp helixes. Biochemistry 24:4533–4539 Freier SM, Kierzek R, Caruthers MH, Neilson T, Turner DH (1986a) Free energy contributions of G.U and other terminal mismatches to helix stability. Biochemistry 25:3209–3213 Freier SM, Kierzek R, Jaeger JA, Sugimoto N, Caruthers MH, Neilson T, Turner DH (1986b) Improved free-energy parameters for predictions of RNA duplex stability. Proc Natl Acad Sci U S A 83:9373–9377 Fujii T, Sugimoto N (2015) Loop nucleotides impact the stability of intrastrand i-motif structures at neutral pH. Phys Chem Chem Phys 17:16719–16722 Ghosh S, Takahashi S, Endoh T, Tateishi-Karimata H, Hazra S, Sugimoto N (2019) Validation of the nearest-neighbor model for Watson-Crick self-complementary DNA duplexes in molecular crowding condition. Nucleic Acids Res 47:3284–3294

78

S. Takahashi et al.

Ghosh S, Takahashi S, Ohyama T, Endoh T, Tateishi-Karimata H, Sugimoto N (2020) Nearestneighbor parameters for predicting DNA duplex stability in diverse molecular crowding conditions. Proc Natl Acad Sci U S A 117:14194–14201 Gorodkin J, Ruzzo WL (2014) RNA sequence, structure, and function: computational and bioinformatic methods. Springer Gurung SP, Schwarz C, Hall JP, Cardin CJ, Brazier JA (2015) The importance of loop length on the stability of i-motif structures. Chem Commun 51:5630–5632 Hazel P, Huppert J, Balasubramanian S, Neidle S (2004) Loop-length-dependent folding of G-quadruplexes. J Am Chem Soc 126:16405–16415 Hickey DR, Turner DH (1985) Effects of terminal mismatches on RNA stability: thermodynamics of duplex formation for ACCGGGp, ACCGGAp, and ACCGGCp. Biochemistry 24:3987–3991 Hoogsteen K (1959) The structure of crystals containing a hydrogen-bonded complex of 1-methylthymine and 9-methyladenine. Acta Crystallogr 12:822–823 Hoogsteen K (1963) The crystal and molecular structure of a hydrogen-bonded complex between 1-methylthymine and 9-methyladenine. Acta Crystallogr 16:907–916 Hudson GA, Bloomingdale RJ, Znosko BM (2013) Thermodynamic contribution and nearestneighbor parameters of pseudouridine-adenosine base pairs in oligoribonucleotides. RNA 19:1474–1482 Huguet JM, Bizarro CV, Forns N, Smith SB, Bustamante C, Ritort F (2010) Single-molecule derivation of salt dependent base-pair free energies in DNA. Proc Natl Acad Sci U S A 107:15431–15436 Huguet JM, Ribezzi-Crivellari M, Bizarro CV, Ritort F (2017) Derivation of nearest-neighbor DNA parameters in magnesium from single molecule experiments. Nucleic Acids Res 45:12921–12931 Iaccarino N, Cheng M, Qiu D, Pagano B, Amato J, Di Porzio A, Zhou J, Randazzo A, Mergny JL (2021) Effects of Sequence and Base Composition on the CD and TDS Profiles of i-DNA. Angew Chem Int Ed 60:10295–10303 Kang H-J, Kendrick S, Hecht SM, Hurley LH (2014) The transcriptional complex between the BCL2 i-motif and hnRNP LL is a molecular switch for control of gene expression that can be modulated by small molecules. J Am Chem Soc 136:4172–4185 Kendrick S, Akiyama Y, Hecht SM, Hurley LH (2009) The i-motif in the bcl-2 P1 promoter forms an unexpectedly stable structure with a unique 8:5:7 loop folding pattern. J Am Chem Soc 131:17667–17676 Knowles DB, LaCroix AS, Deines NF, Shkel I, Record Jr MT (2011) Separation of preferential interaction and excluded volume effects on DNA duplex and hairpin stability. Proc Natl Acad Sci U S A 108:12699–12704 Kwok CK, Ding Y, Tang Y, Assmann SM, Bevilacqua PC (2013) Determination of in vivo RNA structure in low-abundance transcripts. Nat Commun 4:2971 Li XM, Zheng KW, Zhang JY, Liu HH, He YD, Yuan BF, Hao YH, Tan Z (2015) Guanine-vacancybearing G-quadruplexes responsive to guanine derivatives. Proc Natl Acad Sci U S A 112:14581–14586 Lilley DM (2000) Structures of helical junctions in nucleic acids. Q Rev Biophys 33:109–159 Lorenz R, Bernhart SH, Höner zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P.F., and Hofacker, I.L. (2011) ViennaRNA Package 2.0. Algorithms Mol Biol 6:26 Lorenz R, Bernhart SH, Qin J, Honer zu Siederdissen, C., Tanzer, A., Amman, F., Hofacker, I.L., and Stadler, P.F. (2013) 2D meets 4G: G-quadruplexes in RNA secondary structure prediction. IEEE/ACM Trans Comput Biol Bioinform 10:832–844 Mathad RI, Hatzakis E, Dai J, Yang D (2011) c-MYC promoter G-quadruplex formed at the 50 -end of NHE III1 element: insights into biological relevance and parallel-stranded G-quadruplex stability. Nucleic Acids Res 39:9023–9033 Mathews DH, Burkard ME, Freier SM, Wyatt JR, Turner DH (1999a) Predicting oligonucleotide affinity to nucleic acid targets. RNA 5:1458–1469

2

Stability Prediction of Canonical and Noncanonical Structures of. . .

79

Mathews DH, Sabina J, Zuker M, Turner DH (1999b) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J Mol Biol 288:911–940 Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH (2004) Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc Natl Acad Sci U S A 101:7287–7292 Matsumoto S, Tateishi-Karimata H, Takahashi S, Ohyama T, Sugimoto N (2020) Effect of molecular crowding on the stability of RNA G-quadruplexes with various numbers of quartets and lengths of loops. Biochemistry Mergny JL, Lacroix L, Han XG, Leroy JL, Helene C (1995) Intramolecular folding of pyrimidine oligodeoxynucleotides into an I-DNA motif. J Am Chem Soc 117:8887–8898 Nagatoishi S, Isono N, Tsumoto K, Sugimoto N (2011) Loop residues of thrombin-binding DNA aptamer impact G-quadruplex stability and thrombin binding. Biochimie 93:1231–1238 Nakano S, Fujimoto M, Hara H, Sugimoto N (1999) Nucleic acid duplex stability: influence of base composition on cation effects. Nucleic Acids Res 27:2957–2965 Nakano S, Miyoshi D, Sugimoto N (2014) Effects of molecular crowding on the structures, interactions, and functions of nucleic acids. Chem Rev 114:2733–2758 Nott TJ, Craggs TD, Baldwin AJ (2016) Membraneless organelles can melt nucleic acid duplexes and act as biomolecular filters. Nat Chem 8:569–575 Ohmichi T, Nakano S, Miyoshi D, Sugimoto N (2002) Long RNA dangling end has large energetic contribution to duplex stability. J Am Chem Soc 124:10367–10372 Olsen CM, Gmeiner WH, Marky LA (2006) Unfolding of G-quadruplexes: energetic, and ion and water contributions of G-quartet stacking. J Phys Chem B 110:6962–6969 Onel B, Carver M, Wu G, Timonina D, Kalarn S, Larriva M, Yang D (2016) A new G-quadruplex with Hairpin loop immediately upstream of the human BCL2 P1 promoter modulates transcription. J Am Chem Soc 138:2563–2570 Pandey S, Agarwala P, Maiti S (2013) Effect of loops and G-quartets on the stability of RNA G-quadruplexes. J Phys Chem B 117:6896–6905 Peyret N, Seneviratne PA, Allawi HT, SantaLucia Jr J (1999) Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A.A, C.C, G.G, and T.T mismatches. Biochemistry 38:3468–3477 Phan AT, Kuryavyi V, Burge S, Neidle S, Patel DJ (2007) Structure of an unprecedented G-quadruplex scaffold in the human c-kit promoter. J Am Chem Soc 129:4386–4392 Prakash G, Kool ET (1992) Structural effects in the recognition of DNA by circular oligonucleotides. J Am Chem Soc 114:3523–3527 Rachwal PA, Brown T, Fox KR (2007) Effect of G-tract length on the topology and stability of intramolecular DNA quadruplexes. Biochemistry 46:3036–3044 Roberts RW, Crothers DM (1991) Specificity and stringency in DNA triplex formation. Proc Natl Acad Sci U S A 88:9397–9401 Roberts RW, Crothers DM (1996) Prediction of the stability of DNA triplexes. Proc Natl Acad Sci U S A 93:4320–4325 Rogers RA, Fleming AM, Burrows CJ (2018) Unusual isothermal hysteresis in DNA i-motif pH transitions: a study of the RAD17 promoter sequence. Biophys J 114:1804–1815 Rouskin S, Zubradt M, Washietl S, Kellis M, Weissman JS (2014) Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505:701–705 Ruggiero E, Lago S, Sket P, Nadai M, Frasson I, Plavec J, Richter SN (2019) A dynamic i-motif with a duplex stem-loop in the long terminal repeat promoter of the HIV-1 proviral genome modulates viral transcription. Nucleic Acids Res 47:11057–11068 SantaLucia Jr J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearestneighbor thermodynamics. Proc Natl Acad Sci U S A 95:1460–1465 SantaLucia J, Allawi HT, Seneviratne PA (1996) Improved nearest-neighbor parameters for predicting DNA duplex stability. Biochemistry 35:3555–3562

80

S. Takahashi et al.

Simonsson T, Pribylova M, Vorlickova M (2000) A nuclease hypersensitive element in the human c-myc promoter adopts several distinct i-tetraplex structures. Biochem Biophys Res Commun 278:158–166 Stegle O, Payet L, Mergny JL, MacKay DJ, Leon JH (2009) Predicting and understanding the stability of G-quadruplexes. Bioinformatics 25:i374–i382 Sugimoto N, Kierzek R, Turner DH (1987a) Sequence dependence for the energetics of dangling ends and terminal base pairs in ribonucleic acid. Biochemistry 26:4554–4558 Sugimoto N, Kierzek R, Turner DH (1987b) Sequence dependence for the energetics of terminal mismatches in ribooligonucleotides. Biochemistry 26:4559–4562 Sugimoto N, Nakano S, Katoh M, Matsumura A, Nakamuta H, Ohmichi T, Yoneyama M, Sasaki M (1995) Thermodynamic parameters to predict stability of RNA/DNA hybrid duplexes. Biochemistry 34:11211–11216 Sugimoto N, Nakano S, Yoneyama M, Honda K (1996) Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Res 24:4501–4505 Sugimoto N, Nakano M, Nakano S (2000) Thermodynamics-structure relationship of single mismatches in RNA/DNA duplexes. Biochemistry 39:11270–11281 Sugimoto N, Satoh N, Yasuda K, Nakano S (2001) Stabilization factors affecting duplex formation of peptide nucleic acid with DNA. Biochemistry 40:8444–8451 Sun D, Hurley LH (2009) The importance of negative superhelicity in inducing the formation of G-quadruplex and i-motif structures in the c-Myc promoter: implications for drug targeting and control of gene expression. J Med Chem 52:2863–2874 Takahashi S, Sugimoto N (2020) Stability prediction of canonical and non-canonical structures of nucleic acids in various molecular environments and cells. Chem Soc Rev 49:8439–8468 Takahashi S, Yamamoto J, Kitamura A, Kinjo M, Sugimoto N (2019) Characterization of Intracellular Crowding Environments with Topology-Based DNA Quadruplex Sensors. Anal Chem 91:2586–2590 Tinoco Jr I, Uhlenbeck OC, Levine MD (1971) Estimation of secondary structure in ribonucleic acids. Nature 230:362–367 Turner DH, Mathews DH (2010) NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res 38:D280–D282 Wang Y, Patel DJ (1993) Solution structure of the human telomeric repeat d [AG3 (T2AG3)3] G-tetraplex. Structure 1:263–282 Watson JD, Crick FH (1953) Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171:737–738 Wong HM, Stegle O, Rodgers S, Huppert JL (2010) A toolbox for predicting G-quadruplex formation and stability. J Nucleic Acids 2010:564946 Wright EP, Huppert JL, Waller ZAE (2017) Identification of multiple genomic DNA sequences which form i-motif structures at neutral pH. Nucleic Acids Res 45:2951–2959 Xia T, SantaLucia Jr J, Burkard ME, Kierzek R, Schroeder SJ, Jiao X, Cox C, Turner DH (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry 37:14719–14735 Zhang AY, Bugaut A, Balasubramanian S (2011) A sequence-independent analysis of the loop length dependence of intramolecular RNA G-quadruplex stability and topology. Biochemistry 50:7251–7258 Zuker M (1989) On finding all suboptimal foldings of an RNA molecule. Science 244:48–52 Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415 Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148

3

The Effect of Pressure on the Conformational Stability of DNA Tigran V. Chalikian and Robert B. Macgregor, Jr.

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structural Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thermodynamic Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pressure Effects on Canonical Duplex DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changes in Volume, ΔV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changes in Expansibility, ΔE, and Compressibility, ΔKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pressure-Temperature Stability Phase Diagram of Duplex DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . Effects of Cations, Cosolvents, and Sequence and Length of Oligomeric DNA on Transition Volume, ΔV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pressure Effects on Noncanonical DNA Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hairpins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Z-DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Three-Stranded DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motif Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pressure and the Kinetics of Helix Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82 83 84 86 86 91 93 95 96 96 99 100 101 104 104 108 108

Abstract

Studies into the effect of hydrostatic pressure on the thermodynamic and kinetic properties of DNA provide insights into the interactions that stabilize the canonical and noncanonical DNA structures. Under most solution conditions, doubleand triple-stranded DNA molecules are stabilized to a small extent by increasing pressure regardless of their nucleotide sequence. On the other hand, the stabilities of noncanonical conformations, including hairpins, Z-DNA, and the tetrahelical DNA forms, G-quadruplexes, and i-motifs, depend on pressure in more subtle T. V. Chalikian (*) · R. B. Macgregor, Jr. Department of Pharmaceutical Sciences, Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, Canada e-mail: [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_3

81

82

T. V. Chalikian and R. B. Macgregor Jr.

ways. While the stability of i-motif is weakly modulated by pressure, G-quadruplex structures tend to be destabilized by pressure with the extent depending on the individual features of the DNA. The pressure sensitivity of a G-quadruplex is attributed to the existence of a void volume in the folded structure and the exposure of central ions to the solvent upon unfolding. For the duplex and G-quadruplex conformations, there are sufficient data to allow the construction of temperature-pressure stability phase diagrams. Keywords

DNA · Pressure · Thermodynamics · Kinetics · Volumetric properties

Introduction When a system is exposed to high hydrostatic pressure, the conformational equilibrium is shifted towards the states characterized by lower volumes in accordance with Le Chatelier’s principle (Akasaka 2006; Winter 2019). In this way, pressure can be instrumental in modulating and studying order-to-disorder (e.g., helix-to-coil) transitions. It also can be used as a tool for studying order-to-order transitions (e.g., helix-to-helix or duplex-to-tetraplex transitions) provided that the states under study are volumetrically distinct. The relative partial molar volumes of the interconverting conformational states of a biopolymer determine the pressure response of the equilibrium between the states (Macgregor 1998). For example, the conformations involving intrinsic structural voids are destabilized by increasing pressure. Proteins and nucleic acids are polymorphic structures capable of existing in various conformational states. For nucleic acids, the polymorphism cannot be described simply in terms of the folded and unfolded conformations; the folded states are also polymorphic differing in handedness and the number of associating strands. Depending on the sequence of the molecule and environmental conditions, DNA may adopt, in addition to the canonical B-DNA duplex conformation, a range of noncanonical conformations including A- and Z-DNA duplexes, triplexes, threeand four-way junctions, G-quadruplexes, and i-motifs (Tateishi-Karimata and Sugimoto 2020). The first investigations into the effect of pressure on DNA stability were carried out in the 1960s by Heden and coworkers (1964). In these investigations involving genomic DNA duplexes, a weak pressure effect was observed over a range of 1–5000 bar. The lack of a stronger effect was attributed to the absence of buried voids and to the fact that the state of ionization of DNA does not change upon denaturation. We present here an overview of the work from several laboratories that complement and supplement these early findings while also reviewing results of more recent studies that suggest that noncanonical DNA structures exhibit an altered response to pressure. In all cases, the pressure dependence of the stability of folded DNA structures provides important insights into the role that solvent plays in modulating

3

The Effect of Pressure on the Conformational Stability of DNA

83

their stability. We will focus almost exclusively on DNA structures. There are only a few studies exploring the effect of pressure on RNA duplexes; this can be attributed to the experimental difficulties and high cost involved in studying RNA. The few studies that do exist are consistent with the picture in which pressure exerts a qualitatively similar effect on the conformational stabilities of RNA and DNA duplexes.

Structural Considerations Although the antiparallel B-DNA double helix is the best-known structure of DNA, several other stable conformations also exist (Sugimoto et al. 2021; TateishiKarimata and Sugimoto 2020). Below, we briefly describe these noncanonical structures which include hairpins, Z-DNA, triplex DNA, and the four-stranded G-quadruplex and i-motif structures. For more detailed discussions of the structure, biology, and physical properties of these and other nucleic acid structures, an interested reader is referred elsewhere (e.g., Bloomfield et al. 2000). Single-stranded DNA molecules with palindromic sequences can fold into hairpin structures which are made up of a double-stranded stem and a single-stranded loop. The complementary bases in the stem interact with each other via WatsonCrick hydrogen bonds, while the bases in the loop covalently link the two strands. Under the conditions of high ionic strength or dehydrating conditions, DNA molecules with consecutive alternating GC dinucleotide steps may adopt the Z-DNA conformation (Herbert 2019). This conformation differs markedly from the canonical B-DNA structure in several ways. It is a left-handed helix and has a single groove; the edges of the bases are exposed directly to the solvent; and the distance between the phosphate residues is shorter than that of other DNA conformation. In triplex DNA, the third strand interacts with the atomic groups lining the major groove of the host duplex via Hoogsteen hydrogen bonds (Frank-Kamenetskii and Mirkin 1995; Plum et al. 1995). The third strand is, generally, rich in pyrimidine residues and interacts with the host duplex according to the base triplet complementarity rules. Guanine-rich and cytosine-rich DNA molecules can form four-stranded structures. Guanine-rich molecules fold into a G-quadruplex; the latter may be intramolecular (monomolecular) or intermolecular formed from the association of two (bimolecular) or four (tetramolecular) separate DNA strands. Cytosine-rich molecules form i-motif structures which also may be monomolecular, bimolecular, or tetramolecular. Structurally, G-quadruplexes are stabilized by stacked G-tetrads and centrally coordinated cations (Huppert 2010; Lane et al. 2008). The G-tetrads are planar structures consisting of four-guanine residue hydrogen bonded to each other in a cyclical manner. Stacking of two or more G-tetrads upon each other results in the formation of a central cavity lined with four O6 oxygen atoms at each tetrad. These oxygens coordinate cations within the cavity, which further stabilizes the

84

T. V. Chalikian and R. B. Macgregor Jr.

G-quadruplex structure. The physical dimensions of the cavity limit the size of the cations that can be coordinated by the O6 oxygens; sodium and potassium ions are the two most biologically relevant ions of appropriate size. Under mildly acidic pH conditions, DNA molecules with tracts of cytosines may form i-motif structures. These structures are stabilized by hydrogen bonds between hemi-protonated cytosines (C:C+) (Alba et al. 2016; Day et al. 2014). In an i-motif, two C:C+-stabilized parallel duplexes interact in an antiparallel orientation through mutual intercalation of the hemi-protonated cytosine base pairs. G-quadruplexes have been studied in vitro since the early 1960s; however, an explosion of interest among cell and molecular biologists occurred with the discovery of guanine-rich repeats in telomeric regions of eukaryotic chromosomes (Bochman et al. 2012; Gellert et al. 1962; Oganesian and Bryan 2007). In addition, guanine- and cytosine-rich DNA sequences with the potential to form G-quadruplexes and i-motifs have been found in other critical genomic loci, including promoter regions, centromeres, and mutation-prone hot spots (Huppert 2010; Varshney et al. 2020).

Thermodynamic Considerations The differential volume, ΔV(T, P), of two conformational states of a molecule is equal to the pressure slope of their differential free energy, ΔG(T, P): ΔVðP, TÞ ¼

@ΔGðP, TÞ @P

ð1Þ T

The partial molar volume, V, of a solute can be written as the sum of the following terms (Kharakoz 1992): V ¼ VM þ VT þ VI þ βT0 RT

ð2Þ

where VM is the intrinsic volume of the solute; VT is the thermal volume, i.e., the void volume around the solute; VI is the interaction volume which refers to the solute-induced change in the volume of the solvent; and βT0 is the coefficient of isothermal compressibility of the solvent. In Eq. (2), the ideal term, βT0RT, reflects the volume contribution of the translational degrees of freedom of the solute. Specifically, the VM, VT, and VI terms reflect a change in volume due to the addition of a solute molecule at a fixed position in the solvent, while the βT0RT term originates from the availability of the entire volume of the solution to the solute. The intrinsic volume, VM, of a solute molecule is equal to the van der Waals volume of its constituent atoms and the volume of intramolecular voids. Generally, the thermal volume of a solute is proportional to its solvent-accessible surface area, SA; VT ¼ δSA, where δ is the thickness of the thermal volume. The interaction volume, VI, depends on the number of thermodynamically altered water molecules

3

The Effect of Pressure on the Conformational Stability of DNA

85

(hydration number), nh, and the differential partial molar volume of water of hydration, Vh, and bulk water, V0; VI ¼ nh(Vh  V0). Water solvating a solute differs structurally and thermodynamically from bulk water in a manner that depends on the chemical nature of the solute. The structural differences arise from the soluteinduced alteration of the distribution of hydrogen-bonded water species. These structural alterations result in thermodynamic changes which, in turn, are reflected in the differential volumes (Vh  V0), expansibilities (Eh  E0), and compressibilities (Kh  K0) of water of hydration and bulk water (Chalikian 2001; Kharakoz 1989, 1991). The interaction volume, VI, in Eq. (2) reflects the contribution of hydration to the partial molar volume, V. The differential partial molar volume of water of hydration and bulk water, (Vh  V0), is negative for charged and polar functional groups but positive for nonpolar groups (Chalikian 2001). Hence, for highly charged solutes, such as nucleic acids, the magnitude of the interaction volume, VI ¼ nh(Vh  V0), is negative and proportional to the hydration number, nh. For large, complex biological molecules, exposure of a solute to high hydrostatic pressure leads to decreases in the intrinsic volume, VM, by elimination of intramolecular voids, and interaction volume, VI, by increasing the hydration number, nh. However, pressure acts in a way as to cause an overall decrease in V and not necessarily in each term on the right-hand side of Eq. (2). For a system that exists in two conformational states (in this case, the folded and unfolded states), the differential free energy is given by dΔG ¼ ΔVdT  ΔSdT, where ΔH and ΔS are, respectively, the differential partial molar enthalpies and entropies of the two states. Integration and a second-order Taylor series expansion of this relationship with respect to T and P in the vicinity of a reference temperature, T0, and pressure, P0, yields: ΔGðT, PÞ ¼ ΔHM 1 

T TM

þ ΔCP T  TM  Tln

T TM

þ ðΔVðT0 , P0 Þ þ ΔEðT  T0 ÞÞðP  P0 Þ  0:5KT P2  P20

ð3Þ

where ΔHM is the differential partial molar enthalpy of the unfolded and folded states at the transition temperature, TM, and reference pressure, P0; ΔCP, ΔE, and ΔKT are the differential partial molar heat capacity, expansibility, and isothermal compressibility, respectively; and ΔV(T0, P0) is the differential partial molar volume at reference temperature, T0, and pressure, P0. The reference temperature, T0, and pressure, P0, are customarily set to room temperature (25  C) and ambient pressure (1 bar), respectively. As is seen from Eq. (3), differential expansibility, ΔE ¼ @ΔV @T P , and compress@ΔV ibility, ΔKT ¼  @P T, are essential for describing the stability of a biopolymer as a function of temperature and pressure. Molecular insights into the hydration, intrinsic packing, and dynamic fluctuations of solute molecules provided by these parameters are complementary to those provided by volume (Chalikian 2001; Chalikian and Macgregor 2007).

86

T. V. Chalikian and R. B. Macgregor Jr.

Pressure Effects on Canonical Duplex DNA Changes in Volume, ΔV In the initial studies of the effect of hydrostatic pressure on the conformational stability of nucleic acids, the behavior of the molecules was not monitored directly under conditions of high pressure (Heden et al. 1964). Instead, the samples were subjected to heat at different pressures in solutions containing formaldehyde. The latter prevents unfolded structures from refolding into the double-helical state. The samples were subsequently returned to ambient conditions and the mole fractions of the folded and pressure-induced unfolded DNA states were evaluated. It was established that hydrostatic pressure had a limited effect on the stability of the double-stranded structure. In two subsequent publications, the effect of pressure on the conformational stability of nucleic acids was monitored directly using spectroscopy (Hughes and Steiner 1966; Weida and Gill 1966). Weida and Gill characterized the effect of pressure on calf thymus DNA based on the temperature dependence of its circular dichroism (CD) spectrum at pressures of up to 2700 bar (Weida and Gill 1966). These authors reported a positive change in volume, ΔV, accompanying the unfolding transition of +4.5 cm3 mol1 using the Clapeyron equation: dTM ΔV ¼ TM dP ΔHM

ð4Þ

In this equation, TM is the transition temperature, P is the hydrostatic pressure, and ΔHM is the transition enthalpy at TM. As ΔV is positive, pressure stabilizes the duplex relative to the coil state of calf thymus DNA (Weida and Gill 1966). Hughes and Steiner explored the effect of hydrostatic pressure on double and triple-stranded RNA polymers, poly(rA)poly(rU) and poly(rU)poly(rA)poly(rU), by measuring the change in the optical absorption of these molecules as a function of temperature at elevated pressures (Hughes and Steiner 1966). These authors observed that, in solutions containing potassium ions, pressure modestly destabilizes the RNA polymers. These pioneering works led to the widespread use of spectroscopic techniques in studying the pressure dependences of the stability of nucleic acids. In general, changes in molar volume associated with helix-to-coil transitions have been calculated using the Clapeyron equation, although single-molecule methods have been increasingly employed recently. Tables 1 and 2 present compilations of literature data on the pressure effect on the stability of polymeric DNA duplexes (Gunter and Gunter 1972; Hawley and MacLeod 1974, 1977; Hughes and Steiner 1966; Macgregor et al. 1996; Nordmeier 1992; Shi and Macgregor 2007; Weida and Gill 1966; Wu and Macgregor 1993, 1995). According to Eq. (1), the positive sign of a change in volume accompanying strand separation (also called denaturation or melting) signifies pressure-induced stabilization of the folded structure and vice versa. Inspection of Table 1 reveals that the melting transitions of genomic and

3

The Effect of Pressure on the Conformational Stability of DNA

87

Table 1 Thermodynamic data on the effect of pressure on the stability of synthetic DNA and RNA polymeric duplexes ΔTM/ΔP ( C kbar1)

ΔV (cm3 mol1) 1.0

0.05

1.07

0.96

NaCl

0.15

+2.43

NaCl

0.02 0.05 0.2 1.0 0.02 0.05 0.20 0.01 0.05 0.20 0.02 0.05 0.2 1.0 0.02 0.05 0.20 1.0 0.052 0.107 0.30 1.0 0.075 0.27 1.0

DNA Poly(rA) poly(rU)

Salt KCl

[Salt] (M) 0.05

Poly(rA) poly(rU) Poly (dAdT) poly (dAdT) Poly (dAdT) poly (dAdT)

K+

Poly(dA) poly(dT)

NaCl

CsCl

Poly (dAdT) poly (dAdT)

KCl

CsCl

Poly (dGdC) poly (dGdC)

NaCl

Poly(dIdC) poly(dIdC)

NaCl

TM ( C) 50

48.4  0.1 55.0  0.2 64.3  0.3 72.5  0.6 52.7 59.2 73.1 61.2  0.3 68.6  0.2 78.2  0.2 49.2  0.5 54.3  0.1 64.2  0.1 73.4  0.2 49.8  0.2 54.8 63.9 73.9 106.5  0.3 110.1  0.4 114.7  0.5 115.8  0.4 51.9 59.2 63.2

0.36  0.08 0.93  0.17 2.26  0.024 3.86  0.46 2.49  0.1 3.15  0.2 3.86  0.08 2.90  0.16 3.63  0.12 4.49  0.11 0.61  0.45 1.62  0.14 2.09  0.39 3.70  0.25 1.02  0.16 4.2 6.21 6.75 4.51 4.79 5.01 6.41 0.28 1.36 2.64

0.36  0.09 0.90  0.20 2.14  0.35 3.57  0.62 2.60  0.13 3.44  0.24 4.59  0.15 3.46  0.21 4.41  0.20 5.60  0.23 0.61  0.45 1.58  0.24 1.98  0.45 3.42  0.49 1.01  0.20 4.1 5.9 6.2 4.80  0.56 5.16  0.67 5.50  0.74 6.03  0.76 0.26  0.47 1.25  0.15 2.39  0.27

Reference Hughes and Steiner (1966) Gunter and Gunter (1972) Hawley and MacLeod (1974) Wu and Macgregor (1993)

Shi and Macgregor (2007) Najaf-Zadeh et al. (1995)

Wu and Macgregor (1995) Macgregor et al. (1996)

synthetic DNA duplexes are all accompanied by increases in volume as can be judged by either the values of ΔV or the positive sign of the slopes ΔTM/ΔP. In contrast, the melting of poly(rA)poly(rU) RNA duplex causes a decrease in volume. Interestingly, the unfolding volume, ΔV, for the RNA triplex poly(rU)poly(rA)poly (rU) is negative, while that for its DNA counterpart poly(dT)poly(dA)poly(dT) is positive. In general, the effect of pressure on the stability of DNA and, hence, the

88

T. V. Chalikian and R. B. Macgregor Jr.

Table 2 Thermodynamic data on the effect of pressure on the stability of genomic DNA duplexes Source of DNA Calf thymus

Cation concentration 0.03 M NaCl

TM ( C) 77

0.05 M K+

Cl. perfringens

Fraction GCa 0.0 0.28 0.35 0.42 0.47 0.48 0.50 0.72

[KCl] (M) 0.005 0.02 0.05 0.2 0.5 [NaCl] (M) 0.01 0.05 0.12 0.36 1.08 3.6 0.16 M Na+

63.6 71.3 76.4 84.2 89.3

ΔTM/ΔP ( C kbar1) 4.49

ΔV (cm3 mol1) 4.5

2.34

2.7

0.46 1.4 2.0 2.9 3.5

0.51 1.58 2.27 3.32 4.02

Reference Weida and Gill (1966) Gunter and Gunter (1972) Nordmeier (1992)

Hawley and MacLeod (1974) 63.5 72.6 78.5 84.9 90.7 92.1

0.54 2.0 2.6 3.8 4.1 4.6 Hawley and Macleod (1977)

64.5 78.4 84.0 86.3 89.4 89.0 90.3 97.1

2.43 2.98 3.30 3.21 3.60 3.60 3.56 4.12

a

This study used genomic DNA from different organisms with differing fractional content of GC residues

value of ΔV depends on temperature as reflected in the differential partial molar expansibility, ΔE, of the folded and unfolded states. Combined studies from the Chalikian and Macgregor laboratories (Chalikian et al. 1999; Rayan et al. 2009; Wu and Macgregor 1993) have revealed that the transition volume, ΔV, for homopolymeric and alternating copolymeric DNA, RNA, and DNA/RNA hybrid duplexes containing combinations of dA, rA, dT, and rU bases changes linearly with temperature and only weakly depends on the identity of the duplex. These data are plotted against temperature in Fig. 1. The temperature dependence of ΔV in Fig. 1 can be presented as follows: ΔVðTÞ ¼ ΔVref þ ΔEðT  Tref Þ

ð5Þ

3

The Effect of Pressure on the Conformational Stability of DNA

89

5

3

D V, cm mol

-1

10

0

-5

-10

0

20

40

60

80

100

120

T, °C

Fig. 1 The dependence of volume change (ΔV) associated with helix-to-coil transitions on temperature for double-stranded polynucleotide systems. Solid squares („) correspond to ΔV values determined for thermal denaturation of the A/U and A/T polynucleotides (Chalikian et al. 1999); solid circles (z) correspond to ΔV values derived from the titration of poly(rA) with poly (rU) (Chalikian et al. 1999); triangles correspond to ΔV values determined from high pressure UV-melting experiments (Wu and Macgregor 1993, 1995) for poly(dAdT)poly(dAdT) (), poly (dA)poly(dT) (), and poly(dGdC)poly(dGdC) (). The data are presented in terms of per nucleotide

with the average change in expansibility, ΔE, equal to 0.15  0.03 cm3 mol1 K1 (expressed per mole of nucleotide) (Chalikian et al. 1999). For a reference temperature, Tref, of 25  C, the value of Vref is 3.5 cm3 mol1 (expressed per mole of nucleotide) (Dubins et al. 2001). Following the trend in which increasing cation concentrations lead to greater duplex stability, the ΔV of strand separation should increase with salt according to Eq. (5). The nature of the stabilizing cation does not appear to play a significant role in the behavior of double-stranded polymers. Although the trend described in Eq. (5) was derived for duplexes containing only A and T or U nucleobases, inspection of data presented in Tables 1 and 2 reveals that the poly(dGdC)poly(dGdC) and poly(dIdC)poly(dIdC) duplexes as well as genomic duplexes with mixed AT and GC content qualitatively follow this trend (Macgregor 1998).The relationship given in Eq. (5) implies that the temperature dependence of differential volume of the coil and helix states passes zero at ~50  C. In accordance with Eq. (5), duplexes with a TM above ~50  C will exhibit an increase in thermal stability with increasing pressure, while those with a TM below ~50  C will be destabilized by pressure. Thus, the question of stabilization or destabilization of a polymeric duplex by pressure should be considered within the context of a specific temperature domain. The same duplex may be stabilized by pressure at lower temperatures, where ΔV < 0, and stabilized at a higher temperature, where ΔV > 0. In line with this notion, at experimental conditions resulting in TM < 50  C, polymeric duplexes exhibit a decrease in TM with increasing pressure (Dubins et al.

90

T. V. Chalikian and R. B. Macgregor Jr.

2001; Rayan and Macgregor 2005). This is reported for the RNA duplex, poly(rA) poly(rU); the DNA duplexes, poly(dIdC)poly(dIdC) and poly(dAdT)poly(dAdT); and the DNA/RNA hybrid duplex poly(dA)poly(rU). By virtue of the negative value of ΔV associated with the denaturation of poly(dIdC)poly(dIdC), poly(dAdT)poly (dAdT), and poly(dA)poly(rU), these duplexes undergo a pressure-induced helix-tocoil unfolding transitions at near-ambient temperatures under appropriate conditions (see Fig. 2) (Chalikian and Macgregor 2007; Dubins et al. 2001; Rayan and Macgregor 2005). For some polymeric DNA duplexes, experimental conditions at which the TM is lower than 50  C do not exist. Under the experimental conditions of the works quoted in Tables 1 and 2, the melting temperatures of the polymeric duplexes are well above 50  C, with exception of the RNA duplex, poly(rA)poly(rU). ConseM quently, for these duplexes, the values of dT dP are positive, and the transition volumes, ΔVM, at TM are positive ranging from +0.36 to +6.20 cm3 mol1 (expressed per mole of nucleotide). Given the relatively small magnitudes of transition volumes, ΔV (either positive or negative), the conformational stability of double-stranded nucleic acids is weakly sensitive to pressure. This contrasts with the behavior of globular proteins, lipid structures, and some of the other nucleic acid structures discussed below. The observed insensitivity has at least two molecular origins. Firstly, there are no functional groups on nucleic acids whose state of ionization changes upon unfolding at the pH’s, pressures, and temperatures used in the measurements. Secondly, there

1.0

0.8

a

0.6

0.4

0.2

0.0 0.0

0.5

1.0

1.5

2.0

Pressure, kbar

Fig. 2 Pressure dependences of the native (double helical) fraction, α, of poly(dA)poly(rU) at 20  C ({) and 25  C (pH 6.7; 28 mM Na+) (z); poly(dAdT)poly(dAdT) at 25  C (pH 6.7; 5.2 mM Na+) („); and poly(dIdC)poly(dIdC) at 25  C (pH 6.7; 5.2 mM Na+) (…). The plots for poly(dA) poly(rU) were calculated from the extinction coefficient-versus-pressure data presented in (Dubins et al. 2001), while plots for poly(dAdT)poly(dAdT) and poly(dIdC)poly(dIdC) were from Rayan and Macgregor (2005)

3

The Effect of Pressure on the Conformational Stability of DNA

91

are no voids in the folded or unfolded states of duplex nucleic acids. Changes in the state of ionization of abnormally titrating groups and/or the presence of intramolecular voids contribute to the enhanced pressure sensitivity of other biomolecular systems such as globular proteins.

Changes in Expansibility, ΔE, and Compressibility, ΔKS Changes in expansibility, ΔE, accompanying the unfolding transitions of polymeric and oligomeric duplexes have been determined using three experimental approaches (Chalikian et al. 1999; Dragan et al. 2009; Rayan et al. 2009; Son et al. 2014). Firstly, ΔE was calculated from the temperature dependence of the volume change accompanying duplex formation when a DNA strand is added to the complementary strand (Chalikian et al. 1999). Secondly, ΔE was determined from pressure-perturbation calorimetric (PPC) measurements (Dragan et al. 2009; Rayan and Macgregor 2005). And thirdly, ΔE was obtained from the differential post- and pre-denaturation baselines of ΔV-versus-T melting profiles (Son et al. 2014). Table 3 presents the values ΔE for one polymeric and one oligomeric duplex. These values are all positive, which is consistent with the positive sign of the temperature slope of ΔV in Fig. 1. Interpretation of expansibility data is generally given based on the relationship ΔE ¼ ΔEM þ Δnh(Eh  E0), where EM is the intrinsic expansibility of the DNA molecule and Eh and E0 are the partial molar expansibilities of water of hydration Table 3 Changes in expansibility, ΔE, and adiabatic compressibility, ΔKS, accompanying the heat-induced helix-to-coil transitions of nucleic acid duplexes or resulting from the addition of an oligomeric/polymeric single strand to a complementary single strand. The measurements were carried out at atmospheric pressure, and the values are presented in terms of per mole of nucleotide DNA/RNA Poly(rA)poly(rU)a

Poly(rA)poly(rU)a Poly(dA)poly(rU)a Poly(rA)poly(dT)a Poly(dA)poly(dT)a Poly(dAdT)poly (dAdT)a Poly(dAdT)poly (dAdT)b d(GGCATTACGG)/d (CCGTAATGCC)c a

T ( C) 20.0 30.0 40.0 48.0 30.5 55.0 56.5 51.0

ΔE (cm3 mol1 K1)

44.7

0.085

25.0 43.2

0.030  0.01

ΔKS (104 cm3 mol1 bar1) 23.6  2.0 17.3  2.0 10.2  2.0 6.5  2.0 13.5  2.0 5.5  2.0 3.5  4.0 1.0  2.0

2.5  0.3 1.4  0.4

Determined in 10 mM cacodylic buffer and 20 mM NaCl, pH 6.8 Determined in 5 mM cacodylic buffer and 15 mM NaCl, pH 6 c Determined in 10 mM cacodylic acid buffer and 100 mM NaCl, pH 6.7 b

Reference Chalikian et al. (1999)

Rayan et al. (2009) Son et al. (2014)

92

T. V. Chalikian and R. B. Macgregor Jr.

and bulk water, respectively (Chalikian and Macgregor 2007). Since duplex DNA has no expandable intramolecular voids, its intrinsic expansibility, EM, is nearly zero. A change in intrinsic expansibility, ΔEM, accompanying duplex dissociation should be also close to zero and can be neglected in the analysis. The differential partial molar expansibility of water of hydration and bulk water, (Eh  E0), is positive for all atomic groups (Kharakoz 1989). Hence, the positive sign of ΔE for DNA unfolding is consistent with an increase in hydration as expressed in an increase in hydration number (Δnh > 0). Additional contributions to the value of ΔE may originate from the relaxation component that arises from the heterogeneity of the ensemble of unfolded conformations with respect to volume and enthalpy (Son et al. 2014). When analyzing the pressure dependence of the differential volume, ΔV, of the unfolded and folded DNA states, one needs to consider the differential isothermal compressibility, ΔKT. Currently, there are no experimental methods which are suitable for measuring the values of ΔKT in biomolecular studies. In contrast, the change in adiabatic compressibility, ΔKS, can be determined easily by combining densimetric and ultrasonic velocimetric measurements (Sarvazyan 1991). For measuring true adiabatic compressibility, the period of ultrasonic waves (the time interval between compression and decompression in the ultrasonic wave) should be much higher than the relaxation time of structural transitions in the solute and solvent that are accompanied by changes in volume. Otherwise, the measured property is the pseudoadiabatic compressibility. In studies conducted in aqueous solutions, ΔKT and ΔKS are often used interchangeably given the large heat capacity and small expansibility of water (Blandamer et al. 2001). Changes in adiabatic compressibility, ΔKS, associated with the melting of polymeric and oligomeric duplexes have been measured by Chalikian and coworkers (Chalikian et al. 1999; Son et al. 2014). These data are presented in Table 3 and graphically plotted in Fig. 3. Inspection of Fig. 3 reveals that ΔKS is negative at low to moderate temperatures but increases with temperature and becomes positive at elevated temperatures (Chalikian et al. 1999; Son et al. 2014). Generally, compressibility data are interpreted within the framework of the relationship ΔKS ¼ ΔKM þ Δnh(Kh  K0), where KM is the intrinsic compressibility of the DNA molecule and Kh and K0 are the partial molar compressibilities of the water of hydration and bulk water, respectively (Chalikian and Macgregor 2007; Chalikian et al. 1994). Due to the absence of compressible voids in the double and single-stranded conformations, the intrinsic compressibility, KM ¼ βMVM, should be close to zero (Chalikian and Breslauer 1998a, b; Chalikian and Macgregor 2007; Chalikian et al. 1994). The differential partial molar adiabatic compressibility of water of hydration and bulk water (Kh  K0) for all functional groups of biopolymers is strongly temperature-dependent (Chalikian et al. 1994; Kharakoz 1991). At room temperature, the values of (Kh  K0) are negative for most groups. Thus, the negative values of ΔKS observed at room temperature are consistent with an increase in hydration (Δnh > 0). As for the case of expansibility, ΔKS may be contributed by the relaxation component (Son et al. 2014).

3

The Effect of Pressure on the Conformational Stability of DNA

93

0 -5

3

-1

cm mol bar

-1

5

D K S , 10

-4

-10 -15 -20 -25 -30 10

20

30

40

50

60

70

T , °C

Fig. 3 The dependence of volume change (ΔKS) associated with helix-to-coil transitions on temperature for double-stranded polynucleotide systems. Solid squares („) correspond to ΔKS values determined for thermal denaturation of the A/U and A/T polynucleotides (Chalikian et al. 1999); solid circles (z) correspond to ΔKS values derived from the titration of poly(rA) with poly (rU) (Chalikian et al. 1999). The data are presented in terms of per nucleotide

One estimate of the intrinsic coefficient of isothermal compressibility, βM, has come from the pressure-dependent X-ray crystallographic studies of the doublestranded structure formed by the palindromic oligonucleotide d(GGTATACC)2 (Girard et al. 2007). The length of an average base pair step was estimated to shrink by 11.5% from 2.92 Å at ambient pressure to 2.73 Å at 13.9 kbar; thus, the DNA helix behaves as a molecular spring (Girard et al. 2007). The intrinsic coefficient of isothermal compressibility, βT, of the DNA at atmospheric pressure is 21.5  106 bar1 (Girard et al. 2007). This value is on the order of ~50% of the value of the compressibility of water (45  106 bar1). However, the magnitude of this parameter cannot be attributed solely to the intrinsic compressibility of the DNA molecule. The compressibilities of intermolecular contacts and the intra-crystal liquid phase, including the water of DNA hydration, also make important contributions to this value.

Pressure-Temperature Stability Phase Diagram of Duplex DNA Using Eq. (3), one can construct a temperature-pressure stability phase diagram of duplex DNA. The temperature dependence of the pressure at which DNA denatures, PM, is described by (Chalikian and Macgregor 2021; Liu et al. 2021): PM ¼

ΔVðT0 Þ þ ΔEðT  T0 Þ  ΔKT

p

D

ð6Þ

94

T. V. Chalikian and R. B. Macgregor Jr.

12000

PM , bar

10000 8000

10 °C 20 °C 30 °C 40 °C 60 °C 80 °C 100 °C

SS

6000

DS

4000 2000 0 -200

SS

-150

-100

-50

0

50

100

150

T, °C

Fig. 4 Phase diagram for the helix-coil transition of double-stranded nucleic acid polymers (Dubins et al. 2001). The denaturation pressure, PM, is plotted as a function of temperature, T, for several values of helix-coil transition temperature, TM, at atmospheric pressure: 10  C (solid line); 20  C (dashed line); 30  C (dotted line); 40  C (dash-dot); 60  C (dash-dot-dot); 80  C (short dash); 100  C (short dot). In the figure, SS denotes the single-stranded conformation and DS denotes the double-stranded conformation

where D ¼ ½ΔVðT0 Þ þ ΔEðT  T0 Þ2 þ 2ΔKT ΔHM 1 

T TM

 ΔCP T  TM  T ln

T TM

:

. Figure 4 presents the temperature-pressure stability phase diagram of a polymeric DNA duplex calculated according to Eq. (6) (Dubins et al. 2001). Inspection of Fig. 4 reveals that the temperature-pressure stability diagram of duplex DNA is conspicuously distinct from the elliptical diagram characteristic of globular proteins (Hawley 1971; Scharnagl et al. 2005). The stability phase diagram of a polymeric duplex is an intricate function of temperature and pressure and critically depends on the melting temperature, TM, at atmospheric pressure. The pressure dependence of the stability of DNA duplexes with a TM below 50  C is qualitatively different from that of duplexes with a TM above 50  C (Dubins et al. 2001). Duplexes with TM < 50  C become destabilized when pressure increases from 1 to 2000 bar with ΔTM/ΔP < 0. In contrast, they are stabilized at higher pressures when ΔTM/ΔP changes its sign and becomes positive. In contrast, duplexes, which exhibit TM > 50  C, are stabilized at pressures from 1 to 2000 bar (with ΔTM/ΔP > 0) and become destabilized at higher pressures (with ΔTM/ΔP < 0). For all polymeric duplexes, independent of TM at atmospheric pressure, there is a pressure domain

3

The Effect of Pressure on the Conformational Stability of DNA

95

within which they undergo cold denaturation (Dubins et al. 2001). However, the temperatures of cold denaturation are typically on the order of 100  C or lower and, therefore, not experimentally attainable. In addition, although the stability diagram in Fig. 4 includes pressures above 10 kbar, a note of caution is in order in this respect. The diagram was computed under the assumption of ΔKT being independent of pressure; this is an oversimplification and may produce erroneous results at elevated pressures.

Effects of Cations, Cosolvents, and Sequence and Length of Oligomeric DNA on Transition Volume, ΔV In contrast to polymeric nucleic acids, studies of synthetic oligomeric structures offer the advantage of allowing a direct assessment of the effect of sequence and length on the volumetric properties of the molecule. An important distinction between a polymer and an oligomer is the contribution of the end effects to the observed properties of the latter. The thermodynamic behavior of the two terminal base pairs at each of the two ends of double-stranded molecules differs significantly from that of the rest of the base pairs. With shortening of a double-stranded oligonucleotide, the contribution of its terminal bases to the overall properties of the molecule becomes increasingly important. Table 4 presents data on the effect of hydrostatic pressure on the stability of poly (dA)poly(dT) and three compositionally similar oligonucleotides, (dA)n(dT)n, where n ¼ 11, 15, or 19 (Macgregor 1996). The polymer and the three oligonucleotides exhibit an increase in stability within increasing pressure consistent with ΔV being positive. The extent of the pressure-induced stabilization diminishes with the length of the oligonucleotide. The decreasing effect of pressure on the stability of oligomers was attributed to the volumetric properties of the terminal bases which form lessstable pairs and display reduced stacking. The volume contribution of the unfolding of terminal base pairs, ΔVF, was estimated to be 12 cm3 mol1 based on the

Table 4 The effect of pressure on the helix-to-coil transition of dAdT oligonucleotides of different chain lengths in solutions containing 50 mM NaCl (Macgregor 1996). ΔVt is the change in molar volume for the unfolding of the base pairs distant from the ends of the oligomer. ΔVF is the volume change of the helix-coil transition of the “frayed” base pairs at the end of the oligomer. For the polymer, the volume change is calculated for the cooperative unit. For the oligonucleotides, the volume changes reflect the denaturation of the entire structure ΔTM/ΔP (100  C kbar1) 3.15  0.2



Poly(dA)poly (dT) dA19dT19 dA15dT15 dA11dT11

2.0  0.1 1.2  0.2 0.7  0.2

TM ( C) 59.2

ΔVt (cm3 mol1) 108

ΔVF (cm3 mol1) –

39.3 31.8 24.1

35 17 7.3

11 13 12

96

T. V. Chalikian and R. B. Macgregor Jr.

Table 5 Changes in volume, ΔV, accompanying the heat-induced helix-to-coil transitions of nucleic acid duplexes or resulting from the addition of an oligomeric single strand to a complementary single strand. The measurements were carried out at atmospheric pressure, and the values are presented in terms of per mole of nucleotide DNA d(CCATCGCTACC)/ d(GGTAGCGATGG)a d(CGCCTAATCG)/d(CGATTAGGCG)b d(CGCCTATATCG)/d(CGATATAGGCG)b d(GGCATTACGG)/d(CCGTAATGCC)c d(GGCATTACGG)/d(CCGTAATGCC)c

T ( C) 20.0

ΔV (cm3 mol1) 6.5  0.5

Reference Marky et al. (1996)

20.0 20.0 25.0 43.2

8.4  0.6 9.5  0.3 0.9  0.3 1.3  0.3

Zieba et al. (1991) Zieba et al. (1991) Son et al. (2014) Son et al. (2014)

a

Determined in 20 mM sodium phosphate, 100 mM NaCl, pH 7.0 Determined in 10 mM sodium phosphate, 100 mM NaCl, pH 7.0 c Determined in 10 mM cacodylic acid, 100 mM NaCl, pH 6.7 b

analysis of the length-dependence of the transition volume, ΔV. The negative sign of ΔVF suggests that increasing pressure favors dissociation of the terminal base pairs. In another study, changes in molar volume accompanying the unfolding of 24 different 22-base pair oligonucleotides were reported as part of a kinetic study discussed in more detail below (Dubins and Macgregor 2004). The observed ΔV of denaturation ranged from 14 to +22 cm3 mol1 with no apparent correlation with TM which varies between 40.0  C and 55.5  C (Dubins and Macgregor 2004). In a few studies, changes in volume accompanying formation of double-stranded oligonucleotides were measured densimetrically at ambient pressure and different temperatures (see Tables 4 and 5) (Macgregor 1996; Marky et al. 1996; Son et al. 2014; Zieba et al. 1991). The results of these studies corroborate the data obtained in experiments conducted at elevated pressure; the values of ΔV for strand separation are positive even if the measurements are performed at temperatures well below 50  C.

Pressure Effects on Noncanonical DNA Structures Hairpins As discussed above, single-stranded oligonucleotides with palindromic sequences can adopt a hairpin structure containing a single-stranded loop that covalently connects the double-stranded stem region. The properties of the bases at either end of the stem differ from the properties of interlaying base pairs. The two base pairs adjacent to the loop are termed the nucleation base pairs, while the base pairs farthest from the loop are termed the terminal base pairs. Although the pressure dependence of the conformational preferences of internal bases is anticipated to be similar to that of a duplex, the pressure response of the loop and the distal bases of the stem are distinct.

3

The Effect of Pressure on the Conformational Stability of DNA

97

Table 6 lists the volumetric parameters calculated from the pressure dependence of hairpin oligonucleotides with the general sequenced (GGATXXL1L2L3L4YYATCC), where XX and YY denote the complementary AT/TA or AA/TTbase pairs at the nucleation position of the hairpin (Amiri and Macgregor 2011). To study the response of the nucleation base pairs to hydrostatic pressure, the sequence of the DNA molecules was changed systematically. As can be seen in Table 6, the identities of the nucleation and internal bases are crucial for determining the pressure response of the hairpins. Transition volumes calculated using Eq. (4) range from 2.35 cm3 mol1 for d(GGATAATGGTTTATCC) in 10 mM NaCl (TM ¼ 37.9  C) to +6.47 cm3 mol1 for d(GGATATCCCCATATCC) in 100 mM NaCl (TM ¼ 49.3  C). Again, for these structures, there is no obvious correlation between the observed changes in volume and the transition temperatures. NMR spectroscopy has been used to characterize the structural response of the hairpin formed by d(CTAGAGGATCCTUTTGGATCCT) to an increase in pressure (Wilton et al. 2008). Over a pressure range of 3–2000 bar, the hairpin does not undergo a pressure-induced transition, retaining its folded conformation. The highand low-pressure structures of the hairpin are nearly identical (Wilton et al. 2008). The intrinsic volume of the hairpin decreases from 7174 Å3 at 3 bar to 7171 Å3 at 2000 bar; however, a change of 0.042% is not statistically significant. One ramification of this result is that the partial molar isothermal or adiabatic compressibility of duplex DNA in solution can be considered to result entirely from hydration thereby reflecting the differential compressibility of water of DNA hydration and bulk water. Single-molecule Foerster-resonance energy transfer (smFRET) techniques performed as a function of pressure offer an alternative way to study the volumetric properties of macromolecular structures. The smFRET studies of hairpin structures carried out to date have employed DNA hairpins with single-stranded loops consisting of 30 or more nucleotides (Arns et al. 2019; Patra et al. 2017, 2018, 2019; Sung and Nesbitt 2020a). Long loops render the timescale of folding/ unfolding transitions of hairpins amenable to the use of the smFRET technique. In this respect, smFRET studies differ from studies employing ensemble measurements; in the latter, hairpin loops generally comprise no more than four or five nucleotides. Winter and colleagues have employed smFRET measurements to characterize the pressure response of a DNA hairpin with a stem of 6 GC base pairs and a loop of 32 adenine residues (Arns et al. 2019; Patra et al. 2017, 2018, 2019). At pH 7.5, 15 mM NaCl, and ambient pressure, the hairpin exists as a mixture of the folded and unfolded conformations. An increase in pressure from 1 to 1800 bar shifts the equilibrium towards the unfolded state with the differential volume of the two states, ΔV, being equal to ~18 cm3 mol1 (Patra et al. 2017, 2018, 2019). A qualitatively similar result was obtained for DNA hairpins with stems of 7–10 base pairs and loops consisting of 40 consecutive adenine residues; increasing pressure destabilizes the hairpin structures (Sung and Nesbitt 2020a). The contribution of each base pair in the stem to the transition volume was evaluated to be 2.1  0.4 cm3 (mol bp)1; that is, the ΔV of stem unfolding is negative (Sung and Nesbitt 2020a). These results contrast the results of ensemble measurements of the volumetric properties of

AA/TT

Loop sequence Nucleation stack AT/AT

AA/TT

Loop sequence Nucleation stack AT/AT

Na+ (mM) 10 20 50 100 10 20 50 100

Na+ (mM) 10 20 50 100 10 20 50 100

TA2T TM ( C) 42.1 43.2 44.6 46.1 40.2 41.5 43.3 44.7 C4 TM ( C) 42.4 43.7 46.1 49.3 42.9 44.1 45.6 47.2 ΔV (cm3 mol1) 3.07  0.13 4.10  0.20 5.58  0.25 6.47  0.38 1.38  0.07 1.67  0.08 2.18  0.10 2.45  0.10

ΔV (cm3 mol1) 0.44  0.04 0.18  0.08 0.83  0.04 1.46  0.32 1.96  0.08 1.15  0.07 0.19  0.04 0.74  0.10

TG2T TM ( C) 42.8 44.0 45.5 46.8 37.9 41.1 43.5 45.1 G4 TM ( C) 44.0 45.1 46.5 47.7 38.4 40.0 42.0 43.7 ΔV (cm3 mol1) 1.72  0.09 0.29  0.06 2.79  0.19 4.52  0.22 0.83  0.04 0.22  0.06 1.44  0.09 2.68  0.11

ΔV (cm3 mol1) 1.41  0.14 0.25  0.05 1.55  0.21 2.89  0.20 2.35  0.13 0.86  0.04 0.85  0.04 2.14  0.09

TC2T TM ( C) 44.9 46.1 48.7 51.3 44.0 45.4 47.9 49.9 T4 TM ( C) 43.9 45.8 48.2 51.0

ΔV (cm3 mol1) 1.38  0.07 2.30  0.13 3.92  0.20 5.40  0.22

ΔV (cm3 mol1) 1.81  0.29 2.27  0.10 3.05  0.21 3.76  0.18 0.78  0.08 1.18  0.09 1.75  0.19 2.35  0.17

Table 6 Changes in volume accompanying the heat-induced unfolding of DNA hairpins with heterogeneous and homogeneous loop sequences from Amiri and Macgregor (2011)

98 T. V. Chalikian and R. B. Macgregor Jr.

3

The Effect of Pressure on the Conformational Stability of DNA

99

hairpins with shorter loops which produced positive values of ΔV for base pairs in the stem (Amiri and Macgregor 2011). The molecular basis of the discrepancy is unclear; the influence of the long dA40 loop remains to be studied. SmFRET measurements have been also applied to studying the stability of DNA hairpins in solutions containing co-solvents as a function of pressure (Arns et al. 2019; Patra et al. 2017, 2018, 2019). A DNA hairpin with a stem of six base pairs and a loop of 34 adenine residues has been studied in the presence and absence of TMAO, glycine, urea, and glycine betaine (Arns et al. 2019; Patra et al. 2017, 2018, 2019). In the absence of cosolvents, pressure destabilizes the folded form of the hairpins (Patra et al. 2017). The addition of TMAO to the solution stabilizes the hairpin against pressure-induced denaturation. In contrast, urea, glycine, and glycine betaine destabilize the hairpin structure. The destabilizing effect of the preferentially bound cosolvent urea is synergistic to that of pressure as the differential volume of the folded and unfolded states of the hairpin decreases from 17.7 cm3 mol1 in the absence of urea to 29.1 cm3 mol1 in 1 M urea (Patra et al. 2019). TMAO offsets the destabilizing influence of glycine and glycine betaine in a mixture of TMAO, glycine, and glycine betaine (Patra et al. 2018, 2019). This observation is consistent with TMAO being excluded from the surface of DNA which leads to preferential hydration of DNA and protects it from the destabilizing influence of temperature, pressure, and denaturing cosolvents (Arns et al. 2019; Patra et al. 2017, 2018, 2019). Taken together, the data accumulated to date show that the sign and magnitude of the effect of pressure on the conformational stability of hairpins vary widely. Given that the hairpin stem is duplex DNA and that pressure has only a modest effect on the stability of duplex DNA, the observed variation in the values of ΔV underscores the importance of the loop to the hairpin’s structural response to pressure. The extent to which the long multi-adenine loops influence the properties of these molecules is still poorly understood.

Z-DNA The effect of pressure on the equilibrium between the B- and Z-forms of the poly (dGdC)poly(dGdC) duplex has been investigated experimentally and computationally (Krzyzaniak et al. 1991; Norberg and Nilsson 1996). Experiments at elevated hydrostatic pressures performed in the presence of 150 mM NaCl revealed that an increase in pressure shifts the equilibrium toward the Z-conformation (Krzyzaniak et al. 1991). In contrast, in another investigation carried out at molar concentrations of NaCl, pressure was found to shift the equilibrium toward the B-conformation with the differential volume of the B- and Z-conformations being equal to 26 cm3 mol1 (Macgregor and Chen 1990). Intermediate salt concentrations have not been explored. Given the significant discrepancies between the conditions of the two experiments, it is difficult to make a definitive assessment of the effect of pressure on the B-to-Z transition. Molecular dynamics simulations have been applied to investigating the conformational stability of the d(GCGCGCGCGCGC)DNA dodecamer and the

100

T. V. Chalikian and R. B. Macgregor Jr.

r(GCGCGCGCGCGC) RNA dodecamer at 1 and 6000 atm (6079 bar) (Norberg and Nilsson 1996). These two oligonucleotides have sequences that potentiate the formation of the left-handed Z-conformation. At atmospheric pressure, the DNA oligonucleotide was in the B-conformation, while the RNA oligonucleotide was in the A-conformation. The simulations did not produce any evidence of the A-to-Z or B-to-Z transition at elevated pressures. At the higher pressure, the conformational equilibrium of the DNA dodecamer shifts to the A-conformation, while the RNA dodecamer maintains the A-conformation.

Three-Stranded DNA Some of the earliest measurements of the effect of hydrostatic pressure on the conformational stability of nucleic acids were carried out on three-stranded RNA polymers formed by poly(rA) and poly(rU) (Gunter and Gunter 1972; Hughes and Steiner 1966). As shown in Table 7, the poly(rU)poly(rA)poly(rU) RNA triplex is modestly destabilized when pressure increases. The TM of the triplex was near 50  C, which is the temperature below which double-stranded DNA polymers exhibit pressure destabilization (Dubins et al. 2001). On the other hand, the poly (dT)poly(dA)poly(dT) DNA triplex with a TM above 80  C exhibits pressureinduced stabilization (Shi and Macgregor 2007). Table 8 presents molar changes in volume accompanying dissociation transitions of triple-stranded DNA structures formed by the following DNA strands (Lin and Macgregor 1996; Wu and Macgregor 1993): These oligonucleotides associate to form the R1:Y1:Y2 triplex which can undergo two types of dissociation transitions. In the first transition, the third strand, Y2, dissociates to form the R1:Y1 duplex and a single strand. The second transition results in incomplete dissociation of the triplex molecule with the formation of three singleR1: d(AAAGGAGGAGAAGAAGAAAAAA)

Scheme 1

Y1: d(TTTTTTCTTCTTCTCCTCCTTT) Y2: d(TTTCCTCCTCTTCTTCTTTTTT)

Table 7 Changes in volume accompanying the unfolding transitions of synthetic DNA and RNA polymeric triplexes DNA Poly(rU)poly (rA)poly(rU) Poly(rU)poly (rA)poly(rU) Poly(dT)poly (dA)poly(dT)

Salt KCl

[Salt] (M) 0.10

K+

0.14

NaCl

2.0 3.0

TM ( C) 56

85.1 98.4

ΔTM/ΔP ( C kbar1)

ΔV (cm3 mol1) 2.1

+0.26

+0.35

4.50  0.24 5.80  0.22

7.81  0.5 10.4  0.6

Reference Hughes and Steiner (1966) Gunter and Gunter (1972) Shi and Macgregor (2007)

3

The Effect of Pressure on the Conformational Stability of DNA

101

Table 8 Molar changes in volume accompanying the heat-induced conformational transitions of three-stranded DNA oligonucleotides shown in Scheme 1

R1:Y1:Y2 Ð R1 þ Y1 þ Y2

R1:Y1:Y2 Ð R1:Y1 þ Y2

[NaCl] (M) 1.0

TM ( C) 85.1

ΔV (cm3 mol1) 7.81  0.5

3.0

98.4

10.4  0.6

100

32.5

2.8

Reference Wu and Macgregor (1993) Wu and Macgregor (1993) Lin and Macgregor (1996)

stranded oligonucleotides. Table 8 presents changes in volume associated with both transitions (Lin and Macgregor 1996; Wu and Macgregor 1993). Inspection of data presented in Table 8 reveals a similar volumetric behavior exhibited by triple-stranded and double-stranded oligomeric systems. Both systems are stabilized by increasing pressure with positive changes in volume accompanying unfolding transitions. It is a reasonable assumption that the molecular origins of the enhanced stability of triplex structures at elevated pressures are similar to those of DNA duplexes.

G-Quadruplexes Table 9 presents a compilation of literature data on changes in volume accompanying G-quadruplex to single-strand transitions. The effect of pressure has been reported only for monomolecular G-quadruplexes. All G-quadruplexes studied to date exhibit a decreasing stability with increasing pressure and, consequently, negative changes in volume, ΔV, associated with unfolding transitions (Chalikian and Macgregor 2021; Fan et al. 2011; Li et al. 2017; Molnar et al. 2020; Shek et al. 2014; Takahashi and Sugimoto 2013a, b, 2015, 2017). In addition, the G-quadruplex-to-single-strand transitions are characterized by decreases in adiabatic compressibility, ΔKS, and increases in expansibility, ΔE (Fan et al. 2011; Liu et al. 2021; Shek et al. 2014). These volumetric changes are consistent with an increase in G-quadruplex hydration upon unfolding with a possible relaxation contribution due to the temperature- and pressure-induced shifts in the conformational equilibria of the ensemble of unfolded states (Chalikian and Macgregor 2021; Fan et al. 2011; Liu et al. 2021; Shek et al. 2014). The destabilizing effect of pressure on G-quadruplex structures can be analyzed within the framework of Eq. (2) as arising from the interplay between three molecular events. In the first event, the cations coordinated by the O6 oxygens of the guanine bases are released to the bulk. The release of the cations results in their rehydration with the concomitant contraction of water. Water molecules solvating inorganic ions have a smaller partial molar volume relative to bulk water, although this effect decreases with temperature (Marcus 2011). The second source of the observed pressure sensitivity of G-quadruplex structures arises from the large increase in solvent-accessible surface area of the oligonucleotide upon its unfolding.

102

T. V. Chalikian and R. B. Macgregor Jr.

Table 9 Changes in volume, ΔV, calculated from the pressure dependence of the TM for the unfolding transition of G-quadruplexes of differing sequence, topology, and stabilizing cation determined at temperature, T Sequence (DNA) TBA d(G2T2G2TGTG2T2G2) Tel22 d[A(G3T2A)3G3] Tel22 d[A(G3T2A)3G3] Tel22 d[A(G3T2A)3G3] c-MYC d[TGA(G3TG3TA)2A] KIT d(AG3AG3CGCTG3AG2AG3) VEGF d(T2G4CG3C2G5C-G4T2) Tel26 d[A3(G3T2A)3G3A2] c-MYC d[TGA(G3TG3TA)2A]

Cation K+

T ( C) 58.1  1.4

ΔV (cm3 mol1) 54.6  4.2

Na+

54.6  0.9

54.6  0.9

Reference Takahashi and Sugimoto (2013a) Li et al. (2017)

Na+

40.0  0.6

66  3

Fan et al. (2011)

K+

64.6  2.2

42.7  6.7

Li et al. (2017)

K+

83.4  1.1

16.9  1.8

Molnar et al. (2020)

K+

58.5  0.4

6.2  0.9

Molnar et al. (2020)

K+

78.8  1.1

18.1  4.6

Molnar et al. (2020)

K+

25.0

69  7

Shek et al. (2014))

K+

25.0

34  15

Liu et al. (2021)

An increase in water-DNA contacts leads to an increase in thermal volume, VT, and a decrease in interaction volume, VI. Finally, molecular voids that exist in the folded conformation of the G-quadruplex are eliminated upon its unfolding thereby contributing to a decrease in intrinsic volume, VM. The three events cumulatively lead to a negative change in volume, ΔV, accompanying the G-quadruplex-to-coil transition. Other factors such as the number and type of nucleobases in the loops, topological diversity, the stacking of G-tetrads, and the stacking of bases in the loops with G-tetrads are also likely to contribute to the volume of G-quadruplex unfolding (Li et al. 2017; Takahashi and Sugimoto 2017). Currently, however, it is difficult to estimate with any degree of confidence how these factors contribute to changes in volume accompanying the folding/unfolding transition of a G-quadruplex. Additional studies are required to elucidate the roles of loops, stacking, and folding topology. Equation (6) has been used in conjunction with the experimental values of ΔV, ΔE, and ΔKS to compute a temperature-pressure phase diagram of G-quadruplex stability (Liu et al. 2021). The region of G-quadruplex stability shown in Fig. 5 is elliptical resembling that of a globular protein (Hawley 1971; Luong et al. 2015; Scharnagl et al. 2005). On the other hand, the temperature-pressure phase diagram of G-quadruplex stability is distinct from that of duplex DNA (Dubins et al. 2001). The similarity of the phase diagram of proteins and G-quadruplexes is consistent with

The Effect of Pressure on the Conformational Stability of DNA

Fig. 5 The pressuretemperature phase diagram for the stability c-MYC G-quadruplex in aqueous solution containing 50 mM CsCl and 0.1 mM KCl computed with Eq. (6) (Liu et al. 2021)

103

4 Denatured 2

P, kbar

3

0 Native -2

-4

-6 -40

-20

0

20

40

60

80

T, °C

their structural similarity. Both structures are globular, are characterized by intramolecular voids in the native state that are absent in the unfolded state, and contain buried charges that become solvent-exposed upon unfolding. To probe the role of solvent in pressure-induced unfolding of G-quadruplexes, folded-unfolded equilibria have been investigated in the presence of cosolvents. In particular, the combined effect of pressure and cosolvents on the conformational stability of the human telomeric G-quadruplex has been studied with smFRET (Arns et al. 2019). In an aqueous solution with no added cosolvents, the G-quadruplex structure is destabilized by pressure. This observation is consistent with the results on G-quadruplex stability obtained using ensemble measurements (Fan et al. 2011; Takahashi and Sugimoto 2013a). TMAO was found to stabilize the G-quadruplex conformation; in the presence of 1 M TMAO, significantly higher pressures are required to disrupt the folded state (Arns et al. 2019). The smFRET data also suggest that pressure facilitates a transition of the folded G-quadruplex from the antiparallel to the parallel/hybrid topology at approximately 400 bar (Arns et al. 2019). This transition has not been observed with other experimental techniques. Cosolvents with high molecular weights are used to mimic the extreme crowding in the intracellular environment. The presence of a water-soluble polyethylene glycol, such as PEG200 or PEG4000, leads to a decrease in the effect of pressure on the stability of the folded state of the G-quadruplex formed by the thrombinbinding aptamer (TBA) and its derivatives (Takahashi and Sugimoto 2013a, b, 2017). For instance, the unfolding volume of TBA in a solution containing PEG200 is ~40 cm3 mol1 more positive than that observed in an aqueous buffer (Takahashi and Sugimoto 2013a). This finding has been rationalized in terms of changes in the hydration term, VI, of Eq. (2) (Takahashi and Sugimoto 2013a). It has been proposed that the differential hydration of the loops in the folded and unfolded

104

T. V. Chalikian and R. B. Macgregor Jr.

Table 10 Changes in volume, ΔV, and adiabatic compressibility, ΔKS, accompanying the unfolding transitions of i-motif DNA structures determined at temperature, T Sequence Tel22i-motif d(C3TA2)3C3 Tel22i-motif d(C3TA2)3C3 c-MYCi-motif d(T2AC3AC3TAC3AC3TCA)

pH 4.6

T ( C) 36

ΔV (cm3 mol1) ~0

5.15

45.5

11  2

5.0

25.0

~0

ΔKS (104 cm3 mol1)

~0

Reference Lepper et al. (2019) Somkuti et al. (2020) Liu et al. (2018)

conformations plays a major role in determining the differential value of VI (Takahashi and Sugimoto 2013a).

i-Motif Structures Table 10 presents the available data on changes in volume, ΔV, accompanying the unfolding transitions of i-motif structures. The values of ΔV are close to 0 cm3 mol1 (Lepper et al. 2019; Liu et al. 2018; Somkuti et al. 2020). In addition, a change in adiabatic compressibility, ΔKS, associated with unfolding of the i-motif formed by the C-rich strand from the promoter region of the c-MYC oncogene is also around zero (Liu et al. 2018). A near-zero change in compressibility suggests that the partial molar volumes of the folded (i-motif) and unfolded conformations are nearly identical not only at ambient pressure but also at elevated pressures. In accordance with these results, an increase in pressure causes no appreciable change in TM (Fig. 6) (Lepper et al. 2019; Liu et al. 2018; Somkuti et al. 2020). To the best of our knowledge, the effects of pH and molecular crowding on the pressuredependence of i-motif stability have not been investigated. Such studies would be of interest for deepening our understanding of the molecular origins of the volumetric changes associated with i-motif unfolding while also providing insights into the balance of forces governing the stability of these structures.

Pressure and the Kinetics of Helix Formation The influence of pressure on the rate of chemical reactions depends on the differential molar volume of the initial and transition states: @lnk @P

T

¼

@ ΔG‡ ΔV ‡ ¼ RT @P RT

ð7Þ

where k is the rate constant of a reaction step; ΔG‡ and ΔV‡ are the activation free energy and volume, respectively; P is the pressure; T is the temperature; R is the

3

The Effect of Pressure on the Conformational Stability of DNA

105

1.0 0.8

a

0.6 0.4 0.2 0.0 10

20

30

40

50

60

70

80

90

100

T, °C Fig. 6 Normalized UV melting profiles monitored at 260 nm of the i-motif structure formed by the c-MYC oligonucleotide, d(TTACCCACCCTACCCACCCTCA). The solutions contained 5 μM ODN in a pH 5.0 buffer containing 10 mM acetate buffer and 10 mM NaCl at 1 („), 800 (…), 1200 (z), and 1600 ({) bar (Liu et al. 2018)

universal gas constant; and ΔV‡ is the activation volume (i.e., the differential volume of the initial and transition states). The values of ΔV‡ may be positive or negative. Increasing pressure will decelerate a reaction with a positive activation volume and accelerate a reaction with a negative activation volume. In all cases, ΔV‡ depends on pressure, and its magnitude tends to zero with increasing pressure; for this reason, activation volumes are conventionally reported as the values extrapolated to atmospheric pressure (Isaacs 1981). In aqueous solutions, the molecular factors that influence the activation volume (ΔV‡) are similar to those influencing the reaction volume (ΔV ). Thus, molecular events such as creation of a full or partial charge, a change in solute-solvent interactions, and a gain or loss of a void volume in the transition state may contribute to the activation volume. The pressure-induced changes in solvent properties, such as viscosity or polarity, also may affect ΔV‡. In addition, an increase in the molar concentration of each reactant at elevated pressures can alter the rate of a bimolecular reaction. However, because of the limited range of pressures employed in kinetic studies of nucleic acids and because of the low compressibility of water, such effects are of secondary importance and are conventionally neglected. In a kinetic study involving duplex DNA, one is interested in the effect of hydrostatic pressure on the rate constants. For a generalized helix-to-coil conformational transition we can write:

106

T. V. Chalikian and R. B. Macgregor Jr. k1

Reaction 1 : ss1 þ ss2 Ð ds k1

where ss1 and ss2 are the complementary single strands that associate to form a duplex, ds. The rate constants for the forward and reverse reactions are given by k1 and k1. Reaction 1 models duplex formation as a bimolecular single-step reaction that involves the establishment of a properly aligned, bimolecular nucleation complex of several bases. This is the rate-determining step in helix formation. This mechanism neglects the unimolecular step of helix elongation (zipping up), which is much faster than the bimolecular nucleation step. With the caveat that Reaction 1 lacks mechanistic details, one can use it to calculate the activation volume of helix formation from the effect of pressure on the two rate constants. For bulk samples, the effect of pressure on the kinetics of helix-to-coil transitions has been studied using two approaches. In the pressure-perturbation relaxation kinetics or pressure jump approach, the key experimental observable is the relaxation time (τ). It describes the rate at which the system returns to equilibrium after it has been perturbed by a rapid change in pressure. By studying the concentration dependence of τ, one obtains the rate constants and their pressure derivatives, i.e., activation volumes. The second approach exploits the hysteresis between the thermally induced denaturation and renaturation profiles of double-helical oligonucleotides. For a known rate of temperature change, the difference between the denaturation and renaturation data is related to the kinetics of the helix-to-coil transition. By measuring the hysteresis at different pressures, one determines the rate constants as a function of pressure and, hence, the activation volumes. In a manner analogous to the formation of a double-stranded molecule, the formation of a three-stranded molecule from the oligonucleotides shown in Scheme 1 can be modeled as a single-step bimolecular reaction: Reaction 2 : Y2 þ R1 : Y1 Ð R1 : Y1 : Y2 where R1:Y1 and R1:Y1:Y2 are the duplex and triplex DNA molecules, respectively. The pressure dependence of the kinetics of Reaction 2 studied by pressureperturbation relaxation kinetics and hysteresis methods have produced similar results (Lin and Macgregor 1996, 1997). The activation volume for the forward reaction, i.e., the formation of the three-stranded helix, V‡f, is negative; thus, the rate of formation increases with pressure (see Table 11). For the reverse reaction, the activation volume, V‡u, is positive; thus, increasing pressure slows the dissociation of the triple-stranded molecule. The magnitude of V‡u is roughly three-fold that of V‡f. Although the two activation volumes increase with an increase in temperature (i.e., ΔV‡f/ΔT and ΔV‡u/ΔT are positive), the transition volume ΔV ¼ V‡f  V‡u remains constant within experimental error over the experimental temperature range. To the best of our knowledge, the kinetics of the R1 þ Y1 þ Y2 $ R1 : Y1 : Y2 reaction has not been studied as a function of pressure.

3

The Effect of Pressure on the Conformational Stability of DNA

107

The effect of pressure on the kinetics of the formation of 24 different 22-base pair oligonucleotides was studied by measuring the hysteresis between the temperatureinduced denaturation and renaturation profiles. The measured kinetics was analyzed within the framework of Reaction 1 (Dubins and Macgregor 2004). The rate constants and activation volumes evaluated from these studies were interpreted in terms of a nearest-neighbor model of duplex stability. The details of nearestneighbor analyses are beyond the scope of this article; interested readers are referred to other sources (SantaLucia and Hicks 2004). The values of V‡f, V‡u, and ΔV ¼ V‡f  V‡u have been determined for each of the 24 oligonucleotides and each dinucleotide step. The activation volumes for dinucleotide steps correlate with their respective contributions to changes in solvent-accessible surface area accompanying helix formation. For the 24 oligonucleotides investigated, the activation volumes for helix formation, V‡f, are positive and close to zero (Dubins and Macgregor 2004). In contrast, the activation volumes for helix unfolding, V‡u, are more substantial while also being positive. The measured transition volumes, ΔV¼V‡f – V‡u, range from 20  5 to +10  8 cm3 mol1. The values of ΔV calculated with the Clapeyron equation agree well with the kinetics-based results and range from 22  3 to +14  6 cm3 mol1. For 19 of the 24 DNA molecules, the changes in volume for helix formation are

Table 11 Activation volumes for the triplexduplex equilibrium at temperatures within the transition region (Reaction 2). These data were obtained from an analysis of pressure dependence of the hysteresis between the temperature-induced unfolding and refolding transitions (Lin and Macgregor 1996)

T ( C) 30 31 32 32.5a 33 34 35 36 37 38 39 40 a

V‡f (cm3 mol1) 13.4  2.7 12.7  2.5 12.1  2.4 11.8  2.4 11.5  2.3 10.9  2.2 10.2  2.1 9.6  1.9 9.0  1.8 8.4  1.7 7.8  1.6 7.1  1.4

V‡u (cm3 mol1) 37.0  4.6 38.2  4.8 39.3  4.9 39.9  5.0 40.4  5.1 41.3  5.2 42.5  5.3 43.7  5.5 44.6  5.6 45.7  5.7 46.7  5.8 47.9  6.0

TM at atmospheric pressure

Table 12 The activation volumes for the folding, V‡f, and unfolding, V‡u, reactions of the hairpin structure formed by d(50 -Cy5-TCTTCAGT-A40-Cy3-ACTGAAGA-A10-biotin-30 ) at different concentrations of sodium ions (Sung and Nesbitt 2020b) [Na+] (mM) V‡f (cm3 mol1) V‡u (cm3 mol1)

25 26 2 19 2

50 22.1  16 10.3  6

75 19.6  11 6.6  7

100 20.4  10 6.2  7

In addition to the sodium cation, the samples contained 50 mM HEPES, 25 mM K+, pH 7.5

108

T. V. Chalikian and R. B. Macgregor Jr.

negative (Dubins and Macgregor 2004). Thus, for most of the oligonucleotides, pressure stabilizes the helix conformation. Pressure-dependent smFRET measurements have been applied to determining the activation volumes for hairpin formation by d(50 -Cy5-TCTTCAGT-A40-Cy3ACTGAAGA-A10-biotin-30 ) (Sung and Nesbitt 2020b). Table 12 lists the activation volumes for the forward, V‡f, and reverse, V‡u, reactions at different concentrations of Na+. Inspection of data presented in Table 12 reveals that, for any salt, the activation volume, V‡f, for the forward reaction (folding) is positive; thus, helix formation is slowed by increasing pressure. On the other hand, the activation volume for the reverse reaction (unfolding), V‡u, is negative, i.e., the unfolding reaction is accelerated by pressure. A change in molar volume, ΔV ¼ V‡f  V‡u, associated with helix formation is negative, suggesting that the hairpin is destabilized by pressure. Similar to the other studies using the smFRET technique, the volumetric contribution of the dA40 loop remains to be elucidated.

Conclusions Hydrostatic pressure has been used widely to investigate the forces that stabilize the folded structures of proteins. In comparison, the effect of pressure on the conformational stability of nucleic acids has received less attention. The relative scarcity of investigations of nucleic acids under pressure is due, in part, to the fact that the initial studies of double-stranded polymeric DNA or RNA molecules reported only weak pressure dependences of their stability regardless of the sequence or solution conditions. However, the effect of pressure on the stability of nucleic acids with more complex secondary structures, e.g., hairpins, G-quadruplexes, and i-motifs, exhibits greater diversity yielding interesting insights into the forces that stabilize these structures. The development of novel experimental techniques that are compatible with high-pressure studies, such as smFRET, offers great promise deepening our understanding of the relative importance of interactions involving nucleic acids.

References Akasaka K (2006) Probing conformational fluctuation of proteins by pressure perturbation. Chem Rev 106:1814–1835 Alba JJ, Sadurni A, Gargallo R (2016) Nucleic acid i-motif structures in analytical chemistry. Crit Rev Anal Chem 46:443–454 Amiri AR, Macgregor Jr RB (2011) The effect of hydrostatic pressure on the thermal stability of DNA hairpins. Biophys Chem 156:88–95 Arns L, Knop JM, Patra S, Anders C, Winter R (2019) Single-molecule insights into the temperature and pressure dependent conformational dynamics of nucleic acids in the presence of crowders and osmolytes. Biophys Chem 251:106190 Blandamer MJ, Davis MI, Douheret G, Reis JCR (2001) Apparent molar isentropic compressions and expansions of solutions. Chem Soc Rev 30:8–15 Bloomfield VA, Crothers DM, Tinoco Jr I (2000) Nucleic acids: structures, properties, and functions. University Science Books, Sausalito

3

The Effect of Pressure on the Conformational Stability of DNA

109

Bochman ML, Paeschke K, Zakian VA (2012) DNA secondary structures: stability and function of G-quadruplex structures. Nat Rev Genet 13:770–780 Chalikian TV (2001) Structural thermodynamics of hydration. J Phys Chem B 105:12566–12578 Chalikian TV, Breslauer KJ (1998a) Thermodynamic analysis of biomolecules: a volumetric approach. Curr Opin Struct Biol 8:657–664 Chalikian TV, Breslauer KJ (1998b) Volumetric properties of nucleic acids. Biopolymers 48: 264–280 Chalikian TV, Macgregor RB (2007) Nucleic acid hydration: a volumetric perspective. Phys Life Rev 4:91–115 Chalikian TV, Macgregor Jr RB (2021) Volumetric properties of four-stranded DNA structures. Biology 10:813 Chalikian TV, Sarvazyan AP, Breslauer KJ (1994) Hydration and partial compressibility of biological compounds. Biophys Chem 51:89–107 Chalikian TV, Volker J, Plum GE, Breslauer KJ (1999) A more unified picture for the thermodynamics of nucleic acid duplex melting: a characterization by calorimetric and volumetric techniques. Proc Natl Acad Sci U S A 96:7853–7858 Day HA, Pavlou P, Waller ZA (2014) i-motif DNA: structure, stability and targeting with ligands. Bioorg Med Chem 22:4407–4418 Dragan AI, Russell DJ, Privalov PL (2009) DNA hydration studied by pressure perturbation scanning microcalorimetry. Biopolymers 91:95–101 Dubins DN, Macgregor Jr RB (2004) Volumetric properties of the formation of double stranded DNA: a nearest-neighbor analysis. Biopolymers 73:242–257 Dubins DN, Lee A, Macgregor Jr RB, Chalikian TV (2001) On the stability of double stranded nucleic acids. J Am Chem Soc 123:9254–9259 Fan HY, Shek YL, Amiri A, Dubins DN, Heerklotz H, Macgregor RB, Chalikian TV (2011) Volumetric characterization of sodium-induced G-quadruplex formation. J Am Chem Soc 133:4518–4526 Frank-Kamenetskii MD, Mirkin SM (1995) Triplex DNA structures. Annu Rev Biochem 64:65–95 Gellert M, Lipsett MN, Davies DR (1962) Helix formation by guanylic acid. Proc Natl Acad Sci U S A 48:2013–2018 Girard E, Prange T, Dhaussy AC, Migianu-Griffoni E, Lecouvey M, Chervin JC, Mezouar M, Kahn R, Fourme R (2007) Adaptation of the base-paired double-helix molecular architecture to extreme pressure. Nucleic Acids Res 35:4800–4808 Gunter TE, Gunter KK (1972) Pressure dependence of the helix-coil transition temperature for polynucleic acid helices. Biopolymers 11:667–678 Hawley SA (1971) Reversible pressure-temperature denaturation of chymotrypsinogen. Biochemistry 10:2436–2442 Hawley SA, MacLeod RM (1974) Pressure-temperature stability of DNA in neutral salt solutions. Biopolymers 13:1417–1426 Hawley SA, Macleod RM (1977) The effect of base composition on the pressure stability of DNA in neutral salt solution. Biopolymers 16:1833–1835 Heden CG, Lindahl T, Toplin I (1964) Stability of deoxyribonucleic acid solutions under high pressure. Acta Chem Scand 18:1150–1156 Herbert A (2019) Z-DNA and Z-RNA in human disease. Commun Biol 2:7 Hughes F, Steiner RF (1966) Effects of pressure on the helix-coil transitions of the poly A-poly U system. Biopolymers 4:1081–1090 Huppert JL (2010) Structure, location and interactions of G-quadruplexes. FEBS J 277:3452–3458 Isaacs NS (1981) Liquid phase high pressure chemistry. Wiley, New York Kharakoz DP (1989) Volumetric properties of proteins and their analogs in diluted water solutions. 1. Partial volumes of amino acids at 15–55  C. Biophys Chem 34:115–125 Kharakoz DP (1991) Volumetric properties of proteins and their analogs in diluted water solutions. 2. Partial adiabatic compressibilities of amino acids at 15–70  C. J Phys Chem 95:5634–5642

110

T. V. Chalikian and R. B. Macgregor Jr.

Kharakoz DP (1992) Partial molar volumes of molecules of arbitrary shape and the effect of hydrogen bonding with water. J Solut Chem 21:569–595 Krzyzaniak A, Salanski P, Jurczak J, Barciszewski J (1991) B-Z DNA reversible conformation changes effected by high pressure. FEBS Lett 279:1–4 Lane AN, Chaires JB, Gray RD, Trent JO (2008) Stability and kinetics of G-quadruplex structures. Nucleic Acids Res 36:5482–5515 Lepper CP, Williams MAK, Edwards PJB, Filichev VV, Jameson GB (2019) Effects of pressure and pH on the physical stability of an i-motif DNA structure. ChemPhysChem 20:1567–1571 Li YY, Dubins DN, Le D, Leung K, Macgregor Jr RB (2017) The role of loops and cation on the volume of unfolding of G-quadruplexes related to HTel. Biophys Chem 231:55–63 Lin MC, Macgregor Jr RB (1996) The activation volume of a DNA helix-coil transition. Biochemistry 35:11846–11851 Lin MC, Macgregor Jr RB (1997) Pressure-jump relaxation kinetics of a DNA triplex helix-coil equilibrium. Biopolymers 42:129–132 Liu L, Kim BG, Feroze U, Macgregor Jr RB, Chalikian TV (2018) Probing the ionic atmosphere and hydration of the c-MYC i-motif. J Am Chem Soc 140:2229–2238 Liu L, Scott L, Tariq N, Kume T, Dubins DN, Macgregor Jr RB, Chalikian TV (2021) Volumetric interplay between the conformational states adopted by guanine-rich DNA from the c-MYC promoter. J Phys Chem B 125:7406–7416 Luong TQ, Kapoor S, Winter R (2015) Pressure - a gateway to fundamental insights into protein solvation, dynamics, and function. ChemPhysChem 16:3555–3571 Macgregor Jr RB (1996) Chain length and oligonucleotide stability at high pressure. Biopolymers 38:321–327 Macgregor Jr RB (1998) Effect of hydrostatic pressure on nucleic acids. Biopolymers 48:253–263 Macgregor Jr RB, Chen MY (1990) ΔV of the Na+-induced B-Z transition of poly[d(G-C)] is positive. Biopolymers 29:1069–1076 Macgregor Jr RB, Wu J, Najaf-Zadeh R (1996) Sequence, salt, charge, and the stability of DNA at high pressure. In: Markley J, Northrop DB, Royer CA (eds) High pressure effects in molecular biophysics and enzymology. Oxford University Press, New York, pp 274–297 Marcus Y (2011) Electrostriction in electrolyte solutions. Chem Rev 111:2761–2783 Marky LA, Rentzeperis D, Luneva NP, Cosman M, Geacintov NE, Kupke DW (1996) Differential hydration thermodynamics of stereoisomeric DNA-benzo a pyrene adducts derived from diol epoxide enantiomers with different tumorigenic potentials. J Am Chem Soc 118:3804–3810 Molnar OR, Somkuti J, Smeller L (2020) Negative volume changes of human G-quadruplexes at unfolding. Heliyon 6:6 Najaf-Zadeh R, Wu JQ, Macgregor Jr RB (1995) Effect of cations on the volume of the helix-coil transition of poly[d(A-T)]. Biochim Biophys Acta 1262:52–58 Norberg J, Nilsson L (1996) Constant pressure molecular dynamics simulations of the dodecamers: d(GCGCGCGCGCGC)2 and r(GCGCGCGCGCGC)2. J Chem Phys 104:6052–6057 Nordmeier E (1992) Effects of pressure on the helix-coil transition of calf thymus DNA. J Phys Chem 96:1494–1501 Oganesian L, Bryan TM (2007) Physiological relevance of telomeric G-quadruplex formation: a potential drug target. BioEssays 29:155–165 Patra S, Anders C, Erwin N, Winter R (2017) Osmolyte effects on the conformational dynamics of a DNA hairpin at ambient and extreme environmental conditions. Angew Chem Int Ed 56: 5045–5049 Patra S, Anders C, Schummel PH, Winter R (2018) Antagonistic effects of natural osmolyte mixtures and hydrostatic pressure on the conformational dynamics of a DNA hairpin probed at the single-molecule level. Phys Chem Chem Phys 20:13159–13170 Patra S, Schuabb V, Kiesel I, Knop JM, Oliva R, Winter R (2019) Exploring the effects of cosolutes and crowding on the volumetric and kinetic profile of the conformational dynamics of a poly dA loop DNA hairpin: a single-molecule FRET study. Nucleic Acids Res 47:981–996

3

The Effect of Pressure on the Conformational Stability of DNA

111

Plum GE, Pilch DS, Singleton SF, Breslauer KJ (1995) Nucleic acid hybridization: triplex stability and energetics. Annu Rev Biophys Biomol Struct 24:319–350 Rayan G, Macgregor RB (2005) Comparison of the heat- and pressure-induced helix-coil transition of two DNA copolymers. J Phys Chem B 109:15558–15565 Rayan G, Tsamaloukas AD, Macgregor RB, Heerklotz H (2009) Helix-coil transition of DNA monitored by pressure perturbation calorimetry. J Phys Chem B 113:1738–1742 SantaLucia Jr J, Hicks D (2004) The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct 33:415–440 Sarvazyan AP (1991) Ultrasonic velocimetry of biological compounds. Annu Rev Biophys Biophys Chem 20:321–342 Scharnagl C, Reif M, Friedrich J (2005) Stability of proteins: temperature, pressure and the role of the solvent. Biochim Biophys Acta 1749:187–213 Shek YL, Noudeh GD, Nazari M, Heerklotz H, Abu-Ghazalah RM, Dubins DN, Chalikian TV (2014) Folding thermodynamics of the hybrid-1 type intramolecular human telomeric G-quadruplex. Biopolymers 101:216–227 Shi X, Macgregor Jr RB (2007) Effect of cesium on the volume of the helix-coil transition of dAdT polymers and their ligand complexes. Biophys Chem 130:93–101 Somkuti J, Molnar OR, Smeller L (2020) Revealing unfolding steps and volume changes of human telomeric i-motif DNA. Phys Chem Chem Phys 22:23816–23823 Son I, Shek YL, Dubins DN, Chalikian TV (2014) Hydration changes accompanying helix-to-coil DNA transitions. J Am Chem Soc 136:4040–4047 Sugimoto N, Endoh T, Takahashi S, Tateishi-Karimata H (2021) Chemical biology of double helical and non-double helical nucleic acids: “To B or not To B, that is the question”. Bull Chem Soc Jpn 94:1970–1998 Sung HL, Nesbitt DJ (2020a) DNA hairpin hybridization under extreme pressures: a singlemolecule FRET study. J Phys Chem B 124:110–120 Sung HL, Nesbitt DJ (2020b) Single-molecule kinetic studies of DNA hybridization under extreme pressures. Phys Chem Chem Phys 22:23491–23501 Takahashi S, Sugimoto N (2013a) Effect of pressure on the stability of G-quadruplex DNA: thermodynamics under crowding conditions. Angew Chem Int Ed 52:13774–13778 Takahashi S, Sugimoto N (2013b) Effect of pressure on thermal stability of G-quadruplex DNA and double-stranded DNA structures. Molecules 18:13297–13319 Takahashi S, Sugimoto N (2015) Pressure-dependent formation of i-motif and G-quadruplex DNA structures. Phys Chem Chem Phys 17:31004–31010 Takahashi S, Sugimoto N (2017) Volumetric contributions of loop regions of G-quadruplex DNA to the formation of the tertiary structure. Biophys Chem 231:146–154 Tateishi-Karimata H, Sugimoto N (2020) Chemical biology of non-canonical structures of nucleic acids for therapeutic applications. Chem Commun 56:2379–2390 Varshney D, Spiegel J, Zyner K, Tannahill D, Balasubramanian S (2020) The regulation and functions of DNA and RNA G-quadruplexes. Nat Rev Mol Cell Biol 21:459–474 Weida B, Gill SJ (1966) Pressure effect on deoxyribonucleic acid transition. Biochim Biophys Acta 112:179–181 Wilton DJ, Ghosh M, Chary KVA, Akasaka K, Williamson MP (2008) Structural change in a B-DNA helix with hydrostatic pressure. Nucleic Acids Res 36:4032–4037 Winter R (2019) Interrogating the structural dynamics and energetics of biomolecular systems with pressure modulation. Annu Rev Biophys 48:441–463 Wu JQ, Macgregor Jr RB (1993) Pressure dependence of the melting temperature of dAdT polymers. Biochemistry 32:12531–12537 Wu JQ, Macgregor Jr RB (1995) Pressure dependence of the helix-coil transition temperature of poly[d(G-C)]. Biopolymers 35:369–376 Zieba K, Chu TM, Kupke DW, Marky LA (1991) Differential hydration of dAdT base pairing and dA and dT bulges in deoxyoligonucleotides. Biochemistry 30:8018–8026

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

4

Jean-Louis Mergny, Mingpan Cheng, and Jun Zhou

Contents i-Motif Forming Strands and Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motif Characterization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Factors Affecting i-Motif Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motif Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ligands/Compounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physiological Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

117 120 123 128 130 132 133 134

Abstract

i-Motif (the name stems from “intercalated”), also known as i-DNA, is a pH-dependent four-stranded nucleic acid structure formed by cytosine-rich sequences via hemi-protonated and intercalated CC+ base pairs. Although this structure is favored at acidic pH, recent evidence has demonstrated its existence in vivo, stimulating the exploration of its biological roles. Before that, it was mostly regarded as a mere structural oddity, or a tool for bio- and nanotechnologies: its unique pH-sensitive nature makes it a remarkable candidate as a nanodevice and pH sensor. In this chapter, we provide a general panorama of J.-L. Mergny State Key Laboratory of Analytical Chemistry for Life Science, School of Chemistry & Chemical Engineering, Nanjing University, Nanjing, China Laboratoire d’Optique et Biosciences, Ecole Polytechnique, CNRS, Inserm, Institut Polytechnique de Paris, Palaiseau cedex, France M. Cheng School of Engineering, China Pharmaceutical University, Nanjing, China J. Zhou (*) State Key Laboratory of Analytical Chemistry for Life Science, School of Chemistry & Chemical Engineering, Nanjing University, Nanjing, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_5

113

114

J.-L. Mergny et al.

this structure. The history and basic knowledge of i-motif are provided first. Then, we present the main characterization methods of i-motif and factors affecting i-motif stability. Following that, we focus on the applications of i-motif in nanotechnology and analytical chemistry. Last, the interaction between i-motif and ligands and the physiological roles of i-motif are briefly introduced. We argue that the i-motif, similar to its complementary G-quadruplex, is an attractive structure for multidisciplinary approaches. It serves as a basic component for various applications and has been proposed to play biological roles in vivo. Keywords

i-Motif · i-DNA · Biophysical characterization · Structure and properties · DNA nanotechnology · Biosensor · Ligands interaction · Physiological significance

Polymorphism is a fascinating property of nucleic acids, which possess the ability to adopt a plethora of unusual structures besides the classical Watson-Crick double helix. Among these curiosities, the i-motif, also known as i-DNA, is a surprising nucleic acid structure formed by cytosine-rich sequences under mild acidic pH conditions. The term “i-motif” stems from “intercalated”; this motif was discovered by Maurice Guéron and colleagues in 1993 who studied the intercalated tetramolecular quadruple-helical structure formed by d(TCCCCC) (Gehring et al. 1993). This interesting structure is actually composed of two “double-helices” in which two interlocked right-handed parallel duplexes are oriented in an antiparallel (head to tail) orientation through the intercalation of hemi-protonated cytosine-cytosine+ (CC+) base pairs (Fig. 1a and b). This “double-duplex” structure is extremely unusual; its discovery challenged several pre-conceptions such as the impossibility to intercalate between each base pair of an anti-parallel duplex: clearly, the geometry of the parallel duplexes considered here allows this feature. As each base pair requires the protonation of one cytosine at the N3 position, pH is critical for i-motif stability. The pKa of cytosine is around 4.6 in pure water at 25  C; thus the stability of i-motif should be weak at pH far away from this value, for instance, pH 6.6 or 3. In fact, provided the pH is appropriate, four different cytosinerich tracts can associate in an intermolecular fashion with two or four independent strands (bimolecular or tetramolecular) or in an intramolecular monomeric structure (Fig. 2). This scheme is reminiscent of G-quadruplexes, in which mono-, bi-, and tetramolecular G-quadruplex structures may be formed. In both cases, these structures will involve zero, two, and three loops (shown in red), for tetra-, bi-, and intramolecular complexes, respectively. The effect of C-tracts and loops on the stability of i-motif will be discussed in the following section. Bulges are allowed in i-motif (Leroy 2003), as observed for G-quadruplexes (Mukundan and Phan 2013). Furthermore, a G-quadruplex can be formed by three independent strands, leading to a tri-G-quadruplex (Zhou et al. 2012); a similar “tri-i-motif” has yet to be described.

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

115

Fig. 1 (a) Hemi-protonated cytosine-cytosine base pair (CC+). (b-e) Mini tetramolecular i-DNA (PDB 105D, 30 E conformer) formed by d(TCC) as an example to describe the structure features (Leroy and Guéron 1995). The dT is omitted. (b) Simplified schematic diagram of 105D. Two parallel duplexes are intercalated in a head-to-head orientation. (c) Side views of wide and narrow grooves. (d) Stacking between two adjacent CC+ base pairs from the different duplexes. (e) Two consequent and non-stacking CC+ base pairs are covalently linked from the same duplex Fig. 2 Tetramolecular, bimolecular, and intramolecular i-motifs involving four CC+ base pairs. Connecting loops are shown in red; CC+ base pairs are shown as dark or orange lines connecting two cytosines shown as orange circles

116

J.-L. Mergny et al.

After its discovery by NMR, i-motif became a magnet for biologists. However, as described above, i-motif stability is pH-dependent, as half of the cytosines need to be protonated to form CC+ base pairs. As a result, i-motif stability should be optimal around the pKa of cytosine, which is clearly below physiological pH (between 4 and 5; the exact value may depend on ionic conditions and temperature). A long-lasting debate was to establish whether such structure would be formed in vivo around pH 7, because most i-motif-prone motifs remain single-stranded under physiological conditions. This was the case for the human telomeric motif (CCCTAA)n repeats, but sequences involving longer C-runs were somewhat more stable, giving hope that a few motifs would survive at near-neutral pH. Nevertheless, research on i-motif somewhat stalled for a while, and this structure remained in the shade of the “famous” G-quadruplex. i-Motif was described as “a structure in search of a function.” During the fourth international meeting on quadruplex nucleic acids held in Singapore in 2013, i-motif was only a marginal topic. Nevertheless, i-motif was still an attractive object as a nanodevice and biosensor because of its exquisite pH sensitivity. Phan and Mergny studied duplex-quadruplex equilibrium as a function of pH, while the Balasubramanian and Krishnan groups designed DNA nanomachines based on i-motif (Phan and Mergny 2002; Liu and Balasubramanian 2003; Modi et al. 2009). Even if (perhaps) not relevant at neutral pH, i-motif could be used for biotech and nanotech applications, and the concerted protonation/deprotonation resulting from i-motif formation or dissociation provided a superb opportunity for the design of unique pH-responsive structures or materials. In parallel, some scientists never gave up the idea that i-motif could exist in vivo. More evidence started to accumulate to indicate that i-motif could be formed at neutral and even slightly alkaline pH, although with a relatively low melting temperature and slow kinetics of association (Zhou et al. 2010a). In addition, while pH is of course a critical parameter for i-motif formation, other factors such as temperature, ionic strength, molecular crowding, confined effect, or sequences composition could also play a role. i-Motif formation under these conditions or for certain cytosine-rich sequences (long C-tracts) at neutral and even slightly alkaline pH was reported, and these circumstances will be discussed below. Furthermore, imotif-prone sequences are believed to be widely distributed in the human genome (Wright et al. 2017): for instance, they are predicted to exist at the end of telomeres, at centromeres, in the promoter region of some oncogenes, as actually found G-quadruplexes. A significant double-breakthrough was achieved in 2018, when two completely independent works strongly suggested the existence of i-motif in human cells (Dzatko et al. 2018; Zeraati et al. 2018). One line of evidence came from in cell NMR experiments, demonstrating that natural i-motif sequences are stable and persist in the nuclei of living human cells (Dzatko et al. 2018). The other one took advantage of a high-affinity i-motif specific antibody (iMab), which demonstrated that i-motif structures were formed in specific regions of the human genome (Zeraati et al. 2018).

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

117

These two reports gave a new momentum to i-motif studies in biology. Given its unique structure, pH sensitivity, and folding and unfolding characteristics, i-motif may become a “new” favorite molecule in multidisciplinary fields, including biology, biochemistry, bioinformatics, physical chemistry, analytical chemistry, and bionanotechnology. i-Motif is no longer seen as a structural oddity, and there is a clear need to decipher its roles in vivo and potential therapeutic applications that can be envisioned from these results.

i-Motif Forming Strands and Characters A clear difference between i-motif and G-quadruplexes is that i-motif formation has mostly been studied at the DNA level. We will explain why i-motif is also known as “i-DNA.” In reality, besides DNA, i-motif can also be formed by other types of nucleic acids, such as RNA, peptide nucleic acids (PNA), locked nucleic acids (LNA), and chimeras (Snoussi et al. 2001; Krishnan-Ghosh et al. 2005; Modi et al. 2006; Kumar et al. 2007). We can divide these i-motif forming sequences into two categories: one corresponding to natural sequences, including DNA and RNA, and the other to artificial ones that is chemically modified sequences that are not produced by living organisms. The latter category is mainly used to investigate the effect of nucleic acids composition, such as bases (cytosine), sugar, and phosphate backbone, on the properties (mainly focus on the stability) of i-motif, as wellreviewed by Gonzaleź and Damha recently (Assi et al. 2018). Generally, cytosinerich (and cytosine analogues) sequences can form i-motif structures. In this chapter, we will focus on natural sequences and provide the details of the i-motif formed by these sequences only. More interestingly, several recent reports show that i-motif can coexist with other DNA structures at neutral pH, such as G-quadruplex and duplex, and may be formed in vivo, and a new AC-motif was described here. Until 2020, the list of investigated i-motif structures in vitro was limited to about 240 sequences, and only dozens of structures were deposited in the Protein Data Bank (PDB). These figures are modest given that the i-motif was discovered nearly three decades ago and reveal that i-motif was actually a neglected structure. For these involved i-motif structures, they can be classified by the number of strands involved (giving uni-, bi-, or tetramolecular complexes, as shown in Fig. 2). i-Motif is polymorphic, but not as diverse as G-quadruplexes, as strand orientation is fixed. Forming sequence for an intramolecular i-motif can be described by the following formula: 50  Cw2 NL10 Cx2 NL20 Cy2 NL30 Cz2  30 , where C corresponds to a series of continuous cytosines (also called C-tract) for the formation of CC+ base pairs; N is an unspecified nucleobase(s) involved in the formation of loop region; w, x, y, and z are the number of continuous cytosines in each block, while L1, L2, and L3 are the number of nucleobases in the three loop regions, respectively. A subtle distinction of conformations can be made depending

118

J.-L. Mergny et al.

Fig. 3 Structural polymorphism of i-motif: 50 E, 30 E, 53E, and 35E

on the intercalation mode and the spatial arrangement of terminal CC+ base pairs. For the sequence with four continuous cytosines in equal number (w¼x¼y¼z), i-motif should ideally contain an even number of CC+ base pairs and can fold into 5'E or 3'E conformers: the terminal base pairs on each side of the structure may either correspond to cytosines at the 50 or the 30 side of the strands, respectively (Fig. 3, upper). For sequences with an odd number of CC+ base pairs, one duplex should involve one more base pair than the other and both 50 and 30 terminal base pairs are oriented to maximize stacking interactions. Two different arrangements are observed depending on the primary sequence (Fig. 3, lower) (Cheng et al. 2021a). In the 53E arrangement (not to be confused with 50 E), the long (n+1) C-tracts are the first and third runs of cytosines (w¼y¼x+1¼ z+1). In the 35E conformer, the (n+1) C-tracts correspond to the second and fourth positions (w+1¼y+1¼ x¼z). As shown in Fig. 1c-e, the i-motif is a severely underwound and extended structure, clearly different from a duplex or a G-quadruplex. The i-motif structure has four grooves: two very flat wide/major ones and two extremely narrow/minor ones. The axis distance between two stacking CC+ base pairs from the different duplexes is 0.31 nm (Fig. 1d, right), which is shorter than the distance between two base pairs or quartets in a duplex or a G-quadruplex (around 0.34 nm). However, these two intercalated base pairs belong to two different duplexes, so the true distance between two consecutive base pairs in the same duplex is 6.2 nm (Fig. 1e, right). The i-motif structure is also underwound, as the right-handed helical twist angle between two non-stacking CC+ base pairs from the same duplex are around 9–20o (Fig. 1e, left), to be compared with 34o for a duplex and 36o for a G-quadruplex. If one now considers two stacked CC+ base pairs from different duplexes, the twist angle is around 75o (Fig. 1d, left). The structural features of i-motif (Mergny and Sen 2019) and the comparation with duplex and G-quadruplex are listed in Table 1.

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

119

Table 1 Key structural summary of i-motif, G-quadruplex and duplex (B-DNA) Parameter Type of pairing Nucleic acid Sequences requirement Predictive algorithm pH sensitivity Cation sensitivity (Na+, K+) Electrostatic

Mechanical property

i-Motif CC+ base pair

G-quadruplex G-quartet

DNA C-tracts

DNA and RNA G-tracts

B-DNA A/T and G/C base pairs DNA Any sequence

Noa

Yes

Yes

Extreme No

Limited Yes

Limited No

Polyanion outer surface with inner H+ channel Stiff

Polyanion outer surface with inner cation channel

Polyanion

Flexible

Polymorphism Specific ligands Molecular crowding Specific antibody Helicase/ chaperone Stability

Medium A fewb

Topology dependent; parallel G-quadruplex may be very stiff Extensive Many

Not clear

Stabilizes

Destabilizes

Yes: iMab

Yes: BG4

Yes

Possibly

Yes/Yes

Yes/Yes

Stable at mild acid pH

Generally stable

Formation in cells

Yes

Yes

Predictable with length and GC content Yes

Very limited Yes

a

G-quadruplex-specific algorithms have been used, but limited efforts have been made to develop a specific search engine selective for i-DNA b The selectivity of many of these ligands has been questioned (Bonnet et al. 2022)

RNA can form i-motif structures as well, but the corresponding assemblies are significantly less stable than their DNA counterparts. For example, the DNA i-motif formed by d(CCCTCCCTTTTCCCTCCC) melts at 54  C, while the corresponding RNA i-motif has a Tm of 25  C only under identical conditions (Lacroix et al. 1996). This difference in Tm (29  C) is huge and nearly independent on pH, questioning the relevance of RNA i-motif, given that its Tm at physiological pH would be below the temperature of 0  C! The potential reason for the difference is the presence of a 20 -OH in RNA: uracil (in RNA) played no detrimental role, as a DNA sequence in which all dT nucleotides were replaced by deoxyuracyl has a near identical Tm as the natural thymine-containing DNA sequence (Lacroix et al. 1996). This DNA-RNA difference

120

J.-L. Mergny et al.

is the reason why i-motif is also called i-DNA, as the relevance of i-RNA is questionable. This is also a fundamental difference with G-quadruplexes, for which RNA are not less stable than DNA ones and are actually generally more stable. The i-motif RNA/DNA difference can be seen as an advantage when trying to deconvolute biological effects: while RNA G-quadruplex cannot be discarded when considering a transcribed G/C-rich DNA sequence, a similar consideration is less likely to be relevant when considering an i-motif prone motif. Even if unstable under current physiological conditions, RNA i-motif has been proposed to participate in the self-organization of cytosine nucleobase in the primordial world, a conjecture based on the RNA-world hypothesis. RNA i-motif formation can protect the cytosine nucleobase, as it is unstable compared to adenine, guanine, and uracil, by increasing the lifetime of cytosine residues by slowing their deamination rate (Wang 2019). There is a certain level of symmetry (sequence complementarity) at the DNA level when considering G-quadruplex-prone and i-motif-prone sequences: if a sequence is G-rich and G-quadruplex-prone, its complementary C-rich DNA sequence should also be i-motif-prone. As a result, when considering a skewed duplex sequence with one G-rich and one C-rich strand, the equilibrium to be considered is rather duplex , G4 + i-motif, at least in vitro (Phan and Mergny 2002). At physiological pH, the duplex tends to predominate in vitro, but the formation of these quadruplexes may delay hybridization (Alberti and Mergny 2003). Preventing hybridization between complementary G-quadruplex-prone and i-motif-prone to obtain a Watson-Crick base duplex is challenging, but we have demonstrated that these two robust structural motifs are capable of coexistence in solution (Zhou et al. 2013). An alternative and interesting mode of coexistence was recently revealed by the crystal structure of the DNA oligonucleotide, d (CCAGGCTGCAA), which can form a hybrid G-quadruplex/i-motif. This strand self-associates to form a quadruplex structure containing two central antiparallel G-quartets and six CC+ base pairs, implying that these two distinct quadruplex elementary motifs may coexist (Chu et al. 2019). Interestingly, besides the i-motif, a structure called the AC-motif, for adenine:cytosine-motif, can be formed by adenine and cytosine repeats. The AC-motif is composed of CC+ base pairs intercalated within putative A+C base pairs between protonated adenine and cytosine. Magnesium ion plays a critical role in stabilizing the AC-motif structure, which is stable at a physiological pH and may be involved in many cellular events (Hur et al. 2021).

i-Motif Characterization Methods Confirmation of i-motif formation in vitro is an important issue during research. A vast variety of methods have been used to evidence i-motif formation and investigate its properties, and only the most commonly applied techniques will be summarized here. For more other biophysical methods (differential scanning calorimetry, Raman spectroscopy, sedimentation analysis, footprinting, optical tweezers, etc.) and

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

121

molecular dynamic simulations of i-motifs, we suggest referring to an excellent review that presents these methods in detail (Benabou et al. 2014), as we will not cover them here. In the literature, several methods described below were combined used for demonstrating the formation of i-motif and characterization of its properties. It is indisputable that the classical 2D NMR and X-ray crystallography approaches are the most robust techniques to provide high-resolution structural information on i-motif at the atomic level. In fact, the first i-motif structure was revealed by NMR in 1993 in the Maurice Guéron lab in Ecole Polytechnique (Gehring et al. 1993) and soon confirmed by crystallography in 1994 by Rich and colleagues (Chen et al. 1994). i-Motif structure is currently too small to be analyzed by cryo-electron microscopy. Unfortunately, in the vast majority of i-motif papers, no high resolution technique was available, and a convenient and sensitive method to investigate i-motif formation was circular dichroism (CD). CD offers several advantages: it requires limited amounts of samples and provides direct proof of i-motif formation, given that i-motif has a characteristic signature, with a positive peak around 288 nm and a negative band around 260 nm (Fig. 4a). On the other hand, the relatively high cost of a spectropolarimeter (≈ $50k) may be a hindrance for some groups to conduct such experiments. In that case, classical UV absorbance may be an ideal alternative. A Thermal difference spectra (TDS) is obtained by subtracting the UV absorbance spectra of the folded nucleic acid from the one of the same sequence unfolded at high temperature. This can be achieved by recording absorbance below and above the melting temperature, respectively. This TDS is specific for i-motif, with positive and negative peaks at 240 and 295 nm, respectively (Fig. 4b) (Mergny et al. 2005). A number of papers have covered this topic, and we have recently described the features of i-motif CD and TDS signatures in detail: the interested readers can refer to the article and references therein (Iaccarino et al. 2021). We conclude that the data analysis methods, such as multivariate analysis and principal component analysis, can provide a deeper understanding of the spectra of i-motif, as well as other DNA forms. For example, in our recent paper, these analyses reveal the relationship between the sequence composition of i-motif sequences and spectroscopic features of their TDS and CD spectra (Iaccarino et al. 2021). We advise those who have more data in hand to excavate their results in depth. Besides the spectroscopic techniques presented above, well-equipped labs can perform 1D proton NMR analysis, as imino proton peaks of CC+ in 1H NMR spectra resonate in the 15 to 16 p.p.m. range (Fig. 4c), which is significantly different from duplexes (12–14 p.p.m.) and G-quadruplexes (10–12 p.p.m.), making them a characteristic feature of i-motif formation. Besides these differences in chemical shifts, the exchange times of these imino protons also differ, with typical values of minutes/hours for CC+, milliseconds for duplex base pairs, and up to hours or days for G-quartets (Phan and Mergny 2002). It is worth emphasizing here that in cell NMR can also demonstrate i-motif formation in vivo (Dzatko et al. 2018; Cheng et al. 2021b). In analytical chemistry and DNA nanotechnology, fluorescence resonance energy transfer (FRET) was usually used to characterize i-motif formation as the folding of

122

J.-L. Mergny et al.

Fig. 4 Representative (a) CD, (b) TDS, and (c) 1H NMR of i-motif. Sequence used here is d(C5T2C5T5C5T2C5). CD spectra is recorded at 20  C in the solution at pH 5.5, TDS at pH 5.5, and NMR at 20  C with pH 7

i-motif results in the quenching of the donor and the sensitized emission of the acceptor. The advantage of FRET is that very low concentrations of nucleic acids are required to study i-motif formation, as exemplified in this reference in which 100 nM was found to be more than sufficient (Mergny 1999). The FRET technique can even be used at the single molecular level. However, as i-motif is a pH-sensitive structure and as the fluorescence properties of most dyes may be affected by pH, one should be very careful in the choice of appropriate fluorescent dyes to label i-motif sequences. Furthermore, fluorescence anisotropy and time-resolved fluorescence can also be used to analyze the conformational switching or reveal the conformational polymorphism of i-motif (Huang et al. 2015; Benabou et al. 2019). In most of the studies, fluorescent dyes were covalently attached at the ends of cytosine rich sequences, but internal labeling is also possible to detect i-motif formation. In addition to covalent approaches, some groups used free ligands to demonstrate the i-motif formation, given that these ligands, such as crystal violet, and Thioflavin T (ThT), become fluorescent upon non-covalent interaction with i-motif (the part also introduced in the following section of i-motif ligands) (Ma et al. 2011; Lee et al. 2015).

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

123

Nevertheless, there is a current shortage of highly specific light-up probes for i-motif: ThT, for example, also stains G-quadruplexes. For more information about fluorescence spectroscopy on i-motif, the following review paper by Choi and Majima is recommended (Choi and Majima 2013). Native polyacrylamide gel electrophoresis (PAGE) has also often been used to study i-motif sequences, but any conclusion regarding i-motif formation solely based on gel migration should be taken with caution. i-Motif is not very stable at nearneutral pH, and the band does not necessarily reflect the nature of the structure – it may correspond to the unfolded species, as the pH and temperature are changed during the migration step. Therefore, specific dyes and control experiments are recommended for PAGE experiments, and complementary techniques are necessary. On the other hand, PAGE can easily reveal if several species are present, as shown by two or more bands. To analyze strand molecularity, we recommend using sizeexclusion chromatography (SEC)-HPLC: direct information on the molecularity of i-motif can be revealed by the structure index (Largy and Mergny 2014). As this characterization is performed under “native” conditions with a large strand concentration range, it provides a unique understanding of nucleic acid structural polymorphism, when different species of different molecularities are present, and this is very important information for the full biophysical characterization of quadruplex nucleic acids. Therefore, whenever the characterization of the molecularity is needed, we suggest considering the SEC-HPLC experiment, which is directly applicable to i-motif, as well as G-quadruplexes and other structures. In addition, i-motif can be identified by mass spectrometry and electrochemistry. The former technique was pioneered by Gabelica, who found CC+ base pairs were preserved in electrosprayed i-motif structures (Rosu et al. 2010; Khristenko et al. 2019). The latter was based on the folding and unfolding of i-motif between working and reference electrodes, leading to variations in potential (Yang et al. 2010).

Factors Affecting i-Motif Stability i-Motif stability has been the focus of a number of studies, as for other DNA structures. However, i-motif characterization is not straightforward as its formation is associated with a significant proton uptake. Obviously, this concerted protonation may alter the pH of a solution, if not properly buffered. In addition, this protonation further affects the spectroscopic and thermodynamics properties of the i-motif – for example, the spectral properties of C and C+ are significantly different, even in single strands. In many reports, mid-points of acid-alkaline (pHT) and thermal transitions (T1/2 or Tm) are the two often used parameters to determine the stability of i-motifs. However, comparing results in different papers is not trivial, as conditions may vary. For example, pHT is often determined at a given temperature, and pHT values will depend on temperature (the higher the temperature is, the lower pHT will be). Similarly, T1/2 or Tm of i-motif is only measured at a few pH values, and a comparison of thermal stabilities is only possible if these T1/2 or Tm values were recorded at the exact same pH and ionic conditions, as changes in salt concentration

124

J.-L. Mergny et al.

Fig. 5 Example of i-motif fraction folded as a function of temperature and pH deduced from both UV melting and annealing profiles between pH 5 and 8. The DAP i-motif forming sequence d(CCCCCGCCCCCGCCCCCGCCCCCGCCCCC) is used. The figure was slightly modified from a previous publication (Cheng et al. 2021a) and reprinted with authorization by the American Chemical Society

may affect i-motif stability. Recently, we developed a robust systematic fourdimensional analysis of UV melting and annealing, referred to as 4DUVMA to solve this problem (Cheng et al. 2021a). The method consists in measuring the UV absorbance spectra of i-motifs between 220 and 330 nm at different pH values between 5 and 8 and temperatures between 5 and 90  C (Fig. 5). Recording these spectra under these conditions allows to the construction of formation diagrams for i-motif structures, in which the fraction folded can be determined for all pH and temperature combinations. We could infer from these curves clear correlations between pHT and temperature and between T1/2 (or Tm) and pH. Besides the points listed above, another phenomenon is often neglected in the characterization of i-motif structures, i.e., the possible presence of a hysteresis (Fig. 6). This corresponds to the dependence of a state of a system (e.g., fraction folded) on its history (e.g., whether this comes from a heating or cooling experiments). In many reports, when Tm is reported, they often only correspond to the determination made upon heating (a “melting” profile). This is problematic, both in absolute and relative terms: first, as soon as a hysteresis is present, no simple thermodynamic analysis is possible, as these curves do not correspond to equilibrium phenomena. Second, when comparing samples, hysteresis may depend on a variety of factors such as pH or oligonucleotide sequence (the length of cytosine tracts, nature and length of loops, chemical modifications). Therefore, the values obtained from heating profiles only would give inaccurate conclusions, which have been discussed in detail recently (Cheng et al. 2021b). To provide a better picture of i-motif stability, one should rather report the apparent melting temperature (T1/2) for both the melting and annealing processes. A reasonable estimate of the true equilibrium Tm can be obtained by simply averaging these two T1/2 values, provided they

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

125

Fig. 6 Example of thermal hysteresis between melting and annealing processes for the sequence d(C5T2C5T5C5T2C5) at pH 6.8

were obtained with identical temperature gradients (e.g., 0.5  C/min for both heating and cooling). Below, we will discuss the external and internal factors affecting i-motif stability. As some of these factors have previously been described in other reviews, we only briefly introduce them to avoid any replication and only insist on the points which may have been neglected before. As pH-dependent structures, pH obviously plays a critical role in i-motif stability. In the introduction, we mentioned that cytosine should be protonated to form CC+ base pairs and that the optimal stability of the i-motif structure should be obtained around the pKa value of cytosine, which is around pH 4.6. This optimal pH (which is not a pH of mid-transition or pHT) is the one at which Tm is the highest. The optimal pH will therefore depend on the experimental conditions such as salt concentration. An example of such determination may be found in our previous work (Mergny et al. 1995). A second critical parameter is the pH at which 50% of the i-motif is unfolded at a given temperature, the pHT. Obviously, pHT depends on the sequence and is affected by chemical modifications of the bases or the sugar-phosphate backbone. In general, the higher the number of CC+ base pairs, the higher the pHT value. Our recent quantitative analyses of the relationship between pHT and C-tract reveal that as the CC+ base pairs increased from 3 to 6, the pHT values are 6.11, 6.39, 6.56, and 6.68, respectively, corresponding to a monotonous but non-linear increase (Cheng et al. 2021b). As RNA i-motif is less stable than DNA, its pHT will be lower. Finally, pHT depends on temperature: lower values are expected when determined at 37 C rather than at 20 or 4 C (Cheng et al. 2021a). Little consensus has been reached on the role of cations on i-DNA stability. i-Motif, as all nucleic acid structures, is a polyelectrolyte, and cations are expected to stabilize it via screening the negative charges found on each phosphate. But the

126

J.-L. Mergny et al.

situation is actually not as simple as it seems, as (i) the sugar phosphate backbone is extended, as a result of intercalation, increasing the distance between adjacent phosphate (1.45–1.61 nm and 0.63–0.94 nm across the wide grooves and the narrow grooves, respectively) and (ii) each CC+ base pair bears a positive charge. As a result, the net linear charge along the i-motif axis is far less negative than expected and is actually lower than the one of a duplex. This property explains the weak influence of cations, except Ag+ and Cu+ (see below), on the stability of i-motifs. This influence, favorable or unfavorable, actually depends on pH, and whether experiments are performed under very acidic or mildly acidic/neutral conditions. We showed in 1995 that the pKa of the cytosine was increased by around 0.3–0.5 pH units under low salt conditions, meaning that, at pH 5.5–7.0, protonation is somewhat easier (Mergny et al. 1995). This is explained by a polyelectrolyte effect: as cations tend to condense around polyanions, in the quasi-absence of other cations, the local concentration of protons around DNA or RNA is higher than in bulk solution, with a decrease in apparent local pH around the biomolecule at low ionic strength. This effect is only present at relatively low salt concentration, and disappears when working above a few tens of mM of salt. For instance, increasing Na+ concentration from 0 to 100 mM leads to a decrease in Tm at pH 6.4, but a further increase in Na+ concentrations to 300 mM has no significant effect. Similarly, at pH 6.4 in the presence of 100 mM Na+, the i-motif stability showed no differences upon the addition of 5 mM Mg2+, Ca2+, Zn2+, Li+, or K+ (Mergny et al. 1995). A recent article reported a paradoxical effect of potassium on i-motif stability. The results showed that K+ disrupts the i-motif structures in the MES and Bis-Tris buffer, in agreement with previous conclusions. However, a stabilization effect was found in phosphate, citrate, and sodium cacodylate buffers, perhaps as a result of the formation of other structures like C-hairpins or intermolecular duplexes formed by CC+ base pairs (Gao and Hou 2021). This work further demonstrates the complexity of measuring the stability of i-motif, and the importance of buffer choice, as reminded in the beginning of the section: extreme caution must be taken to analyze the melting and annealing profiles of i-motif/C-rich sequences. These electrostatics are also completely different from those of the complementary G-quadruplex, which can accommodate cations such as Na+, K+, or Sr2+ between quartets in a very specific manner (Largy et al. 2016). A few ions may have positive effects on i-motif stability. Waller and colleagues found that silver cation (Ag+) can stabilize i-motif by the formation of C-Ag+-C base pair and even induce the formation of i-motif structures at physiological pH (Day et al. 2013). Furthermore, they also found that different from Cu2+, which is unbeneficial to i-motif, Cu+ can promote the folding of i-motif structure. The Cu+ mediate CC+ base pairing is sufficient to preserve the i-motif structure (Abdelhamid et al. 2018). High concentrations of biomolecules such as proteins in the cellular environment alter the properties of molecules by reducing the volume of solvent available. This contrasts with the typical dilute conditions used in the test tube. The study of DNA and RNA under realistic conditions is important, and molecular crowding can be mimicked by adding high concentrations of (supposedly) inert molecules such as

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

127

polyethylene glycol (PEG). Conflicting results have been reported regarding crowding effects on i-motif stability. A first difficulty stems from the fact that these mimics affect the nature of the physiological environment, both by the excluded volume effect but also by dehydration and changes in viscosity, and these factors can all alter the structures and stability of nucleic acids, as extensively reviewed by Sugimoto and colleagues (Nakano et al. 2014; Matsumoto and Sugimoto 2021). Most of the results support the folding of i-motif structures at neutral pH in the presence of polyethylene glycol (PEG), indicating that these compounds raise the pKa of the cytosine. These results suggest that molecular crowding is favorable to i-motif formation. However, other mimics besides PEG should be investigated to get a general picture. For example, the behavior of the telomeric G-quadruplex in Xenopus laevis egg extract or Ficoll 70 (another molecular crowding mimic) is notably different from what is observed in PEG (Hänsel et al. 2011). In the same line, our very recent work showed that the stability of i-motif in the presence of PEG (PEG200 and PEG8000) increased by 6–10  C, whereas Ficoll 70 did not have a notable effect (less than 1  C) (Cheng et al. 2021a), raising questions on which conditions are appropriate to recapitulate what happens in the cell. Similar results were obtained by other groups and PEG might not be a good mimic for crowding due to its potential interaction with i-motif or C-rich strands (Jamroskovic et al. 2022). Overall, it seems that i-motif formation is perhaps slightly more favorable under physiological conditions than in the test tube (Dzatko et al. 2018; Cheng et al. 2021a), arguing for a better assessment of in vitro conditions. A further difficulty is that nucleic acids are predominantly located in the nucleus in which conditions are different than in the cytoplasm. We have used reverse micelles (RM) to mimic a confined environment and found that G-quadruplexes can form and are stabilized in nanosized water pools (Zhou et al. 2010b). Subsequently, Sugimoto et al. observed i-motif formation in RMs at 37  C at neutral pH (Pramanik et al. 2012). These results confirm that a confined space favors i-motif folding, as observed for DNA origami nanocages and ion channels (see the discussion below). Furthermore, high pressure has been reported to stabilize the i-motif (Takahashi and Sugimoto 2015): at atmospheric pressure, the melting temperature of an intramolecular i-motif is 38.8  C; it increases to 61.5  C at 400 MPa (Takahashi and Sugimoto 2015). A more recent report concluded that the thermal stability of i-motif was not affected by pressure (Liu et al. 2018). A recent delicate work clearly demonstrated that pH could be affected by variations in pressure and presented the effect of corrected and uncorrected pH on i-motif perhaps stabilization (Lepper et al. 2019). After correction for pH changes, the authors concluded that the i-motif was actually destabilized when pressure increased, whereas the opposite conclusion would be reached without correcting pH. These phenomena illustrate that one should be very cautious when analyzing the properties of i-motif, as very subtle changes in pH may result in huge effects. All the above experiments were performed in aqueous solutions. While G-quadruplexes have been widely studied in many anhydrous solutions (Zhao and Qu 2013; Tateishi-Karimata and Sugimoto 2018), such as ionic liquids or deep

128

J.-L. Mergny et al.

eutectic solvents, we found a single report analyzing i-motif formation in anhydrous solutions. Interestingly, i-motif can also form under these conditions and is actually stabilized. Sugimoto et al. have observed i-motif formation in ionic liquids, composed of choline dhp, even at physiological pH. More interestingly, it is more stable than G-quadruplex due to choline ions bound to loop regions and grooves of the i-motifs (Tateishi-Karimata et al. 2015). These observations should stimulate studies of i-motif in anhydrous environments. Besides these external factors, the intrinsic sequence properties of the nucleic acids such as the primary sequence (number of cytosines and maximum number of CC+ base pairs, length, and composition of the loops, presence of flanking nucleotides, chemical modifications of the sugar-phosphate backbone, etc.) are fundamental factors affecting the structure and stability of the i-motifs which have been reviewed elsewhere (Benabou et al. 2014; Assi et al. 2018). Since these reports, our group performed a systematic analysis of hundreds of different sequences, to obtain a global picture of the sequence effects, allowing us to propose general rules on the impact of loops and the number of CC+ base pairs. With this massive set of sequences, we could demonstrate that i-motif stability was weakly dependent on total loop length, in contrast with G-quadruplexes (Cheng et al. 2021a; Cheng et al. 2021b).

i-Motif Applications As mentioned in the introduction, i-motif has been widely used in nanotechnology and analytical chemistry (biosensor) even at the time when it was only considered as “a structure in search of a function.” These applications took advantage of its pH sensitivity and/or adopted it as the basic unit to construct nanodevices. In these two fields, i-motif formation can be in competition with the duplex formation with its complementary G-rich strand, which may adopt a G-quadruplex structure. In many cases, intramolecular i-motif structures were chosen as they fold faster than intermolecular ones. However, the intermolecular ones are successfully used, and all the cytosine-rich tracts are in close spatial proximity, so they can be regarded as “apparent” intramolecular structures. The pH-dependent stability of i-motif (high stability at acidic pH and low stability at neutral or alkaline pH) allows the structure to form one-, two-, or threedimensional nanostructures at acidic pH which will collapse into single strands at neutral or alkaline pH (Ghodke et al. 2007; Yang et al. 2012; Li and Famulok 2013). Interestingly, as this assembly and disassembly are based on non-covalent interactions, it can be reverted, leading to pH-dependent cycling, adjusted by the addition of acids/bases or a chemical oscillator (Liedl and Simmel 2005). It should be noted that the response sensitivity and transition midpoint of i-motif can be rationally designed and manipulated precisely (Nesterova and Nesterov 2014). For example, very elaborate molecular systems show that the transition midpoint is tunable with 0.1 pH unit precision, and a response range of 0.2 pH units.

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

129

Furthermore, the free ends of the four main stretches of i-motif provide additional levels of freedom in the design of higher-order nanostructure (Yang et al. 2012). In addition, i-motif can be used as one of the basic elements to form hydrogels, molecular motors, or devices (Dong et al. 2014; Mergny and Sen 2019). Significantly, i-motif can be conjugated to other nanomaterials, such as graphene oxide, carbon nanotubes, metal-organic frameworks (MOFs), or gold nanoparticles (AuNPs). We cannot cover every aspect here, and one can refer to recent reviews and cited papers therein (Mergny and Sen 2019; Debnath et al. 2019). Interestingly, as i-motif structure/sequences embedded in the confined space (e.g., within DNA origami nanocages, ion channels, or reverse micelles) are stabilized even at neutral pH, one may propose that the physiological conditions are indeed compatible with i-motif formation (Jonchhe et al. 2018; Pramanik et al. 2012; Wang et al. 2017). In fact, the cellular (nuclear) environment is crowded, so it is reasonable that i-motif can be formed in the cell nucleus, as experimentally verified recently for the most stable motifs (Cheng et al. 2021b; Dzatko et al. 2018; Zeraati et al. 2018). The reversible folding and unfolding of i-motif can be regarded as an “on/off” signal in analytical chemistry to develop sensors. In the introduction part, we mentioned that i-motif has been widely used as a nanodevice and biosensor. i-Motif is not only pH-dependent; it is a structure in which concerted protonation or deprotonation is observed (multiple CC+ base pairs are simultaneously broken and created upon i-motif melting or formation), making i-motif an ideal candidate for measuring pH changes. A pioneering contribution was made by Krishnan’s group, who detected for the first time pH changes in living cells based on i-motif using fluorescence resonance energy transfer (FRET) (Modi et al. 2009). In this work, the FRET constructed nanodevice is based on an intermolecular i-motif, which is different from a general intramolecular one. This device exhibited FRET efficiency of 54–60% at pH 5. The designed i-motif sensor displayed a dynamic range between pH 5.5 and 6.8 and could map spatial and temporal pH variations associated with endosome maturation. The Krishnan group further improved this i-motif-based sensor and expanded the applications to other organelles, demonstrating the promising vista of i-motif in monitoring pH (Surana et al. 2011; Modi et al. 2013). As i-motif can precisely reflect pH changes when designed rationally (Nesterova and Nesterov 2014), the contributions from Krishnan and colleagues open new avenues for i-motif-based sensing. Besides direct pH measurement, i-motif can also serve as a bridge to adjust the distance between gold nanoparticles (AuNPs) based on the pH response of i-motif folding and unfolding. The i-motif sequences are chemically linked to the surface of AuNPs, and the aggregation or dispersion of AuNPs can be achieved by changing pH. This effect can be observed with the naked eye or simple colorimetric methods and may be constructed to detect other analytes. Furthermore, i-motif sequences can be chemically modified at the surface of electrodes, allowing the development of electrochemical methods for pH analysis (Xu et al. 2010; Yang et al. 2010). A previous review has surveyed the usages of i-motif in analytic chemistry (Alba et al. 2016), so we will not cover all aspects here.

130

J.-L. Mergny et al.

Ligands/Compounds It is known for over three decades that one can specifically recognize an unusual nucleic acid structure, as initially shown for DNA triple-helices (Mergny et al. 1992) and later on G-quadruplexes. Hundreds of articles have now reported more than 3,000 G-quadruplex ligands, listed in the G4LDB (http://www.g4ldb.com), showing that recognition of non-B structures has attracted a lot of interest. i-Motif, given its unique shape and electrostatic properties, may also provide the basis for specific recognition by small molecules. Hurley and colleagues reported the development of ligands/compounds targeting the i-motif (Fedoroff et al. 2000). Unfortunately, finding “good” i-motif ligands is very challenging and actually much harder than identifying G-quadruplex-ligands. In addition, the i-DNA ligands reported so far often exhibit modest selectivity toward this structure (Bonnet et al. 2022). Moreover, many current i-DNA ligands have also been reported to interact with G-quadruplexes, meaning that these compounds will bind to both G-quadruplex and i-DNA structures. Several reasons may explain why i-motif recognition is so challenging. i-DNA structure is indeed unique, and some of its specific features may complicate recognition. First of all, i-motif is intercalated, meaning that drugs acting via intercalation are unlikely to bind, as a sufficient further extension of the sugar-phosphate backbone is impossible. Nevertheless, while intercalation is also very difficult into a G-quadruplex, planar aromatic molecules found a very convenient way of interacting with a G-quadruplex structure via terminal stacking or “half-interaction.” Unfortunately, while a G-quartet provides an attractive platform for stacking, this is not the case for a small, electron-deficient CC+ base pair. An alternative way of interaction is provided by groove binding: with its four strands, i-DNA provides four grooves (two large and two narrow ones as shown in Fig. 1) offering potential binding pockets. While these grooves are very different from the minor and major grooves of a B-DNA duplex, they may not be optimal for specific recognition of for the following reasons: (i) the two narrow grooves are actually so narrow they are nearly non-existent; the two sugar phosphate backbones of two different duplexes are in close contact. One cannot easily insert a compound in between the two; (ii) the two large grooves are shallow and do not offer a proper binding pocket for classical ligands. In addition, each duplex is extended and severely underwound with a righthanded helical twist angle of 12 to 20 . Finally, the limited structural polymorphism of i-DNA will make the identification of a compound able to recognize one specific i-motif and not the others even more challenging. These difficulties explain why we do not currently have excellent i-motif probes available. This is true also when considering fluorescent light-up compounds. While many molecules exhibit a huge and selective increase in fluorescence emission upon G4 binding (e.g., N-methyl mesoporphyrin IX), equivalent fluorescent i-motif probes do not currently exist, even if compounds such as crystal violet, thiazole orange, quinaldine red, and neutral red have been reported to interact with i-DNA. The fluorescence changes observed upon interaction with i-motif sequences/structures

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

131

can be used to detect i-motif formation in vitro but are unlikely to be sufficient for cellular applications. The effect of reported ligands on the stability of i-DNA is summarized in Fig. 7; some of these compounds are also G-quadruplex ligands. Other ligands, like IMC-76 (Kendrick et al. 2014), Fisetin (Takahashi et al. 2020), and a Chromomycin complex (Tseng et al. 2017), can rather induce a structural switch from i-DNA to a hairpin conformation. In any case, the investigation of i-DNA ligands is far behind its complementary G-quadruplex, and the development of high affinity and selectivity i-motif ligands acting as anticancer/antiviral agents or probes will still require important efforts. This whole chapter may give the impression that the investigations on i-motif are lagging what has been done on G-quadruplexes. This is not the case for all applications: for example, Qu and colleagues reported the interaction between carbon nanomaterials and i-motif. The Qu group observed that the single-walled carbon nanotubes (SWNTs) can bind to the major groove of the human telomeric i-motif and remarkably enhance its stability. The most significant finding was that SWNTs could interact with i-motif selectively and induce i-motif formation under physiological

Fig. 7 Ligands were reported to modulate i-motif stability, either increasing (red) or decreasing (blue) i-DNA Tm (> 5  C). Compounds with a limited influence on thermal stability (< 5  C) are shown in green. “Switch” refers to a situation in which i-DNA is switched to a hairpin structure by the ligand

132

J.-L. Mergny et al.

conditions and even at pH 8, which constitutes an important breakthrough (Li et al. 2006). In addition, the same group used i-motif to distinguish single- and multiwalled carbon nanotubes or utilized SWNTs to inhibit telomerase activity on account of the strong affinity of SWNTs for i-motif. Interestingly, another kind of carbon nanomaterial, graphene quantum dot, was also shown to stabilize and induce i-motif formation, as SWNTs (Chen et al. 2013). The interaction between carbon nanomaterials and DNA shows great potential in a variety of fields such as sensing, nanodevices, or biomedicine and has been reviewed by Qu and colleagues (Sun et al. 2016). Nevertheless, one should not forget that these carbon nanomaterials are actually not small molecules and are different from the traditional duplex or G-quadruplex ligands: more efforts are needed in the future to find the ideal i-motif ligand.

Physiological Roles As mentioned in the introduction, a long lasting question of whether i-motif can form in vivo was answered in 2018 (Dzatko et al. 2018; Zeraati et al. 2018). Although many sequences cannot form stable i-motif structures at physiological pH in vitro, certain sequences with long cytosine tracts may actually exist at neutral or even slightly alkaline pH. In fact, physiological conditions are very complex, as discussed earlier: molecular crowding and confined effect may help in vivo. The in-cell NMR results from Trantirek and colleagues clearly demonstrated the presence of folded i-motif structures at physiological pH, as the imino signals of i-motif were kept at near-physiological temperature (Dzatko et al. 2018). Our recent work also indicates that the stabilities of i-motifs in vitro are consistent with those found in vivo: the most stable i-motifs in vitro are those we can observe in the cells. Interestingly, the stabilization effect was found in vivo: for example, the imino signals of the selected i-motif disappeared when the temperature was higher than 30  C in vitro but remained at 40  C in vivo (Cheng et al. 2021b). These results imply that physiological conditions may still be suitable for i-motif formation, despite the relatively low pKa of cytosine. In addition, the high affinity i-motif specific antibody iMab, developed by Christ’s and Dinger’s groups, has been reported to recognize i-motif structures (Zeraati et al. 2018), providing independent support for the existence of i-motif structures in living human cells. Based on these observations, one may study the physiological roles of this structure in key biological processes, as the i-motifprone sequences may be widespread in the human genome. Actually, since the discovery of i-motif, many efforts have been devoted to the search for specifical proteins that interact with this structure. Many excellent review papers address this point (Assi et al. 2018; Benabou et al. 2014; Day et al. 2014; Debnath et al. 2019), so we only give a brief introduction here. It is well established that over 700, 000 potential G-quadruplex forming sequences are present in the human genome. The number of potential i-motif sequences is debatable: given the

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

133

relative instability of this structure, one could imagine that this figure could be much smaller (not all cytosine-rich strands complementary to G-quadruplex forming sequences are expected to adopt stable i-motifs at the physiological pH). Even with this limitation in mind, there are a number of dispersed candidate i-motif sequences in the genome, and a very recent report from the Christ lab using high affinity i-motif immunoprecipitation followed by sequencing identified over 650,000 i-motif structures in human genomic DNA (Martinez et al. 2022). In parallel, Zhang’s lab recently developed a Mab antibody-based immunoprecipitation method coupled with high-throughput sequencing to map i-motif forming sequences among different cultivated rice subpopulations or species, providing the first report on the biology of i-motifs in plants (Ma et al. 2022). Furthermore, the identification of proteins and ligands interacting with the i-motif may reveal its physiological roles in processes such as telomere maintenance, regulation of gene expression, and DNA biosynthesis. i-DNA may also serve as a target for anticancer drugs and contribute to genomic instability. Martella et al. very recently analyzed (TCCC) repeats of various lengths (between 5 and 15 repeats). i-Motif formation at neutral pH increased over the range d(TCCC)5 to d(TCCC)15, and genomic deletions actually required the ability to form a stable i-motif at neutral pH (Martella et al. 2022). One last aspect currently poorly covered is related to i-motif-mediated liquidliquid phase separation (LLPS). LLPS is an emerging concept in cell biology which plays important role in various cellular processes. LLPS is regarded as a fundamental mechanism that regulates multiple processes and mediates the condensation of biomolecules, shaping coherent structures for concentrating and compartmentalizing intracellular biochemical reactions. Besides proteins, DNAs, including quadruplex nucleic acids, also play vital roles in LLPS. However, most of the attention was paid to G-quadruplexes before, and little emphasis was placed on i-motif. A recent work showed that i-motif, in a manner similar to G-quadruplexes, was capable of promoting the formation of liquid-like droplets with histone H1 via LLPS, suggesting quadruplex nucleic acids are able to regulate LLPS-mediated dynamic chromatin condensation in the nucleus (Mimura et al. 2021). This exciting report should stimulate efforts to understand the contribution of i-motif to LLPS and its possible synergy with G-quadruplexes.

Conclusion Three decades after its discovery, i-motif remains a fascinating structure that has not revealed all of its secrets. It still lags behind G-quadruplexes in terms of the number of publications and general interest. There are important efforts to be made to “reconnect” these two structures: are they mutually exclusive, as suggested by some reports, or can they coexist on two complementary strands? Do they play synergistic or antagonistic roles in transcription? Are they formed at the same stage of the cell cycle? No doubt the forthcoming years will see a plethora of new studies dedicated to the roles of unusual nucleic acid motifs in cell biology.

134

J.-L. Mergny et al.

Acknowledgements This work was financially supported by the National Natural Science Foundation of China (No. 22177047), State Key Laboratory of Analytical Chemistry for Life Science (5431ZZXM2202), ANR grant ANR-21-CE44-0005-01 “ICARE,” and Scientific Research Foundation for High-level Faculty, China Pharmaceutical University (3150110048).

References Abdelhamid MA, Fábián L, MacDonald CJ, Cheesman MR, Gates AJ, Waller ZA (2018) Redoxdependent control of i-motif DNA structure using copper cations. Nucleic Acids Res 46(12): 5886–5893 Alba JJ, Sadurní A, Gargallo R (2016) Nucleic acid i-motif structures in analytical chemistry. Crit Rev Anal Chem 46(5):443–454 Alberti P, Mergny JL (2003) DNA duplex-quadruplex exchange as the basis for a nanomolecular machine. Proc Natl Acad Sci 100(4):1569–1573 Assi HA, Garavís M, Gonzaleź C, Damha MJ (2018) i-motif DNA: structural features and significance to cell biology. Nucleic Acids Res 46(16):8038–8056 Benabou S, Aviňó A, Eritja R, González C, Gargallo R (2014) Fundamental aspects of the nucleic acid i-motif structures. RSC Adv 4(51):26956–26980 Benabou S, Ruckebusch C, Ml S, Aviñó A, Eritja R, Gargallo R, de Juan A (2019) Study of conformational transitions of i-motif DNA using time-resolved fluorescence and multivariate analysis methods. Nucleic Acids Res 47(13):6590–6605 Bonnet H, Morel M, Devaux A, Boissieras J, Granzhan A, Elias B, Lavergne T, Dejeu J, Defrancq E (2022) Assessment of presumed small-molecule ligands of telomeric i-DNA by biolayer interferometry (BLI). Chem Commun 58(33):5116–5119 Chen L, Cai L, Zhang X, Rich A (1994) Crystal structure of a four-stranded intercalated DNA, d (C4). Biochemistry 33(46):13540–13546 Cheng M, Chen J, Ju H, Zhou J, Mergny JL (2021a) Drivers of i-DNA formation in a variety of environments revealed by four-dimensional UV melting and annealing. J Am Chem Soc 143(20):7792–7807 Cheng M, Qiu D, Tamon L, Ištvánková E, Víšková P, Amrane S, Guédin A, Chen J, Lacroix L, Ju H, Trantírek L, Sahakyan AB, Zhou J, Mergny JL (2021b) Thermal and pH stabilities of i-DNA: Confronting in vitro experiments with models and in-cell NMR data. Angew Chem Int Ed 60(18):10286–10294 Chen X, Zhou X, Han T, Wu J, Zhang J, Guo S (2013) Stabilization and induction of oligonucleotide i-motif structure via graphene quantum dots. ACS Nano 7(1):531–537 Choi J, Majima T (2013) Reversible conformational switching of i-motif DNA studied by fluorescence spectroscopy. Photochem Photobiol 89(3):513–522 Chu B, Zhang D, Paukstelis PJ (2019) A DNA G-quadruplex / i-motif hybrid. Nucleic Acids Res 47(22):11921–11930 Day HA, Huguin C, Waller ZAE (2013) Silver cations fold i-motif at neutral pH. Chem Commun 49(70):7696–7698 Day HA, Pavlou P, Waller ZAE (2014) i-Motif DNA: Structure, stability and targeting with ligands. Bioorg Med Chem 22(16):4407–4418 Debnath M, Fatma K, Dash J (2019) Chemical regulation of DNA i-motifs for nanobiotechnology and therapeutics. Angew Chem Int Ed 58(10):2942–2957 Dong Y, Yang Z, Liu D (2014) DNA nanotechnology based on i-motif structures. Acc Chem Res 47 (6):1853–1860 Dzatko S, Krafcikova M, Hansel-Hertsch R, Fessl T, Fiala R, Loja T, Krafcik D, Mergny JL, Foldynova-Trantirkova S, Trantirek L (2018) Evaluation of the stability of DNA i-motifs in the nuclei of living mammalian cells. Angew Chem Int Ed 57(8):2165–2169

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

135

Fedoroff OY, Rangan A, Chemeris VV, Hurley LH (2000) Cationic porphyrins promote the formation of i-motif DNA and bind peripherally by a nonintercalative mechanism. Biochemistry 39(49):15083–15090 Gao B, Hou XM (2021) Opposite effects of potassium ions on the thermal stability of i-motif DNA in different buffer systems. ACS Omega 6(13):8976–8985 Gehring K, Leroy JL, Guéron M (1993) A tetrameric DNA structure with protonated cytosinecytosine base pairs. Nature 363(6429):561–565 Ghodke HB, Krishnan R, Vignesh K, Kumar G, Narayana C, Krishnan Y (2007) The i-tetraplex building block: Rational design and controlled fabrication of robust 1D DNA scaffolds through non Watson-Crick interactions. Angew Chem Int Ed 46(15):2646–2649 Hänsel R, Löhr F, Foldynová-Trantírková S, Bamberg E, Trantírek L, Dötsch V (2011) The parallel G-quadruplex structure of vertebrate telomeric repeat sequences is not the preferred folding topology under physiological conditions. Nucleic Acids Res 39(13):5768–5775 Huang H, Hong X, Liu F, Li N (2015) A simple approach to study the conformational switching of i-motif DNA by fluorescence anisotropy. Analyst 140(17):5987–5991 Hur JH, Kang CY, Lee S, Parveen N, Yu J, Shamim A, Yoo W, Ghosh A, Bae S, Park C, Kim KK (2021) AC-motif: a DNA motif containing adenine and cytosine repeat plays a role in gene regulation. Nucleic Acids Res 49(17):10150–10165 Iaccarino N, Cheng M, Qiu D, Pagano B, Amato J, Di Porzio A, Zhou J, Randazzo A, Mergny JL (2021) Effects of sequence and base composition on the CD and TDS profiles of i-DNA. Angew Chem Int Ed 60(18):10295–10303 Jamroskovic J, Deiana M, Sabouri N (2022) Probing the folding pathways of four-stranded intercalated cytosine-rich motifs at single base-pair resolution. Biochimie 199:81–91 Jonchhe S, Pandey S, Emura T, Hidaka K, Hossain MA, Shrestha P, Sugiyama H, Endo M, Mao H (2018) Decreased water activity in nanoconfinement contributes to the folding of G-quadruplex and i-motif structures. Proc Natl Acad Sci 115(38):9539–9544 Kendrick S, Kang HJ, Alam MP, Madathil MM, Agrawal P, Gokhale V, Yang D, Hecht SM, Hurley LH (2014) The dynamic character of the BCL2 promoter i-motif provides a mechanism for modulation of gene expression by compounds that bind selectively to the alternative DNA hairpin structure. J Am Chem Soc 136(11):4161–4171 Khristenko N, Amato J, Livet S, Pagano B, Randazzo A, Gabelica V (2019) Native ion mobility mass spectrometry: When gas-phase ion structures depend on the electrospray charging process. J Am Soc Mass Spectrom 30(6):1069–1081 Krishnan-Ghosh Y, Stephens E, Balasubramanian S (2005) PNA forms an i-motif. Chem Commun 42:5278–5280 Kumar N, Nielsen JT, Maiti S, Petersen M (2007) i-motif formation with locked nucleic acid (LNA). Angew Chem Int Ed 46(48):9220–9222 Lacroix L, Mergny JL, Leroy JL, Hélène C (1996) Inability of RNA to form the i-motif: Implications for triplex formation. Biochemistry 35(26):8715–8722 Largy E, Mergny JL (2014) Shape matters: size-exclusion HPLC for the study of nucleic acid structural polymorphism. Nucleic Acids Res 42(19):e149 Largy E, Mergny JL, Gabelica V (2016) Role of alkali metal ions in G-quadruplex nucleic acid structure and stability. Met Ions Life Sci 16:203–258 Lee IJ, Patil SP, Fhayli K, Alsaiari S, Khashab NM (2015) Probing structural changes of selfassembled i-motif DNA. Chem Commun 51(18):3747–3749 Lepper CP, Williams MAK, Edwards PJB, Filichev VV, Jameson GB (2019) Effects of pressure and pH on the physical stability of an i-motif DNA Structure. Chemphyschem 20(12):1567–1571 Leroy JL (2003) TT pair intercalation and duplex interconversion within i-motif tetramers. J Mol Biol 333(1):125–139 Leroy JL, Guéron M (1995) Solution structures of the i-motif tetramers of d(TCC), d(5methylCCT) and d(T5methylCC): novel NOE connections between amino protons and sugar protons. Structure 3(1):101–120

136

J.-L. Mergny et al.

Li T, Famulok M (2013) i-Motif-programmed functionalization of DNA nanocircles. J Am Chem Soc 135(4):1593–1599 Li X, Peng YH, Ren J, Qu X (2006) Carboxyl-modified single-walled carbon nanotubes selectively induce human telomeric i-motif formation. Proc Natl Acad Sci 103(52):19658–19663 Liedl T, Simmel FC (2005) Switching the conformation of a DNA molecule with a chemical oscillator. Nano Lett 5(10):1894–1898 Liu D, Balasubramanian S (2003) A proton-fuelled DNA nanomachine. Angew Chem Int Ed 42(46):5734–5736 Liu L, Kim BG, Feroze U, Macgregor Jr RB, Chalikian TV (2018) Probing the ionic atmosphere and hydration of the c-MYC i-motif. J Am Chem Soc 140(6):2229–2238 Ma DL, Kwan HT, DSH C, Lee P, Yang H, VPY M, Bai LP, Jiang ZH, Leung CH (2011) Crystal violet as a fluorescent switch-on probe for i-motif: label-free DNA-based logic gate. Analyst 136(13):2692–2696 Ma X, Feng Y, Yang Y, Li X, Shi Y, Tao S, Cheng X, Huang J, Wang X, Chen C, Monchaud D, Zhang W (2022) Genome-wide characterization of i-motifs and their potential roles in the stability and evolution of transposable elements in rice. Nucleic Acids Res 50(6):3226–3238 Martella M, Pichiorri F, Chikhale RV, Abdelhamid MAS, Waller ZAE, Smith SS (2022) i-Motif formation and spontaneous deletions in human cells. Nucleic Acids Res 50(6):3445–3455 Martinez CDP, Zeraati M, Rouet R, Mazigi O, Gloss B, Chan CL, Bryan TM, Smith NM, Dinger ME, Kummerfeld S, Christ D (2022) Human genomic DNA is widely interspersed with i-motif structures. https://doi.org/10.1101/2022.04.14.488274 Matsumoto S, Sugimoto N (2021) New insights into the functions of nucleic acids controlled by cellular microenvironments. Top Curr Chem 379(3):17 Mergny JL (1999) Fluorescence energy transfer as a probe for tetraplex formation: the i-motif. Biochemistry 38(5):1573–1581 Mergny JL, Duval-Valentin G, Nguyen CH, Perrouault L, Faucon B, Rougée M, MontenayGarestier T, Bisagni E, Hélène C (1992) Triple-helix specific ligands. Science 256(5064): 1681–1684 Mergny JL, Lacroix L, Han XG, Leroy JL, Hélène C (1995) Intramolecular folding of pyrimidine oligodeoxynucleotides into an I-DNA motif. J Am Chem Soc 117(35):8887–8898 Mergny JL, Li J, Lacroix L, Amrane S, Chaires JB (2005) Thermal difference spectra: a specific signature for nucleic acid structures. Nucleic Acids Res 33(16):e138 Mergny JL, Sen D (2019) DNA quadruple helices in nanotechnology. Chem Rev 119(10):6290– 6325 Modi S, Nizak C, Surana S, Halder S, Krishnan Y (2013) Two DNA nanomachines map pH changes along intersecting endocytic Pathways inside the same cell. Nat Nanotechnol 8(6): 459–467 Modi S, Swetha MG, Goswami D, Gupta GD, Mayor S, Krishnan Y (2009) A DNA nanomachine maps spatiotemporal pH changes in living cells. Nat Nanotechnol 4(5):325–330 Modi S, Wani AH, Krishnan Y (2006) The PNA-DNA hybrid i-motif: implications for sugar-sugar contacts in i-motif tetramerization. Nucleic Acids Res 34(16):4354–4363 Mimura M, Tomita S, Shinkai Y, Hosokai T, Kumeta H, Saio T, Shiraki K, Kurita R (2021) Quadruplex folding promotes the condensation of linker histones and DNAs via liquid–liquid phase separation. J Am Chem Soc 143(26):9849–9857 Mukundan VT, Phan AT (2013) Bulges in G-quadruplexes: broadening the definition of Gquadruplex-forming sequences. J Am Chem Soc 135(13):5017–5028 Nakano S, Miyoshi D, Sugimoto N (2014) Effects of molecular crowding on the structures, interactions, and functions of nucleic acids. Chem Rev 114(5):2733–2755 Nesterova IV, Nesterov EE (2014) Rational design of highly responsive pH sensors based on DNA i-motif. J Am Chem Soc 136(25):8843–8846 Phan AT, Mergny JL (2002) Human telomeric DNA: G-quadruplex, i-motif and Watson-Crick double helix. Nucleic Acids Res 30(21):4618–4625

4

Quadruplexes Are Everywhere. . .On the Other Strand Too: The i-Motif

137

Pramanik S, Nagatoishi S, Sugimoto N (2012) DNA tetraplex structure formation from human telomeric repeat motif (TTAGGG):(CCCTAA) in nanocavity water pools of reverse micelles. Chem Commun 48(40):4815–4817 Rosu F, Gabelica V, Joly L, Gregoire G, De Pauw E (2010) Zwitterionic i-motif structures are preserved in DNA negatively charged ions produced by electrospray mass spectrometry. Phys Chem Chem Phys 12(41):13448–13454 Snoussi K, Nonin-Lecomte S, Leroy JL (2001) The RNA i-motif. J Mol Biol 309(1):139–153 Sun H, Ren J, Qu X (2016) Carbon nanomaterials and DNA: from molecular recognition to applications. Acc Chem Res 49(3):461–470 Surana S, Bhat JM, Koushika SP, Krishnan Y (2011) An autonomous DNA nanomachine maps spatiotemporal pH changes in a multicellular living organism. Nat Commun 2:340 Takahashi S, Sugimoto N (2015) Pressure-dependent formation of i-motif and G-quadruplex DNA structures. Phys Chem Chem Phys 17(46):31004–31010 Takahashi S, Bhattacharjee S, Ghosh S, Sugimoto N, Bhowmik S (2020) Preferential targeting cancer-related i-motif DNAs by the plant flavonol fisetin for theranostics applications. Sci Rep 10(1):2504 Tateishi-Karimata H, Nakano M, Pramanik S, Tanaka S, Sugimoto N (2015) i-Motifs are more stable than G-quadruplexes in a hydrated ionic liquid. Chem Commun 51(32):6909–6912 Tateishi-Karimata H, Sugimoto N (2018) Biological and nanotechnological applications using interactions between ionic liquids and nucleic acids. Biophys Rev 10(3):931–940 Tseng WH, Chang CK, Wu PC, Hu NJ, Lee GH, Tzeng CC, Neidle S, Hou MH (2017) Induced-fit recognition of CCG trinucleotide repeats by a nickel-chromomycin complex resulting in largescale DNA deformation. Angew Chem Int Ed 56(30):8761–8765 Wang B (2019) The RNA i-motif in the primordial RNA world. Orig Life Evol Biosph 49(1-2):105–109 Wang J, Fang R, Hou J, Zhang H, Tian Y, Wang H, Jiang L (2017) Oscillatory reaction induced periodic C-quadruplex DNA gating of artificial ion channels. ACS Nano 11(3):3022–3029 Wright EP, Huppert JL, Waller ZAE (2017) Identification of multiple genomic DNA sequences which form i-motif structures at neutral pH. Nucleic Acids Res 45(6):2951–2959 Xu X, Li B, Xie X, Li X, Shen L, Shao Y (2010) An i-DNA based electrochemical sensor for proton detection. Talanta 82(4):1122–1125 Yang Y, Liu G, Liu H, Li D, Fan C, Liu D (2010) An electrochemically actuated reversible DNA switch. Nano Lett 10(4):1393–1397 Yang Y, Zhou C, Zhang T, Cheng E, Yang Z, Liu D (2012) DNA pillars constructed from an i-motif stem and duplex branches. Small 8(4):552–556 Zeraati M, Langley DB, Schofield P, Moye AL, Rouet R, Hughes WE, Bryan TM, Dinger ME, Christ D (2018) I-motif DNA structures are formed in the nuclei of human cells. Nat Chem 10(6):631–637 Zhao C, Qu X (2013) Recent progress in G-quadruplex DNA in deep eutectic solvent. Methods 64(1):52–58 Zhou J, Amrane S, Korkut DN, Bourdoncle A, He HZ, Ma DL, Mergny JL (2013) Combination of i-motif and G-quadruplex structures within the same strand: formation and application. Angew Chem Int Ed 52(30):7742–7746 Zhou J, Bourdoncle A, Rosu F, Gabelica V, Mergny JL (2012) Tri-G-quadruplex: Controlled assembly of a G-quadruplex structure from three G-rich strands. Angew Chem Int Ed 51(44): 11002–11005 Zhou J, Wei C, Jia G, Wang X, Feng Z, Li C (2010a) Formation of i-motif structure at neutral and slightly alkaline pH. Mol Biosyst 6(3):580–586 Zhou J, Wei C, Jia G, Wang X, Feng Z, Li C (2010b) Formation and stabilization of G-quadruplex in nanosized water pools. Chem Commun 46(10):1700–1702

5

i-Motif Nucleic Acids Zoe¨ A. E. Waller

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motif Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hemi-Protonated Cytosines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intramolecular i-Motif Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Grooves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stabilizing Cations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motifs and pH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nanotechnology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motifs at Neutral pH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motifs in Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motifs in the Telomeres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motifs in Gene Promoter Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motifs in DNA Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motifs in Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motifs and G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i-Motif Ligands and Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TMPyP4 and Macrocycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carboxyl-Modified Nanotubes and Quantum Dots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Small Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fluorescent Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synergistic Ligands for i-Motifs and G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

140 141 141 142 143 143 144 145 146 146 147 149 150 150 151 152 154 155 156 157 158 159 160 162 163

Z. A. E. Waller (*) Drug Discovery, UCL School of Pharmacy, London, WC1N 1AX, UK e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_97

139

140

Z. A. E. Waller

Abstract

i-Motifs are four-stranded nucleic acid secondary structures formed from sequences rich in cytosine and are stabilized by hemi-protonated cytosine–cytosine base pairs. In this chapter, we take you through an overview of i-motif nucleic acids: the structure of i-motif DNA, their special relationship with pH, their potential functions in biology, and targeting with ligands and probes.

Introduction GC-rich regions of the genome have long been known to be subject to instability, mutations, and deletions. Minisatellites, a class of tandem repeats, are shown to cause disease by influencing gene expression, modifying coding sequences within genes or generating fragile sites. Instability at GC-rich minisatellites involve distinct mutation processes operating in somatic and germline cells. In particular, CpG dinucleotides mutate at a high rate because cytosine is vulnerable to deamination. Moreover, cytosines in CpG dinucleotides are often methylated, and deamination of 5-methylcytosine (5mC) produces thymidine. GC-rich regions of DNA have a propensity to form alternative structures (Fig. 1) (Choi and Majima 2011; Gilbert and Feigon 1999); the best studied of these is that of the G-quadruplex, formed in

Fig. 1 Overview of comparison of B-form double-helical DNA, G-quadruplex, and i-motif structures

5

i-Motif Nucleic Acids

141

sequences rich in guanine. For a detailed overlook of this structure, please refer to Quadruplex DNA. Wherever G-rich sequences form in the genome, C-rich sequences are also present on the complementary strand; some of these DNA sequences are capable of forming intercalated or i-motifs. While there has been a lot of research focussed on the study of structures formed from G-rich DNA and RNA sequences, much less has focussed on the C-rich counterparts, and consequently i-motif structure and function. Here, we provide an overview of the subject of i-motif nucleic acids.

i-Motif Structure Hemi-Protonated Cytosines i-Motifs are four-stranded secondary structures comprised of two parallel-stranded DNA duplexes zipped together in an antiparallel orientation by intercalated, hemiprotonated cytosine–cytosine base pairs (Fig. 2) (Gehring et al. 1993). The building block of i-motifs is the CC+ base pair, and it is formed by the planar arrangement of two cytosines interacting through three hydrogen bonds. The central hydrogen bond is stronger compared to the other two as it involves a proton oscillating between the two nitrogens (Lieblein et al. 2012a). Cytosine–cytosine base pairs (CC+) were first observed in crystals of acetyl cytosine (Marsh et al. 1962) at a period of time where other alternative (non-Watson-Crick) base pairings were being discovered. However, it was over 30 years after this that such base pairing was determined to be central to a novel-type of DNA structure. In 1993, Gehring, Leroy, and Guéron using the sequence 50 -d(TCCCCC) at acid pH revealed the i-motif, using NMR spectroscopy (Gehring et al. 1993). This was the first example of a nucleic acid secondary structure that had intercalated base pairs. Prior to this seminal publication, sequences containing tracts of cytidine forming hemi-protonated base pairs at acid pH had previously been considered to be double-stranded.

Fig. 2 Hemi-protonated CC-base pair and overall nomenclature for an i-motif sequence and structure

142

Z. A. E. Waller

Intramolecular i-Motif Formation Shortly after the initial discovery of the i-motif, examples of intramolecular i-motif species were presented, including the C-rich sequence from the human telomere (Ahmed et al. 1994) with the realization that although they form preferentially at acidic conditions, they had the potential to form in vivo (Leroy et al. 1994). The initial disclosure of the NMR structure of i-motifs was an intermolecular structure, comprised of four separate strands. This has been accompanied by several intermolecular crystal structures (e.g., PDB IDs 1CN0 (Weil et al. 1999), 1BQJ (Weil et al. 1999), 6TQI (Zhang et al. 2020)). Although there are intramolecular NMR structures of i-motifs available (e.g., PDB IDs 1EL2, 1ELN; see Fig. 3, for an example), they are of modified fragments from the telomeres, and these modifications (necessary to enable structure determination by NMR) have been shown to alter the widths of the grooves in the structure (Weil et al. 1999; Zhang et al. 2020). To date, no intramolecular crystal structures of i-motifs have been published. i-Motif structures most relevant to biology are intramolecular, formed from a signal strand, similar to what would be present when forming within the genomic context. These

Fig. 3 Solution structure of an intramolecular i-motif from a modified sequence from the human telomere d

(CCCTA25mCCCTA2CCCUA2CCCT) PDB: 1ELN

5

i-Motif Nucleic Acids

143

structures will have a core of stacked C-C+ base pairs but also loops. However, similar to recent evidence of proteins binding intermolecular G-quadruplex structures, intermolecular i-motif structures may also be biologically relevant (https:// pubs.acs.org/doi/10.1021/jacs.1c10745).

Topologies By virtue of their structure, all i-motifs are antiparallel, making the topologies much simpler than G-quadruplex DNA. There are two general topologies of i-motifs, the 50 E and 30 E forms (Phan et al. 2000). Each of these depicts the position of terminal CC+ base pair. In the 30 E, the outermost CC+ pairs of the i-motif are formed by the cytosines at the 30 end of each stretch of oligonucleotide. In the other topology, denoted 50 E, the outermost cytosines are those at the 50 end. It has been shown that each of these topologies may have different stabilities. For example, Schwalbe showed by pH jump experiments that the 30 E forms rapidly and is kinetically favored, but then the 50 E is thermodynamically more stable (Lieblein et al. 2012b). Abdelhamid et al. also indicated that these could interconvert over the time when samples are stored in the fridge. Their work indicated that caution should be taken to ensure annealing conditions, and subsequent further incubation times post-annealing for i-motif-forming sequences should always be kept consistent; otherwise, the potential interconversion over time could affect observed results and provide inconsistencies (Abdelhamid and Waller 2020) (Fig. 4).

Grooves The main features of i-motifs include the loops, the grooves, and the CC+ base pair stack. The i-motif structure has four grooves: two wide grooves and two narrow grooves. The wide grooves are wider than B-form DNA, providing a large surface different to that of other DNA structures. The minor grooves in the i-motif are very

Fig. 4 50 E and 30 E topologies in i-motif structures

144

Z. A. E. Waller

narrow (3.1 Å), just over half the distance as that found in B-DNA (Abou Assi et al. 2018). Consequently, there is repulsion between adjacent negatively charged phosphate backbones that define the minor groove. The 20 -OH groups in RNA i-motifs destabilize i-motifs because of the close juxtaposition of sugar phosphates in this structure, making formation of RNA i-motif disfavored (Lacroix et al. 1996). It has been proposed that sugar–sugar contacts are important in stabilizing i-motifs in general and revealed that solvation is the cause of the instability of RNA i-motif structures (Fenna et al. 2008). Many different artificial backbones have been used to alter the stability of i-motif structures as a result. For example, Mergny and Lacroix investigated phosphorothioate and methylphosphonate modifications to the sugar–phosphate backbone. In this case, it was found that phosphorothioate modifications resulted in less stable i-motifs, whereas methylphosphonate-modified equivalents were unable to form i-motifs at all (Mergny and Lacroix 1998). Peptide nucleic acids are also very much capable of forming i-motif structures, but only at much lower pHs compared to DNA (Krishnan-Ghosh et al. 2005). Another notable example is where Damha, González, and co-workers used 2-deoxy-2fluoroarabinose as a sugar alternative in their oligonucleotides showing significant advantages in offering stability across a wide pH range, including stability at neutral pH (Assi et al. 2016). Although the overall i-motif structure remains the same, the modification alters the conformation of the sugar, from the C30 -endo to C20 -endo form which, combined with the electronegative fluorines, provides favorable sequential and interstrand electrostatic interactions.

Loops The hypervariable loops situated between the tracts of cytosines provide sequence differences and unique features to each i-motif. These are one of the specific positions in which various i-motifs can potentially be differentiated from each other. In 2011, Hurley and Kendrick reviewed all the i-motif structures to date and summarized that i-motif-forming sequences with long loops (such as the ones from cMYC and BCL2) had a higher pH stability than others with short loops (VEGF, RET, and Rb) (Brooks et al. 2010). They defined these as Class I, for short loop i-motifs, and Class II, for long loop type i-motifs, that were noticeably more stable. It is clear that although long loops may provide additional stabilizing interactions within the loop region, this is not absolutely necessary for high stability as Brazier showed in 2012, when their team showed that the i-motif structure formed by a HIF1α promoter sequence shows unexpected stability near neutral pH (Brazier et al. 2012). Both the sequence from HIF-1α and the sequence from hTERT are capable of forming two i-motif structures within the same oligonucleotide and also indicated evidence of interaction between multiple i-motif structures in solution. The unexpectedly high stability of the sequence from HIF-1α challenged the belief that long looping regions are required to form i-motif structures that are stable near physiological conditions. The following work indicated in a sequence-independent manner that shorter loops offer more stability than longer ones (Gurung et al. 2015). Wadkins

5

i-Motif Nucleic Acids

145

and co-workers have also studied loop regions within i-motif-forming sequences and found that intramolecular i-motifs with longer central loops form at pHs and temperatures higher than those of iMs with longer outer loops (Reilly et al. 2015). This was corroborated by our own work in 2017, where we observed a few sequences with longer “middle loops” that were stable (Wright et al. 2017). Burrows’ studies on sequences of polycytosine also showed that stable combinations could be found with a slightly longer central loop (Fleming et al. 2018). Additional stabilizing interactions in loop regions are more likely to be possible and favorable if found within the second (middle) loop.

Stabilizing Cations As DNA is charged, cations are an important additional environmental consideration when studying DNA structures. G-quadruplex topologies are strongly influenced by the type of cations present, and metal cations alter DNA structure by inducing helixto-coil transitions, right- to left-handed helical transitions as well as aggregation and condensation. Mergny et al. found that the pKa of the cytosine N3 was reduced in low-salt conditions, favoring i-motif formation at lower ionic concentrations (Mergny et al. 1995). Toward the pKa of cytosine, increasing the concentration of NaCl to 100 mM resulted in a destabilization, though further increases to 300 mM did not alter the stability thereafter. No differences in i-motif stability were observed upon the addition of 5 mM Mg2+, Ca2+, Zn2+, Li+, or K+ cations. Similar work on sequences from n-MYC also showed a decrease in stability of the i-motif with increasing ionic concentration (Benabou et al. 2014). Hong and co-workers demonstrated that Li+ cations destabilize i-motifs. Kinetics analysis indicated that although the size of Li+ ions allows them to occupy the space between the cytosine–cytosine base pairs, in direct competition with protons, they do not adequately replace the role of protons in mediating the hydrogen bonding of cytosine pairs (Kim et al. 2014). Inspired by work indicating that Ag+ cations can mediate C–C mismatches in duplex DNA, we were able to observe that Ag+ cations can fold i-motif DNA at neutral pH (Day et al. 2013). Following the discovery, other metal cations were screened, and it was found that copper cations could alter the structure of i-motif (Abdelhamid et al. 2018; Day et al. 2015). Cu2+ was examined initially and was found to shift the equilibrium of structures in solution from i-motif toward a hairpinlike structure. This switch was possible, even under acidic conditions, suggesting that Cu2+ was capable of competing with H+ within a CC base pair. Further computational studies showed that Cu+ can act as a substitute for H+ to support the formation of cytosine dimers with similar conformation to the hemi-protonated base pair found in i-motif DNA (Gao et al. 2016). Through a range of biophysical methods, we provided experimental evidence to support the hypothesis that Cu+ can mediate CC base pairing in i-motif DNA, either by substitution of H+ under acidic conditions or formation at neutral pH. Given the relationship between Cu+ and Cu2+, we also studied the effects of redox sensitivity. Folding of i-motif using Cu+ can be reversed using a metal chelator or exposure to ambient oxygen in the air that drives

146

Z. A. E. Waller

oxidation of Cu+ to Cu2+. Thus, the use of copper cations provides redox-sensitive conditions for conformational control of i-motif-forming DNA sequence. It is likely that there are other cations that may fold C-rich DNA into i-motif or other competing structures, and this area could prove important for alternative switching conditions for both biological and nanotechnological applications.

i-Motifs and pH The formation of CC+ base pair hydrogen bonding is promoted by acidic conditions where at low pH, the N3 in cytosine becomes protonated to drive rapid i-motif formation. As such, depending on the particular sequence, i-motif-forming sequences can be very responsive to pH (Mergny et al. 1995). The pH stability of the structure is defined by its transitional pH (pHT) which is the pH where the population is 50% folded into i-motif. For the structure to form, the cytosines need to be 50% protonated, as only one cytosine in each CC+ base pair needs to be protonated for the hydrogen bonding to be complete. The pKa of cytosine is around 4.5, but the local pKa for each environment changes, depending on the surrounding sequence and the environmental conditions. This is the reason why some i-motifforming sequences are more stable than others. As the structure requires hemiprotonation of cytosines, when the pH is too low, both cytosines may protonate, and when the pH is too high, neither will protonate. Both extremes will not provide the necessary conditions to the formation of i-motif structure. While the issues surrounding pH made working on them more challenging in terms of biological investigations, the sensitivity of the structure to pH is a relatively unique property that has been utilized and exploited in nanotechnology.

Nanotechnology The first example of utilization of i-motifs in a nanotechnological device was in 2003 when Liu and Balasubramanian invented what they termed a “proton-fueled nanomachine.” (Liu and Balasubramanian 2003) In this simple device, they utilized the changes in i-motif structure with pH. At pH 5, one of the oligonucleotides folds into the i-motif (closed state) and at pH 8, unfolds and forms an extended duplex. This process was shown to be reversible and opened up new possibilities for exploiting pH-driven motion in nanoscale structures. Shortly after, this was further developed into the first demonstration of an artificial DNA molecular motor that could perform micromechanical work with tunable control using buffer pH and ionic strength. Since then, there have been numerous inventions using C-rich DNA, and i-motif pH-switching properties have since been exploited in DNA-based scaffolds for molecular engineering and synthetic biology. The breadth of this field is extensive and worthwhile visiting some of the reviews in this area to appreciate the variety of different applications that have been made (Mergny and Sen 2019; Debnath et al. 2019; Zheng et al. 2021; Dong et al. 2014). Examples included DNA nanomachines

5

i-Motif Nucleic Acids

147

that can map pH changes inside living cells, assembly of graphene oxide (with useful applications for chemical sensors, energy storage, catalysis, and optoelectronics) and in logic gates, with switches based on altering the pH. Much work has been done on adjusting the properties of i-motif DNA and the pH responsiveness such that finetuning of the sequence can result in highly responsive sensors based on pH. A notable example of the use of i-motifs in nanotechnology includes a pH-driven bipedal walker and stepper (Wang et al. 2011). Willner and co-workers created a “bipedal walker” and of a “bipedal stepper” using DNA activated by H+/OH triggers. The forward “walking” of the DNA on the template track could be activated by H+ ions which folded the DNA into the i-motif structure and contributed toward the DNA translocation driving forces. Backward “walking” was activated by OH ions that increase the pH, unfolding the i-motif. The DNA machines’ progress was followed optically by fluorescent labelling. Another early example came from the Krishnan group who provided an example of using i-motifs as an autonomous DNA nanomachine that maps spatiotemporal pH changes in cells and then a multicellular living organism (Surana et al. 2011), where they demonstrated the operation of a pH-triggered DNA nanomachine inside the nematode model organism C. elegans. The nanomachine used FRET to effectively map spatiotemporal pH changes associated with endocytosis. This was the first demonstration of the independent functionality of a DNA nanomachine in vivo and showed the utility of DNA nanodevices as tools to interrogate complex biological phenomena. Another early application was to use i-motifs in drug release, one of the first examples of pH-controlled reversible drug binding and release using a cytosine-rich DNA sequence (Xu et al. 2011). These have since been combined with other materials such as nanoparticles and polymers for the delivery of nucleic acids and drug molecules. Such nanodevices that can load small molecules for controlled release could be utilized in intelligent drug delivery systems. The sensitivity of i-motifs to changes in pH makes them ideal for probes and triggers based on changes in these conditions. However, the additional cations that have been shown to stabilize and destabilize the structure, as well as others that are yet to be discovered, may also be used as triggers in these devices and systems.

i-Motifs at Neutral pH Early work on i-motif DNA showed that even though the structures were more stable at acidic pH, they were still present at physiological pH. In 1995, Mergny et al. studied all aspects of C-rich DNA and showed using gel electrophoresis and spectroscopic studies that i-motif structures are persistent at physiological pH in DNA (Mergny et al. 1995). Despite this, and other early work, there was skepticism with regard to whether i-motifs were physiologically relevant. This arguably led to a lack of research in this area, aside from within the area of nanotechnology. Researchers needed more convincing that i-motifs were physiologically relevant. Slowly but surely, more evidence came.

148

Z. A. E. Waller

First, Hurley and co-workers showed using plasmid DNA that negative superhelicity folded i-motif structures at pH 7 (Sun and Hurley 2009). Their studies sought to mimic the effect of transcriptionally induced negative superhelicity in the promoter region of the cMYC oncogene, and they incorporated the G-quadruplex/i-motif-forming region into a supercoiled plasmid. Using enzymatic and chemical footprinting, they provided evidence to indicate that negative superhelicity facilitates the formation of secondary DNA structures under physiological conditions, and the formed structures are not the same as those formed in single-stranded DNA templates. This was good evidence early on that i-motif structures were viable and comparable potential drug targets, when compared alongside G-quadruplex structures. In 2010, Li and co-workers showed that i-motifs can fold at neutral and even slightly alkaline pH at 4  C (Zhou et al. 2010). They used a mixture of biophysical techniques and observed the properties of i-motif-forming sequences from the human telomere as well as the promoter regions of the genes RET and Rb. Their studies indicated that the sequences can form i-motif structure at pH 7.0 and at 4  C. Then, Sugimoto and his team demonstrated that molecular crowding conditions, consistent with those within the cellular environment, stabilized i-motif structures so that they would form under physiological conditions (Rajendran et al. 2010). Their study was the first example in which C-rich triplet repeat DNA sequences were shown to adopt an i-motif structure at neutral pH by molecular crowding. They showed that molecular crowding stabilized the i-motif through altering the pKa of the N3 of cytosine. It was proposed that in the crowded environment, the pKa could increase and thus subsequently increase the stability of any i-motif structures. Molecular crowding was already known to accelerate the formation of multistranded alternative DNA structures, but in this case, it was shown that this was applicable to intramolecular i-motif formation. In 2012, Brazier and their team showed that the C-rich sequence from the promoter region of hypoxia-induced transcription factor (Hif-1α) had a transitional pH of 7.2, even in the absence of additional stabilizing conditions. This was the first example of a “Class II” i-motif with short loops that had high pH stability and the first example of a native promoter i-motif-forming sequence having a transitional pH above pH 7. This demonstrated that, even in the absence of additional stabilizing conditions, some i-motif-forming sequences would naturally be stable at neutral pH. In 2013, our own work showed that silver (I) cations can fold i-motifs at neutral pH (Day et al. 2013). These conditions, although not directly physiologically relevant, were useful as switches and have since been used as conditions for many nanotechnological and self-assembled devices, including logic gates (Shi et al. 2014) and antibacterial gels for wound healing (Bhattacharyya et al. 2019). Following the discovery that silver cations could stabilize i-motifs at neutral pH, other metal cations were screened and found that copper cations could alter the structure of i-motif (Abdelhamid et al. 2018; Day et al. 2015). Copper I cations, however, stabilize a metal-mediated i-motif structure at neutral pH, whereas copper II cations have been shown to shift the structure from i-motif to hairpin, even at acidic pH. This has not only revealed that it is possible to change the shape of DNA using redox conditions but also provides conditions to fold i-motifs at neutral pH if required.

5

i-Motif Nucleic Acids

149

In 2017, our own work revisited that of Mergny et al. (1995) in which they looked at i-motif stability with cytosine tract length between 2 and 5 cytosines long. This work expanded the number of cytosines in the C-tracts involved in i-motifs up to 10 cytosines in length and found that i-motifs with tract lengths of 5 and above resulted in structures that were naturally stable at neutral pH (Wright et al. 2017). Searching the human genome revealed that there were thousands of potential stable i-motifs, and characterization of a subsection of these using spectroscopic methods revealed multiple i-motifs from the human genome that were naturally stable at neutral pH. This revealed that the sequence from Hif-1α was not an isolated case and that there were likely many different ways i-motif structures could confer stability under neutral conditions. In this case, it is evident that as the stack of CC+ base pairs gets longer, the pKa of the nitrogens within the CC+ base pairs changes. Above five cytosines, two separate structures emerged. This was initially attributed to a mixture of i-motif and hairpin forms, but subsequent work by Vorlíčková and co-workers indicated that rather than an i-motif and hairpin, once a sequence gets beyond a certain length, it has enough cytosines to produce two i-motif structures side by side rather similar to “beads on a string.” (Školáková et al. 2019) Separate studies by Burrows and co-workers also showed that polyC sequences could fold into neutralstable i-motif structures (Fleming et al. 2017). Their work revealed that the number of CC+ base pairs and their folding affects the overall stability of the structures formed in polyC sequences and that 4n–1 cytosines in a poly-cytosine homopolymer sequence is a “sweet spot” in stable i-motif folding. Further studies indicated that i-motifs with an even number of base pairs in the CC+ base pair stack with a single nucleotide in each loop was the most stable. Also an i-motif with an odd number of CC+ base pairs in the stack can have higher stability if loops 1 and 3 have a single nucleotide, and loop 2, the central loop, has three nucleotides (Fleming et al. 2018). Given the number and position of potential sequences across the human genome, these studies cemented the idea that stable i-motifs are abundant throughout the human genome and are biologically relevant. In addition to traditional i-motif structure formation, there are possible alternatives that are also stable at neutral pH. Additional stabilizing interactions can help i-motif formation; for example, it has also been demonstrated that a minimal i-motif structure with just two CC+ base pairs can be formed, even at neutral pH, by the formation of additional G:T:G:T tetrads at the top and bottom of the CC+ base pair stack (Escaja et al. 2012). Moreover, it is possible to form AC-motifs if a tract of cytosines is replaced with adenines (Hur et al. 2021). These additional features and potential stabilizing interactions within the loops currently make it difficult to predict i-motif formation, especially when compared to prediction tools for G-quadruplex DNA.

i-Motifs in Biology i-Motif structures were once thought to be unstable at physiological pH, which precluded substantial biological investigation. Work toward the study of i-motifs in biology was limited initially due to the presumption that i-motifs were not

150

Z. A. E. Waller

physiologically relevant. This was compounded as researchers focussed on G-quadruplexes as i-motifs weren’t perceived to be physiologically relevant and i-motif focussed researchers had difficulty getting their work published for the same reasons. Advances in the field have now displaced most of these concerns as there is overwhelming evidence that i-motif structures are stable under physiological conditions, have native binding partners, and can affect biology. There have also been studies identifying both DNA and RNA i-motifs in cellulo.

i-Motifs in the Telomeres Early work by the groups of Hélène and Henderson showed using NMR that four consecutive C stretches of a C-rich telomeric strand can fold into an i-motif (Ahmed et al. 1994; Leroy et al. 1994). They each proposed hypothetically that there was biological relevance: that i-motifs may be relevant to the formation of G-quadruplexes by complementary G-rich sequences and i-motif formation could occur in vivo. The evidence came much later, when in 2012, the insights into the biomedical effects of carboxyl-modified single-wall carbon nanotubes on telomerase and telomeres were published in Nature Communications (Chen et al. 2012). It was the first example that stabilization of i-motif structure can inhibit telomerase activity and interfere with the telomere functions in cancer cells. Qu and co-workers reported that carboxyl-modified single-walled carbon nanotubes inhibit telomerase activity through stabilization of i-motif structure formed in the human telomere. Using a G-quadruplex TRAP assay, they showed that stabilization of the i-motif and the G-quadruplex leads to telomere uncapping and displaces telomere-binding proteins from the telomere. The dysfunctional telomere triggers DNA damage response and elicits upregulation of p16 and p21 proteins. This was the first example showing that carboxyl-modified single-walled carbon nanotubes can inhibit telomerase activity and interfere with the telomere functions in cancer cells. Their hypothesis was that formation of the i-motif stabilized the opposing G-quadruplex, and this in turn gave rise to the observed biological effects. The work was important in showing the capability of targeting i-motif structures in biology.

i-Motifs in Gene Promoter Regions Xodo and co-workers first investigated the potential of i-motif structure formation in gene promoter regions where they provided evidence for intramolecularly folded i-motif DNA structures in biologically relevant CCC-repeat sequences (Manzini et al. 1994). Alongside the sequence from the human telomere, they also studied a sequence from a critical functional stretch of the KRAS promoter proto-oncogene, indicating a larger significance of i-motif structures in the functioning of DNA. Relatively quickly after this, other i-motif-forming sequences were subsequently studied from other regulatory regions in the human genome including the promoter regions of c-Myc, c-Kit, PDGF-A, Rb, and RET as well as examples such as those

5

i-Motif Nucleic Acids

151

from the insulin-linked polymorphic region, a minisatellite repetitive region within the promoter region of the insulin gene (INS). The number of i-motifs in the human (and other) genomes is not fully understood, as prediction of i-motif formation is somewhat more challenging than G-quadruplexes, though there are now plenty of examples of stable i-motif-forming sequences from gene promoters (Wright et al. 2017). Critically in 2014, two seminal back-to-back papers from Hurley’s lab presented a full picture of the C-rich sequence within the cis-regulatory element within the promoter region of BCL2 (Kang et al. 2014; Kendrick et al. 2014). They showed that the C-rich sequence was highly dynamic and existed in an equilibrium between an i-motif and a flexible hairpin. Using small molecules as probes that shifted the equilibrium toward i-motif (IMC-48) or hairpin (IMC-76), they were able to observe that stabilization of the i-motif structure resulted in an increase in gene expression and stabilization of the hairpin decreased BCL2 gene expression. The companion papers also revealed a hypothesis that the protein hnRNP LL targets the loops of the i-motif to act as an initiating site for transcription. The protein unfolds the i-motif structure into a single strand of DNA, and the ligand probes were shown to have antagonistic effects on the formation of the hnRNP LL–i-motif complex. This was the first example where i-motifs were implicated to be a molecular switch that may control gene expression and that this could effectively be targeted with small molecule ligands. Their careful work enabled support of a hypothesis that loops in i-motif structures may be important regions for protein recognition and that this may enable specific targeting of individual gene promoters, something the G-quadruplex field has found challenging. This work also provided a detailed examination into protein–i-motif DNA interactions. Even since this, very little is understood about the types of proteins that interact with i-motif. Early work showed that some nuclear proteins (hnRNP K protein and ASF/SF2 splicing factor) could bind C-rich DNA, but it was not initially understood whether they bind the i-motif or the single strand (Lacroix et al. 2000). Since then, only hnRNPLL (Kang et al. 2014) and hnRNP K (Shu et al. 2019) have been indicated to bind i-motif-forming DNA sequences from humans and BmILF in silk worms (Niu et al. 2017). The reason why there is little work in this area results from the need to stabilize most i-motif structures previously described in the literature using acidic pH and the resulting complications with studying proteins under physiological conditions. With the accumulation of examples of i-motifs that fold at neutral pH, this is likely to change moving forwards and enables exploration of the effects of modulating DNA structure on transcription (Fig. 5).

i-Motifs in DNA Replication In 2017, Sugimoto and co-workers described the topological impact of i-motif DNA structures on DNA polymerase. They investigated replication of DNA sequences capable of forming i-motif structures from the human telomere, the promoter region of Hif-1α, and a sequence repeat from the insulin-linked polymorphic region. The

152

Z. A. E. Waller

Fig. 5 Schematic overview of the hypothesis of how DNA structures influence transcription. Left: Assumed normal level of gene expression. Right: Stabilization of DNA secondary structures such as G-quadruplexes and i-motifs gives rise to changes in DNA structure, affects the action of the transcriptional machinery, and results in alterations in transcription and gene expression

formation of i-motif resulted in inhibition of replication. When compared alongside G-quadruplex structures, it was shown that the i-motif was a better inhibitor of replication than mixed-type G-quadruplexes or hairpin structures, even though all had similar thermodynamic stabilities. These results gave very clear indication that both the stability and the potential topology of DNA structures within potential DNA templates can affect DNA polymerase progression. This suggested that i-motif formation may trigger genomic instability by stalling the replication of DNA and in turn may play a role in disease.

i-Motifs in Cells Papers showing key biological roles of i-motif in telomere function, transcription, and replication were a step toward showing that i-motifs were biologically relevant, and it was worth learning more about their functions. More direct evidence toward the existence of i-motif in cells came in 2018 in the form of in cell NMR (Dzatko et al. 2018) and visualization with an i-motif-specific antibody (Zeraati et al. 2018). Trantriek and their team provided important support for the evidence of i-motif formation in cells but using state-of-the-art in cell NMR spectroscopy. They evaluated the stability of DNA i-motifs in the nuclei of living mammalian cells using naturally occurring sequences from the human genome and showed that they were not only stable, but persisted in the nuclei of living cells. They revealed that i-motifforming sequences from the promoter regions of DAP and JAZF1 were actually more stable in cellulo than they were in vitro, providing support to earlier work by Sugimoto’s group indicating that the conditions present in the intracellular space result in molecular crowding, which then increases the pH and thermal stability of i-motifs. Christ and co-workers generated and characterized an antibody fragment, named iMab, that selectively recognizes i-motif structures. They used this as a tool to visualize i-motifs in the nuclei of human cells (Zeraati et al. 2018). This work

5

i-Motif Nucleic Acids

153

indicated that in vivo formation of i-motif is both cell-cycle and pH dependent. Their work also indicated that RNA i-motifs may form in cells, contrary to previous studies that they are highly unstable and unlikely to form at all in vivo (Lacroix et al. 1996). The highest level of i-motif formation was found to occur in the late G1 phase, which is characterized by high levels of transcription and cellular growth. This is consistent with earlier findings that i-motifs may have a regulatory role in transcription, complementing Hurley’s earlier work on BLC2 (Kang et al. 2014; Kendrick et al. 2014). Remarkably, i-motif formation during the cell cycle was found to be different compared to G-quadruplex formation, which occur mostly during the S phase. This corroborates earlier work that suggested that i-motifs and G-quadruplexes are mutually exclusive, i.e., they do not fold at the same time opposite each other in genomic DNA (Dhakal et al. 2012; Cui et al. 2016). Also prior to 2018, the amount of evidence suggesting that i-motifs were biologically relevant was substantial. The in cell work and the visualization of i-motifs using iMab together provide more proof of their existence in cells. This then completes the picture to refute any suggestion that i-motifs are not physiologically relevant. i-Motif structures exist in cells, are stable under physiological conditions, have native protein binding partners, and can effect biological changes relevant to disease development and treatment. In addition to the evidence of i-motifs and experiments performed in human cells, there have been an increasing number of examples of biologically relevant i-motifs in other organisms and viruses. For example, in yeast, it has been shown that GC-rich motifs associated with meiosis-specific double-strand breaks are able to fold into intramolecular G-quadruplex and i-motif structures, both in vitro and in vivo. This revealed important relationships between non-B-form DNA structures and Hop1 in meiotic chromosome synapsis and recombination (Kshirsagar et al. 2017). Sara Richter and team, interested in viral genomes, discovered that a dynamic i-motif with a duplex stem loop in the long terminal repeat promoter of the HIV-1 proviral genome modulates viral transcription (Ruggiero et al. 2019). They found that hnRNP K silencing resulted in an increased long terminal repeat (LTR) promoter activity, confirming the ability of the protein to stabilize the i-motif-forming sequence, which in turn regulates the LTR-mediated HIV-1 transcription. These findings demonstrated the complexity of the HIV-1 virus but also provided a foundation from which antivirals could be designed, based targeting the HIV-1 LTR i-motif. During the global COVID-19 pandemic, many computational studies on the SARS-CoV-2 genome were performed, including one to determine the potential G-quadruplexes and i-motifs in the RNA sequence. Extensive bioinformatic analysis of the SARS-CoV-2 genome and related viruses was performed using an upgraded version of the open-source algorithm G4-iM Grinder. This revealed opportunities to target the G-quadruplex structure, but the putative i-motif-forming sequence investigated in the study did not form into an i-motif structure. This is unsurprising, given the inherent instability of RNA i-motifs (Belmonte-Reche et al. 2021). Genome-wide characterization of i-motifs and their potential roles in the stability and evolution of transposable elements has also been performed in plants using rice (Ma et al. 2022). This work revealed that potential i-motif-forming

154

Z. A. E. Waller

sequences potentially have intrinsic subgenomic distributions, cis-regulatory functions, and an intricate relationship with DNA methylation. Such unique insights into the biology of i-motifs in plants could well be exploited in plant biotechnology for improving rice crops. There have also been a number of papers that have used the domestic silk moth, Bombyx mori, as a model to study i-motifs in insects. The protein BmILF and i-motif structure are involved in transcriptional regulation of the transcription factor gene BmPOUM2 in Bombyx mori (Niu et al. 2017). In vivo visualization of i-motif DNA secondary structure has been performed in Bombyx mori testis as well as large-scale screening of i-motif-binding compounds. The latter study provided a list of compounds that have potential applications in functional analysis of i-motif structure and in pesticide and drug development through gene transcription regulation by i-motif structure. Although there is a significant amount of literature covering i-motifs from the human genome. There are great opportunities in other organisms, as the work outside humans is scant.

i-Motifs and G-Quadruplexes With each i-motif-forming sequence, there will always be an opposing G-quadruplex-forming sequence (Brooks et al. 2010). Although the general type of sequence requirements are similar, the stability of each type of structure is different depending on the underlying sequence of bases. G-quadruplex-forming sequences tend to be more stable when the loops are shorter, forming parallel structures, i-motif-forming sequences need longer loops as they are always antiparallel. This means that a stable G-quadruplex does not necessarily mean a stable i-motif and vice versa. Careful biophysical work by Mao and co-workers has indicated that the two structures are mutually exclusive; i.e., you may only have one structure present at a time (Dhakal et al. 2012) unless they are offset in a repetitive sequence. They suggested that this may perhaps be due to steric hindrance but would potentially have important, distinct biological consequences. Cell-cyclespecific experiments with iMab have since shown that i-motifs form at a different point in the cell cycle (G1 phase) compared to G-quadruplex (S phase) (Zeraati et al. 2018). This has also been probed in cells using the i-motif-specific antibody iMab and small molecules that stabilize G-quadruplex and i-motifs (King et al. 2020). Smith and co-workers showed that stabilization of G-quadruplexes using small molecules results in fewer visualized i-motifs in cells using iMab (King et al. 2020). The reverse was also observed with stabilization of i-motifs resulting in fewer visualized G-quadruplexes in cells using the G-quadruplex antibody BG4. This work has important implications as previously G-quadruplexes and i-motifs have been targeted separately, but this work indicates that they should be considered together as important gene regulatory switches. Recently, Smith and co-workers have shown that i-motif formation may result in spontaneous deletions in human cells (Martella et al. 2022). They studied concatemers of d(TCCC) that were first detected through their association with

5

i-Motif Nucleic Acids

155

deletions at the RACK7 locus but are also widespread throughout the human genome. They compared both the G-rich and C-rich sequences, capable of forming G-quadruplexes and i-motifs. Consequently, there was a correlation between deletion frequency and d(TCCC)n folding at neutral pH, providing strong evidence that i-motif structures are linked to deletions at d(TCCC)n elements in the human genome. Although G-quadruplex formation may contribute to spontaneous mutation at the studied sites, deletions actually require the potential for i-motif to form and remain unresolved at neutral pH.

i-Motif Ligands and Probes i-Motifs are a desirable drug design target as they are prevalent regulatory regions of genes and are linked with the development of different types of diseases from cancer to diabetes. While i-motifs are known to be diverse in the structures, they can adopt and are highly responsive to conditions and environmental triggers. Overviews of imotif-binding ligands have previously been reviewed (Day et al. 2014; Sedghi Masoud and Nagasawa 2018). Despite previous advances, specific i-motif-stabilizing compounds are few, and very little is understood about the binding properties and modes of i-motif ligands. This, in part, is because their structure is understudied, due to the previous assumption that it was not physiologically relevant, but also because there is not enough structural information known about intramolecular i-motifs and few structures are known only from which structure-based drug design can be performed. In terms of sites for ligand binding, the loops are unique to each individual i-motif, enabling us to target a particular i-motif specifically (Kang et al. 2014; Spence et al. 2020), but as loop sequences can also be present in other DNA structures such as G-quadruplexes or hairpins, this may not be enough to ensure specificity to the i-motif structure itself. The top/bottom faces of the stack of CC+ base pairs are relatively small, and molecules that interact in this region will be reliant on stacking interactions, which will result in similar binding interactions with double-helical and G-quadruplex DNA. Traditional small molecule probes have so far been unsuccessful at effective specific binding to i-motifs generically. It is possible to generically stabilize i-motif DNA with ellipticine (King et al. 2020), but this molecule does still bind to other DNA structures, even if it does not stabilize them. The narrow grooves are likely to be too narrow for ligands to interact with and stabilize i-motifs. Even the OH groups in RNA i-motifs destabilize i-motifs because of the close juxtaposition of sugar phosphates in this structure (Lacroix et al. 1996). The wide grooves are shallow and could be compared to targeting a protein–protein interaction surface, with all the challenges that comes with targeting a flat “featureless” surface. All of this has made targeting i-motif challenging, especially in the early years when acidic pH was required for the formation of i-motif in in vitro experiments. Here, we discuss some examples of ligands to target i-motif.

156

Z. A. E. Waller

TMPyP4 and Macrocycles The first compound identified to interact with i-motif was TMPyP4 (Fig. 6) (Fedoroff et al. 2000). This classical nucleic acid-interacting ligand has been demonstrated to target many different types of structures. Contrary to the intuitive hypothesis that TMPyP4 has preference for G-quadruplex DNA because of the good overlap with the size of a G-tetrad, its preferred binding modes are to unpaired bases and loops. Initial studies indicate that TMPyP4 can induce i-motif formation; however, the responsiveness and exact interaction between TMPyP4 depend on the sequences and specific conditions used. TMPyP4 was found to increase the stability of the i-motif from the promoter region of cMYC and was found to block the binding of the transcription factor HNRNPK (Bialis et al. 2007). However, subsequent studies have shown that it can either destabilize or stabilize i-motif structure (Pagano et al. 2018; Abdelhamid et al. 2019), depending on the sequence and pH conditions. Feng and co-workers used TMPyP4 as an i-motif inhibitor probe in BmPOUM2, where it was used to downregulate transcription in the silk worm Bombyx mori (Niu et al. 2017; Yu et al. 2022). TMPyP4 clearly has utility as a probe, but caution should be used, and biophysical characterization of binding against the target of interest should be performed to determine the effects for that particular system. Inspired by telomestatin, Nagasawa and co-workers’ group have previously synthesized macrocycles to mimic the surface area coverage of the G-tetrad in G-quadruplexes. From this, they designed and synthesized a smaller macrocyclic tetraoxazole compound, L2H2-4OTD, with two aminoalkyl side chains (Sedghi Masoud et al. 2018). They studied the interaction of this macrocycle with the imotif-forming sequence from the human telomere using electrophoretic mobility shift assay, circular dichroism spectroscopy, mass spectrometry, and NMR spectroscopy analyses. They proposed that the macrocycle interacts with a 2:1 stoichiometry, with two molecules bound to the i-motif structure. NMR spectroscopy analysis indicated that the preferred binding site comprised of loops 1 and 3 and the surrounding environments. Given TMPyP4’s previous binding preferences, it is likely that the binding mode for this compound is within a similar region.

Fig. 6 Structure of the porphyrin TMPyP4 (right) and tetraoxazole compound, L2H2-4OTD

5

i-Motif Nucleic Acids

157

Carboxyl-Modified Nanotubes and Quantum Dots Carboxyl-modified single-walled carbon nanotubes (SWNTs) were the first example of an i-motif selective stabilizing agent (Fig. 7). SWNTs were shown to be able to stabilize i-motif structure at pH 5.5 with a change in melting temperature of +22  C. They were also found to be able to induce the formation of i-motif in the C-rich sequence from the human telomere and inhibit the formation of duplexes at pH 8.0. Carboxyl-modified multi-walled carbon nanochannels, larger analogues of SWNTs, do not bind i-motif DNA in the same manner and have no impact on the stability of i-motifs, confirming initial hypotheses that these target the major groove of the i-motif structure. SWNTs are i-motif selective, and they were shown to not bind either duplex or G-quadruplex structures and were shown to be able to inhibit telomerase (see also section “i-Motifs in the Telomeres”), showing that even a large i-motif-interacting agent may have utility as a probe. Carboxyl-modified graphene quantum dots have also been shown to intercalate with DNA and were used to study i-motif-forming sequences from the human telomere and the promoter region of cMYC. At pH 8, when the i-motif-forming sequence is in the unfolded state, addition of GQDs was observed to promote the folding of i-motif structure, followed by CD spectroscopy, and confirmed by NMR spectroscopy, where the characteristic i-motif proton signals between 15 and 16 ppm appear upon addition of GQD. Further work indicated that the GQDs target the loop regions of the i-motif in an end-stacking mode. Both SWNTs and GQDs are larger than traditional small molecules but offer the opportunity to strongly stabilize i-motifs. Both of these agents are carboxyl modified, suggesting that these may alter the local pH on binding the DNA, which is why they work even at above physiological pH.

Fig. 7 Carboxyl-modified single-walled carbon nanotubes

158

Z. A. E. Waller

Small Molecules Similar to how the first small molecules that were found to target G-quadruplex were inspired by ligands that bound double-helical DNA, the first i-motif-interacting compounds arose as from the study of ligands interacting with G-quadruplex DNA. Initial i-motif-binding compounds were found by studying DNA binding ligands against G-quadruplex DNA and using i-motif-forming sequences as a control alongside duplex DNA. Examples of phenanthroline-based ligands were studied alongside G-quadruplex structures and were shown to stabilize the i-motifforming sequence from the human telomere. The compounds were found to change the melting temperature of the i-motif structure by 7–10  C at pH 5.5 at concentrations of up to 20 μM, and the measured dissociation constants were between 4 and 8 μM. Although the compounds have a weaker affinity for double-helical DNA, they were found to bind to G-quadruplexes with higher affinity. Indeed, there are many studies of the interaction of small molecules with both G-quadruplexes and i-motifs, and much of the early compounds described to bind to i-motifs were actually G-quadruplex ligands rather than i-motif ligands (Day et al. 2014). In 2014, Hurley, Hecht, and co-workers described targeting the i-motif-forming sequence from the promoter region of BCL2 (Kang et al. 2014; Kendrick et al. 2014). Using a FRET-based assay, the team screened a library of compounds, and they identified leads: IMC-48 as an i-motif stabilizer (Kd ¼ 0.49 μM) and IMC-76 as an i-motif destabilizing ligand (IMC-76 Kd ¼ 1.01 μM); see Fig. 8 (Choi and Majima 2011).H NMR showed that the C-rich sequence from BCL2 exists in an equilibrium between i-motif and hairpin, and the pair of ligands was shown to shift the equilibrium to the i-motif (IMC-48) or the hairpin (IMC-76). These ligands were hypothesized to bind within the central loop region of the i-motif, perhaps forming a capping structure within the central loop. Hurley and co-workers tested this hypothesis by creating mutant loop sequences. The study of the effects of the compounds in cells indicated that the i-motifstabilizing compound resulted in an increase in BCL2 gene expression, whereas the

Fig. 8 Hypothesis of ligands IMC-48 and IMC-76 targeting the C-rich region in the promoter of BCL2. The C-rich region is indicated to exist as an equilibrium between hairpin and i-motif. Stabilization of i-motif enables binding of hnRNP LL and gives rise to transcriptional activation. Stabilization of the flexible hairpin results in transcriptional repression

5

i-Motif Nucleic Acids

159

hairpin stabilizing compound gave rise to a decrease in gene expression. This was the first time where targeting i-motifs was studied in detail and provided an example that targeting i-motifs may have a different biological effect to targeting G-quadruplexes. Further study with the protein hnRNP LL showed that IMC-48, which stabilizes the i-motif, enabled an increase in protein complex formation. The current hypothesis is that IMC-48 stabilizes the i-motif, and this provides a scaffold from which the loops are available for to hnRNP LL to bind, which in turn activates transcription by unfolding the i-motif to the single strand; i.e., the interaction of either IMC-48 or IMC-76 with BCL2 modulates the amount of i-motif available for hnRNP LL to bind, and this controls the amount of transcription at this gene. These compounds, as they targeted the loop regions, are relatively specific to the BCL2 gene and are not generic i-motif-binding compounds. They revealed a great opportunity for targeting C-rich DNA as well as G-rich promoter sequences. Further studies followed this first example, including examples from the same team targeting PDGFR and KRAS. Other research teams have also examined the effects of small molecule ligands on promoter activity including peptidomimetics also for BCL2 (Debnath et al. 2017) and acridones (Shu et al. 2018) for targeting cMYC.

Fluorescent Probes There are a number of ligands for i-motif that may be used as fluorescent probes. Crystal violet and berberine have been shown to bind i-motif DNA (Day et al. 2014; Pagano et al. 2018; Abdelhamid et al. 2019) and have been used in applications in DNA-based logic gates and other light-up effects. We used thiazole orange to develop a displacement-based assay (Sheng et al. 2017), and there are other fluorescent probes to target i-motif DNA in a nonspecific manner. Recent work from my group reported for the first time the interactions among the three isomers of a ruthenium complex, [Ru(bqp)2]2+ (Fig. 9), with i-motif, G-quadruplex, and double-stranded DNA. Each isomer was found to have vastly different lightswitching properties: mer was found to be fluorescent, trans was nonfluorescent, and the cis isomer was found to switch from “off” to “on” in the presence of all types of DNA. Using emission lifetime measurements, we explored the potential of cis to “light up” and identify i-motifs, even when other DNA structures are present using a sequence from the promoter region of the death-associated protein (DAP). Moreover, separated cis enantiomers revealed Λ-cis to have a preference for the i-motif, whereas Δ-cis has a preference for double-helical DNA. The light-switching mechanism is hypothesized to originate from steric compression and electronic effects in a tight binding site, as opposed to solvent exclusion previously reported for other ruthenium complexes binding DNA. Our work suggests that many published non-emissive Ru complexes could potentially “switch on” in the presence of biological targets, so in addition to the isomers described, there may be many more potential i-motif “light-up” complexes to discover.

160

Z. A. E. Waller

Fig. 9 Structures of the different analogues of [Ru(bqp)2]2+

Synergistic Ligands for i-Motifs and G-Quadruplexes Mitoxantrone, a clinically used drug and compound whose analogues have been found to bind G-quadruplex DNA, was found to bind i-motif-forming sequences with a higher affinity compared to double-helical DNA (Wright et al. 2016). It was also shown using FRET melting that mitoxantrone could be used to induce folding of i-motif structures at neutral pH, and mitoxantrone could stabilize i-motif-forming DNA sequences. However, similar to TMPyP4, mitoxantrone has also displayed differing properties depending on the sequence and conditions used, as a further study showed using UV and CD melting that mitoxantrone in fact destabilized i-motif structures (Pagano et al. 2018; Abdelhamid et al. 2019). This was a property that was found alongside other widely used G-quadruplex ligands such as berberine, BRACO-19, Phen-DC3, pyridostatin, RHPS4, and TMPyP4 (Fig. 10). Each of these G-quadruplex-specific ligands was actually found to also interact with i-motif structures, generally leading to their destabilization.

5

i-Motif Nucleic Acids

161

Fig. 10 Structures of ligands that have been studied for their synergistic effects between G-quadruplexes and i-motifs

Crucially, these results have implications both for the search for G-quadruplex binding compounds as well as for the effects of compounds reported to have G-quadruplex specificity without examining their effects on i-motif. Furthermore, it has been indicated that, in addition to synergistic ligands, i-motif and G-quadruplex structure formation are interdependent in human cells. Smith and co-workers showed that stabilization of G-quadruplexes using small molecules destabilizes i-motifs (King et al. 2020). In the absence of generic i-motif-stabilizing agents, the team used ellipticine, an DNA-intercalating alkaloid. Using biophysical experiments, they showed that although ellipticine is already known to bind G-quadruplex and doublehelical DNA, it did not strongly stabilize either of these structures but was found to

162

Z. A. E. Waller

stabilize i-motif DNA by FRET melting and CD titrations and CD melting experiments. The team used ellipticine as an i-motif-stabilizing agent, and treatment of MCF7 cells with ellipticine at 10 μM resulted in a statistically significant increase in the number of nuclear i-motif foci, when visualized with the i-motif-specific antibody iMab. This indicated the first example of ligand-induced stabilization of i-motifs within the cellular environment by ellipticine. The ligands in Fig. 10 show that there are no perfectly specific ligands, and it is important to consider all potential effects of ligand-DNA interactions and their biological consequences when developing and testing hypotheses. It also demonstrates that ligands also do not necessarily need to be perfectly specific to be useful tools in the study of DNA structure and function. However, there is a need for more general imotif-stabilizing compounds (i.e., ligands that can stabilize multiple types of i-motifs) as well as i-motif ligands that have higher binding affinities. Thus, there is a lot of scope for discovery and development of new and better i-motif-binding ligands in the future.

Conclusion The interest and research into understanding i-motif structures is ever-increasing, but several significant gaps remain in the field that need more attention. As described earlier in this chapter, a better structural understanding of i-motif formation is important. Intramolecular crystal structures are so far unknown, and when revealed, they will enable rational structure-based drug discovery. This, in turn, would then facilitate and underpin better ligand design for i-motifs and, thus, any downstream applications of interacting agents in nanotechnology or healthcare. The choice of i-motif ligands that are commercially available is limited; improvements in this area would enable researchers to study i-motifs in different ways and open up the field in general. Another area that is ripe for exploration is that of i-motif-binding proteins. There are very few proteins that have been shown to interact with i-motifs; although there is evidence that i-motifs are widespread and likely biologically important, there are no supporting studies on protein–i-motif DNA interactions. Given that there are now plenty of examples of i-motifs that form at neutral pH, there are no longer as many challenges with studying i-motifs at an appropriate pH. With the extensive range of researchers interested in G-quadruplexes, there is much more to do with the complementary strand. There are plenty of evidence now showing not only that i-motifs can be targeted and may play roles in biological functions, but they may actually be part of a larger switching complex, in combination with G-quadruplexes. i-Motifs have been shown to be much more than an acidic switch and may play roles in the modulation of gene expression, the ageing process, and the development of disease. With the potential applications of both using and targeting these structures in disease and beyond and many topics surrounding i-motifs that remain unexplored, there are significant opportunities and discoveries still to be made in studying i-motif nucleic acids. Acknowledgments Dr. Waller’s research is currently funded by Diabetes UK (18/0005820) and the BBSRC (BB/S008942/1 and BB/W001616/1).

5

i-Motif Nucleic Acids

163

References Abdelhamid MAS, Waller ZAE (2020) Tricky topology: persistence of folded human Telomeric i-Motif DNA at ambient temperature and neutral pH. Front Chem 8:40 Abdelhamid MA, Fabian L, MacDonald CJ, Cheesman MR, Gates AJ, Waller ZA (2018) Redoxdependent control of i-Motif DNA structure using copper cations. Nucleic Acids Res 46(12): 5886–5893 Abdelhamid MAS, Gates AJ, Waller ZAE (2019) Destabilization of i-Motif DNA at neutral pH by G-Quadruplex ligands. Biochemistry 58(4):245–249 Abou Assi H, Garavís M, González C, Damha MJ (2018) i-Motif DNA: structural features and significance to cell biology. Nucleic Acids Res 46(16):8038–8056 Ahmed S, Kintanar A, Henderson E (1994) Human telomeric C-strand tetraplexes. Nat Struct Biol 1(2):83–88 Assi HA, Harkness RWV, Martin-Pintado N, Wilds CJ, Campos-Olivas R, Mittermaier AK, González C, Damha MJ (2016) Stabilization of i-Motif structures by 20 -β-fluorination of DNA. Nucleic Acids Res 44(11):4998–5009 Belmonte-Reche E, Serrano-Chacón I, Gonzalez C, Gallo J, Bañobre-López M (2021) Potential G-quadruplexes and i-Motifs in the SARS-CoV-2. PLoS One 16(6):e0250654 Benabou S, Ferreira R, Aviñó A, González C, Lyonnais S, Solà M, Eritja R, Jaumot J, Gargallo R (2014) Solution equilibria of cytosine- and guanine-rich sequences near the promoter region of the n-myc gene that contain stable hairpins within lateral loops. Biochim Biophys Acta 1840(1): 41–52 Bhattacharyya T, Chaudhuri R, Das KS, Mondal R, Mandal S, Dash J (2019) Cytidine-derived hydrogels with tunable antibacterial activities. ACS Appl Bio Mater 2(8):3171–3177 Bialis T, Dexheimer T, Gleason-Guzman M, Yang D, Hurley L (2007) Transcriptional consequences of targeting the i-Motif structure of the c-Myc promoter with TMPyP4. Cancer Res 67(9_Supplement):3169–3169 Brazier JA, Shah A, Brown GD (2012) i-Motif formation in gene promoters: unusually stable formation in sequences complementary to known G-quadruplexes. Chem Commun 48(87): 10739–10741 Brooks TA, Kendrick S, Hurley L (2010) Making sense of G-quadruplex and i-Motif functions in oncogene promoters. FEBS J 277(17):3459–3469 Chen Y, Qu K, Zhao C, Wu L, Ren J, Wang J, Qu X (2012) Insights into the biomedical effects of carboxylated single-wall carbon nanotubes on telomerase and telomeres. Nat Commun 3:1074 Choi J, Majima T (2011) Conformational changes of non-B DNA. Chem Soc Rev 40(12): 5893–5909 Cui Y, Kong D, Ghimire C, Xu C, Mao H (2016) Mutually exclusive formation of G-Quadruplex and i-Motif is a general phenomenon governed by steric hindrance in duplex DNA. Biochemistry 55(15):2291–2299 Day HA, Huguin C, Waller ZAE (2013) Silver cations fold i-Motif at neutral pH. Chem Commun (Camb) 49(70):7696–7698 Day HA, Pavlou P, Waller ZAE (2014) i-Motif DNA: structure, stability and targeting with ligands. Bioorg Med Chem 22(16):4407–4418 Day HA, Wright EP, MacDonald CJ, Gates AJ, Waller ZAE (2015) Reversible DNA i-Motif to hairpin switching induced by copper(ii) cations. Chem Commun 51(74):14099–14102 Debnath M, Ghosh S, Chauhan A, Paul R, Bhattacharyya K, Dash J (2017) Preferential targeting of i-Motifs and G-quadruplexes by small molecules. Chem Sci 8(11):7448–7456 Debnath M, Fatma K, Dash J (2019) Chemical regulation of DNA i-Motifs for Nanobiotechnology and therapeutics. Angew Chem Int Ed Engl 58(10):2942–2957 Dhakal S, Yu Z, Konik R, Cui Y, Koirala D, Mao H (2012) G-quadruplex and i-Motif are mutually exclusive in ILPR double-stranded DNA. Biophys J 102(11):2575–2584 Dong Y, Yang Z, Liu D (2014) DNA nanotechnology based on i-Motif structures. Acc Chem Res 47(6):1853–1860

164

Z. A. E. Waller

Dzatko S, Krafcikova M, Hansel-Hertsch R, Fessl T, Fiala R, Loja T, Krafcik D, Mergny JL, Foldynova-Trantirkova S, Trantirek L (2018) Evaluation of the stability of DNA i-Motifs in the nuclei of living mammalian cells. Angew Chem Int Ed Engl 57(8):2165–2169 Escaja N, Viladoms J, Garavís M, Villasante A, Pedroso E, González C (2012) A minimal i-Motif stabilized by minor groove G:T:G:T tetrads. Nucleic Acids Res 40(22):11737–11747 Fedoroff OY, Rangan A, Chemeris VV, Hurley LH (2000) Cationic porphyrins promote the formation of i-Motif DNA and bind peripherally by a nonintercalative mechanism. Biochemistry 39(49):15083–15090 Fenna CP, Wilkinson VJ, Arnold JRP, Cosstick R, Fisher J (2008) The effect of 20 -fluorine substitutions on DNA i-Motif conformation and stability. Chem Commun 30:3567–3569 Fleming AM, Ding Y, Rogers RA, Zhu J, Zhu J, Burton AD, Carlisle CB, Burrows CJ (2017) 4n–1 is a “sweet spot” in DNA i-Motif folding of 20 -deoxycytidine Homopolymers. J Am Chem Soc 139(13):4682–4689 Fleming AM, Stewart KM, Eyring GM, Ball TE, Burrows CJ (2018) Unraveling the 4n  1 rule for DNA i-Motif stability: base pairs vs. loop lengths. Org Biomol Chem 16(24):4537–4546 Gao J, Berden G, Rodgers MT, Oomens J (2016) Interaction of Cu+ with cytosine and formation of i-Motif-like C–M+–C complexes: alkali versus coinage metals. Phys Chem Chem Phys 18(10): 7269–7277 Gehring K, Leroy J-L, Guéron M (1993) A tetrameric DNA structure with protonated cytosinecytosine base pairs. Nature 363(6429):561–565 Gilbert DE, Feigon J (1999) Multistranded DNA structures. Curr Opin Struct Biol 9(3):305–314 Gurung SP, Schwarz C, Hall JP, Cardin CJ, Brazier JA (2015) The importance of loop length on the stability of i-Motif structures. Chem Commun 51(26):5630–5632 Hur JH, Kang CY, Lee S, Parveen N, Yu J, Shamim A, Yoo W, Ghosh A, Bae S, Park C-J, Kim KK (2021) AC-motif: a DNA motif containing adenine and cytosine repeat plays a role in gene regulation. Nucleic Acids Res 49(17):10150–10165 Kang HJ, Kendrick S, Hecht SM, Hurley LH (2014) The transcriptional complex between the BCL2 i-Motif and hnRNP LL is a molecular switch for control of gene expression that can be modulated by small molecules. J Am Chem Soc 136(11):4172–4185 Kendrick S, Kang HJ, Alam MP, Madathil MM, Agrawal P, Gokhale V, Yang D, Hecht SM, Hurley LH (2014) The dynamic character of the BCL2 promoter i-Motif provides a mechanism for modulation of gene expression by compounds that bind selectively to the alternative DNA hairpin structure. J Am Chem Soc 136(11):4161–4171 Kim SE, Lee I-B, Hyeon C, Hong S-C (2014) Destabilization of i-Motif by submolar concentrations of a monovalent cation. J Phys Chem B 118(18):4753–4760 King JJ, Irving KL, Evans CW, Chikhale RV, Becker R, Morris CJ, Peña Martinez CD, Schofield P, Christ D, Hurley LH, Waller ZAE, Iyer KS, Smith NM (2020) DNA G-Quadruplex and i-Motif structure formation is interdependent in human cells. J Am Chem Soc 142(49):20600–20604 Krishnan-Ghosh Y, Stephens E, Balasubramanian S (2005) PNA forms an i-Motif. Chem Commun 42:5278–5280 Kshirsagar R, Khan K, Joshi MV, Hosur RV, Muniyappa K (2017) Probing the potential role of non-B DNA structures at yeast meiosis-specific DNA double-Strand breaks. Biophys J 112(10): 2056–2074 Lacroix L, Mergny JL, Leroy JL, Helene C (1996) Inability of RNA to form the i-Motif: implications for triplex formation. Biochemistry 35(26):8715–8722 Lacroix L, Liénard H, Labourier E, Djavaheri-Mergny M, Lacoste J, Leffers H, Tazi J, Hélène C, Mergny JL (2000) Identification of two human nuclear proteins that recognise the cytosine-rich strand of human telomeres in vitro. Nucleic Acids Res 28(7):1564–1575 Leroy JL, Guéron M, Mergny JL, Hélène C (1994) Intramolecular folding of a fragment of the cytosine-rich strand of telomeric DNA into an i-Motif. Nucleic Acids Res 22(9):1600–1606 Lieblein AL, Krämer M, Dreuw A, Fürtig B, Schwalbe H (2012a) The nature of hydrogen bonds in cytidineH+cytidine DNA Base Pairs. Angew Chem Int Ed 51(17):4067–4070

5

i-Motif Nucleic Acids

165

Lieblein AL, Buck J, Schlepckow K, Fürtig B, Schwalbe H (2012b) Time-resolved NMR spectroscopic studies of DNA i-Motif folding reveal kinetic partitioning. Angew Chem Int Ed 51(1): 250–253 Liu D, Balasubramanian S (2003) A proton-fuelled DNA nanomachine. Angew Chem Int Ed Engl 42(46):5734–5736 Ma X, Feng Y, Yang Y, Li X, Shi Y, Tao S, Cheng X, Huang J, Wang XE, Chen C, Monchaud D, Zhang W (2022) Genome-wide characterization of i-Motifs and their potential roles in the stability and evolution of transposable elements in rice. Nucleic Acids Res 50(6):3226–3238 Manzini G, Yathindra N, Xodo LE (1994) Evidence for intramolecularly folded i-DNA structures in biologically relevant CCC-repeat sequences. Nucleic Acids Res 22(22):4634–4640 Marsh RE, Bierstedt R, Eichhorn EL (1962) The crystal structure of cytosine-5-acetic acid. Acta Crystallogr 15(4):310–316 Martella M, Pichiorri F, Chikhale RV, Abdelhamid MAS, Waller ZAE, Smith SS (2022) i-Motif formation and spontaneous deletions in human cells. Nucleic Acids Res 50(6):3445–3455 Mergny J-L, Lacroix L (1998) Kinetics and thermodynamics of i-DNA formation: phosphodiester versus modified oligodeoxynucleotides. Nucleic Acids Res 26(21):4797–4803 Mergny JL, Sen D (2019) DNA quadruple helices in nanotechnology. Chem Rev 119(10): 6290–6325 Mergny J-L, Lacroix L, Han X, Leroy J-L, Helene C (1995) Intramolecular folding of pyrimidine Oligodeoxynucleotides into an i-DNA motif. J Am Chem Soc 117(35):8887–8898 Niu K, Zhang X, Deng H, Wu F, Ren Y, Xiang H, Zheng S, Liu L, Huang L, Zeng B, Li S, Xia Q, Song Q, Palli SR, Feng Q (2017) BmILF and i-Motif structure are involved in transcriptional regulation of BmPOUM2 in Bombyx mori. Nucleic Acids Res 46(4):1710–1723 Pagano A, Iaccarino N, Abdelhamid MAS, Brancaccio D, Garzarella EU, Di Porzio A, Novellino E, Waller ZAE, Pagano B, Amato J, Randazzo A (2018) Common G-Quadruplex binding agents found to interact with i-Motif-forming DNA: unexpected multi-target-directed compounds. Front Chem 6:281 Phan AT, Guéron M, Leroy J-L (2000) The solution structure and internal motions of a fragment of the cytidine-rich strand of the human telomere 1 1 Edited by I. Tinoco. J Mol Biol 299(1): 123–144 Rajendran A, Nakano S, Sugimoto N (2010) Molecular crowding of the cosolutes induces an intramolecular i-Motif structure of triplet repeat DNA oligomers at neutral pH. Chem Commun 46(8):1299–1301 Reilly SM, Morgan RK, Brooks TA, Wadkins RM (2015) Effect of interior loop length on the thermal stability and pK(a) of i-Motif DNA. Biochemistry 54(6):1364–1370 Ruggiero E, Lago S, Šket P, Nadai M, Frasson I, Plavec J, Richter SN (2019) A dynamic i-Motif with a duplex stem-loop in the long terminal repeat promoter of the HIV-1 proviral genome modulates viral transcription. Nucleic Acids Res 47(21):11057–11068 Sedghi Masoud S, Nagasawa K (2018) i-Motif-binding ligands and their effects on the structure and biological functions of i-Motif. Chem Pharm Bull 66(12):1091–1103 Sedghi Masoud S, Yamaoki Y, Ma Y, Marchand A, Winnerdy FR, Gabelica V, Phan AT, Katahira M, Nagasawa K (2018) Analysis of interactions between Telomeric i-Motif DNA and a cyclic Tetraoxazole compound. Chembiochem 19(21):2268–2272 Sheng Q, Neaverson JC, Mahmoud T, Stevenson CEM, Matthews SE, Waller ZAE (2017) Identification of new DNA i-Motif binding ligands through a fluorescent intercalator displacement assay. Org Biomol Chem 15(27):5669–5673 Shi Y, Sun H, Xiang J, Chen H, Yang Q, Guan A, Li Q, Yu L, Tang Y (2014) Construction of DNA logic gates utilizing a H+/Ag+ induced i-Motif structure. Chem Commun (Camb) 50(97): 15385–15388 Shu B, Cao J, Kuang G, Qiu J, Zhang M, Zhang Y, Wang M, Li X, Kang S, Ou TM, Tan JH, Huang ZS, Li D (2018) Syntheses and evaluation of new acridone derivatives for selective binding of oncogene c-myc promoter i-Motifs in gene transcriptional regulation. Chem Commun (Camb) 54(16):2036–2039

166

Z. A. E. Waller

Shu B, Zeng P, Kang S, Li PH, Hu D, Kuang G, Cao J, Li X, Zhang M, An LK, Huang ZS, Li D (2019) Syntheses and evaluation of new Quinoline derivatives for inhibition of hnRNP K in regulating oncogene c-myc transcription. Bioorg Chem 85:1–17 Školáková P, Renčiuk D, Palacký J, Krafčík D, Dvořáková Z, Kejnovská I, Bednářová K, Vorlíčková M (2019) Systematic investigation of sequence requirements for DNA i-Motif formation. Nucleic Acids Res 47(5):2177–2189 Spence P, Fielden J, Waller ZAE (2020) Beyond solvent exclusion: i-Motif detecting capability and an alternative DNA light-switching mechanism in a Ruthenium(II) Polypyridyl complex. J Am Chem Soc 142(32):13856–13866 Sun D, Hurley LH (2009) The importance of negative superhelicity in inducing the formation of G-quadruplex and i-Motif structures in the c-Myc promoter: implications for drug targeting and control of gene expression. J Med Chem 52(9):2863–2874 Surana S, Bhat JM, Koushika SP, Krishnan Y (2011) An autonomous DNA nanomachine maps spatiotemporal pH changes in a multicellular living organism. Nat Commun 2:340 Wang Z-G, Elbaz J, Willner I (2011) DNA machines: bipedal walker and stepper. Nano Lett 11(1): 304–309 Weil J, Min T, Yang C, Wang S, Sutherland C, Sinha N, Kang C (1999) Stabilization of the i-Motif by intramolecular adenine-adenine-thymine base triple in the structure of d(ACCCT). Acta Crystallogr D Biol Crystallogr 55(Pt 2):422–429 Wright EP, Day HA, Ibrahim AM, Kumar J, Boswell LJE, Huguin C, Stevenson CEM, Pors K, Waller ZAE (2016) Mitoxantrone and analogues bind and stabilize i-Motif forming DNA sequences. Sci Rep 6(1):39456 Wright EP, Huppert JL, Waller ZAE (2017) Identification of multiple genomic DNA sequences which form i-Motif structures at neutral pH. Nucleic Acids Res 45(6):2951–2959 Xu C, Zhao C, Ren J, Qu X (2011) pH-controlled reversible drug binding and release using a cytosine-rich hairpin DNA. Chem Commun 47(28):8043–8045 Yu G, Niu K, Peng Y, Liu Z, Song Q, Feng Q (2022) Large-scale screening of i-Motif binding compounds in the silkworm, Bombyx mori. Biochem Biophys Res Commun 589:9–15 Zeraati M, Langley DB, Schofield P, Moye AL, Rouet R, Hughes WE, Bryan TM, Dinger ME, Christ D (2018) i-Motif DNA structures are formed in the nuclei of human cells. Nat Chem 10(6):631–637 Zhang Y, El Omari K, Duman R, Liu S, Haider S, Wagner A, Parkinson GN, Wei D (2020) Native de novo structural determinations of non-canonical nucleic acid motifs by X-ray crystallography at long wavelengths. Nucleic Acids Res 48(17):9886–9898 Zheng LL, Li JZ, Li YX, Gao JB, Dong JX, Gao ZF (2021) pH-responsive DNA motif: from rational design to analytical applications. Front Chem 9:732770 Zhou J, Wei C, Jia G, Wang X, Feng Z, Li C (2010) Formation of i-Motif structure at neutral and slightly alkaline pH. Mol BioSyst 6(3):580–586

Part II Structural Chemistry of Nucleic Acids

6

NMR Study on Nucleic Acids Janez Plavec

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elements of the Structural Buildup of Nucleic Acids and Their Conformational Landscape . . . Assessment of the Folding Topology by NMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Assessment of Multimeric State by Translational Diffusion Coefficients . . . . . . . . . . . . . . . . . . . . . Site-Specific Low-Isotopic Enrichment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nucleobase Substitutions with Nucleobase Analogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natural Abundance Heteronuclear Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resonance Assignment Through Sequential and Interstrand Interactions . . . . . . . . . . . . . . . . . . . . . Determination of 3D Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Labeling with Stable Heteronuclear 15N and 13C Isotopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NMR Structural Studies in Combination with Complementary Methods . . . . . . . . . . . . . . . . . . . . . Challenges in Structural Studies of Biologically Relevant DNA and RNA . . . . . . . . . . . . . . . . . . . Dynamic Processes in RNA and Corresponding NMR Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

170 173 181 184 184 185 186 187 195 201 204 205 206 208 209

Abstract

Nucleic acid structures and their interactions with cellular constituents continue to offer surprises despite decades of structural, biophysical, and biochemical studies. Knowledge of the structure and dynamics of nucleic acids is important not only for understanding biological mechanisms, but also for developing new therapeutics. NMR (Nuclear Magnetic Resonance) spectroscopy has been used for many years to determine the structure of nucleic acids as well as their dynamics and interactions with proteins, other nucleic acids, low molecular weight ligands, cations, and solvent molecules. Recent studies use nucleic acids to create new J. Plavec (*) Slovenian NMR Centre, National Institute of Chemistry, Ljubljana, Slovenia Faculty of Chemistry and Chemical Technology, University of Ljubljana, Ljubljana, Slovenia EN-FIST Center of Excellence, Ljubljana, Slovenia e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_8

169

170

J. Plavec

materials or focus on the interactions of small molecule ligands with large entities such as the ribosome, while novel in vivo methods enable probing of RNA structure and proteins that remodel nucleic acid structures, correct for chemical damage to DNA, modulate gene expression by binding to RNAs, etc. 1H NMR experiments allow determination of NOE effects and scalar coupling constants between nearby protons of nucleobases and sugar units. The low proton density of nucleic acids allows for rapid detection and identification of hydrogen bonds, which enable assessment of folding and provide constraints for defining base pair arrangements and assessing secondary structure. Sequential resonance assignment is typically followed by collection of structural restraints and structure determination at high resolution. Conventionally, the structure determination process is based on interpretation of magnetization transfer between protons through space mediated by carbon, nitrogen, and phosphorus atoms. Numerous advances including the introduction of ingenious pulse sequences and refined sample preparation strategies have enabled NMR structure determination of RNAs larger than 100-nt. These advances, coupled with improved workflows that incorporate hybrid methods of structure determination, have pushed the boundaries for studying larger, more complex, and biologically relevant systems into new dimensions. With multidimensional NMR experiments, we can measure the dynamics of constituent nuclei along the entire DNA and RNA structure and characterize functionally important motions that range from picoseconds to seconds and longer. Keywords

Nuclear Magnetic Resonance · Structure · Dynamics · Interactions · Folding

Introduction It is becoming increasingly clear in the broader scientific community that structural information with atomistic detail provides a different level of insight than nucleotide sequence alone. In the search for sequence-structure-function relationships, highresolution NMR spectroscopy has proven extremely useful. Of course, results related to equilibrium populations and dynamics between different polymorphic structures need to be complemented by other spectroscopic techniques and biophysical methods of structural characterization such as X-ray crystallography and cryoelectron microscopy. In particular, NMR spectroscopy provides dynamic information in solution, making it an excellent method to study DNA and RNA and their interactions with small molecule ligands and macromolecules with high-resolution structural insight. Since the determination of the B-form DNA duplex structure several decades ago, DNA has continued to surprise with new and polymorphic structures. Apart from its enormous biological impact, DNA is used as a material for the creation of novel nanodevices that rely on building complex higher-order shapes and nano-objects.

6

NMR Study on Nucleic Acids

171

It is the ability of multiple strands to assemble that allows the construction of complex higher-order DNA structures. Uncommon structures of DNA have been established recently, together with diverse interactions with its ligands. New G-quadruplex structures continue to emerge and provide insight into their biological roles in the context of telomere function and other regulatory roles. Complexes with G-quadruplex-binding small molecules are promising for the modulation of telomere capping and therefore protection against exonucleases. Other ligand complexes with DNA include structures of minor-groove binding molecules that recognize the minor groove in a sequence-specific way. Duplex DNA continues to exhibit polymorphism, and certain sequences have the propensity to form unusual base pairs, which may facilitate sequence-specific protein recognition. In addition, a number of structures have emerged of chemically modified DNA, which is important for understanding the effects of mutagenic lesions and how they are recognized by the DNA repair machinery. An increasing list of nonnatural DNA and RNA analogs includes both non-native backbone and nonstandard nucleobases. NMR spectroscopy is a unique method to screen for formation of nonstandard structural elements and, if quality of a sample allows, establish local details and stability as well as dynamic behavior of noncanonical helical-type structures with single-residue resolution. It is possible to unambiguously identify hydrogen-bonding schemes in G-C and A-T Watson–Crick base pairs and non-Watson–Crick base pairs that involve imino to nitrogen hydrogen bonds, such as reverse A-T Hoogsteen and head-to-head G-A base pairs. Traditionally, hydrogen bonding schemes were identified based on their characteristic nuclear Overhauser effect (NOE) patterns. NMR has proven itself as an essential tool in studies of nucleic acids. Motivations to study DNA and RNA oligonucleotides by NMR comprise simple resonance assignment for the study of interactions with ligands, determining topology from sequential assignment to full 3D structure determination. On the other hand, RNAs play essential roles in a diverse range of biological processes, not necessarily limited to their established capacity as the messengers of genetic information from DNA to protein with noncoding RNAs being recognized in a variety of processes including transcription and translation of the genetic code, epigenetic modulation, and RNA turnover/decay. Additionally, RNAs can act as ribozymes, riboswitches, viral genomes, and micro-RNAs. Importantly, the three-dimensional structure of DNA and RNA molecule is a crucial factor in specificity and function. In NMR spectroscopy, the chemical shift is the resonant frequency of an atomic nucleus relative to a standard in a given magnetic field of a spectrometer used. An atomic nucleus possessing a magnetic moment (nuclear spin) gives rise to different energy levels and resonance frequencies in a magnetic field. Importantly however, the total magnetic field experienced by a nucleus includes local magnetic fields induced by currents of electrons in the molecular orbitals (note that electrons have a magnetic moment themselves). The electron distribution of the same type of nucleus (e.g., 1H, 13C, 15N, and 31P) usually varies according to the local geometry (bond lengths, angles between bonds, orientation of substituent groups in a rotamer, binding partners, and so on), and with it the local magnetic field at each nucleus. This is reflected in the spin energy levels and resonance frequencies. The variations

172

J. Plavec

of NMR frequencies of the same kind of nucleus, due to variations in the electron distribution, are called the chemical shift. Often the position and number of signals with distinct chemical shifts are diagnostic of the structure of a molecule. In this respect, perusal of 1D NMR spectrum enables (preliminary) insight into the quality of oligonucleotide sample under study, its folding state, and structural heterogeneity. A better insight is obtained by acquiring a 2D NMR spectrum. Several high-resolution DNA and RNA structures using solution-state NMR were determined and their coordinates deposited in pdb database. In addition, NMR studies offered insight into stability of individual structural elements, dynamic aspects, and intermolecular interactions. In principle, analyses of NMR spectra of oligonucleotides benefit from their low proton density which enables immediate recognition of nucleobases involved in complementary base pairs. Canonical base pairs including those involved in Watson-Crick pairing typically constitute helical stem regions of a structure. Recognition of secondary structural elements is followed, if quality of the data permits, by a more detailed and comprehensive analysis that leads to determination of 3D structure. The procedure is more time demanding as it requires assignment of individual sugar moieties as well as phosphate groups covalently bound at their 30 and 50 ends. The structure determination process is traditionally based on characterization of NOE measurements for shortrange distance restraints and scalar couplings measurements for torsion angle restraints. For any application of NMR, assignment of individual resonances is a fundamental process from which the quality of the interpretation is based. Chemical shifts are a fundamental indicator of molecular structure, and resolution on the chemical shift scale is at the heart of all NMR studies. However, chemical shift assignment remains a tedious and often rate-limiting step in determining 3D structure and characterizing biomolecular dynamics and ligand interactions. Great efforts are being made to develop and improve methods for predicting chemical shifts of nuclei in oligonucleotides. Robust prediction, coupled with linked interactive data analysis software and AI, can significantly reduce the time required to assign chemical shifts. However, the accuracy of these algorithms requires a robust database of chemical shifts representing a wide variety of sequence and structural motifs. The small number of databases limits the accuracy of available tools. Methods of assigning proton NMR spectra of duplex DNA and RNA have been known for some time. Proton NMR experiments on duplex DNA allow determination of NOE effects and scalar coupling constants between nearby protons of nucleobase and sugar moieties. The original strategies rely on the use of NOE contacts between nonexchangeable protons in a regular B-form duplex. In the chapter based on earlier contributions (Adrian et al. 2012; Plavec 2012; Webba da Silva 2007), we review approaches and challenges of NMR studies of DNA beyond double stranded structures. NMR methods include simple resonance assignment for determining topology that is typically followed by sequential assignment and structure determination with high resolution. Low proton density of nucleic acids allows for quick recognition and identification of hydrogen bonds which enable evaluation of folding and offers restraints for defining alignments of base pairs. Complete structure determination requires assignment of the sugar–phosphate backbone.

6

NMR Study on Nucleic Acids

173

Conventionally, the structure determination process is based on the interpretation through space magnetization transfer between protons that is mediated through carbon, nitrogen, and phosphorus atoms. The power and limitations of NOE-derived information lies in the fact that NOE is a function of many factors including motion, relative orientation, and mutual effects in addition to internuclear distance of a pair of protons. The scalar coupling constants are weighted average over all conformations that DNA and RNA can adopt. This in general limits a conversion of NOE and coupling constant information into a single structure of a duplex DNA or any other form of DNA and RNA in solution.

Elements of the Structural Buildup of Nucleic Acids and Their Conformational Landscape In terminological terms, DNA and RNA are polymers made up of repeating units, nucleotides, comprising three components (Fig. 1): a sugar (20 -β-D-deoxyribose and β-D-ribose), phosphodiester, and one of the four heterocyclic bases (guanine, G and cytosine, C, adenine, A, thymine, T or uridine, and U). The glycosidic bond links a (deoxy)ribose sugar and a nucleobase, being the C10 -N9 bond for purine and the C10 -N1 bond for pyrimidine nucleotides. Glycosidic torsion angles define relative orientation of O40 -C10 bonds with respect N9-C4 for purines and N1-C2 for pyrimidines. The torsion angle χ around glycosidic bond is

Fig. 1 Chemical structure of nucleos(t)ides: (a) Schematic presentation of 20 -deoxyribose with a phosphate group connected to the C50 and a nucleobase to the C10 in β-D-configuration; (b) chemical structure of the five common purine and pyrimidine bases found in DNA and RNA

174

J. Plavec

Fig. 2 Rotation along gylcosidic bond: anti-syn equilibrium of 20 -deoxyguanosine. The two orientations are defined quantitatively with the use of torsion angle χ[O40 -C10 -N9-C4], which adopts values around 180 in anti and 0 in syn conformation (anti in panel A and syn in panel B)

usually confined to two low-energy regions. The anti conformation has the N1, C2 face of purines and the C2, N3 face of pyrimidines directed away from the sugar ring so that the hydrogen atoms attached to C8 of purines and C6 of pyrimidines are lying over the sugar ring (Fig. 2). In Watson-Crick hydrogen-bonded pairs, exocyclic substituents on the purine and pyrimidine rings are directed away from the sugar ring. These orientations are reversed for the syn conformation with the hydrogenbonding groups oriented toward the sugar. Especially O50 atom has a possibility of additional hydrogen bonds (e.g., between O50 and N3). The five-membered sugar ring is for steric reasons not planar (Altona and Sundaralingam 1972). The pentofuranosyl moieties of natural nucleosides and nucleotides adopt a variety of distinct puckered conformations. There are five internal sugar torsions (i.e., ν0, . . ., ν4) that unequivocally define a specific conformation on a pseudorotational circle in terms of phase angle of pseudorotation, P, and the maximum puckering amplitude, Ψm. Extensive studies by several methods including NMR spectroscopy have established that the sugar moieties of nucleos(t) ides are in solution involved in a two-state conformational equilibrium between two distinctly identifiable North (N) and South (S) conformations (Fig. 3). The two-state N ¼ S pseudorotational equilibrium is controlled by the competing anomeric and gauche effects (Plavec et al. 1993). The 20 -OH and 30 -OH groups have been demonstrated to drive the N ¼ S pseudorotational equilibrium through the

6

NMR Study on Nucleic Acids

175

Fig. 3 Pseudorotational wheel with 20 distinct twist (T ) and envelope (E) conformations in the North (P ≈ 0 ), East (P ≈ 90 ), South (P ≈ 180 ), and West (P ≈ 270 ) regions. P refers to Phase angle of pseudorotation. The shaded sections indicate preferred regions in the North and South regions of conformational space with the representative C30 -endo (3E) and C20 -endo (2E) conformers shown on the sides, respectively

tendency to adopt a gauche arrangement of C20 -O20 and C30 -O30 bonds with C10 -O40 and C40 -O40 bonds, respectively. The heterocyclic base in N-nucleosides drives the two-state N ¼ S pseudorotational equilibrium of the constituent beta-Dpentofuranosyl moieties by the anomeric effect (i.e., electronic interaction between one of the lone-pair orbitals of O40 and σ* orbital of glycosidic bond), which places the aglycone in the pseudoaxial orientation (i.e., N-type conformation). The hypothesis of the two-state model in solution has been experimentally evidenced by the NMR observations of two distinctly identifiable and dynamically interconverting N and S conformations of the sugar moieties in oligonucleotides as in B ¼ Z DNA, A ¼ Z RNA or A-form ¼ B-form lariat RNA transitions. Assuming that the two-state model is valid, it has been shown that the conformation of the sugar in nucleos(t)ides is driven by a balance of stereoelectronic and steric effects (Plavec et al. 1993). The N to S (and vice versa) interconversion is fast on the NMR timescale and results in the time-averaged coupling constants and chemical shifts of sugar moieties in nucleosides. The N ¼ S pseudorotational equilibrium of the sugar moiety is in solution energetically controlled by various competing stereoelectronic effects which are determined by the stereochemical position of the

176

J. Plavec

Fig. 4 Conformation along sugar-phosphate backbone is defined with the five torsion angles: α [O30 -P-O50 -C50 ], β [P-O50 -C50 -C40 ], γ [O50 -C50 -C40 -C30 ], δ [C50 -C40 -C30 -O30 ], ε [C40 -C30 -O30 -P], and ζ [C30 -O30 -P-O50 ]

heterocyclic base, (40 -)CH2OH group, and other substituents on the alpha or beta face of the sugar moiety, as well as by the entropy of the system (Plavec et al. 1996). The sugar pucker is closely related to whether the helix will exist in the A-form or in the B-form. In the β-D-ribofuranose, the plane C10 -O40 -C40 is fixed. Endo-pucker means that C20 or C30 are turned out of this plane into the direction of O50 . Exopucker describes a shift in the opposite direction. C30 -endo (N) and C20 -endo (S) pseudorotamers are involved in dynamic equilibrium. DNA structures exhibit predominantly conformations in the S region of pseudorotational circle with the C20 -endo being the representative conformer. In RNA, we find predominantly the C30 -endo conformation. Both DNA and RNA may adjust and are able to take up both conformations. There are pronounced correlations between sugar pucker and glycosidic angle, which reflect the changes in nonbonded interactions produced by C20 -endo versus C30 -endo puckers. The phosphodiester backbone of an oligonucleotide has six variable torsion angles per a single residue, which are designated α to ζ (Fig. 4). Steric considerations alone dictate that the backbone angles are restricted to discrete ranges. Several torsion angles have highly correlated values and therefore exhibit correlated motions in a solution environment. This restricts their conformational space and does not allow the individual torsion angles to adopt any value between 0 and 360 . Nevertheless, the fact that α, Β, γ, and ζ torsion angles each have three allowed ranges together with the two staggered regions for ε leads to a large number of possible low-energy conformations for nucleotide residue. From the NMR spectroscopy point of view, intranucleotide proton-proton distances depend on ν0-ν4, χ and γ torsion angles. A description of the conformation of individual nucleotide strand is best given with the use of helix parameters. The handedness describes the sense of the helix. The pitch relates the number of nucleobases per turn (n) and the height (h) per base along the helix axis. The twist per base pair is equal to 360 degs/n. Typical double helical structures such as B- and

6

NMR Study on Nucleic Acids

177

Fig. 5 Watson-Crick A-T and G-C base pairing

A-type DNA and RNA are right-handed. Both helices contain Watson-Crick base pairs (Fig. 5). On the other hand, left-handed Z-DNA is unique in that it contains a dinucleotide repeat unit. The base pairs are not centered on the helix axis but are displaced from it to variable extent. Because of this axis displacement and the fact that the glycosidic bonds branch off from one side of the base pairs, the outer envelope of the double helices is not cylindrically smooth but displays two grooves, which have variable width and depth in the different forms. Different functionalities of base pairs are exposed in minor and major groove edges. For A-T base pair, the major groove edge contains the protons AH8, AC6-NH2, TC5-Me, and TH6, whereas A2H and T3NH are in the minor groove (Fig. 5). For G-C base pair in Watson-Crick orientation, GH8, CC4-NH2, CH5, and CH6 protons are in the major groove, and GC6-NH2 is in the minor groove. It should be noted here that the canonical Watson-Crick base pairs in the DNA double helix exist in dynamic equilibrium with short-lived (6 Hz) and weak J200 30 cross-peaks is observed, the sugar pucker geometries are predominantly populated by the S-type conformation. Evaluation and restraints on conformation along the sugar-phosphate backbone contribute to precision and accuracy of 3D structure calculation. 3JH30 P coupling constant (across C30 –O30 bond, ε torsion angle) and 3J40 50 and 3J40 500 coupling constants (across C40 –C50 bond, γ torsion angle) are derived from 1H–31P HSQC and 1H–31P COSY-type experiments. A 3JH30 P coupling constant of over 5 Hz results in ε being involved in equilibrium between trans and gauche- conformations. If 3 JH30 P coupling constant was greater than 15 Hz, the torsion angle ε could be assigned to the gauche+/trans region. For residues with C20 -endo sugar puckers and 31P signals resonating within the normal 1 ppm range, both α[O30 –P–O50 –C50 ] and ζ[C30 –O30 –P–O50 ] torsion angles can be excluded from the trans domain. The information on phosphate backbone conformations are of crucial relevance for DNA and structure (Lankhorst et al. 1985; Schwalbe et al. 1994; Webba da Silva 2007). Inclusion of restraints for backbone torsion angles in structure calculation protocols has not been widespread. For derivation of 3JH30 P coupling constant (across C30 –O30 bond, ε torsion angle), 3J40 50 and 3J40 500 coupling constants (across C40 –C50 bond, γ torsion angle), it is convenient to use 1H–31P HSQC- and 1H–31P COSY-type experiments (Fig. 18) (Hoogstraten and Pardi 1998; Szyperski et al. 1997). The value of 3JH30 P coupling constant of over 5 Hz results in ε being involved in equilibrium between trans and gauche conformations. A series of heteronuclear INEPT transfer delays in a 1H–31P HSQC was used to determine scalar couplings from 7 to 25 Hz which were used to assign the torsion angle ε based on the range of H30 (n)–P(n þ 1) coupling constants. For 3JH30 P coupling constants greater than

6

NMR Study on Nucleic Acids

201

15 Hz the torsion angle ε was assigned to the gauche+/trans region. 3JH30 P coupling constant smaller than 10 Hz indicates that ε does not adopt gauche+ region. For residues with C20 -endo sugar puckers and 31P signals resonating within the normal 1 ppm range both α[O30 –P–O50 –C50 ] and ζ[C30 –O30 –P–O50 ] torsion angles can be excluded from the trans domain. The ability to detect 4J40 P (>5 Hz) indicates for C20 -endo sugar puckers that γ torsion angles are in gauche+ domain. Absence of 3 J50 /500 P representing a coupling smaller than 5 Hz indicates that β is in trans conformation (Ippel et al. 1996; Tisne et al. 1996). Various methods have been developed to measure backbone coupling constants in 13C-labeled and unlabeled molecules. The implementation of the CT-HMQC-J experiment was demonstrated on a 13C-labeled sample of the quadruplex forming sequence d(GGAGGAT)4. The measurement of 3J30 (i)P(iþ1) and 3J50 (i)P(i) was derived from the intensity difference of 1H–13C cross-peaks in the presence and absence of the proton–phosphorus coupling interaction during the constant-time period in HMQC experiment.

Labeling with Stable Heteronuclear 15N and 13C Isotopes Unlike 1H nuclei with a chemical shift span of 0–15 ppm, 13C nuclei in RNA have chemical shifts from 61 (C50 -ribose) to 170 (nonprotonated pyrimidine nucleobase C4) ppm, while 15N chemical shift range spans from 70 (amino nitrogen) to 240 (nonprotonated purine nucleobase N7) ppm (Wijmenga and van Buuren 1998). Noteworthy, carbon and nitrogen atoms are distributed in nucleic acid major and minor grooves that are interaction sites. The ribose 13C carbon atoms are much better resolved than 1H, and 13C direct detection can contribute toward increasing the number of resonance assignments using 3D experiments such as (H) CC-TOCSY, (H)CPC-, and (H)CPC-CCH-TOCSY (Marino et al. 1995). Noteworthy, assignments of C20 and C30 chemical shifts can be used to assess the predominant ribose conformation. Given the wider chemical shift dispersion of 15N and its narrower linewidths over 13C and 1H nuclei, it is more suited to monitor groove interactions. Availability of appropriate NMR cryogenic probes being used on a high magnetic field spectrometers enables detection of a low-γ nuclei (15N) thus opening new avenues (Schnieders et al. 2020) that together with isotope labeling technologies and tailored NMR experiments enable studies of large RNAs by NMR (Dayie et al. 2022). For the study of RNA oligonucleotides with more than 25 nucleotides, labeling with 13C and 15N isotopes is required. Sequential assignment is based on heteronuclear double and triple resonance heteronuclear experiments. The basic strategy combines information from two- and three-dimensional heteronuclear through-bond correlation experiments with sequential NOE connectivities. Isotopic labeling enables 13C direct detection experiments that can contribute to a complete resonance assignment of nucleobases, including their quaternary carbon or tertiary nitrogen atoms. NMR experiments may exploit 1H-excited-13C-detected experiments, or alternatively 13C-excited and 13C-detected experiments.

202

J. Plavec

Structure determination of RNAs longer than 50–70 nt, where spectral overlap and rapid relaxation are limiting, follows the “divide and conquer” approach (Barnwal et al. 2017). Spectral overlap, particularly in the sugar region, makes resonance assignment difficult. Selective nucleobase isotope labeling by residue or resonance type leads to simpler NMR spectra and requires only one of many labeled nucleotide triphosphates (NTPs). Multiple residue-specific 13C- and 15N-labeled and specifically deuterated nucleotide-labeled samples facilitated structural studies of RNA greater than 50 nt. This approach can involve the preparation of RNA with sitespecific (H6/H8, H10 , H20 , 2H30 , 2H40 , 2H50 /2H500 , and 2H5) ribose and nucleobase deuteration, which simplifies spectra and significantly reduces dipolar relaxation. The largest overlap in RNA spectra occurs for ribose protons. The H20 to H500 protons resonate between δ4 and 5 ppm and their carbons at δ62–85 ppm. Deuteration of H30 , H40 , and H50 /500 in the sugar and H5 positions in C/U can be performed using commercially available NTPs. Another challenge in studying larger RNA is slow molecular tumbling (leading to efficient dipolar 1H-13C relaxation), which increases proton and carbon line widths and makes it difficult to maintain coherence. Long extended structures such as stem loops often do not have interactions between individual segments, and even complex structures with multiple stem-loop fragments do not always have interactions between individual stem loops. This can be determined experimentally by comparing NMR spectra of isolated fragments or by looking at linewidth and relaxation rates (in the absence of long-range interactions, domains tumble independently, leading to inefficient relaxation and sharp NMR lines). In this case, there is no reason to pursue any of the much more sophisticated and less efficient labeling strategies. The NMR method for structure determination relies on distance restraints derived from NOE effects between nearby hydrogen atoms. NOEs are the essential NMR data to define the secondary and tertiary structure of an oligonucleotide because they connect pairs of hydrogen atoms separated by less than about ca. 5 A in space which can be far away in terms of primary sequence. In principle, all hydrogen atoms of an oligonucleotide form a single network of spins that are coupled by the dipole-dipole interactions. Magnetization can be transferred from one spin to another not only directly but also indirectly via other spins in the vicinity (e.g., spin diffusion). The approximation of isolated spin pairs is valid only for very short mixing times in the NOESY experiment. However, the mixing time cannot be made arbitrarily short because the intensity of a NOE is proportional to the mixing time. The quantification of a NOE amounts to determining the volume of the corresponding cross peak in a NOESY spectrum. Since the line-widths can vary appreciably for different resonances, cross peak volumes are determined by integration over the peak area rather than by measuring peak heights. While integration might be straightforward for resolved cross-peaks, deconvolution methods are needed for overlapping crosspeaks. Luckily, the relative error of the distance estimate is only one-sixth power of the relative error of the volume determination. Vicinal 3JHH and 3JHX coupling constants are related to H-X-X-H and H-X-X-X torsion angle (X being a heteronuclei such as C, N, or P). Appropriate Karplus equation is selected in the analysis and interpretation of experimental coupling

6

NMR Study on Nucleic Acids

203

constants. In contrast to NOEs, scalar coupling constants give information only on the local conformation of sugar-phosphate backbone. They are nevertheless important to accurately define the local conformation, to obtain stereospecific assignments for diastereotopic protons, and to detect torsion angles that occur in multiple states. Hydrogen atoms involved in hydrogen bonds are involved in slow hydrogen exchange. As the acceptor oxygen or nitrogen atom cannot be identified directly by NMR, one has to rely on NOEs in the vicinity of the postulated hydrogen bond or on assumptions about regular secondary structure to define the acceptor. The standard hydrogen bonds in regular canonical geometry can be identified with much higher certainty than hydrogen bonds with amino or other groups. Hydrogen bond restraints are introduced into the structure calculation as distance restraints, typically by connecting the acceptor hydrogen and the atom to which the hydrogen atom. Chemical shifts are very sensitive probes of the molecular environment of a nucleus. However, their dependence on structure is complicated and not yet understood to the point where reliable conformational constraints could be derived. A quantitative approach to defining and refining details of 3D structure relies on satisfying distance restraints between different hydrogen atoms, hydrogen bonding, and spin-spin coupling constants, which is usually done through molecular model calculations. A refinement procedure usually starts with a trial structure with suitable topology that is subjected to constrained least squares refinement. NOE enhancements are interpreted in terms of relatively accurate proton-proton distances, which are used as restraints in the refinement procedure. The inherently low proton density and poor dispersion of the chemical shift pose challenges for solution-state NMR studies. A particularly significant hurdle for larger RNAs is the typical lack of experimental information defining the relative orientation of secondary structure elements. Although local structural features for larger RNAs can be well defined using 2H-edited NMR approaches, segmental labeling, or site-specific labeling, the interproton distances between adjacent helices are typically too large for measurement with NOEs, and until recently, rapid 1H-13C NMR relaxation and other technical issues have limited the detection of residual dipolar couplings (RDCs) to relatively small RNAs. These limitations have led to the use of hybrid methods that complement high-resolution local structural information with lower-resolution global information. A newer conformational restraint originates from residual dipolar couplings in partially aligned molecules (Chiliveri et al. 2022). RDC values give information on angles between covalent bonds and globally defined axes in the molecule (magnetic susceptibility tensor) (Lipsitz and Tjandra 2004). RDCs can provide information on long-range order which is not directly accessible from other commonly used NMR parameters. In this way, a precise structure depends on obtaining global structural constraints including helical bend, end-to-end orientation, and different interhelical angles for different helical motifs within a given molecule. Partial alignment of RNA molecules in solution is achieved by Pf1 filamentous phage, polyethylene glycol, bicelles doped with charged amphiphilic molecules, and charged polyacrylamide gels. Pf1 filamentous phage remains a popular RNA alignment media due to its high negative surface charge that prevents RNA precipitation.

204

J. Plavec

Measuring RDCs becomes more difficult as RNA size increases and methods have been developed for measuring 1H-15N imino and a variety of 1H–13C nucleobase RDCs in nucleic acids, using an improved TROSY-HSQC method and Amide RDCs by TROSY Spectroscopy (ARTSY). In the ARTSY experiment, TROSY peak intensity and RDC values are determined based on the peak intensity ratio in an attenuated spectrum and a reference spectrum which may limit precision of the RDC measurements due to small signal-to-noise ratio, which is dependent on effective correlation time of studied RNAs. Alternative approach utilizes a variable flip angle HMQC. In the VF-HMQC spectra, RDCs are calculated based on ratio of measured peak intensity, and higher sensitivity and resolution are allowed. A limitation to this approach is the need for helical regions with many adenosine residues in order to obtain a sufficient number of restraints. Paramagnetic spin labels are an approach for obtaining long-range RNA structural constraints where paramagnetic probes are incorporated into a larger RNA. Paramagnetic labeling enables observation of dipolar interactions between an unpaired electron within the probe and the nearby nuclei of the RNA molecule, which can be reflected in both line broadening and chemical shift changes depending on the spin-label. Paramagnetic relaxation enhancement and long-range distances also provide insight as to the existence of transient protein-RNA interactions or to investigate protein-RNA recognition mechanisms. The evaluation of the quality of NMR structures is usually based on the assessment of precision (how much the structures differ from each other) and accuracy (how close the structures are to the “correct” structure). The precision value is easy to calculate for an ensemble of NMR structures (e.g., using the root mean square deviation (RMSD) between different structures). Nevertheless, the RMSD value is not an objective criterion for assessing the quality of the structure per se, since it may reflect not only the quality of the NMR data, but also the way in which NMR constraints were applied and force fields, computational protocols, and solvent models were used. Accuracy is more important than precision in defining the quality of a structure. However, it is difficult to define it quantitatively because the “real” structure is usually unknown. There are descriptive parameters and features of DNA and RNA structure that can be calculated, for example, using X3DNA and MolProbity software. Validation based on chemical shifts can also be performed using NUCHEMICS. However, these descriptive parameters are not yet integrated into the structure validation software, which would compare the structural features of the nucleic acids with a high-quality reference database and indicate potential problems.

NMR Structural Studies in Combination with Complementary Methods Structure determination of large RNA structures by NMR has benefited from advances in new methods of sample preparation, isotope labeling strategies, and NMR data acquisition. However, the use of complementary structural techniques

6

NMR Study on Nucleic Acids

205

facilitates structure determination, as the determination of large RNA structures by NMR remains a major technical challenge. A number of recent studies have used small-angle X-ray scattering (SAXS), cryo-electron microscopy (cryo-EM), and small-angle neutron scattering (SANS) in combination with NMR to investigate the overall structure of large RNAs (Kotar et al. 2020). SAXS has been the most commonly used complementary technique to gain global structural insight. While NMR data provide high-resolution local structural information, SAXS provides more global structural information, albeit at lower resolution. RNA structures have been determined using hybrid NMR/SAXS spectroscopy approaches, confirming the compatibility of these methods in the study of large RNAs. Higher resolution can be achieved with wide-angle X-ray scattering (WAXS). The inclusion of WAXS data in addition to SAXS may prove to be a new frontier for hybrid methods, as it is easy to implement and improves resolution. SANS is effective in determining global structural constraints at low resolution. A neutron beam is directed at a sample, which may be an aqueous solution. The neutrons are elastically scattered by nuclear interaction with the nuclei or interaction with the magnetic momentum of unpaired electrons. A key feature of SANS, which makes it particularly useful for the life sciences, is the special behavior of hydrogen, especially compared to deuterium. In nucleic acids, hydrogen can be exchanged for deuterium, which usually has minimal effect on the sample but dramatic effects on the scattering. The NMR-derived structure can be verified by SANS scattering data to verify agreement of the global structure. Cryo-electron microscopy and tomography can be used in combination with NMR to refine RNA structures and, in particular, to confirm global topology. A combination of NMR and cryo- EM map restraints has been applied to refine the structures of large RNAs over 100-nt. Recent studies demonstrate the advantages of using NMR and cryo- EM techniques in determining RNA structures. NMR can measure accurate local structural information at the atomic resolution level, even with great flexibility, while cryo-EM provides data on global topology. In addition, cryo-EM in combination with chemical probing and computational approaches has already enabled rapid determination of highly structured RNAs in size of up to 388 nucleotides and resolution from 5–14 Å.

Challenges in Structural Studies of Biologically Relevant DNA and RNA NMR has proven itself as a powerful tool for studying structure and dynamics of RNAs. However, its successes have been limited mostly to small and medium-sized RNAs consisting of few tents of residues. Only a handful of structures consist of more than 100 nucleotides. Two main factors have been recognized which promise to advance the field and address more complex system with NMR as molecular size increases. First, DNA and RNA in principle consist of only four nucleotide-building blocks which severely limit resolution and dispersion of NMR signals. Obviously, this problem is exacerbated as the number of nucleotides in DNA or RNA increases

206

J. Plavec

and NMR signal overlap hinders unambiguous spectra assignment. Second, biomolecules including DNA and RNA, with high(er) and increasing molecular weight tumble more and more slowly. Their slower tumbling in solution leads to shorter and shorter transverse relaxation times (T2). Shortening of T2 leads to severe linewidth broadening that decreases signal intensity and resolution. On a positive side, numerous advances in sample preparation, data acquisition, and subsequent data analysis of multidimensional NMR spectra have been developed that address these issues successfully. A key advancement relates to isotope labeling strategies. NMR signal overlap for large RNAs may be reduced by isotopic labeling strategies that simultaneously simplify and improve the quality of the spectra. Incorporation of NMR-active isotopes into RNA enables use of novel pulse sequences to measure specific structural parameters. Incorporation of isotopically labeled rNTPs in a nucleotide-specific manner involves replacing the unlabeled rNTP with the desired labeled rNTP in an in vitro transcription reaction. This approach is commonly used for incorporation of 15N, 13C, and 2H labels within RNAs. As incorporation of high density of 15N and 13C nuclei can cause detrimental T2 relaxation effects which may be resolved with the use of a deuterium-edited approach, several RNA samples are prepared with unique 2H-labeling patterns in the ribose and/or nucleobase. Incorporation of deuterium at nonexchangeable positions has the potential to greatly reduce the number of signals from overlapping resonances of different nucleotides and results in narrowed linewidths due to a reduction in 1H spin diffusion. Nucleotidespecific deuterium labeling has the potential to greatly simplify and improve 1H–1H NOESY spectra of large RNAs. Magnetically active nuclei such as 19F have valuable spectroscopic properties that confer clear advantages in the study of macromolecular structure and conformational changes (Becette et al. 2020; Li et al. 2020). 19F is sensitive to changes in its local chemical environment, making it a useful probe of conformational changes in in vitro study of medically important RNAs and in cell mimics of biological systems. The strength and limitations of the information derived from the NOE lie in the fact that the NOE is a function of many factors, including motion, relative orientation, and mutual effects in addition to the nuclear separation of a proton pair. The scalar coupling constants are weighted averages over all conformations that DNA or RNA can adopt. This generally limits conversion of NOE and coupling constant information into a single structure in solution. Importantly, however, the biological functions of DNA and RNA rely on changes in their structure and thus on a dynamic equilibrium between different states. NMR experiments provide insight into higher energy structures and dynamic exchange and respective timescales from picoseconds to hours with atomic resolution.

Dynamic Processes in RNA and Corresponding NMR Methods Early investigations of RNA dynamics that were limited to the use of 1D NMR methods showed that NMR can probe the structural dynamics of RNA in solution at atomic resolution, opening the door to their functional understanding. Development

6

NMR Study on Nucleic Acids

207

of 1D and 2D heteronuclear polarization transfer methods allowed probing of dynamics of ribose, nucleobase, and phosphorus nuclei distributed along the entire RNA structure. Changes in RNA-secondary structure play fundamental roles in the cellular functions of a growing number of noncoding RNAs. Motions can be characterized in a wide time range from picosecond to seconds (and slower) and visualize conformers that are transient and sparsely populated. For these low populated states, we can extract chemical shifts (structure), rates (kinetics), and populations (thermodynamics) under various physiological conditions of temperature, salt, pH, and cellular environment. Typically, single-stranded regions and nucleotides within dynamic secondary structure elements such as loops and bulges can be studied with 13C-detected experiments as imino proton signals are broadened beyond detection by exchange with bulk water. The residue-specific information on exchange process between the energetically more favorable ground state and transient states (referred to as “excited states”) may be obtained through a “chemical shift fingerprint” (Dethoff et al. 2012). The chemical shift fingerprints can be used to infer which residues become more/less helical upon formation of the excited state(s). Transitions toward RNA excited states are orders of magnitude faster than secondary structural rearrangements, occur without assistance from external cofactors, and result in smaller, yet significant, changes in RNA secondary structure, which can include changes in base protonation state. Slower RNA dynamics can be analyzed through successive collection of NMR spectra at selected time intervals. This real-time approach is highly intuitive as it relates to a biological system directly associated with disappearance and appearance of signals (Marušič et al. 2019). Example processes to be measured by real-time NMR spectroscopy are RNA (re-)folding by reshuffling of base-paired nucleotides or catalytic reactions of ribozymes. Slow base-pair opening, resulting in exchange of labile-bound protons with solvent, can be followed with H-D exchange experiments. Intermediate dynamics involves slow-intermediate to slow exchange (ms–s) and may be studied with CEST (chemical exchange saturation transfer). In this experiment, a low power spin lock is used to saturate different regions of a spectrum by varying its offset. The technique is especially useful if one of the two states0 populations is low and therefore its chemical shift cannot be observed directly. In case the spin lock offset matches the frequency of the detectable signal, the major state signal is saturated and the intensity in the CEST NMR spectrum is close to zero. In case the spin lock is applied to a region far off the chemical shifts of both, the ground state as well as the invisible, excited state, no saturation effect is visible and the detected signal in the CEST spectrum is of the same intensity as the original signal. The most interesting effect happens when the carrier frequency of the spin lock offset is on resonant with the chemical shift of the minor/excited state. Alternative method relies on CPMG experiments that require (INEPT) or selective Hartmann–Hahn transfers. To extract parameters of interest, information about the exchange process, the data are usually acquired on at least two different fields and solutions to the Bloch–McConnell equations have to be found. Similar to using CPMG in order to detect exchange contributions on top of R2 relaxation times, it is possible to replace the 180 train of hard pulses by a spin lock pulse and detect

208

J. Plavec

exchange as contribution to detected R1ρ values. Cross-correlated relaxation studies can be used to access dynamics data. Cross-correlations may occur between the fluctuations of dipole-dipole interactions between nitrogen and protons and chemical shift anisotropy interactions of the nitrogen nuclei involved in Watson–Crick base pairs. Fast dynamics involves internal motions such as bond vibrations and librations, changes in bond angles, angular orientation, as well as bond lengths, that are faster than the overall tumbling rate of the molecule and directly influence local relaxation rates R1 and R2 as well as cross-relaxation rates and NOE. The observed relaxation rates at a specific site/nucleus are then not only a function of the overall molecular rotational correlation time, τC, but also of the local, effective correlation time and an order parameter. RDC dynamics is one of the few methods that allows the determination of larger scale motion such as movement of helices and covers wide range of timescales and is also independent of τC (Xue et al. 2015).

Conclusion DNA and RNA are two different nucleic acids found in the cells of every living organism. Both play significant roles in cell biology. DNA contains the genetic instructions used in the development and functioning of all known living organisms, whereas RNA molecules are involved in protein synthesis and in the transmission of genetic information. Several methods including NMR show that DNA oligonucleotides are found increasingly to be more polymorphic than typically considered and far from being associated with the double helix alone described in the early 1950s. One of the major differences between DNA and RNA is the sugar moiety, with 20 -deoxyribose being replaced by the ribose in RNA. Hydrogen bond complementarity is at the heart of the recognition and stabilization of double stranded structural elements. There exists a plethora of structural motifs that are different from the double helix. Detailed knowledge of their structural characteristics is necessary for understanding their biological function. NMR spectroscopy allows the study of the structures of nucleic acid oligomers in solution at close to physiological and under in cell conditions. In order to delineate the relationships between the structures, thermodynamic stabilities, and biological function of nucleic acids, it is highly desirable to complement the structural studies by characterization of equilibrium conformational fluctuations. NMR is a powerful method to study molecular motions and sequence-dependent flexibility at atomic resolution on a wide range of timescales and visualize structures that are transient and sparsely populated. Acknowledgments The author is grateful to the colleagues named in the cited papers from his laboratory at Slovenian NMR center, especially Drs. Trajkovski, Šket, Lenarčič Živković, Kocman, Kotar, Marušič, Podbevšek, Pavc, Toplishek, and Cevec. The help of Klemen Pečnik, Dr. Marko Trajkovski, Matic Kovačič, and Dr. Primož Šket in the preparation of the artwork is gratefully acknowledged. This work was supported partly by the Slovenian Research Agency (ARRS, grants P1-0242 and J1-1704).

6

NMR Study on Nucleic Acids

209

References Adrian M, Heddi B, Phan AT (2012) NMR spectroscopy of G-quadruplexes. Methods 57(1):11–24 Altona C (1982a) Conformational analysis of nucleic acids. Determination of backbone geometry of single-helical RNA and DNA in aqueous solution. Recl Trav Chim Pays-Bas 101(12): 413–433 Altona C (1982b) High resolution NMR studies of nucleic acids. NATO Adv Stud Inst 45:161 Altona C, Haasnoot CAG (1980) Prediction of anti anf gauche vicinal proton-proton coupling constants in carbohydrates: a simple additivity rule for pyranose rings. Org Magn Reson 13(6): 417–429 Altona C, Sundaralingam M (1972) Conformational analysis of the sugar ring in nucleosides and nucleotides. A new description using the concept of pseudorotation. J Am Chem Soc 94(23): 8205–8212 Altona C, Francke R, de Haan R, Ippel JH, Daalmans GJ, Westra Hoekzema AJA, van Wijk J (1994) Empirical group electronegativities for vicinal NMR proton-proton couplings along a C-C bond: solvent effects and reparametrization of the Haasnoot equation. Magn Reson Chem 32:670–678 Barnwal RP, Yang F, Varani G (2017) Applications of NMR to structure determination of RNAs large and small. Arch Biochem Biophys 628:42–56 Becette OB, Zong G, Chen B, Taiwo KM, Case DA, Dayie TK (2020) Solution NMR readily reveals distinct structural folds and interactions in doubly 13C- and 19F-labeled RNAs. Sci Adv 6(41):eabc6572 Berman HM, Olson WK, Beveridge DL, Westbrook J, Gelbin A, Demeny T, Hsieh SH, Srinivasan AR, Schneider B (1992) The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys J 63(3):751–759 Boelens R, Koning TMG, Vandermarel GA, Vanboom JH, Kaptein R (1989) Iterative procedure for structure determination from proton proton NOEs using a full relaxation matrix approach – application to a DNA octamer. J Magn Reson 82(2):290–308 Boisbouvier J, Brutscher B, Pardi A, Marion D, Simorre JP (2000) NMR determination of sugar puckers in nucleic acids from CSA-dipolar cross-correlated relaxation. J Am Chem Soc 122(28):6779–6780 Božič T, Zalar M, Rogelj B, Plavec J, Šket P (2020) Structural diversity of sense and antisense RNA hexanucleotide repeats associated with ALS and FTLD. Molecules 25(3):525 Chiliveri SC, Robertson AJ, Shen Y, Torchia DA, Bax A (2022) Advances in NMR spectroscopy of weakly aligned biomolecular systems. Chem Rev 122(10):9307–9330 Dayie TK, Olenginski LT, Taiwo KM (2022) Isotope labels combined with solution NMR spectroscopy make visible the invisible conformations of small-to-large RNAs. Chem Rev 122(10): 9357–9394 de Leeuw FAAM, Altona C (1982) Conformational analysis of ß-D-Ribo-, ß-D-Deoxyribo-, ß-DArabino-, ß-D-Xylo-, and ß-D-Lyxo-nucleosides from proton-proton coupling constants. J Chem Soc Perkin Trans 2:375–384 Dethoff EA, Petzold K, Chugh J, Casiano-Negroni A, Al-Hashimi HM (2012) Visualizing transient low-populated structures of RNA. Nature 491(7426):724–728 Every AE, Russu IM (2007) Probing the role of hydrogen bonds in the stability of base pairs in double-helical DNA. Biopolymers 87(2–3):165–173 Feigon J, Koshlap KM, Smith FW (1995) 1H NMR spectroscopy of DNA triplexes and quadruplexes. Methods Enzymol 261:225–255 Flynn PF, Kintanar A, Reid BR, Drobny G (1988) Coherence transfer in deoxyribose sugars produced by isotropic mixing: an improved intraresidue assignment strategy for the two-dimensional NMR spectra of DNA. Biochemistry 27(4):1191–1197 Furtig B, Richter C, Wohnert J, Schwalbe H (2003) NMR spectroscopy of RNA. Chembiochem 4(10):936–962

210

J. Plavec

Gaffney BL, Wang C, Jones RA (1992) Nitrogen-15-labeled oligodeoxynucleotides. 4. Tetraplex formation of d[G(15N-7)GTTTTTGG] and d[T(15N7)GGGT] monitored by 1H detected 15N NMR. J Am Chem Soc 114(11):4047–4050 Glaser SJ, Remerowski ML, Drobny GP (1989) Complete assignment of the deoxyribose 50 /500 proton resonances of the EcoRI DNA sequence using isotropic mixing. Biochemistry 28(4): 1483–1487 Gorenstein DG (1994) Conformation and dynamics of DNA and protein-DNA complexes by 31P NMR. Chem Rev 94(5):1315–1338 Groves P, Webba da Silva M (2010) Rapid stoichiometric analysis of G-quadruplexes in solution. Chem Eur J 16(22):6451–6453 Haasnoot CAG, de Leeuw FAAM, Altona C (1980) The relationship between proton-proton NMR coupling constants and substituent electronegativities -I. an empirical generalization of the Karplus equation. Tetrahedron 36:2783–2792 Haasnoot CAG, de Leeuw FAAM, de Leeuw HPM, Altona C (1981) Relationship between protonproton NMR coupling constants and substituent electronegativities. III. Conformational analysis of proline rings in solution using a generalized Karplus equation. Biopolymers 20:1211–1245 Hänsel R, Luh LM, Corbeski I, Trantirek L, Dötsch V (2014) In-cell NMR and EPR spectroscopy of biomacromolecules. Angew Chem Int Ed 53(39):10300–10314 Hoogstraten CG, Pardi A (1998) Measurement of carbon-phosphorus J coupling constants in RNA using spin-echo difference constant-time HCCH-COSY. J Magn Reson 133(1):236–240 Ippel JH, Wijmenga SS, de Jong R, Heus HA, Hilbers CW, de Vroom E, van der Marel GA, van Boom JH (1996) Heteronuclear scalar couplings in the bases and sugar rings of nucleic acids: their determination and application in assignment and conformational analysis. Magn Reson Chem 34:S156–S176 Kaptein R (2013) NMR studies on protein-nucleic acid interaction. J Biomol NMR 56(1):1–2 Karplus M (1959) Contact electron-spin coupling of nuclear magnetic moments. J Chem Phys 30(1):11–15 Kemmink J, Boelens R, Koning T, Vandermarel GA, van Boom JH, Kaptein R (1987) H-1-NMR study of the exchangeable protons of the duplex d(GCGTTGCG).D(CGCAACGC) containing a thymine photodimer. Nucl Acids Res 15(11):4645–4653 Koning TMG, Boelens R, Vandermarel GA, Vanboom JH, Kaptein R (1991) Structure determination of a DNA octamer in solution by NMR spectroscopy – effect of fast local motions. Biochemistry 30(15):3787–3797 Kotar A, Foley HN, Baughman KM, Keane SC (2020) Advanced approaches for elucidating structures of large RNAs using NMR spectroscopy and complementary methods. Methods 183:93–107 Kovačič M, Podbevšek P, Tateishi-Karimata H, Takahashi S, Sugimoto N, Plavec J (2020) Thrombin binding aptamer G-quadruplex stabilized by pyrene-modified nucleotides. Nucleic Acids Res 48(7):3975–3986 Kovanda A, Zalar M, Šket P, Plavec J, Rogelj B (2015) Anti-sense DNA d(GGCCCC)n expansions in C9ORF72 form i-motifs and protonated hairpins. Sci Rep 5:17944 Lankhorst PP, Haasnoot CAG, Erkelens C, Westerink HP, van der Marel GA, van Boom JH, Altona C (1985) Carbon-13 NMR in conformational analysis of nucleic acid fragments 4. The torsion angle distribution about the C30 -O30 bond in DNA constituents. Nucleic Acids Res 13(3): 927–942 Latham MR, Brown DJ, McCallum SA, Pardi A (2005) NMR methods for studying the structure and dynamics of RNA. Chembiochem 6(9):1492–1505 Li Q, Chen J, Trajkovski M, Zhou Y, Fan C, Lu K, Tang P, Su X, Plavec J, Xi Z, Zhou C (2020) 40 -fluorinated RNA: synthesis, structure, and applications as a sensitive 19F NMR probe of RNA structure and function. J Am Chem Soc 142(10):4739–4748 Lipsitz RS, Tjandra N (2004) Residual dipolar couplings in NMR structure analysis. Annu Rev Biophys Biomol Struct 33:387–413

6

NMR Study on Nucleic Acids

211

Marino JP, Schwalbe H, Anklin C, Bermel W, Crothers DM, Griesinger C (1994) A threedimensional triple-resonance 1H, 13C, 31P experiment: sequential through-bond correlation of ribose protons and intervening phosphorus along the RNA oligonucleotide backbone. J Am Chem Soc 116(14):6472–6473 Marino JP, Schwalbe H, Anklin C, Bermel W, Crothers DM, Griesinger C (1995) Sequential correlation of anomeric ribose protons and intervening phosphorus in RNA oligonucleotides by a 1H, 13C, 31P triple-resonance experiment: HCP-CCH-TOCSY. J Biomol NMR 5(1):87–92 Marino JP, Schwalbe H, Glaser SJ, Griesinger C (1996) Determination of gamma and stereospecific assignment of H50 protons by measurement of 2J and 3J coupling constants in uniformly C-13 labeled RNA. J Am Chem Soc 118(18):4388–4395 Marino JP, Schwalbe H, Griesinger C (1999) J-coupling restraints in RNA structure determination. Acc Chem Res 32(7):614–623 Marušič M, Schlagnitweit J, Petzold K (2019) RNA dynamics by NMR spectroscopy. Chembiochem 20(21):2685–2710 Pardi A, Hare DR, Wang C (1988) Determination of DNA structures by NMR and distance geometry techniques: a computer simulation. Proc Natl Acad Sci U S A 85:8785–8789 Pavc D, Wang B, Spindler L, Drevenšek-Olenik I, Plavec J, Šket P (2020) GC ends control topology of DNA G-quadruplexes and their cation-dependent assembly. Nucleic Acids Res 48(5): 2749–2761 Phan AT (2000) Long-range imino proton 13C J-couplings and the through-bond correlation of imino and non-exchangeable protons in unlabeled DNA. J Bomol NMR 16(2):175–178 Phan AT (2001) Through-bond correlation of sugar and base protons in unlabeled nucleic acids. J Magn Reson 153(2):223–226 Phan AT, Patel DJ (2002) A site-specific low-enrichment N-15,C-13 isotope-labeling approach to unambiguous NMR spectral assignments in nucleic acids. J Am Chem Soc 124(7):1160–1161 Phan AT, Luu KN, Patel DJ (2006) Different loop arrangements of intramolecular human telomeric (3+1) G-quadruplexes in K+ solution. Nucleic Acids Res 34(19):5715–5719 Pikkemaat JA, Altona C (1996) Fine structure of the P-H50 cross-peak in 31P-1H correlated 2D NMR spectroscopy. An efficient probe for the backbone torsion angles β and γ in nucleic acids. Magn Reson Chem 34(Special Issue):S33–S39 Plavec J (2012) DNA. In NMR of biomolecules: towards mechanistic systems biology. In: Bertini I, McGreevy KS, Parigi G (eds). Wiley-VCH Verlag, Singapore, pp 97–116 Plavec J, Tong W, Chattopadhyaya J (1993) How do the gauche and anomeric effects drive the pseudorotational equilibrium of the pentofuranose moiety of nucleosides? J Am Chem Soc 115(21):9734–9746 Plavec J, Thibaudeau C, Chattopadhyaya J (1996) How do the energetics of the stereoelectronic gauche and anomeric effects modulate the conformation of nucleos(t)ides? Pure & Appl Chem 68:2137–2145 Richter C, Reif B, Worner K, Quant S, Marino JP, Engels JW, Griesinger C, Schwalbe H (1998) A new experiment for the measurement of nJ(C,P) coupling constants including 3J(C4i0 ,P-i) and 3J(C4i0 ,Pi+1) in oligonucleotides. J Biomol NMR 12(2):223–230 Rinkel LJ, Altona C (1987) Conformational analysis of the deoxyribofuranose ring in DNA by means of proton-proton coupling constants: a graphical method. J Biomol Struct Dyn 4(4): 621–649 Schnieders R, Keyhani S, Schwalbe H, Fürtig B (2020) More than proton detection—new avenues for NMR spectroscopy of RNA. Chem Eur J 26(1):102–113 Schwalbe H, Marino JP, King GC, Wechselberger R, Bermel W, Griesinger C (1994) Determination of a complete set of coupling constants in 13C-labeled oligonucleotides. J Biomol NMR 4: 631–644 Schwalbe H, Marino JP, Glaser SJ, Griesinger C (1995) Measurement of H,H-coupling constants associated with ν1,ν2 and ν3 in uniformly 13C-labeled RNA by HCC-TOCSY-CCH-E.COSY. J Am Chem Soc 117(27):7251–7252

212

J. Plavec

Sket P, Crnugelj M, Kozminski W, Plavec J (2004) 15NH4+ ion movement inside d(G4T4G4)2 G-quadruplex is accelerated in the presence of smaller Na+ ions. Org Biomol Chem 2(14): 1970–1973 Sket P, Crnugelj M, Plavec J (2005) Identification of mixed di-cation forms of G-quadruplex in solution. Nucleic Acids Res 33(11):3691–3697 Šket P, Pohleven J, Kovanda A, Štalekar M, Župunski V, Zalar M, Plavec J, Rogelj B (2015) Characterization of DNA G-quadruplex species forming from C9ORF72 G4C2-expanded repeats associated with amyotrophic lateral sclerosis and frontotemporal lobar degeneration. Neurobiol Aging 36(2):1091–1096 Sklenar V, Bax A (1987) Measurement of 1H-31P NMR coupling constants in double-stranded DNA fragments. J Am Chem Soc 109(24):7525–7526 Sklenar V, Miyashiro H, Zon G, Miles HT, Bax A (1986) Assignment of the 31P and 1H resonances in oligonucleotides by two-dimensional NMR spectroscopy. FEBS Lett 208(1):94–98 Szyperski T, Ono A, Fernandez C, Iwai H, Tate S, Wuthrich K, Kainosho M (1997) Measurement of 3JC20 P scalar couplings in a 17 kDa protein complex with 13C,15N-labeled DNA distinguishes the B-I and B-II phosphate conformations of the DNA. J Am Chem Soc 119(41):9901–9902 Takahashi S, Kotar A, Tateishi-Karimata H, Bhowmik S, Wang Z-F, Chang T-C, Sato S, Takenaka S, Plavec J, Sugimoto N (2021) Chemical modulation of DNA replication along G-quadruplex based on topology-dependent ligand binding. J Am Chem Soc 143(40): 16458–16469 Tisne C, Simenel C, Hantz E, Schaeffer F, Delepierre M (1996) Backbone conformational study of a non-palindromic 16 base pair DNA duplex exploring 2D 31P-1H heteronuclear inverse spectroscopy: assignment of all NMR phosphorus resonances and measurement of 3J31P-1H30 coupling constants. Magn Reson Chem 34:S115–S124 van Wijk J, Huckriede BD, Ippel JH, Altona C (1992) Furanose sugar conformations in DNA from NMR coupling constants. Methods Enzymol 211:286–306 Webba da Silva M (2007) NMR methods for studying quadruplex nucleic acids. Methods 43: 264–277 Webba da Silva M, Trajkovski M, Sannohe Y, Hessari NM, Sugiyama H, Plavec J (2009) Design of a G-quadruplex topology through glycosidic bond angles. Angew Chem Int Ed 48(48): 9167–9170 Weber PL, Drobny G, Reid BR (1985) 1H NMR studies of Lambda-Cro repressor. 1. Selective optimization of two-dimensional relayed coherence transfer spectroscopy. Biochemistry 24(17): 4549–4552 Wijmenga SS, van Buuren BNM (1998) The use of NMR methods for conformational studies of nucleic acids. Prog Nucl Magn Reson Spectrosc 32:287–387 Wüthrich K (1986) NMR of proteins and nucleic acids. Wiley, New York Xue Y, Kellogg D, Kimsey IJ, Sathyamoorthy B, Stein ZW, McBrairty M, Al-Hashimi HM (2015) Characterizing RNA excited states using NMR relaxation dispersion. Methods Enzymol S A Woodson and F H T Allain 558:39–73 Zhu LM, Reid BR, Drobny GP (1995) Errors in measuring and interpreting values of coupling constants J from PE.COSY experiments. J Magn Res A 115(2):206–212

7

Z-DNA Doyoun Kim, Vinod Kumar Subramani, Soyoung Park, Joon-Hwa Lee, and Kyeong Kyu Kim

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chemical and Structural Properties of Z-DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Left-Handed Z-DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Crystal Structures of Z-DNA in Complex with Chemical Inducers . . . . . . . . . . . . . . . . . . . . . . . . Crystal Structures of Z-DNA in Complex with Z-DNA Binding Proteins . . . . . . . . . . . . . . . . . Crystal Structures of BZ Junctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NMR Studies of Z-DNA Transition Induced by ZBP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NMR Monitoring on Z-DNA Formation of d(CGCGCG)2 by ZBPs . . . . . . . . . . . . . . . . . . . . . . NMR Monitoring on Intermolecular Interaction of ZBPs with Z-DNA . . . . . . . . . . . . . . . . . . . B-to-Z Transition Mechanism of DNA Induced by ZBPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

214 217 217 218 222 224 225 225 226 227

Doyoun Kim and Vinod Kumar Subramani contributed equally with all other contributors. D. Kim Therapeutics and Biotechnology Department, Drug Discovery Platform Research Center, Korea Research Institute of Chemical Technology (KRICT), Daejeon, Republic of Korea Medicinal Chemistry and Pharmacology, Korea University of Science and Technology (UST), Daejeon, Republic of Korea e-mail: [email protected] V. K. Subramani · K. K. Kim (*) Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea e-mail: [email protected]; [email protected] S. Park (*) Immunology Frontier Research Center, Osaka University, Osaka, Japan e-mail: [email protected] J.-H. Lee (*) Department of Chemistry and RINS, Gyeongsang National University, Jinju, Republic of Korea e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_9

213

214

D. Kim et al.

NMR Dynamics Study on B-to-Z Transition of DNA Induced by ZBPs . . . . . . . . . . . . . . . . . . A-to-Z Transition Mechanism of RNA Induced by ZBPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BZ Junction Formation of DNA Induced by ZBPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chemical Biology Strategies Used to Elucidate the Biological Significance of Z-DNA and Z-DNA Binding Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strategies Used to Determine the Structure and Stability of Z-DNA . . . . . . . . . . . . . . . . . . . . . . Strategies for Developing a Z-DNA Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strategies Applied to Ascertain the Z-DNA Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strategies for Developing Therapeutics Targeting Z-DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Strategies Applied for Nanotechnology Applications Using Z-DNA . . . . . . . . . . . . . . . . . . . . . . Disease Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Z-DNA Is Immunogenic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Z-DNA Forming Sequence (ZFS) Controls the Expression of the DiseaseRelated Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Z-DNA Forming Sequence (ZFS) Is a Hotspot for the Large-Scale Deletion of DNA . . . . Disease Implications of Z-DNA Binding Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion and Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

229 229 230 230 230 232 233 234 234 235 235 235 236 236 237 238

Abstract

Z-DNA, a left-handed form of double-stranded DNA formed by CG repeat sequences, was discovered more than 50 years ago. Since then, the presence of several Z-DNA-binding proteins and Z-DNA-forming sequences in the genome have confirmed the biological relevance of Z-DNA. Over the last five decades, enormous scientific achievements have been made by outstanding scientists to reveal the chemical nature of Z-DNA and the role of Z-DNA and Z-DNA-binding proteins in cells. In this chapter, various chemical biology approaches used to evaluate the physicochemical characteristics and biological roles of Z-DNA/ZRNA and their binding proteins are comprehensively reviewed. Finally, the clinical implications and perspectives regarding Z-DNA are discussed. Keywords

Z-DNA · Z-RNA · BZ junction · Structure · Chemical biology

Introduction Double-stranded deoxyribose nucleotide (dsDNA) appears in many different conformations, including B-DNA, A-DNA, H-DNA, G-quadruplex, i-motif, AC-motif, and Z-DNA (Hur et al. 2021; Ghosh and Bansal 2003). Unlike the canonical B-DNA, Z-DNA adopts the left-handed conformation (Wang et al. 1979). The first crystal structure of Z-DNA at high salt concentrations revealed that the phosphate backbones of Z-DNA are arranged in a zigzag pattern, an energetically unstable conformation (Wang et al. 1979). Many structural, chemical, and biological studies exploring left-handed Z-DNA have revealed the sequence preference for Z-DNA formation in particular physicochemical environments (Dickerson et al. 1982). In 1981, Alexander Rich and colleagues invented the anti-Z-DNA antibody, which

7

Z-DNA

215

enabled the exploration of the presence of Z-DNA formation in biological systems (Lafer et al. 1981). The discovery of Z-DNA-specific antibodies has made it possible to study Z-DNA in biological systems (Lafer et al. 1981; Rich and Zhang 2003). Z-DNA is known to be formed and stabilized by the Z-DNA-specific binding proteins (ZBPs) that commonly contain a Zα domain. Zα binds nucleotides with the Z-conformation under physiological conditions. Several ZBPs, including adenosine deaminase that acts on RNAs 1 (ADAR1), viral E3L, Z DNA-dependent activator of IFN-regulatory factor (DAI, also known as ZBP1 or DLM-1), protein kinase containing Zα domain (PKZ), ORF112 in fish herpes virus, and mitochondrial RBP7910, have been identified and studied (Herbert et al. 1995; Schwartz et al. 2001; Kim et al. 2014; Kus et al. 2015; Nikpour and Salavati 2019; Ha et al. 2004). These insights have led to intriguing questions regarding the biological functions of energetically unstable Z-DNA. Although the discovery of Z-DNA-binding proteins suggests a role for Z-DNA in the innate immune system, several bioinformatic predictions and immunostaining analyses implicate Z-DNA in genetic processes, such as gene regulation. Z-DNA-forming regions (ZFRs) in genomic DNA are linked to large-scale deletions and are significantly associated with genetic instability (Wang and Vasquez 2007). Z-DNA was originally discovered in high salt conditions (Pohl and Jovin 1972), which has little biological significance under physiological conditions. Furthermore, Z-DNA chemical biology has been limited by the handful of chemicals that have been identified in individual studies. These studies have yet to materialize in a Z-DNA-targeted therapeutic drug (Fig. 1). The research mostly consists of in vitro studies using synthetic DNA sequences. This limitation has been partly due to the scarcity of new Z-DNA-modulating chemicals that target either the sequences directly or the players and factors that stabilize Z-DNA. The discovery of new Z-DNA modulators has been limited by a lack of good screening strategies, even though present small-molecule libraries run into the thousands and current DNA-encoded chemical libraries run into the millions of possible chemical candidates. However, there is a huge gap in the availability of strategies and methods to evaluate available small-molecule libraries. Some of these libraries and candidate chemicals are FDA-approved drugs that can be used in humans. Therefore, effective testing strategies are needed to identify the role of Z-DNA. This would ensure relatively quick exploitation of targeted therapeutic strategies. For example, the FDA-approved drug curaxin (CBL0137) was identified for its Z-DNA stabilization role, which has been shown to be effective in cancer therapy (Zhang et al. 2022). We believe that this is only the beginning of the emergence of many more potent candidates targeting Z-DNA. Prof. Alexander Rich, a key discoverer of the structure of the unusual left-handed Z-DNA, strongly believed that Z-DNA plays a key role in biological systems, and he investigated the functional roles of Z-DNA. However, owing to the serious limitations in the technologies available for studying this unique form and highly dynamic nature of this conformation of DNA, which continues to be challenging even today, the biological role of Z-DNA has emerged slowly. Despite such limitations, the discovery of the functional roles of Z-DNA has progressed steadily. Key landmarks include the generation of anti-Z-DNA antibody (Lafer et al. 1981); the role of

216

D. Kim et al.

Fig. 1 Z-DNA chemical biology (infographic). Schematic illustration of the Z-DNA chemical biology detailed in this chapter. Horizontal middle section: Surface representative model of a BZ junction DNA. The B-DNA (orange) and Z-DNA (green) structure is joined at the junction (blue), showing extruded adenine (bottom) and thymine (top) bases. This structural model was drawn using ChimeraX . Top right: Z-DNA structural characterization involves the modification at C5 position in cytosine (pyrimidine), C8 position in guanine (purine), and the amino group. These are highlighted within dotted circles on the structure diagram of the cytosine and guanine bases. Vertical middle region: The chemicals are involved (shown on top) in the characterization of the BZ junction. The chemical structure of a purine base and thymine is shown along the bottom. 2AP is 2-aminopurine, a base analogue of adenine that has fluorescence properties. Bottom left: Names of chemicals that have been successfully applied as a sensor using the B-DNA to Z-DNA transition mechanism. Bottom right: A pill-shaped box showing a list of chemicals that have been demonstrated to work as therapeutic strategies using the Z-DNA or the B-to-Z transition mechanism. The list in the green box indicates the chemicals that work via stabilizing the Z-DNA, while the list in the red box indicates the chemicals that work via destabilizing the Z conformation and cause the Z-DNA to B-DNA transition. Right side: The key functions of Z-DNA mediated either directly or indirectly via other biological or chemical factors. Refer to sections “NMR Studies of Z-DNA Transition Induced by ZBP” and “Chemical Biology Strategies Used To Elucidate The Biological Significance Of Z-DNA And Z-DNA Binding Proteins” for details on the chemicals and evidence of their functions. (We also pay homage to late Prof. Alexander Rich who always strongly believed in the functional role of Z-DNA)

Z-DNA in biological processes, such as negative supercoiling (Rich and Zhang 2003) and base modification (Behe and Felsenfeld 1981); and the discovery of Z-DNA binding proteins (Rich and Zhang 2003); and the role of Z-DNA in immune response, viral pathogenicity, genomic instability, chromatin remodeling, promoter activity, and gene regulation (Rich and Zhang 2003). There is also a growing list of diseases that are associated with Z-DNA either directly via Z-DNA forming sequences (Renciuk et al. 2010; Ravichandran et al. 2019) or

7

Z-DNA

217

mediated by the regulation of proteins that have the Z-DNA binding Zα domain (Ravichandran et al. 2019; Herbert 2019). Emerging evidence has highlighted the roles of Z-DNA in bacterial biofilm composition (Buzzo et al. 2021), as well as ZDNA-targeted cancer therapeutic strategies (Zhang et al. 2022) (Fig. 1). Thus, we are at an exciting juncture where further advances in the role of Z-DNA in biology and disease are likely to be made. However, the current focus is on Z-DNA-targeted chemical biology, with strategies to employ Z-DNA science in therapeutic efforts. Thus, we are currently stepping into the next phase of Z-DNA science. As the evidence of the biological role of Z-DNA in the genome has accumulated, concerns about the establishment of BZ junctions, where Z-DNA is surrounded by B-DNA in genomic sequences, have been raised. Ha et al. solved the BZ junction structure in complex with four Zα domains from ADAR1, revealing a distinct base-extruded structure (Ha et al. 2005). The structural discrepancies between heterogeneous helical segments result in the formation of distinct junction structures, including BZ, ZZ, and AZ junctions (Ha et al. 2005; Kim et al. 2009). Junction formation with distinct structural features suggests the existence of junctionrecognizing proteins. The presence of Z-DNA was first proposed by Pohl and Jovin in 1972, and its structure at the atomic level and name were subsequently elucidated by Rich at 1979 (Wang et al. 1979; Pohl and Jovin 1972). However, due to technical limitations and lack of genomic information, the role of Z-DNA in biological systems has been controversial. Despite this, studies on Z-DNA, specifically in the cellular context, are flourishing and many new interesting results are being unraveled (Zhang et al. 2022; Sun et al. 2022; Herbert et al. 2022). For example, there is now an annual meeting of Z-nucleic acid scientists and enthusiasts, known as the ABZ Meeting, during which researchers discuss new findings within this small but growing community (https:// www.abz.bio/). This meeting has enabled the collaborative exploration of the functional aspects of Z nucleic acid biology, which has culminated in several recent findings (Zhang et al. 2022; Herbert et al. 2022). In this context, we believe that it is appropriate and useful to thoroughly review Z-DNA from different perspectives. In this chapter, we first discuss the chemical and structural properties of Z-DNA, BZ junctions, and their binding proteins explored by X-ray, NMR, and other spectroscopic methodologies. Furthermore, the ZBP-induced Z-DNA transition mechanism elucidated by NMR studies is also reviewed. Next, the chemical biology strategies used to study Z-DNA and Z-DNA-binding proteins are thoroughly reviewed. Finally, the clinical implications and further perspectives on Z-DNA research are discussed.

Chemical and Structural Properties of Z-DNA Left-Handed Z-DNA Z-DNA and Z-form DNA are structurally distinct from other dsDNA conformations (Dickerson et al. 1982). Z-DNA is structurally distinguished by its left-handed helical shape, whereas other DNA conformations, such as B-DNA and A-DNA,

218

D. Kim et al.

adopt a right-handed conformation (Fig. 2). The term Z-DNA is derived from the zigzag arrangement of the phosphate backbone, which was identified in the first crystal structure of the deoxy-hexamer, d(CGCGCG)2 in the presence of two spermine molecules (Wang et al. 1979). Z-DNA has a smaller diameter (18 Å), whereas B-DNA and A-DNA have widths of 20 and 23 Å, respectively (Dickerson et al. 1982). While major and minor grooves are seen in B-DNA and A-DNAs, Z-DNA has a narrow and deep minor groove along the relatively longer pitch of the double-stranded helix, taking 12 bp per turn and a 3.7 Å distance rise in the base step (Dickerson et al. 1982) (Fig. 2). In the crystal structure of Z-DNA, all guanine bases (dGs) were in syn-conformation, whereas all cytosine bases (dCs) were in anticonformation, similar to right-handed B-DNA (Wang et al. 1979). The sugar puckers of purine and pyrimidine in Z-DNA are C30 -endo and C20 -endo pucker, respectively, whereas C30 -endo and C20 -endo sugar puckers were found in A-DNA and B-DNA, respectively (Fig. 3) (Dickerson et al. 1982). Left-handedness was initially characterized for the first time with circular dichroism (CD) signals from DNA molecules in high salt conditions (3 M NaCl) (Pohl and Jovin 1972). Energetically unstable Z-DNAs are stabilized under physicochemical conditions, such as high salt concentration (3 M NaCl), cytosine methylation, millimolar concentrations of cations, spermine, and spermidine (Rich and Zhang 2003). Comprehensive computational and biochemical research revealed that the alternation of CG repeats was energetically more favorable for forming a Z-DNA conformation compared to that of CT and GT repeat alternation (Ho et al. 1986). The energies required to form an antisyn conformation for CC, CA, and CG dinucleotides are 2.4, 1.3, and 0.7 kcal/mol, respectively (Ho et al. 1986). Remarkably, the alternation of adenine (A) and thymidine (T) did not favor the adoption of the Z-DNA conformation. Because purines are more likely to adopt syn-conformation than pyrimidines, sequences formed by alternating purine pyrimidine repeats are likely to adopt a Z-DNA conformation (Ho et al. 1986).

Crystal Structures of Z-DNA in Complex with Chemical Inducers Owing to the higher energy status of Z-DNA, external factors, such as polyamines, multivalent metal ions, inorganic salts, and base modifications, are required for Z-DNA stabilization (Wang and Vasquez 2007). The first crystal structure of dsDNA revealed unexpected left-handed Z-DNA in a complex with two molecules of spermine (Wang et al. 1979). The spermines bind to the narrow minor grooves in the structure and seem to release negatively charged repulsion energy from the closed phosphate backbone (Fig. 4a) (Drozdzal et al. 2013). Since spermine and spermidine’s positively charged amine groups act as Z-DNA stabilizers, oligopeptides with basic side chains, (Lys-Ala-Ala)n and (Lys-Leu-Ala)n induce the B-to-Z transition of poly(dG-dC) sequences (Votavova et al. 1991). In addition to polyamines, transition metals, such as Mn2+, Co2+, Mg2+, Zn2+, Cd2+, and Ni2+, convert the conformation of poly d(CG)n DNA from B-DNA to Z-DNA (van de Sande et al. 1982). The crystal structure of Z-DNA in high concentrations of MgCl2

7

Z-DNA

219

Fig. 2 Structural properties of A-, B-, and Z-conformations. The crystal structures of A-RNA (PDB ID 1SDR), B-DNA (PDB ID 1BNA), and Z-DNA (PDB ID 4OCB) are represented as sphere models, and their top views are shown as a stick model with a ribbon style backbone. The backbones in the sphere model are shown in red. The different strain of conformers in top view are shown in blue and red. The structure properties of each conformation are described below in the table

220

D. Kim et al.

Fig. 3 Crystal structures of Z-DNA bound to small molecules: (a). Spermine binds to narrow minor groove (PDB ID 1D48); (b). the crystal structure of Z-DNA is stabilized by ZnCl2 (PDB ID 4HIF) and MnCl2 (PDB ID 4HIG). The divalent metals are shown as violet spheres, and the chloride ions are shown as green spheres; (c). the trivalent, Cr3+ ions stabilize Z-DNA (PDB ID 4R15). The Cr3+ ions are shown as yellow spheres; (d). the cobalt hexamine binds and stabilizes Z-DNA (PDB ID 1XA2); and (e). the non-APP sequence is stabilized in Z-DNA conformation (PDB ID 1VTW). All cytosines in the structure were modified by C5 methylation

(500 mM MgCl2) and CaCl2 (500 mM CaCl2) revealed that the positive charge of magnesium ions neutralized the charge repulsion in the phosphate backbone of Z-DNA (Fig. 4b) (Chatake and Sunami 2013). The ultrahigh-resolution crystal structure of Z-DNA in complex with Mn2+ and Zn2+ ions in protonated sperminium tetracation (Spk) revealed that divalent metals coordinate with the guanine bases of Z-DNA and support its conformational stability (Fig. 4b) (Drozdzal et al. 2013). The trivalent chromium, Cr3+, in the crystal structure of d(CG)3 showed that Cr3+ ions bind phosphate backbones and stabilize the Z-conformation (Fig. 4c) (Drozdzal et al. 2015). A micromolar concentration of cobalt hexamine ([Co(NH3)6]3+) effectively induced poly(dG-dC) from B-DNA to Z-DNA (Behe and Felsenfeld 1981). The AT base pairs in Z-DNA forming sequences (ZFS), which are energetically less favorable to adopt Z-DNA, destabilize the Z-DNA conformation and enforce a DNA cruciform (Wang et al. 1984). In the crystal structure of Z-DNA containing the adenine-thymine (AT) base pair, dehydration in narrow grooves was detected largely due to solvent disorder, while the helical grooves of GC base-paired Z-DNA were

7

Z-DNA

221

Fig. 4 Crystal structures of Z-DNA binding domains in complex with Z-DNA. A-D. The crystal structure of human Zα domain of ADAR1 (PDB 1QBJ): (a) Zα domain of mouse DAI (PDB 1J75); (b) Zα domain of viral E3L (PDB 1SFU); (c), and Zα domain of gold-fish PKZ (PDB 4KMF); (d) are shown in blue, cyan, purple, and yellow, respectively; (e) the crystal structure of cyprinid herpesvirus3 ORF112 (hvORF112) in complex with Z-DNA (PDB 4WCG) is shown as a gray ribbon model; and (f) structural superimposition of Zα families. The superimposition result shows that they share a conserved winged helix turn helix (wHTH) structure

filled with two water molecules per base pair (Wang et al. 1984). The crystal structure of d(CGCGCG)2 in complex with ruthenium hexamine, Ru(NH3)6+, salt revealed that Z-DNA is stabilized by base contact with ruthenium (III) and cobalt (III) hexamine (Ho et al. 1987).

222

D. Kim et al.

In conclusion, Z-DNA structures in complexes with charged small organic molecules, inorganic cations, and transient metal ions mutually reduced the unfavorable electrostatic repulsion of the negatively charged phosphate backbones of Z-DNA. Some chemicals, including ruthenium and cobalt hexamine, have demonstrated sequence-specific B-to-Z conversion ability.

Crystal Structures of Z-DNA in Complex with Z-DNA Binding Proteins Adenosine deaminase acting on RNA 1 (ADAR1) was first identified as an enzyme that converts adenosine to inosine at a specific site of RNA (RNA editing), resulting in the expression of different proteins with different amino acid compositions from the same mRNA (Herbert 2019). Band shift analysis with a Z-DNA probe, chicken lung extract, and blood nuclei revealed that ADAR1 has Z-DNA-specific binding activity mediated via its Zα domain (ZαADAR1) (Herbert et al. 1995). The crystal structure of human ZαADAR1 in complex with d (CGCGCGCG)2 dsDNA revealed that it binds to and stabilizes Z-DNA under physiological conditions (Schwartz et al. 1999). In the crystal structure, two Zα domains bind to one palindromic dsDNA in the Z-DNA conformation. The Zα domain is composed of topologies with a winged helix-turn-helix (wHTH) structure, commonly found in DNA-binding proteins (Fig. 5a) (Schwartz et al. 1999). Currently, seven protein families, including ADAR1, DAI (also known as ZBP1 or DLM-1), PKZ in fishes, E3L in viruses, ORF112 in fish herpesviruses, I73R in African swine fever virus, and RBP7910 in mitochondria, are known to have Zα domains (Schwartz et al. 2001; Kim et al. 2014; Kus et al. 2015; Nikpour and Salavati 2019; Ha et al. 2004; Sun et al. 2022; Schwartz et al. 1999). The crystal structure of the Zα domain of mouse DAI (mZαDAI) in complex with dT(CGCGCGCG)2 repeats revealed that mZαDAI stabilized the Z-DNA conformation invariant from that of ZαADAR1 (Fig. 5b) (Schwartz et al. 2001). The crystal structure of Zα domains in complex with Z-DNA revealed that all interactions between Zα domains and Z-DNA are structurally conserved with distinct features: The hydrophobic core, which comprises Tyr from α3 and Trp from the β-wing, is functionally significant; the conserved Tyr residue from the α3 core interacts with syn conformation of the guanine base; and Zα recognizes the zig-zag arrangement of the phosphate backbone rather than the sequence (Fig. 5) (Kim et al. 2010). The major difference in Z-DNA recognition by the Zα domains is the structural difference in the β-wing motif of the Zα domains. Structural comparison between mZαDAI and hZαADAR1 in complex with Z-DNA revealed a relatively shorter loop in mZαDAI (Fig. 5b) (Schwartz et al. 2001). The Zα domain of Yabalike disease virus (YLDV) E3L (yabZaE3L) shows structural similarity to mammalian Zα proteins, ZαADAR1 and ZαDAI, and recognizes Z-DNA through the conserved surface of yabZαE3L (Fig. 5c) (Ha et al. 2004). In fish PKZ, the structural diversity in the β-wing region of the Zα domain correlates with its B-to-Z transition activity. The Zα domain of Carassius auratus PKZ (caZαPKZ) has a positively charged β-wing motif comprising Trp, Pro, Pro, and Lys residues. The caZαPKZ showed faster B-to-Z transition activity than other Zα domains, largely due to the

7

Z-DNA

223

Fig. 5 Crystal structures of BZ and ZZ junctions. The crystal structures of the BZ junctions (PDB 2ACJ, 5ZUP, 5ZUO, and 5ZU1) are shown. The solvent extrusion bases at the junction (A or T of A-T base pair) are labeled. The A-T base pair at the junction of four crystal structures was broken and extruded out of the helical stacking axis. The crystal structure of ZZ junction in complex with four Zα domains (PDB 3IRQ) was modeled, and the junction between two Z-DNA parts was labeled. No base extrusion and base broken were observed in the ZZ junction structure

charge interaction between the positively charged β-wing and Z-DNA (Fig. 5d) (Kim et al. 2014). An HTH with a large wing motif, which contains an extra basic motif in the β-wing, was found in Danio rerio PKZ. Larger and more basic interactions have been found to be correlated with fast B-to-Z transition kinetics (Subramani et al. 2016). The crystal structure of the Zα domain from Cyprinid herpesvirus 3 ORF112 in complex with 18-bp CpG DNA, which forms Z-DNA, revealed that it forms a similar contact with another Zα family through α3 core helix while containing a short-wing structure comprising only Trp and Pro residues (Fig. 5e) (Kus et al. 2015). Although most structural studies of protein–Z-DNA complexes have been conducted with alternating cytosine guanine repeat DNA, Ha et al. (2009) solved the crystal structure of hZαADAR1 in complex with non-CG repeat DNA in the Z-DNA conformation (Ha et al. 2009). Three non-CG-repeat DNAs, d(CACGTG)2, d(CGTACG)2, and d(CGGCCG)2, adopt conventional Z-DNA conformation in crystal structures in complex with hZαADAR1 (Fig. 5f–h) (Ha et al. 2009). These results conclusively showed that the conformational restraints originate from the

224

D. Kim et al.

binding pocket of the Zα domain and recognize the structural features of the lefthanded zigzag arrangement of the phosphate backbone (Ha et al. 2009).

Crystal Structures of BZ Junctions The formation of noncanonical structures of DNA, such as Z-DNA, H-DNA, and G-quadruplex, has biological consequences, such as gene regulation, recombination, genetic instability, and various diseases (Ravichandran et al. 2019). ZFS in longstretched dsDNA or genomic DNA can be surrounded by canonical B-DNA (Rich and Zhang 2003). Because Z-DNA is a higher energy conformer of the double helix, it is stabilized by negative supercoiling during transcription during gene expression. Therefore, the formation of Z-DNA inside the B-DNA environment leads to successive heterogeneous helical structures (Rich and Zhang 2003). In 2005, Ha et al. crystallized such a BZ junction forming sequence for the first time in complex with four hZαADAR1 molecules (Ha et al. 2005). The structure revealed that the purinepyrimidine tracts in the DNA sequence adopted the Z-DNA conformation when stabilized by ZαADAR1, whereas the remaining sequences, except the A-T base pair at the junction, were in the right-handed B-DNA conformation (Fig. 5a). Surprisingly, the A-T base pair at the BZ junction was broken and extruded from the base stacking, while leaving two heterogeneous helical segments, B-DNA and Z-DNA, stacked continuously (Fig. 5a) (Ha et al. 2005). The three crystal structures of the BZ junction with various Z-forming sequences combined with fluorescent probe-based structural analysis revealed a sequence preference for BZ junction formation (Kim et al. 2018). The base extrusion structure was consistently observed when the A-T base pair at the junction was substituted with a T-A base pair (Fig. 5b). The crystal structures of the BZ junction composed of d(CGCGCGCG)2 and d(CGCGCGCA)/d (TGCGCGCG) sequences in the Z-DNA segment showed that these step variations prior to the BZ junction site also adopted the conventional Z-DNA conformation and the accompanying base-extruded junction structure (Fig. 5c, d). Crystallographic and fluorescent probe-based structure analysis conclusively revealed common structural features: (1) Base extrusion is a common structural feature of BZ junctions; (2) only the A-T base pair, and not the G-C base pair, after completing the Z-DNA dinucleotide unit is a target site for BZ junction formation; and (3) the CCGT step before the BZ junction site is not favorable for forming stable Z-DNA (Kim et al. 2018). In addition to the BZ junction, helical discrepancies with left-handed Z conformations have been identified. The fluorescent adenine analog was utilized to analyze junction formation in AZ and ZZ junction-generating sequences, according to BZ junction structures with extruded A-T base pairs at the BZ junction site (Kim et al. 2009). Unlike the BZ junction structure, the ZZ junction crystal structure displayed no base extrusion (Fig. 5e) (de Rosa et al. 2010). Notably, in the ZZ junction structure, partial or complete disruption of helical stacking between Z-DNA segments was separated by a single base pair (de Rosa et al. 2010).

7

Z-DNA

225

NMR Studies of Z-DNA Transition Induced by ZBP NMR Monitoring on Z-DNA Formation of d(CGCGCG)2 by ZBPs CG-repeat short DNA duplexes, such as d(CGCGCG)2, can form stable left-handed Z-DNA upon binding to various ZBPs, such as ADAR1 (Schwartz et al. 1999), DAI (Schwartz et al. 2001), viral E3L (Ha et al. 2004), and fish PKZ (Kim et al. 2014). Z-DNA formation can be monitored by 31P NMR chemical shifts, which form a sensitive probe of backbone conformational change of nucleic acids and/or 1H imino proton resonances (Kang et al. 2009). An investigation of the B-to-Z transition of d (CGCGCG)2 using NMR revealed five new resonances in the 31P NMR and two new imino resonances in the 1H NMR spectra at 35  C, indicative of Z-form helix generation in DNA induced by hZαADAR1 (Fig. 6a) (Kang et al. 2009). The relative Z-DNA populations of d(CGCGCG)2, which were determined by the integration of these new resonances, showed that DNA was completely converted to a Z-form helix when the protein/DNA (P/N) molar ratio was >2.0 (Fig. 6b) (Kang et al. 2009). Similar results have been reported for the B-to-Z transition of d (CGCGCG)2 induced by yabZαE3L (Fig. 6b) (Lee et al. 2010). In the case of hZα2DAI (previously known as ZβDAI), only 62% of d(CGCGCG)2 is in the Z-conformation, even with a P/N ratio of 6.4 (Fig. 6b) (Kim et al. 2011); in contrast, most of the DNA is converted to Z-DNA by hZαADAR1 and yabZαE3L (Kang et al. 2009; Lee et al. 2010). Interestingly, hZα2DAI induced a different Z-DNA conformation than hZαADAR1, which was confirmed by 31P NMR and imino proton spectra of Z-DNA (Kim et al. 2011). The caZαPKZ exhibited a relatively lower B-toZ transition activity for d(CGCGCG)2 than for d(TCGCGCG)2, unlike other ZBPs (Fig. 6b) (Lee et al. 2016), because caZαPKZ requires additional H-bonding interactions between its K56 side chain and the T0-C1 phosphate of DNA (Kim et al. 2014). Interestingly, the B-to-Z transition activities of d(CGCGCG)2 and d (TCGCGCG)2 induced by ZαPKZ can be modulated by varying the salt concentration (Lee et al. 2016, 2017). Non-CG-repeat short DNA duplexes are also converted to Z-DNA by hZαADAR1 (Ha et al. 2009). NMR studies revealed that 89% of d(CACGTG)2 was converted to Z-DNA by hZαADAR1 at P/N ¼ 5.3, and only 76% of d (CGTACG)2 displayed the Z-conformation, even though the P/N ratio increased up to 7.4 (Seo et al. 2010). In the case of the RNA duplex r(CGCGCG)2, hZαADAR1 induces an A-to-Z transition of the CG-repeat RNA to an extent similar to the B-to-Z transition in CG-repeat DNA, although it exhibits distinct imino proton spectra (Lee et al. 2019). The crystal structure of the hZαADAR1–d(CGCGCG)2 complex revealed that the interaction is mediated by five residues in the α3 core and four residues in the β-hairpin (Schwartz et al. 1999). Among these residues, K169, N173, Y177, and W195 showed a high degree of conservation among ZBPs. The N173 and Y177 mutants of hZαADAR1 were unable to convert d(CGCGCG)2 to Z-DNA (Jeong et al. 2014). The K169A and W195F mutants induce a complete B-to-Z transition at a much higher P/N ratio (Jeong et al. 2014). Thus, these residues play important roles

226

D. Kim et al.

Fig. 6 NMR studies on Z-DNA induced by ZBPs: (a) 31P NMR (left) and 1H imino proton spectra (right) of d(CGCGCG)2 at 35  C upon titration with hZαADAR1 (Kang et al. 2009). New resonances from Z-form, denoted as Z; (b) relative Z-form populations in the various Z-DNAZBP complexes as a function of the P/N ratio (Kang et al. 2009; Lee et al. 2010; Kim et al. 2011; Lee et al. 2016). Solid lines represent the best fit to the correlation between Z-form population and the P/N ratio; (c) superposition of 1H/15N HSQC spectra of free hZαADAR1 (blue), hZαADAR1-d (CGCGCG)2 (green), and hZαADAR1-r(CGCGCG)2 complexes at 35  C (Kang et al. 2009; Lee et al. 2019); and (d) chemical shift changes in hZαADAR1 upon binding to d(CGCGCG)2 (upper) (Kang et al. 2009) and r(CGCGCG)2 (lower) (Lee et al. 2019)

in ZBP function, specifically in the B-to-Z transition of DNA. In the case of the mutants of the other five residues (K170A, R174A, T191A, P192A, and P193A), 100% of d(CGCGCG)2 was converted to Z-DNA at P/N < 3.0 (Jeong et al. 2014).

NMR Monitoring on Intermolecular Interaction of ZBPs with Z-DNA Chemical shift perturbation (CSP), which is represented as the weighted average 1 15 H/ N chemical shift change (Δδavg), is a simple NMR technique for studying the binding of a protein to various ligands, such as small molecules, nucleic acids, and other proteins. The binding of hZαADAR1 to the Z-conformation of d(CGCGCG)2 causes significant chemical shift changes in all the residues of the α3 helix with Δδavg >0.1 ppm (Fig. 6c, d) (Kang et al. 2009). In addition, significant chemical shift changes were observed in the β1-α2 and β2-loop-β3 regions (Fig. 6d) (Kang et al.

7

Z-DNA

227

2009). These results are consistent with the direct interaction of their side chains with the phosphate backbone of Z-DNA, as reported in the crystal structure (Schwartz et al. 1999). The binding of hZαADAR1 to RNA [r(CGCGCG)2] (Lee et al. 2019) or 20 -OMe-containing DNA [d(CGCmGCG)2] (Lee et al. 2019) also results in significant chemical shift changes in the α3 helix, as well as the β1-α2 and β2-loop-β3 regions. Similar CSP results have been reported for Z-DNA binding of yabZαE3L (Lee et al. 2010) and caZαPKZ (Lee et al. 2016). In the case of hZα2DAI, residues in the α3 helix underwent chemical shift changes of only 0.02–0.11 ppm upon binding to d(CGCGCG)2, indicating its lower B-to-Z transition activity (Kim et al. 2011). Similarly, residues of hZαADAR1 showed Δδavg 2.0 (Kang et al. 2009), consistent with the crystal structure of the hZαADAR1–d(CGCGCG)2 complex (Schwartz et al. 1999). NMR titration studies revealed that Z-DNA is produced as half the total amount of added hZαADAR1 until the P/N ratio is up to 2.0 (Kang et al. 2009). Based on the crystal structure, this can be considered a mixture of the Z-DNA–(hZαADAR1)2 complex and free B-DNA. However, a dynamic study suggested that the conformational state of hZαADAR1 in the complex at P/N ¼ 1.0 is not the Z-DNA–(hZαADAR1)2 complex but rather a mixture of two conformational states, Z-DNA–hZαADAR1 and Z-DNA–(hZαADAR1)2 complexes (Kang et al. 2009). This hypothesis is supported by the observation that the complex state at P/N ¼ 1.0 had a larger molecular weight than free protein but smaller than the DNA–(hZαADAR1)2 complex, as determined by gel-filtration chromatography, as well as diffusion-coefficient measurement (Kang et al. 2009). Taken together, the active mechanism of the B-to-Z transition of a DNA duplex, d(CGCGCG)2, by hZαADAR1 is suggested as follows (Fig. 7): (i) Initially, hZαADAR1 (P) preferentially binds to B-DNA (B) to form BP (Kd,BP ¼ [B][P]/[BP]); (ii) simultaneously, the B-form helix of the BP complex is converted to left-handed Z-DNA with KBZ to form ZP; and (iii) finally, the ZP2 complex is produced by the addition of P to a ZP (Kd,ZP2 ¼ [ZP][P]/[ZP2]) (Kang et al. 2009). Based on this hypothesis, NMR hydrogen exchange data were analyzed to obtain KBZ of ~1 and Kd,ZP2/Kd,BP of ~87 in the B-to-Z transition of d(CGCGCG)2 induced by hZαADAR1 (Kang et al. 2009). Similarly, yabZαE3L converted the B-form helix of d(CGCGCG)2 to Z-DNA with KBZ of 1.02 and Kd,ZP2/Kd,BP of 6.5 (Lee et al. 2010). The non-CG-repeat DNA, d(CACGTG)2, can be converted to Z-DNA by hZαADAR1 with KBZ of 0.4, Kd,BP of 256 μM, and Kd,ZP2 of 182 μM (Seo et al. 2010). In the case of d(CGTACG)2, hZαADAR1 induced a B-to-Z transition with

228

D. Kim et al.

Fig. 7 Mechanisms for the B-to-Z conformational transition of a 6-bp DNA duplex by two hZαADAR1 proteins. The DNAs in B-conformation and Z-conformation are shown as blue and red stick ball models, respectively. The hZαADAR1 proteins are shown in cyan. The B-DNA, Z-DNA, and B-DNA bound to a single hZαADAR1, Z-DNA bound to a single hZαADAR1, B-DNA bound to two hZαADAR1 proteins, and Z-DNA bound to two hZαADAR1 proteins which are labeled with B, Z, BP, ZP, BP2, and ZP2, respectively. (These images were adapted with permission from Fig. 7 of report on B-Z transition mechanism (Kang et al. 2009). Copyright 2022 American Chemical Society)

KBZ of 6.3, Kd,BP of 400 μM, and Kd,ZP2 of 29 μM (Seo et al. 2010). Thus, during the B-to-Z transition, hZαADAR1 exhibits the sequence preference of d(CGCGCG)2  d(CACGTG)2 > d(CGTACG)2 through multiple sequence discrimination parameters (KBZ, Kd,BP, and Kd,ZP2) (Seo et al. 2010). When the Z-DNA binding of ZBPs exhibits fast exchange behavior on the NMR timescale, the CSP data can be analyzed to obtain the KBZ, Kd,BP, and Kd,ZP2 values. Goldfish ZαPKZ induced the B-to-Z transition of d(CGCGCG)2 with KBZ of 0.87, Kd, BP of 28 nM, and Kd,ZP2 of 345 nM under 10 mM NaCl conditions (Lee et al. 2016). The increase in [NaCl] to 100 mM led to ~600- and 25-fold larger Kd,BP and Kd,ZP2, respectively, and a 4.6-fold smaller KBZ (Lee et al. 2016). Titration data at 250 mM NaCl could not be precisely analyzed because of the extremely small KBZ (Lee et al. 2017). The analysis of CSP data was used to reveal the chemical shift changes caused by Z-DNA and B-DNA binding (Lee et al. 2016). At 10 mM NaCl, both B-DNA and Z-DNA binding of caZαPKZ caused similar chemical shift perturbations, such that significant chemical shift changes were observed in the α3 helix, as well as in the β1-α2 and β-hairpin regions (Lee et al. 2016). However, at higher [NaCl], the B-DNA binding conformation of caZαPKZ exhibited a chemical shift pattern similar to that of free ZαPKZ but completely different from that of the Z-DNA binding conformation (Lee et al. 2016, 2017). These studies suggest that the structure of the intermediate complex formed by caZαPKZ and B-DNA can be modulated by varying the salt concentration; thus, it could be used as a molecular ruler to determine the degree of B-to-Z transition (Lee et al. 2016, 2017).

7

Z-DNA

229

NMR Dynamics Study on B-to-Z Transition of DNA Induced by ZBPs The conformational exchange of proteins on a millisecond timescale can be analyzed using NMR [Carr–Purcell–Meiboom–Gill (CPMG) relaxation dispersion (RD)] experiments based on a two-state kinetic model . caZαPKZ exhibited conformational exchange with kex of 550–684 s1 (Lee et al. 2016). 15N CPMGRD experiments at two different magnetic fields revealed that hZαADAR1 also showed the conformational exchange with a kex of 5784 s1 at 35  C between states “A” (89% population) and “B” (11% population) (Go et al. 2021). During the B-to-Z transition, the rate constants for the association (kFB) and dissociation (kBF) of ZBPs with DNA were determined using 15N CPMG-RD experiments based on a pseudo-three-state model (Lee et al. 2019). The CPMG data sets of the hZαADAR1–d(CGCGCG)2 complex were analyzed to obtain a kFB of 157 s1 and kBF of 1050 s1, as well as an exchange rate constant from Z-DNA bound to B-DNA bound states (kZN) of 12.7 s1 (Lee et al. 2019). For the caZαPKZ–d (TCGCGCG)2 complex, the CPMG dataset at 10 mM NaCl was analyzed to obtain a kFB of 98 s1 and kBF of 675 s1 (Lee et al. 2016). As [NaCl] increased to 100 mM, caZαPKZ exhibited a 13-fold slower kFB and 2-fold faster kBF, resulting in a 25-fold larger Kd,ZP2 (Lee et al. 2016).

A-to-Z Transition Mechanism of RNA Induced by ZBPs Similar to DNA, the CG-repeat RNA duplex, r(CGCGCG)2, is completely converted to the Z-form by hZαADAR1 at P/N  2.0 (Lee et al. 2019). Biolayer interferometry experiments have shown that hZαADAR1 has a twofold smaller Kd for a 12-bp CG-repeat RNA than for the corresponding DNA (Lee et al. 2019). In the hZαADAR1–r(CGCGCG)2 complex, significant CSP upon binding to RNA was observed in the α3 helix, as well as in the β1-α2 and β-hairpin regions, similar to Z-DNA binding (Lee et al. 2019). However, several residues exhibited significantly different CSP results when bound to Z-RNA versus Z-DNA (Fig. 6d). For example, T191 in β-hairpin showed a larger upfield shift of its 15N resonance upon binding to Z-RNA than to Z-DNA (Fig. 6d) (Lee et al. 2019). In addition, the Trp195 side chain showed little CSP when binding to Z-RNA but a large perturbation when binding to Z-DNA (Fig. 6d) (Lee et al. 2019). The CPMG data set of the hZαADAR1–r(CGCGCG)2 complex was analyzed based on pseudo-three-state model to obtain a kFB of 3.0 s1 and kBF of 34.9 s1, as well as kZN of 2.2 s1 (Lee et al. 2019). These rate constants for slow exchange are confirmed by 2D TROSY-based ZZ-exchange experiments, where the kFB and kBF were determined as 14.7 s1 and of 41.8 s1, respectively (Lee et al. 2019). These unique structural and dynamic features of hZαADAR1 in Z-RNA binding may be involved in targeting ADAR1 for RNA recognition.

230

D. Kim et al.

BZ Junction Formation of DNA Induced by ZBPs Z-DNA is produced in the long genomic DNA by ZBPs through the formation of two BZ junction structures. An X-ray structural study showed that DNA helices are stabilized at one end in the Z-conformation by hZαADAR1 proteins, while the other end remains as B-DNA (Ha et al. 2005). An NMR titration study of hZαADAR1 to a 13-bp DNA, which contains CG-repeat, as well as non-CG-repeat AT-rich regions, showed that DNA maintains a B-form helix when P/N < 1 (Lee et al. 2012b). The 31 P NMR spectrum at P/N ¼ 4.5 indicated that, in this complex, DNA converted the BZ junction structure by binding with four molecules of hZαADAR1 (Lee et al. 2012b). The CSP data of hZαADAR1 upon binding to this 13-bp DNA at P/N ¼ 1.0 indicated that the hZαADAR1 exhibits the unique conformation (known as “initial contact” conformation), which is distinct from the Z-DNA-bound conformation, similar to that of the hZαADAR1–d(CGCGCG)2 complex (Lee et al. 2012b). Interestingly, when P/N ¼ 2.0, hZαADAR1 exhibits properties of both the initial contact conformation and Z-DNA-bound conformation (Lee et al. 2012b). An NMR hydrogen exchange study at P/N < 1.0 revealed that, in the initial contact conformation, the GC base pairs in the CG-repeat region are opened more slowly and closed considerably more slowly than free DNA (Lee et al. 2012b). At the same time, these GC base pairs in the complex form become slightly less stable than the corresponding base pairs in free DNA (Lee et al. 2012b). In addition, NMR studies revealed that, in the initial contact conformation, hZαADAR1 significantly destabilized the AT base pairs in AT-rich regions, even though it physically interacts with the-CG-repeat region (Lee et al. 2012b). Based on these findings, a mechanism for BZ junction formation in the long genomic DNA was suggested: (i) ZBP specifically interacts with a CG-repeat DNA segment via initial contact conformation; (ii) the neighboring AT-rich regions become very unstable and form a bubblelike structure, and the CG-repeat segment easily converts to Z-DNA; and (iii) the neighboring AT-rich regions base-paired again, and the BZ junction structure is formed (Lee et al. 2012b).

Chemical Biology Strategies Used to Elucidate the Biological Significance of Z-DNA and Z-DNA Binding Proteins Strategies Used to Determine the Structure and Stability of Z-DNA Initial efforts to characterize the structure of Z-DNA and its unique features predominantly involved chemical reagents that react specifically at vulnerable/exposed sites within the Z conformation structure, including at the junctions. Powerful oxidizing agents, such as osmium tetroxide (OsO4) and potassium permanganate (KMnO4), react with unpaired thymine (Johnston and Rich 1985). Dimethyl (DMS) and diethyl sulfate (DES) alkylate N7 guanine in Z-DNA. DNA subjected to these chemical reactions, followed by piperidine cleavage, gel electrophoresis, and

7

Z-DNA

231

sequencing is helpful in characterizing Z-DNA and the nature of BZ junctions (Johnston and Rich 1985). Apart from the abovementioned chemical modulators that modify Z-DNA junctions, naturally occurring or chemically synthesized base modifications also help decipher structural details and heterogeneity in Z-DNA and Z-DNA junctions. Classically, pyrimidine bases in DNA are modified at the C5 position and purines at the C8 position to facilitate the B-to-Z transition. Sugiyama and colleagues demonstrated that m8G incorporation in the Z-DNA-forming ODN dramatically stabilized the Z conformation, even at physiological salt concentrations (Sugiyama et al. 1996). They also found that 8-methylguanosine (m8rG) is superior to m8G as the incorporation of m8rG could stabilize Z-DNA in the absence of NaCl (Xu et al. 2003). Xu and colleagues synthesized oligonucleotides containing 20 -O-methyl-8methylguanosine (20 -OMe-m8G) and demonstrated that these oligonucleotides formed Z-DNA with high thermal stability at physiological salt conditions. They also demonstrated that the trifluoromethylation at C8 position of 20 -deoxyguanosine (F8G) significantly stabilized the Z-DNA at physiological salt concentration as comparable m8G (Bao et al. 2020). F8G can function as not only a Z-DNA inducer, but also 19 F NMR reporter to detect Z-DNA structure and interactions due to its low background signal and high sensitivity. Rich et al. investigated bromination of poly (dG-dC) and found that bromine substitution at C8 position of guanine and C5 position of cytosine stabilized Z conformation in low salt conditions (Moller et al. 1984) (Fig. 8). Nadler and Diederichsen (Nadler and Diederichsen 2008) combined the base and sugar modifications in a single nucleotide with the synthesis of 8-bromo-20 -ethynyl-arabino-deoxyguanosine and showed a synergistic effect with a stronger Z-DNA stabilization. Train et al. synthesized hairpin oligonucleotides containing various C8-arylguanine adducts and demonstrated that C8-arylguanine modification steered the B/Z-DNA equilibrium toward the Z-DNA (Train et al. 2014). This study affords potential biological significance as a new carcinogenesis model mediated via Z-DNA, with prominent involvement of the C8-arylguanine adducts by carcinogenic chemicals such as arylhydrazines. While 5-methylcytosine promoted Z-DNA stabilization, all its oxidization products hindered the B-to-Z transition. A special base modification in adenine resulting in the fluorescent adenine base analog 2-aminopurine (2AP) has been used extensively to study the nature of base stacking within Z-DNA and at the BZ junction to study the nature of base flipping out at this junction (Kim et al. 2009, 2018). In addition to this base modification, the acetylation of histones that constitute the chromatin assembly of the genome has been explicitly linked to favor the B-to-Z transition (Zhang et al. 2016). As described above, base modification is an invaluable strategy to assume Z-DNA in intracellular environment. The exploitation of Z-DNA inducers can facilitate discovery of Z-DNA functions in living systems and drug development targeting Z-DNA. Thus, chemical reactions with DNA bases, base modifications, and epigenetic modifications can influence the stability of Z-DNA and Z-DNA junctions. A combination of chemical and biological approach led to the mapping of Z-DNA forming sequences in the genome.

232

D. Kim et al.

Fig. 8 Modified guanine at C8 position and energetically minimized models of d(CGCXCG)2. X ¼ G, m8G, F8G, and Ph8G. Molecular models were generated using the Amber force field implemented in Molecular Operating Environment (MOE)

Strategies for Developing a Z-DNA Sensor While chemical biology efforts should focus on the study of Z-DNA structure and stability, parallel efforts have been made to develop compounds that could work as detectors/sensors of this unique left-handed DNA conformation. Initial efforts have revealed the use of chiral metal complexes as probes. Pioneering work in the development of DNA chiral probes involves phenanthroline ruthenium stereoisomers that are specific to either B-DNA or Z-DNA. Z-DNA has been detected by hyperchromicity and enhanced luminescence (Barton et al. 1984). Similarly, another set of enantiomers was developed as chiral probes, (P)-helicene, which converts B-DNA to Z-DNA with specific and stable binding (Xu et al. 2004). Porphyrins bind to DNA as intercalators, outside groove binders, or stackers along the external surface of the DNA, resulting in induced negative, positive, and bisignate signature profiles in CD spectroscopy (Balaz et al. 2005). Metal porphyrins and their derivatives have been used as Z-DNA sensors (Balaz et al. 2005). Such a system was adapted to sense the B-to-Z transition capable of sensing pH variations and hence proposed as an application in reversible information storage systems and logic gates (D’Urso et al. 2009). In addition to the remarkable structural information generated by the use of 2-aminopurine (2AP), the design of probes incorporating 2AP was adapted to test the activity of Z-DNA binding Zα protein and its mutants (Subramani et al. 2016). Park and Sugiyama synthesized thienopyrimidine-based deoxyguanosine analogue (thdG) with visible light emission as a Z-DNA fluorescent probe and demonstrated direct visualization of B-to-Z transition in the presence of Zαβ protein (Park et al. 2014) (Fig. 9). They also devised the first Watson-Crick

7

Z-DNA

233

Fig. 9 Visual detection of B-to-Z transition by Zαβ interaction. 4 eq. (left) or 0 eq. (middle) of Zαβ was added to Z-DNA solution. The photo was taken under UV irradiation (365 nm) (right)

Fig. 10 Nucleobase FRET pair and color changes by B-to-Z transition under various NaClO4 concentrations

base-pairable FRET system with thdG-tC pair and applied the nucleobase FRET pair to monitor B-to-Z transition by color change under increasing salt concentrations (Han et al. 2017) (Fig. 10). Very recently, Park and Liang verified that a Z-B chimera formed Z-DNA conformation by topological constraint of circular DNA using highly emissive 20 -OMe-thdG (Liu et al. 2022).

Strategies Applied to Ascertain the Z-DNA Function As Z-DNA has a higher-energy structure, stabilizing factors are a prerequisite to influence physiological functions. These factors could be physical factors, such as torsional stress, negative supercoiling, or chemical modulators, including charged organic molecules, salts, metals, metal complexes, and adducts. Furthermore, ZBPs are biological stabilizing factors that contribute to the composition and nature of Z-DNA. These factors have been described in detail in sections “Chemical and Structural Properties of Z-DNA” and “NMR Studies of Z-DNA Transition Induced by ZBP” of this chapter, along with the structural details of their interactions with Z-DNA. Chemical biology approaches to establish the functional aspects or roles of Z-DNA involve the careful selection of the structural modifications described above. The goal was to establish the functional aspects of Z-DNA and its applications. For example, epigenetic regulation can be achieved by the dynamic regulation of 5-methylcytosine and all its derived oxidative products, 5-methylcytosine (5-hydroxymethylcytosine, 5-formylcytosine, and 5-carboxycytosine), catalyzed

234

D. Kim et al.

by ten eleven translocation (TET) family proteins that hinder the B-to-Z transition. This suggests a role for Z-DNA as an on/off epigenetic regulatory switch (Vongsutilers et al. 2020). A report revealed the reversible control of the B-to-Z transition using a cucurbit (Herbert et al. 1995) uril-based supramolecular approach. Cucurbit (Herbert et al. 1995) uril can encapsulate the central butanediamine moiety [HN(CH2)4NH] and reverse the Z-DNA caused by spermine back to B-DNA. Treatment with 1-adamantanamine disassembled the cucurbit (Herbert et al. 1995) uril/spermine complex and readily induced the reconversion of B-DNA to Z-DNA (Shaoru et al. 2018). Such synthetic modulators enable the dynamic control of the B-to-Z transition in a much-sought reversible manner for functional studies.

Strategies for Developing Therapeutics Targeting Z-DNA A considerable number of Z-DNA-stabilizing compounds have been used as anticancer chemotherapy drugs. Cis-diamminedichloroplatinum(II) (cisplatin), a well-known anticancer drug, is known to stabilize methylated and brominated Z-DNA-forming sequences via the formation of a monodentate adduct (Malinge and Leng 1984). The anticancer drug daunorubicin, implicated in the destruction of multidrug-resistant cancer cells, is also a curious case regarding the BZ equilibrium. WP900 is (-)-daunorubicin that favors the conversion of B-DNA into Z-DNA (Qu et al. 2000), while (+)-daunorubicin destabilizes Z-DNA by selective binding to B-DNA, which shifts the equilibrium from Z-DNA to B-DNA. (+)daunorubicin is known to intercalate into DNA, used in cancer therapy (Fuertes et al. 2006). Several compounds bind to the B-DNA conformation via intercalation, consequently preventing Z-DNA formation. Alternatively, they also force intercalate onto the preformed Z-DNA, thereby enabling the Z-to-B transition. The list includes amyloid beta aggregates (Geng et al. 2010), aluminum (Hegde et al. 2004), the popular antimalarial drug chloroquine (Kwakye-Berko and Meshnick 1990), and popular DNA intercalating stain ethidium bromide (EtBr) that is commonly used as a drug for the treatment and prophylaxis of trypanosomiasis in cattle (Fuertes et al. 2006), the antitumor antibiotic elsamicin A, the tetra peptide KWGK peptide (Fuertes et al. 2006), the anticancer compound NC-182 (Fuertes et al. 2006), netropsin and distamycin (Zimmer et al. 1983), histones H1 and H5 (Russell et al. 1983), and a guanine base forming 7-deazaguanine (7daG) with a substitution of N7 atom (Seela and Driller 1989). Although several anticancer drugs have shown a B-DNA or Z-DNA preference, there is no clear evidence supporting the correlation between anticancer activity and B-DNA or Z-DNA binding preference. Further studies are needed before Z-DNA-targeted chemotherapeutic strategies can be applied against cancer.

Strategies Applied for Nanotechnology Applications Using Z-DNA The B-to-Z conformational transition has been demonstrated in nanomechanical molecular machines in the Seeman laboratory (Mao et al. 1999). They constructed

7

Z-DNA

235

a supramolecular device using DNA double-crossover (DX) molecules that caused an atomic displacement of 20–60 Å accompanied by the B-to-Z transition triggered by cobalt hexammine, which was detected by FRET spectroscopy (Mao et al. 1999). The stability of the Z conformation of DNA and RNA is known to be altered by temperature. While Z-RNA is stable at higher temperatures (Hall et al. 1984), Z-DNA is stable at lower temperatures (Sugiyama et al. 1996). Tashiro and Sugiyama (2003) used this property to construct a temperature-sensitive molecular switch in which the fluorescence property of the adenine base analog 2AP, which responds to differences in base stacking in Z-DNA and B-DNA, was employed to measure the temperature change (Tashiro and Sugiyama 2003). When the temperature changed from low to high, the RNA switch changed from an “off” (A-RNA) to an “on” signal (Z-RNA), whereas the DNA device switched from “on” (Z-DNA) to “off” (B-DNA). Park and Sugiyama constructed further advanced system as the form of a visible nanothermometer based on Z-DNA stabilization and a strong visible emission of 20 -OMe-thG (Yamamoto et al. 2015). Spermine-functionalized carbon dots (SC-dots) can induce a B-to-Z DNA transition under physiological salt conditions. SC-dots were further adapted to develop DNA logic gates for potential use in DNA computing and DNA nanotechnology (Feng et al. 2013). Lanthanum, cerium, and praseodymium ions stabilize Z-DNA within self-assembled Y-shaped branched DNA structures. The transition was easily reversed using metal chelators, such as EDTA (Nayak et al. 2016; Bhanjadeo et al. 2016; Bhanjadeo and Subudhi 2019). Such dynamic control of the conformational change of DNA will enable the construction of complex nanomechanical machines and devices in the future. The dynamic helicity change between the B and Z forms was used to build artificial viral coat structures using a mixture of pyridine and methylpyridinium, triggered by a pH change from pH 7.4 to –5.5 (Kim et al. 2017). This study highlights the potential role of Z-DNA in the creation of nanomachines that can have various applications.

Disease Implications Z-DNA Is Immunogenic The role of Z-DNA in disease manifestation was initiated with the finding that Z-DNA is immunogenic, and anti-Z-DNA antibodies were found in autoimmune diseases, including systemic lupus erythematosus (SLE), Crohn’s disease, polyradiculoneuritis, and amyotrophic lateral sclerosis (ALS), where patients produce antibodies against Z-DNA (Ravichandran et al. 2019).

Z-DNA Forming Sequence (ZFS) Controls the Expression of the Disease-Related Genes Z-DNA-forming sequence elements within the genome regulate promoter activity. One such element in the ADAM-12 promoter exhibits a tight control of its activity.

236

D. Kim et al.

The loss of this control has been shown to result in overactivity of the promoter, a characteristic feature of many human cancers (Ravichandran et al. 2019). Thus, HIF1α-mediated Z-DNA stabilization in the SLC11A1 promoter regulates human susceptibility to infectious diseases, such as tuberculosis and leprosy, as well as inflammatory diseases, such as rheumatoid arthritis and Crohn’s disease (Ravichandran et al. 2019).

Z-DNA Forming Sequence (ZFS) Is a Hotspot for the Large-Scale Deletion of DNA Z-DNA stabilization in the genome induces large-scale deletions. This includes several cancer-related genes, such as BCL2, C-MYC, and SCL (Ravichandran et al. 2019). Z-DNA sequences are strongly associated with chromosomal translocation regions implicated in human lymphoid tumors and recombination hotspots (Ravichandran et al. 2019). Recent evidence points to the Z-DNA-mediated role of the nucleotide excision repair complex Rad10-Rad1(ERCC1-XPF) and the mismatch repair complex Msh2-Msh3 in genetic instability in yeast and human cells. Both ERCC1-XPF and MSH2-MSH3 bind to Z-DNA-forming sequences, implying their potent role in Z-DNA-induced genetic instability (Ravichandran et al. 2019). This further implicates the role of Z-DNA in translocation-related diseases and cancer etiology. The Z-DNA-forming sequence in the DM2 (myotonic dystrophy type 2) gene locus (in the first intron of the ZNF9 gene) has a protective effect by reducing the potential for slipped-strand DNA formation in (CCTG)n (CAGG)n repeats (Ravichandran et al. 2019). Thus, ZFS can function as a torsional sink, absorbing and reducing negative supercoiling and preventing alternative DNA structure formation. However, sequences associated with triplet nucleotide repeats have a strong potential to be stabilized in the Z-DNA conformation (Renciuk et al. 2010). This is critically important, because such repeats induce genomic instability, leading to disease phenotypes. These include CAG repeats implicated in Huntington’s disorder and several spinocerebellar ataxias, CGG repeats associated with fragile X chromosomes, and GAC repeats associated with skeletal dysplasia. In addition, Z-DNA occurrence in the hippocampus of the Alzheimer’s brain is also implicated in neurodegenerative disorders (Ravichandran et al. 2019).

Disease Implications of Z-DNA Binding Proteins Loss-of-function Zα variants of ADAR1 cause dysregulation of innate interferon responses to double-stranded RNA (dsRNA). The residues involved in these diseases are known to be Z-DNA-interacting Zα residues (Herbert 2019; Schwartz et al. 1999), suggesting that Z-DNA binding plays a critical role in disease initiation. Zα domains are enriched in cytoplasmic stress granules because of their nucleic acid-

7

Z-DNA

237

binding ability (Herbert 2019). This could be another potent host defense mechanism to overcome unfavorable stressful environments. The viral interferon response inhibitor E3L comprises a single Z domain. Studies on the E3L of Yatapox virus and Vaccinia virus have revealed that Z-DNA binding activity is essential for viral pathogenicity (Kus et al. 2015; Ha et al. 2004). E3 proteins prevent the sensing of viral RNA by the host DAI, which prevents the defensive necroptosis mechanism (Takaoka and Taniguchi 2008). The disease mechanism involves overwhelming the host dsRNA sensors by competition in numbers and, hence, helps the viral E3L protein to suppress the IFN response and enable pathogenicity. Similarly, in the case of influenza A virus, viral RNA is sensed by the host DAI, which triggers necroptosis as a host defense mechanism (Takaoka and Taniguchi 2008). A slightly different mechanism was elucidated in Variola virus, in which viral pathogenesis was linked to the E3L inhibition of PKR activity rather than the Z-DNA binding activity of the Zα domain (Takaoka and Taniguchi 2008). Recently, Herbert et al. found that in severe acute respiratory syndrome coronavirus (SARS), Nsp13 helicase activity modulates the Z-RNA-dependent necroptosis pathway by preventing Z-RNA sensing by DAI. Thus, the coronavirus Nsp13 is considered a potential new target in SARS virus pathogenesis to allow host DAI activity (Herbert et al. 2022). Recently, a potential new class of Zα family proteins in the African swine fever virus ASFV I73R with Z-DNA binding Z-fold was reported with its crystal structure (Sun et al. 2022). However, the functional aspects of this protein remain unclear. Considering the role and mechanism of the viral Zα protein family, the I73R protein may play a role in antagonizing the host antiviral response. In bacteria, Z-DNA was found to confer structural integrity to the bacterial biofilm matrix, because it accumulates as the biofilm matures. Thus, Z-DNA plays a significant role in bacterial biofilm pathogenesis, innate immune response, and immune evasion (Buzzo et al. 2021).

Conclusion and Perspective Since the 1972 discovery of a peculiarly inverted CD profile of DNA in high-salt conditions, a unique DNA form has been investigated. The first-ever crystal structure of this “Z” form of DNA was solved with atomic resolution, even before the experimental atomic details for canonical Watson–Crick B-DNA were revealed. Since then, greater details regarding the noncanonical and uniquely left-handed helical conformation of DNA have been elucidated. In this chapter, we have summarized the major chemical biological approaches that have thus far enabled us to understand Z-DNA, B-to-Z transition, and BZ junction. Going forward, next-generation functional studies on Z-DNA are needed to screen ever more chemical modulators and expand the chemical biology vocabulary of Z-DNA science. This would enable functional interventional strategies targeting Z-DNA, which would be useful in modulating the role of Z-DNA and its binding partners in human genome maintenance and disease. The investigation of Z-DNA in

238

D. Kim et al.

cells is extremely challenging owing to the dynamic nature of its formation and its instability. However, we predict that advanced chemical biology efforts will enable real-time dynamic studies to examine the role of Z-DNA in cell physiology in vivo. These efforts will pave the way for the study of other noncanonical DNA structures, enabling a better understanding of noncanonical DNA structure-mediated genome regulation. Finally, it is impossible to discuss Z-DNA without mentioning Prof. Alexander Rich, who initiated the field of Z-DNA and developed it into an important scientific frontier. He hypothesized that Z-DNA plays important roles in cells, working toward proving his hypothesis throughout his career. At present, growing evidence on Z-DNA and Z-RNA strongly supports that he was correct, and the field of Z-DNA research continues to expand. Acknowledgments This work was supported by National Research Foundation of Korea (NRF) grants funded by the Ministry of Education, Science, and Technology (MSIT) of the Korean government (2020RC1C1C1007371 to D.K.; 2020R1A4A1018019 and 2021R1A2C3011644 to K.K.; 2020R1A2C1006909 and 2022R1A4A1021817 to J.-H.L).

References Balaz M et al (2005) A cationic zinc porphyrin as a chiroptical probe for Z-DNA. Angew Chem Int Ed Engl 44:4006–4009 Bao HL et al (2020) Oligonucleotides DNA containing 8-trifluoromethyl-20 -deoxyguanosine for observing Z-DNA structure. Nucleic Acids Res 48(13):7041–7051 Barton JK et al (1984) Chiral probes for the handedness of DNA helices: enantiomers of tris (4,7-diphenylphenanthroline)ruthenium(II). Proc Natl Acad Sci 81:1961–1965 Behe M, Felsenfeld G (1981) Effects of methylation on a synthetic polynucleotide: the B–Z transition in poly(dG-m5dC)poly(dG-m5dC). Proc Natl Acad Sci U S A 78(3):1619–1623 Bhanjadeo MM, Subudhi U (2019) Praseodymium promotes B–Z transition in self-assembled DNA nanostructures. RSC Adv 9(8):4616–4620 Bhanjadeo MM, Nayak AK, Subudhi U (2016) Cerium chloride stimulated controlled conversion of B-to-Z DNA in self-assembled nanostructures. Biochem Biophys Res Commun 482(4): 916–921 Buzzo JR et al (2021) Z-form extracellular DNA is a structural component of the bacterial biofilm matrix. Cell 184(23):5740–5758 Chatake T, Sunami T (2013) Direct interactions between Z-DNA and alkaline earth cations, discovered in the presence of high concentrations of MgCl2 and CaCl2. J Inorg Biochem 124:15–25 de Rosa M et al (2010) Crystal structure of a junction between two Z-DNA helices. Proc Natl Acad Sci U S A 107(20):9088–9092 Dickerson RE et al (1982) The anatomy of A-, B-, and Z-DNA. Science 216(4545):475–485 Drozdzal P et al (2013) Ultrahigh-resolution crystal structures of Z-DNA in complex with Mn(2+) and Zn(2+) ions. Acta Crystallogr D Biol Crystallogr 69(Pt 6):1180–1190 Drozdzal P et al (2015) High-resolution crystal structure of Z-DNA in complex with Cr(3+) cations. J Biol Inorg Chem 20(3):595–602 D’Urso A et al (2009) Interactions of a tetraanionic porphyrin with DNA: from a Z-DNA sensor to a versatile supramolecular device. J Am Chem Soc 131:2046–2047 Feng L et al (2013) Lighting up left-handed Z-DNA: photoluminescent carbon dots induce DNA B to Z transition and perform DNA logic operations. Nucleic Acids Res 41:7987–7996

7

Z-DNA

239

Fuertes MA et al (2006) Molecular mechanisms for the BZ transition in the example of poly[d (GC)d(GC)] polymers a critical review. Chem Rev 106(6):2045–2064 Geng J et al (2010) Alzheimer’s disease amyloid beta converting left-handed Z-DNA back to righthanded B-form. Chem Commun (Camb) 46:7187–7189 Ghosh A, Bansal M (2003) A glossary of DNA structures from A to Z. Acta Crystallogr D Biol Crystallogr 59(Pt 4):620–626 Go Y et al (2021) Conformational exchange of the Zalpha domain of human RNA editing enzyme ADAR1 studied by NMR spectroscopy. Biochem Biophys Res Commun 580:63–66 Ha SC et al (2004) A poxvirus protein forms a complex with left-handed Z-DNA: crystal structure of a Yatapoxvirus Zalpha bound to DNA. Proc Natl Acad Sci U S A 101(40):14367–14372 Ha SC et al (2005) Crystal structure of a junction between B-DNA and Z-DNA reveals two extruded bases. Nature 437(7062):1183–1186 Ha SC et al (2009) The structures of non-CG-repeat Z-DNAs co-crystallized with the Z-DNAbinding domain, hZ alpha(ADAR1). Nucleic Acids Res 37(2):629–637 Hall K et al (1984) ‘Z-RNA’ – a left-handed RNA double helix. Nature 311(5986):584–586 Han JH et al (2017) Development of a vivid FRET system based on a highly emissive dG-dC analogue pair. Chemistry 23(31):7607–7613 Hegde ML et al (2004) First evidence for helical transitions in supercoiled DNA by amyloid beta peptide (1-42) and aluminum: a new insight in understanding Alzheimer’s disease. J Mol Neurosci 22:19–31 Herbert A (2019) Z-DNA and Z-RNA in human disease. Commun Biol 2(1):7 Herbert A et al (1995) Chicken double-stranded RNA adenosine deaminase has apparent specificity for Z-DNA. Proc Natl Acad Sci U S A 92(16):7550–7554 Herbert A, Shein A, Poptsova M (2022) Z-RNA and the flipside of the SARS Nsp13 helicase. Front Immunol 13:912717. bioRxiv. p. 2022.03.03.482810 Ho PS et al (1986) A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences. EMBO J 5(10):2737–2744 Ho PS et al (1987) The interactions of ruthenium hexaammine with Z-DNA: crystal structure of a Ru(NH3)6+3 salt of d(CGCGCG) at 1.2 A resolution. J Biomol Struct Dyn 4(4):521–534 Hur JH et al (2021) AC-motif: a DNA motif containing adenine and cytosine repeat plays a role in gene regulation. Nucleic Acids Res 49(17):10150–10165 Jeong M et al (2014) NMR study of the Z-DNA binding mode and B-Z transition activity of the Z alpha domain of human ADAR1 when perturbed by mutation on the alpha3 helix and betahairpin. Arch Biochem Biophys 558:95–103 Johnston BH, Rich A (1985) Chemical probes of DNA conformation: detection of Z-DNA at nucleotide resolution. Cell 42:713–724 Kang YM et al (2009) NMR spectroscopic elucidation of the B-Z transition of a DNA double helix induced by the Z alpha domain of human ADAR1. J Am Chem Soc 131(32):11485–11491 Kim D et al (2009) Base extrusion is found at helical junctions between right- and left-handed forms of DNA and RNA. Nucleic Acids Res 37(13):4353–4359 Kim D et al (2010) Z-DNA binding proteins as targets for structure-based virtual screening. Curr Drug Targets 11(3):335–344 Kim HE et al (2011) The Z beta domain of human DAI binds to Z-DNA via a novel B-Z transition pathway. FEBS Lett 585(5):772–778 Kim D et al (2014) Distinct Z-DNA binding mode of a PKR-like protein kinase containing a Z-DNA binding domain (PKZ). Nucleic Acids Res 42(9):5937–5948 Kim Y et al (2017) Collective helicity switching of a DNA–coat assembly. Nat Nanotechnol 12(6): 551–556 Kim D et al (2018) Sequence preference and structural heterogeneity of BZ junctions. Nucleic Acids Res 46(19):10504–10513 Kus K et al (2015) The structure of the cyprinid herpesvirus 3 ORF112-Zalpha.Z-DNA complex reveals a mechanism of nucleic acids recognition conserved with E3L, a poxvirus inhibitor of interferon response. J Biol Chem 290(52):30713–30725

240

D. Kim et al.

Kwakye-Berko F, Meshnick S (1990) Sequence preference of chloroquine binding to DNA and prevention of Z-DNA formation. Mol Biochem Parasitol 39:275–278 Lafer EM et al (1981) Antibodies specific for left-handed Z-DNA. Proc Natl Acad Sci U S A 78(6): 3546–3550 Lee EH et al (2010) NMR study of hydrogen exchange during the B-Z transition of a DNA duplex induced by the Z alpha domains of yatapoxvirus E3L. FEBS Lett 584(21):4453–4457 Lee AR et al (2012a) NMR dynamics study of the Z-DNA binding domain of human ADAR1 bound to various DNA duplexes. Biochem Biophys Res Commun 428(1):137–141 Lee YM et al (2012b) NMR study on the B-Z junction formation of DNA duplexes induced by Z-DNA binding domain of human ADAR1. J Am Chem Soc 134(11):5276–5283 Lee AR et al (2016) Solution structure of the Z-DNA binding domain of PKR-like protein kinase from Carassius auratus and quantitative analyses of the intermediate complex during B-Z transition. Nucleic Acids Res 44(6):2936–2948 Lee AR et al (2017) NMR elucidation of reduced B-Z transition activity of PKZ protein kinase at high NaCl concentration. Biochem Biophys Res Commun 482(2):335–340 Lee A-R et al (2019) NMR dynamics study reveals the Zα domain of human ADAR1 associates with and dissociates from Z-RNA more slowly than Z-DNA. ACS Chem Biol 14(2):245–255 Malinge JM, Leng M (1984) Reaction of cis-diamminedichloroplatinum (II) and DNA in B or Z conformation. EMBO J 3:1273–1279 Mao C et al (1999) A nanomechanical device based on the B–Z transition of DNA. Nature 397(6715):144–146 Mengqin Liu YC, Zhang Y, An R, Li L, Park S, Sugiyama H, Liang X (2022) Single basemodification reports and locates Z-DNA conformation on a Z-B-chimera formed by topological constraint. Bull Chem Soc Jpn 95(3):433–439 Moller A et al (1984) Bromination stabilizes poly(dG-dC) in the Z-DNA form under low-salt conditions. Biochemistry 23(1):54–62 Nadler A, Diederichsen U (2008) Guanosine analog with respect to Z-DNA stabilization: nucleotide with combined C8-Bromo and C20 -Ethynyl modifications. Eur J Org Chem 2008(9): 1544–1549 Nayak AK et al (2016) Lanthanum induced B-to-Z transition in self-assembled Y-shaped branched DNA structure. Sci Rep 6:26855 Nikpour N, Salavati R (2019) The RNA binding activity of the first identified trypanosome protein with Z-DNA-binding domains. Sci Rep 9(1):5904 Park S et al (2014) Highly emissive deoxyguanosine analogue capable of direct visualization of B-Z transition. Chem Commun (Camb) 50(13):1573–1575 Pohl FM, Jovin TM (1972) Salt-induced co-operative conformational change of a synthetic DNA: equilibrium and kinetic studies with poly (dG-dC). J Mol Biol 67:375–396 Qu X et al (2000) Allosteric, chiral-selective drug binding to DNA. Proc Natl Acad Sci 97: 12032–12037 Ravichandran S, Vinod Kumar S, Kyeong Kyu K (2019) Z-DNA in the genome: from structure to disease. Biophys Rev 11(3):383–387 Renciuk D et al (2010) CGG repeats associated with fragile X chromosome form left-handed Z-DNA structure. Biopolymers 95:174–181 Rich A, Zhang S (2003) Timeline: Z-DNA: the long road to biological function. Nat Rev Genet 4(7):566–572 Russell WC et al (1983) Differential promotion and suppression of Z leads to B transitions in poly[d (G-C)] by histone subclasses, polyamino acids and polyamines. EMBO J 2:1647–1653 Schwartz T et al (1999) Crystal structure of the Z alpha domain of the human editing enzyme ADAR1 bound to left-handed Z-DNA. Science 284(5421):1841–1845 Schwartz T et al (2001) Structure of the DLM-1-Z-DNA complex reveals a conserved family of ZDNA-binding proteins. Nat Struct Biol 8(9):761–765 Seela F, Driller H (1989) Alternating d(G-C)3 and d(C-G)3 hexanucleotides containing 7-deaza-20 deoxyguanosine or 8-aza-7-deaza-20 -deoxyguanosine in place of dG. Nucleic Acids Res 17(3): 901–910

7

Z-DNA

241

Seo YJ et al (2010) Sequence discrimination of the Z alpha domain of human ADAR1 during B-Z transition of DNA duplexes. FEBS Lett 584(20):4344–4350 Shaoru W et al (2018) The Cucurbit[7]Uril-based supramolecular chemistry for reversible B/ZDNA transition. Adv Sci 5(7):1800231 Subramani VK et al (2016) Structural and functional studies of a large winged Z-DNA-binding domain of Danio rerio protein kinase PKZ. FEBS Lett 590:2275–2285 Sugiyama H et al (1996) Synthesis, structure and thermodynamic properties of 8-methylguaninecontaining oligonucleotides: Z-DNA under physiological salt conditions. Nucleic Acids Res 24: 1272–1278 Sun L et al (2022) Structural insight into African swine fever virus I73R protein reveals it as a Z-DNA binding protein. Transbound Emerg Dis 69(5):e1923–e1935. https://doi.org/10.1111/ tbed.14527 Takaoka A, Taniguchi T (2008) Cytosolic DNA recognition for triggering innate immune responses. Adv Drug Deliv Rev 60:847–857 Tashiro R, Sugiyama H (2003) A nanothermometer based on the different pi stackings of B- and Z-DNA. Angew Chem Int Ed Engl 42:6018–6020 Train BC et al (2014) Single C8-arylguanine modifications render oligonucleotides in the Z-DNA conformation under physiological conditions. Chem Res Toxicol 27(7):1176–1186 van de Sande JH, McIntosh LP, Jovin TM (1982) Mn2+ and other transition metals at low concentration induce the right-to-left helical transformation of poly[d(G-C)]. EMBO J 1(7): 777–782 Vongsutilers V, Shinohara Y, Kawai G (2020) Epigenetic TET-catalyzed oxidative products of 5-methylcytosine impede Z-DNA formation of CG decamers. ACS Omega 5(14):8056–8064 Votavova H et al (1991) Effect of basic oligopeptides on the B-Z transition of poly(dG-dC). poly (dG-dC) in water-methanol solutions. Biopolymers 31(3):275–283 Wang G, Vasquez KM (2007) Z-DNA, an active element in the genome. Front Biosci 12: 4424–4438 Wang AH et al (1979) Molecular structure of a left-handed double helical DNA fragment at atomic resolution. Nature 282(5740):680–686 Wang AH et al (1984) AT base pairs are less stable than GC base pairs in Z-DNA: the crystal structure of d(m5CGTAm5CG). Cell 37(1):321–331 Xu Y, Ikeda R, Sugiyama H (2003) 8-Methylguanosine: a powerful Z-DNA stabilizer. J Am Chem Soc 125(44):13519–13524 Xu Y et al (2004) (P)-helicene displays chiral selection in binding to Z-DNA. J Am Chem Soc 126: 6566–6567 Yamamoto S, Park S, Sugiyama H (2015) Development of a visible nanothermometer with a highly emissive 20 -O-methylated guanosine analogue. RSC Adv 5:104601–104605 Zhang F et al (2016) Histone acetylation induced transformation of B-DNA to Z-DNA in cells probed through FT-IR spectroscopy. Anal Chem 88(8):4179–4182 Zhang T et al (2022) ADAR1 masks the cancer immunotherapeutic promise of ZBP1-driven necroptosis. Nature 606:1–9 Zimmer C, Marck C, Guschlbauer W (1983) Z-DNA and other non-B-DNA structures are reversed to B-DNA by interaction with netropsin. FEBS Lett 154:156–160

8

Structures of G-Quadruplexes and Their Drug Interactions Yichen Han, Jonathan Dickerhoff, and Danzhou Yang

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DNA G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structural Characteristics of DNA G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intramolecular DNA G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Human Telomeric DNA G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Human Telomeric G-Quadruplex Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Human Promoter DNA G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel DNA G-Quadruplexes in Gene Promoters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Broken-Strand DNA G-Quadruplexes in Gene Promoters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Promoter DNA G-Quadruplexes with Long Loops and Hairpin Motifs . . . . . . . . . . . . . . . . . . . Left-Handed DNA G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Four-Tetrad DNA G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structural Basis of Small Molecule Interactions of DNA G-Quadruplexes . . . . . . . . . . . . . . . . . . . G-Quadruplex Interactions with End-Stacking Compounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Small Molecule Recognition of G-Quadruplexes with Additional Loop and Capping Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Small Molecule Recognition of Parallel G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Small Molecule Interactions with Vacancy G-Quadruplex Bound by Metabolites . . . . . . . . Small Molecule Interactions with G-Quadruplex-Duplex Junction . . . . . . . . . . . . . . . . . . . . . . . . G-Quadruplex Intercalation with Small Molecule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Electrostatic Interactions of G-Quadruplex-Interactive Small Molecules . . . . . . . . . . . . . . . . . .

244 244 245 246 247 248 250 251 253 255 257 258 258 259 261 262 263 264 265 266

Y. Han · J. Dickerhoff College of Pharmacy, Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, USA e-mail: [email protected]; [email protected] D. Yang (*) College of Pharmacy, Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, USA Purdue Center for Cancer Research, West Lafayette, IN, USA Department of Chemistry, Purdue University, West Lafayette, IN, USA e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_10

243

244

Y. Han et al.

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

Abstract

G-quadruplexes are noncanonical four-stranded DNA and RNA secondary structures formed in guanine-rich sequences. DNA G-quadruplexes are found in specific human genomic locations of functional significance, such as telomeres, promoters, and replication initiation sites. They are involved in essential cellular processes, such as genome stability, gene transcription, and DNA replication. DNA G-quadruplexes readily form under physiologically relevant solution conditions. Their globular shape sets them apart from the thread-like double-helix DNA. As such, DNA G-quadruplexes are considered a new class of molecular targets for drug development. Additionally, there is considerable interest in the use of G-quadruplexes for biomaterials, biosensors, and biocatalysts. Therefore, DNA G-quadruplex has emerged as one of the most exciting nucleic acid secondary structures. Structural information of DNA G-quadruplexes is essential to understand their biological functions, therapeutic interventions, and biomaterial applications. This chapter discusses structural characteristics of DNA G-quadruplexes and their drug interactions.

Introduction DNA G-Quadruplexes DNA, or deoxyribonucleic acid, as double-stranded helix (B-DNA) is the storage form of genetic information. However, only 3% of the human genome encodes protein, and DNA can form noncanonical secondary structures that are functionally important (ENCODE Project Consortium 2012). Among non-B DNA structures, G-quadruplex is perhaps the most exciting class (Yang and Okamoto 2010; Chen et al. 2022). G-quadruplexes are four-stranded structures formed by guanine-rich DNA and RNA sequences (Fig. 1). DNA G-quadruplexes are found in functionally important genomic regions, such as human telomeres, promoters, and replication initiation sites, and involved in important cellular processes including genome stability, gene transcription, and DNA replication. DNA G-quadruplexes can readily form under physiologically relevant solution conditions and have a globular structure that is distinctive from B-DNA. As such, DNA G-quadruplexes have emerged as a new class of molecular targets for drug development (Qin and Hurley 2008; Yang and Okamoto 2010; Neidle 2017; Spiegel et al. 2020; Chen et al. 2022; Xu and Hurley 2022). In addition, there is considerable interest in using DNA G-quadruplexes for biomaterials, biosensors, and biocatalysts (Yang 2019; Mergny and Sen 2019). This chapter discusses structural characteristics of DNA G-quadruplexes and their drug interactions.

8

Structures of G-Quadruplexes and Their Drug Interactions

245

Fig. 1 (a) Schematic G-tetrad. Hoogsteen hydrogen bonds (dash lines) and the centrally coordinated K+ are shown. (b) Intermolecular G-quadruplexes: dimeric (left) and tetrameric (right). (c) Intramolecular G-quadruplexes. (top) Folding topologies of different classes and types; (bottom) example molecular structures. Loop types are labeled. (d) A G-tetrad molecular structure showing different types of groove widths and anti- and syn-glycosidic conformations

Structural Characteristics of DNA G-Quadruplexes DNA G-quadruplexes are four-stranded secondary structures that arise from guanine (G)-rich sequences (Fig. 1). They can be formed either by multiple DNA molecules, i.e., intermolecular (dimeric/trimeric/tetrameric), or by one DNA molecule, i.e., intramolecular (monomeric) (Fig. 1b, c). The building block of a G-quadruplex is the G-tetrad, which is a plane of four circularly arranged guanines (Fig. 1a). Hoogsteen hydrogen bonds connect the guanines head to tail, in contrast to the Watson-Crick hydrogen bonds of duplex DNA base pairs. Two or more G-tetrads stack upon each other to form the G-quadruplex core (G-core). The G-core is

246

Y. Han et al.

critically stabilized by monovalent cations (K+ or Na+) that are coordinated by guanine O6 carbonyl groups within the central pore (Fig. 1a–c) (Hud et al. 1996). Because K+ is the dominant intracellular cation and more strongly stabilizes G-quadruplexes, most studies are performed in the presence of K+ cations. In contrast to the two antiparallel strands of B-DNA with opposite 50 –30 directionality, tracts of guanines (G-tracts) in a G-core can have both parallel and antiparallel orientations. Whereas B-DNA only includes anti-residues, DNA G-quadruplexes can have both anti- and syn-residues (Fig. 1d). Importantly, guanines from codirectional or parallel G-tracts adopt the same glycosidic conformation within a tetrad, whereas those from antidirectional or antiparallel G-tracts adopt different glycosidic conformations (Fig. 1c). Two stacked tetrads can have the same guanine arrangements, i.e., “homopolar stacking,” or different guanine arrangements, i.e., “heteropolar stacking.” The anti-/syn-glycosidic conformations within a G-quadruplex core further correlate with the groove width of a G-quadruplex, i.e., the distance between the sugar-phosphate backbone of two adjacent G-tracts. Two codirectional or parallel G-tracts always flank a medium-width groove. In contrast, the groove width between two antidirectional or antiparallel G-tracts is narrow or wide depending on the anti-/syn-arrangement, narrow for syn-G -> anti-G and wide for anti-G -> syn-G (hydrogen bond donors to acceptors) (Fig. 1d).

Intramolecular DNA G-Quadruplexes In the human genome, most quadruplex-forming regions are spatially distant. Therefore, biologically relevant DNA G-quadruplexes are usually intramolecular and readily form under physiologically relevant solution conditions (Yang and Okamoto 2010; Chen et al. 2022). Most intramolecular G-quadruplexes have three G-tetrads and are formed by G-rich DNA sequences containing closely spaced, continuous runs of guanines (Fig. 1c). In contrast to the structurally uniform B-DNA structures, G-quadruplexes exhibit great conformational diversity regarding folding topology, loop conformation, and capping structure. The topology of intramolecular DNA G-quadruplexes can be categorized by the directionalities of their G-tracts (Fig. 1c). In parallel G-quadruplexes, all G-tracts are codirectional and have anti-G-core guanines. Therefore, all G-tetrads show homopolar stacking, and all grooves are of medium width. In antiparallel G-quadruplexes, i.e., the chair structure, neighboring G-tracts have antidirectional 50 –30 orientation with alternating wide and narrow groove widths. Consequently, G-tetrads in chairtype topologies have syn-anti-syn-anti patterns and exhibit both homopolar and heteropolar stacking. On the other hand, mixed parallel/antiparallel G-quadruplexes (hybrid and basket type) have both codirectional and antidirectional G-tracts. Hybrid-type G-quadruplexes are formed by three codirectional and one antidirectional G-tract ([3 þ 1]). Within each G-tetrad, guanines from the three codirectional G-tracts have the same glycosidic conformation, whereas the guanines of the antidirectional G-tract adopt the opposite conformation. Usually, the 50 G-tetrad has 3 syn þ 1 anti-guanines, whereas the middle and 30 G-tetrads have

8

Structures of G-Quadruplexes and Their Drug Interactions

247

3 anti- þ 1 syn-guanines. Therefore, the 50 G-tetrad has an opposite guanine glycosidic arrangement compared to the middle and 30 G-tetrad and exhibits heteropolar stacking. Basket-type G-quadruplexes contain two codirectional and two antidirectional G-tracts ([2 þ 2]), and each strand has both a parallel and antiparallel neighbor. As such, each G-tetrad exhibits a syn-syn-anti-anti pattern, and all G-tetrads have alternate guanine glycosidic arrangements, hence heteropolar stacking. Both [3 þ 1] hybrid and [2 þ 2] basket G-quadruplexes have two medium grooves flanked by codirectional G-tracts as well as a narrow and a wide groove with neighboring antidirectional G-tracts. Furthermore, whereas most G-quadruplexes are formed by continuous G-tracts, usually G3-runs, broken-strand G-quadruplexes contain discontinuous G-tracts. For example, the vacancy a G2-run creates in a three-tetrad core can be filled in by an intramolecular distal guanine (Fig. 1c). Most broken-strand G-quadruplexes have parallel G-tracts and can be considered as variants of parallel G-quadruplexes (Chen et al. 2012; Onel et al. 2018). Intriguingly, the intramolecular fill-in G is dynamic and can detach from the G-core, resulting in a vacancy-bearing G-quadruplex that can be bound by cellular guanine metabolites (Wang et al. 2020). Sequential G-tracts of intramolecular G-quadruplexes are connected by different loop motifs (Fig. 1c). Codirectional (parallel) G-tracts are connected by a “propeller loop” bridging the two outer G-tetrads. This is also known as the parallel structural motif, and a three-tetrad G-quadruplex particularly favors a 1-nt propeller loop, i.e., the G3NG3 parallel motif (Chen et al. 2022). Adjacent antidirectional (antiparallel) G-tracts are connected by a “lateral loop” that runs along the edge of an outer G-tetrad, while antidirectional non-neighboring G-tracts on opposite corners of a G-core are connected by a diagonal loop across the outer G-tetrad. Although some principles of G-quadruplex folding are known, sequence-based prediction of a G-quadruplex structure is often difficult and experimental structure determination is required. Such complexity is reflected in the fact that different DNA sequences may form G-quadruplexes of the same topology or a given sequence can fold into different topologies, as in the case of the human telomeric DNA (Yang and Okamoto 2010; Chen et al. 2022). The structural diversity of G-cores is further expanded by a wide variety of capping structures formed by flanking and loop residues. Based on the composition and length of these segments, motifs of different complexity are formed and often involve unusual hydrogen-bonded base pairs. Moreover, long loop sequences with complementary bases can form hairpins stabilized by Watson-Crick hydrogen bonds.

Human Telomeric DNA G-Quadruplexes Human telomere, located at chromosomal ends, is a DNA-protein complex that is closely associated with genome stability, cell senescence, and cancer (Blackburn et al. 2006, 2015). Telomeres protect chromosomes from nuclease degradation, nonhomologous end joining, and erosion from DNA replication. In human somatic

248

Y. Han et al.

cells, telomeres shorten with each replication, leading to apoptosis or senescence after a critical length has been reached. However, telomeres are maintained in cancer cells to realize limitless replication potential, a hallmark of cancer. In 85–90% of cancers, telomere length is maintained by activation of telomerase, a reverse transcriptase. For the other 10–15% of cancer cells without detectable telomerase activity, the alternative lengthening of telomeres (ALT) mechanism is used to maintain telomere integrity (Shay et al. 2012). Human telomeric DNA consists of TTAGGG tandem repeats of 5–10 kilobases and ends in a single-stranded 30 -overhang of several hundred bases (Sfeir et al. 2005). Importantly, the human telomeric DNA forms G-quadruplexes that can be stabilized by small molecules to stall telomerase or ALT actions. Therefore, telomeric G-quadruplexes are attractive anticancer drug targets (Temime-Smaali et al. 2009; Neidle 2017).

Human Telomeric G-Quadruplex Structures Despite the repetitive character of the (TTAGGG)n human telomeric DNA sequence, the corresponding G-quadruplex structures are remarkably polymorphic because the TTA linker segment can adopt all three loop types (propeller, later, and diagonal) (Fig. 2). The various telomeric G-quadruplexes may interact with different proteins to fulfill distinct biological functions. The wild-type 22-nt human telomeric DNA wt-Tel22 (50 -AGGG(TTAGGG)3-30 ) forms a basket-type, intramolecular G-quadruplex in Na+ solution (Fig. 2a) (Wang and Patel 1993). This mixed parallel/antiparallel structure contains three G-tetrads connected by one diagonal and two lateral TTA loops. The same 22-nt sequence forms a parallel intramolecular G-quadruplex when crystallized in the presence of K+ (Fig. 2b) (Parkinson et al. 2002). However, in K+ solution, this wt-Tel22 sequence forms hybrid-type structures even in the copresence of high concentration Na+ (Ambrus et al. 2006; Xu et al. 2006). Evidently, different cations can modulate the folding of human telomeric G-quadruplexes, a feature not observed for other DNA secondary structures. Intramolecular hybrid G-quadruplex structures (Fig. 2c–d) were identified as the major conformations formed by human telomeric sequences under physiologically relevant K+ solution conditions (Ambrus et al. 2006; Phan et al. 2006, 2007b; Luu et al. 2006; Dai et al. 2007a, b), as well as in X. laevis egg extract (ex vivo) (Hänsel et al. 2013). Furthermore, a parallel folding was shown for a 23-nt wild-type telomeric DNA in K+-containing solution in the presence of polyethylene glycol (PEG) (Miyoshi et al. 2006; Heddi and Phan 2011). More recently, the wt-Tel21 variant sequence (50 -GGG(TTAGGG)3-30 ) with brominated G8 and G20 formed an antiparallel chair-type G-quadruplex in K+-containing crystals, where guanine bromination enforced a syn-conformation of the modified residues (Fig. 2f) (Geng et al. 2019). In K+ solution, the extended human telomeric DNA exists in an equilibrium of two [3 þ 1] hybrid G-quadruplexes, the minor “hybrid-1” and major “hybrid-2” conformation (Ambrus et al. 2006; Dai et al. 2007a, b). Both the hybrid-1 and

8

Structures of G-Quadruplexes and Their Drug Interactions

249

Fig. 2 Human telomeric DNA G-quadruplex structures (Wang and Patel 1993; Parkinson et al. 2002; Dai et al. 2007a, b; Zhang et al. 2010; Geng et al. 2019). Each structure is accompanied with sequence (tetrad Gs underlined), condition, folding topology, PDB ID, and hydrogen-bonded capping residues. Color code: A ¼ green, T ¼ blue, G ¼ red, I and BrG ¼ pink. Color code in schematics: anti-G ¼ red, syn-G ¼ purple

hybrid-2 structure have two lateral loops and one propeller loop but in different order (Fig. 2c, c). As the 50 and 30 ends of hybrid telomeric G-quadruplexes locate at opposite G-tetrads, hybrid G-quadruplexes provide an efficient way to pack the telomeric DNA in a multimer format (Dai et al. 2007a). The structure of the major

250

Y. Han et al.

hybrid-2 telomeric G-quadruplex was determined by NMR using the wild-type 26-nt DNA wt-Tel26 (50 -TTA(GGGTTA)3GGGTT-30 ) in K+ solution (Fig. 2d) (Dai et al. 2007a). Here, a hydrogen-bonded T8:A9:T25 triple specific to the hybrid-2 folding caps the 30 G-tetrad. This hydrogen-bonded triple is crucial for the hybrid-2 folding and requires the wild-type 30 -flanking T25. The structure of the minor conformer, hybrid-1 G-quadruplex, was determined by NMR using the modified 26-nt DNA Tel26 (50 -AAA(GGGTTA)3GGGAA-30 ) in K+ solution (Fig. 2c) (Dai et al. 2007b). This modified DNA has all-adenine flankings compared to the wt-Tel26 50 -TTA and 30 -TT flanking. An adenine triple, A3:A9:A21, containing only wild-type residues covers the 50 G-tetrad, whereas the 30 -capping T14:A25 base pair utilizes the mutated 30 -flanking A25. The hybrid-1 structure is also formed by the modified 24-nt DNA Tel24 (50 -TT(GGGTTA)3GGGA-30 ), which again bears the 30 T-to-A mutation and forms the 30 -capping T:A base pair (Luu et al. 2006). These observations indicate that the 30 -flanking residues drives the folding of the human telomeric G-quadruplex. The hybrid-2 and hybrid-1 telomeric G-quadruplexes demonstrate the role of specific capping structures in driving selective G-quadruplex folding. The coexistence of two hybrid G-quadruplexes in the same telomeric DNA can be attributed to the length and asymmetrical nature of the TTA linker sequence. While the 3-nt length is versatile in forming different loop types, it is the adenine breaking the linker sequence symmetry that promotes specific capping structures. A TTT or AAA linker would not be able to form the hybrid-1-specific adenine triple or the hybrid-2specific 30 -capping T:A:T triple (Dai et al. 2007a, b). Although hybrid-1 and hybrid-2 telomeric G-quadruplexes have a small energy difference, their slow exchange kinetics indicates the presence of high-energy intermediates during the structural conversion (Dai et al. 2007a). Partial unfolding of the three-tetrad hybrid G-quadruplexes, necessary for their interconversion, may generate unusual high-energy topologies which could be the rate-limiting step. A novel two-tetrad, basket-type G-quadruplex species was found for telomeric DNA (Lim et al. 2009; Zhang et al. 2010). Using the 23-nt wt-Tel23 DNA with a G14-toInosine mutation, the structure of a two-tetrad basket-type telomeric G-quadruplex was determined by NMR in K+ solution and showed extensive capping structures (Fig. 2e) (Zhang et al. 2010). The top G-tetrad is capped by a hydrogen-bonded G4: A7:A19 triple and a hydrogen-bonded T6:T18 base pair, while the bottom G-tetrad is capped by a hydrogen-bonded G10:I14:G22 triple and a hydrogen-bonded T12:T23 base pair. Intriguingly, the G10:I14:G22 triple can be viewed as a vacancy-bearing G-tetrad.

Human Promoter DNA G-Quadruplexes G-quadruplexes are also enriched in gene promoters, for example, of the high-profile oncogenes MYC, HIF1A, KRAS, c-KIT, VEGF, BCL-2, PDGFA, and PDGFRB (Qin and Hurley 2008; Yang and Okamoto 2010; Balasubramanian et al. 2011; Chen et al. 2022). In contrast to G-quadruplexes formed by single-stranded telomeric

8

Structures of G-Quadruplexes and Their Drug Interactions

251

Fig. 3 Sequences of human genomic DNA G-quadruplexes grouped by folding topologies. The 1-nt propeller parallel motifs (G3NG3, G2NG3) are boxed. Tetrad Gs are shaded in pink with broken-strand Gs in brown

overhangs, the presence of a complementary strand makes promoter G-quadruplex folding compete with the B-DNA duplex formation. Therefore, promoter G-quadruplex formation requires duplex melting, which is driven by negative supercoiling in the promoter proximal region induced by active transcription (Kouzine et al. 2008; Zheng et al. 2017). In contrast to the uniform and repetitive telomeric DNA, promoter DNA exhibits large sequence diversity and frequently contains more than four G-runs of unequal lengths, as well as spacing segments of various length and composition (Fig. 3). Thus, G-rich promoter sequences often form multiple G-quadruplexes of distinct folding. Determination of the major or physiologically relevant conformation is important to understand the function of promoter G-quadruplexes and their drug targeting. Strikingly, the G3NG3 element that robustly forms a parallel structure motif with a 1-nt propeller loop is prevalent in many promoter G-quadruplexes (Fig. 3) (Yang and Okamoto 2010; Chen et al. 2022). By inducing a local parallel topology, the G3NG3 motif was likely evolutionarily selected to establish a stable foundation for promoter DNA G-quadruplex formation and drive the parallel folding.

Parallel DNA G-Quadruplexes in Gene Promoters Parallel G-quadruplexes are commonly found in promoter sequences, for example, for the MYC, c-KIT, VEGF, and KRAS genes. They often contain two G3NG3 motifs with 1-nt linkers and a middle loop of variable length (Fig. 3). One of the best-studied parallel promoter G-quadruplexes is found in the MYC gene promoter. The Myc transcription factor is a master regulator that promotes

252

Y. Han et al.

Fig. 4 (a) The human MYC promoter NHE schematics. (b) The 28-nt NHE sequence MycPu28 has five G-runs (I-V, underlined). The Myc2345 and Myc1245 sequences are also shown with tetrad Gs underlined. (c) The major G-quadruplex Myc2345 utilizes G-run II-V to adopt a parallel folding with 1:2:1 loop arrangement in K+ solution. The folding and molecular structure are shown (Ambrus et al. 2005). (d) The minor G-quadruplex Myc1245 utilizes G-run I, II, IV, and V to adopt a parallel folding with 1:6:1 loop arrangement in K+ solution. Folding and molecular structure are shown (Dickerhoff et al. 2019). Color code: G ¼ red, A ¼ green, T ¼ blue

cancer growth, proliferation, and survival. Its overexpression in both blood and solid tumors makes it a very attractive anticancer drug target (Dang 2012). The MYC Gquadruplex-forming region is a 28-nt sequence (MycPu28) located in the G-rich nuclease hypersensitive element (NHE) upstream to the transcription start site that controls ~85% of the transcriptional activity (Fig. 4a) (Simonsson et al. 1998; Siddiqui-Jain et al. 2002). MycPu28 consists of three G4-runs (I, III, and V) and two G3-runs (II and IV). The major MYC G-quadruplex utilizes G-runs II-V and is hence termed “Myc2345” (Fig. 4b) (Siddiqui-Jain et al. 2002). The Myc2345 G-quadruplex structure in K+ solution was determined by NMR and has three tetrads with three propeller loops of 1:2:1 length (Fig. 4c) (Ambrus et al. 2005). In this structure, the connecting points of two parallel G3-tracts are close due to the righthanded twist DNA and the 1-nt propeller loop spanning the three-tetrad G-core is energetically favored. In fact, the Myc2345 G-quadruplex and other parallel structures with short propeller loops exhibit remarkable stability, and 1-nt loops are

8

Structures of G-Quadruplexes and Their Drug Interactions

253

prevalent in promoter G-quadruplex sequences and recognized as the most preferred propeller loop conformer (Chen et al. 2022). Besides the major Myc2345, the MYC promoter NHE can also form two alternative minor parallel-stranded G-quadruplexes. Using the first G-runs I–IV, the “Myc1234” conformer also has a 1:2:1 loop size and is connected to supercoiled DNA (Mathad et al. 2011). In addition, a parallel MYC G-quadruplex with 1:6:1 loop length was found to form with G-runs I, II, IV, and V involved in the G-core, named “Myc1245” (Fig. 4b) (Dickerhoff et al. 2019). This 6-nt loop is stretched toward the 50 G-tetrad to form a hydrogen-bonded T2:G3:A16 capping with the 50 flanking (Fig. 4d). Propeller loops longer than 3 nt in other sequences show similar interactions with flanking bases to form a joint capping, as observed for VEGF, c-KIT2, and KRAS G-quadruplexes (Kuryavyi et al. 2010; Agrawal et al. 2013; Kerkour et al. 2017).

Broken-Strand DNA G-Quadruplexes in Gene Promoters The G-quadruplex core can also contain a discontinuous or broken strand, a feature mostly observed in parallel structures. Broken-stranded G-quadruplexes have been found for c-KIT and PDGFRB promoter sequences (Phan et al. 2007a; Chen et al. 2012; Onel et al. 2018). Among these, the PDGFRB gene promoter has been the most studied. PDGFR-β (platelet-derived growth factor receptor beta) is a cell surface receptor tyrosine kinase whose upregulation is pivotal for cancer metastasis, cardiovascular diseases, and fibrotic disorders (Andrae et al. 2008). The G-rich NHE upstream to the transcription start site of the PDGFRB promoter can form G-quadruplexes that modulate PDGFRB gene transcription (Qin et al. 2010). The major PDGFR-β promoter G-quadruplex adopts a broken-strand parallel G-quadruplex with all single-nt propeller loops (Fig. 5a) (Chen et al. 2012). The third G-tract is a broken strand, with a G2 segment that is completed by the distal G from the 30 flanking delivered through a lateral loop. Interestingly, all known PDGFR-β promoter G-quadruplexes are broken-stranded structures with a G2-run (Chen et al. 2012; Onel et al. 2018). In a broken-stranded G-quadruplex, the intramolecularly filled-in guanine is dynamic and can detach from the G-core to create a vacancy G-quadruplex (vG4) (Fig. 5b). For example, in the major PDGFR-β G-quadruplex, the fill-in G can dislodge from the 30 G-tetrad (Wang et al. 2020). This leads to two coexisting vG4s in equilibrium with a vacancy in either 50 or 30 G-tetrad because the G2-run can move up and down within the G-core (Fig. 5b). Interestingly, external guanine derivatives, such as dGMP or cGMP, can fill the vacancy to complete the G-tetrad because they match the required Hoogsteen hydrogen bond pattern (Fig. 5c, d) (Wang et al. 2020). This suggests that guanine metabolites may regulate PDGFRB gene expression through binding and stabilizing the PDGFR-β promoter vG4. The NMR K+ solution structure of the dGMP-filled-in PDGFR-β vG4 shows a preference of dGMP for the 50 vacancy, likely due to the steric hindrance and electrostatic repulsion between the dGMP phosphate and the DNA backbone at the 30 vacancy

254 Fig. 5 (a) The major G-quadruplex formed in the human PDGFR-β promoter in K+ solution has a broken strand with an intramolecular fill-in G via a lateral loop (Chen et al. 2012). DNA sequence is shown with G-tetrad guanines underlined. (b) Folding of the two equilibrating vG4s that can be formed by detaching of the intramolecular fill-in G. vG4 DNA sequence is shown with G-tetrad guanines underlined. (c) Folding (top) and the K+ solution NMR structure (bottom left) of the 50 -vacancy vG4 filled in with dGMP. dGMP complements the Hoogsteen hydrogen bonds of the vacancy tetrad (bottom right) (Wang et al. 2020). G-tetrad guanines are underlined. Color code: G ¼ red, A ¼ green, C ¼ yellow, dGMP ¼ cyan

Y. Han et al.

8

Structures of G-Quadruplexes and Their Drug Interactions

255

site (Fig. 5d). In contrast, cGMP can fill in at both the 50 - and 30 - vacancy sites of the PDGFR-β vG4. Furthermore, vG4s can be induced by cellular factors such as oxidative damage. In a recent structural study of the BLM (Bloom syndrome protein) promoter, (Croteau et al. 2014) oxidation-induced 8-oxo guanine destabilizes the native parallel G-quadruplex structure and dislodges itself from the G-core to form vG4s (Wang et al. 2022). Like the PDGFR-β vG4, the 8-oxo-guanine-induced vG4s can be filled in by guanine metabolites in K+ solution, as shown by the NMR structure of the cGMP-filled-in BLM vG4. However, compared to nonselective fill-in of cGMP to the PDGFR-β vG4, cGMP prefers the 50 vacancy of the BLM vG4. In this case, cGMP filling of the 30 vacancy could be impeded by the 30 flanking because the BLM vG4 has the G2 strand within the 30 -terminal G-tract.

Promoter DNA G-Quadruplexes with Long Loops and Hairpin Motifs Most promoter G-quadruplexes have closely spaced G-runs and therefore contain short loops, and their folding is also kinetically favored. In contrast, longer connecting segments between G-runs slow down folding kinetics and increase the likelihood of duplex formation with the complementary strand (Zhang and Balasubramanian 2012). Nevertheless, promoter G-quadruplexes with long loops exist, which can be stabilized by hairpin structures with hydrogen-bonded base pairs in addition to capping structures. The BCL2 gene encodes an important inhibitor of apoptosis, and BCL2 overexpression promotes cancer resistance (Kale et al. 2018). Multiple G-quadruplexes with long loops can form in the BCL2 promoter within two adjacent regions that regulate transcription, i.e., Pu39 and P1G4 (Fig. 6a) (Dexheimer et al. 2006; Onel et al. 2016). The presence of multiple G-quadruplexes suggests different protein interactions for transcriptional regulation. The Pu39 region contains six G-runs and forms an equilibrium of two three-tetrad G-quadruplexes with distinct folding (Fig. 6a right) (Dai et al. 2006; Agrawal et al. 2014). The G-quadruplex conformer “Pu39-mid” adopts a hybrid-2 topology using G-runs II-V as shown by the NMR structure in K+ solution (Dai et al. 2006). In addition to a 1-nt propeller loop parallel motif and a 3-nt lateral loop above the 30 G-tetrad, the central 7-nt lateral loop forms a reversed Watson-Crick A10-T15 base pair capping the 50 G-tetrad. The second “Pu39-1245” G-quadruplex utilizes G-runs I, II, IV, and V and forms a parallel G-quadruplex with a long 13-nt central propeller loop (Agrawal et al. 2014). Long loops within G-quadruplexes can form hairpin structures with multiple Watson-Crick base pairs. The P1G4 region is located 11 bases downstream to the Pu39 region and contains five G3 runs. It uses G-runs I, II, IV, and V to form two coexisting parallel G-quadruplexes of three tetrads (Onel et al. 2016). The first P1G4 species has two 1-nt propeller loops and a central 12-nt propeller loop. The second P1G4 species has a broken strand with three 1-nt propeller loops and an 11-nt loop (Fig. 6a left). NMR studies showed that the long loops of both P1G4 species form a hairpin stabilized by two Watson-Crick G:C base pairs (Onel et al. 2016). Multiple

256

Y. Han et al.

Fig. 6 (a) The two G-quadruplex-forming regions of the human BCL2 promoter, P1G4 and Pu39 (Dexheimer et al. 2006; Onel et al. 2016). Left: The P1G4 region in K+ solution forms two coexisting parallel G-quadruplexes using G-runs I, II, IV, and V: one with 1:12:1 loop arrangement, one with three 1-nt propeller loops, and a 11-nt loop with a broken strand. Both long loops form a hairpin structure with Watson-Crick base pairs (dash lines) (Onel et al. 2016). Right: The Pu39 species in K+ solution include the Pu39-1245 G-quadruplex (G-run I, II, IV, and V) that adopts a parallel folding with 1:13:1 loop arrangement (Agrawal et al. 2014). The other species is the Pu39Mid G-quadruplex (G-run II-V) with a hybrid-2 folding with a 3-nt lateral, a 7-nt lateral, and a 1-nt propeller loop (Dai et al. 2006). (b) The K+ solution NMR structures of two coexisting human PIM1 promoter G-quadruplexes (Tan et al. 2020). DNA sequences are shown with G-tetrad guanines underlined. The form 1 adopts a hybrid-1 folding with a 1:10:2 loop arrangement (left) and the form 2 adopts a two-tetrad, chair-type folding with a 2:10:4 loop arrangement (right). Both contain the same 10-nt lateral duplex hairpin. Watson-Crick base pairs are shown in bold. Color code: A ¼ green, T ¼ blue, C ¼ yellow, G ¼ red, with syn-G ¼ purple in folding schematics

other sequences in the human genome exhibit this property, as shown by the NMR structures of the PIM1 promoter G-quadruplexes (Lim et al. 2015; Tan et al. 2020). The major three-tetrad hybrid-1 G-quadruplex (Form 1) of 1:10:2 loop length and a minor two-tetrad antiparallel chair-type G-quadruplex (Form 2) of 2:10:4 loop length coexist within the PIM1 promoter (Fig. 6b). Interestingly, the 10-nt lateral loop in both structures forms the same hairpin structure with three Watson-Crick G:C base pairs despite their distinct topologies (Tan et al. 2020).

8

Structures of G-Quadruplexes and Their Drug Interactions

257

Left-Handed DNA G-Quadruplexes Although physiologically relevant DNA secondary structures are dominated by right-handed forms, such as the A- and B-DNA duplex, left-handed structures have been reported for DNA G-quadruplexes (Chung et al. 2015). AS1411 50 -(GGT)4TG(TGG)4-30 is a synthetic G-quadruplex-forming DNA that exhibits anticancer activity (Bates et al. 2017) and forms a mixture of different conformations (Dailey et al. 2010). A modified AS1411 sequence, 50 -T(GGT)4TG(TGG)3TGT2-30 , forms a left-handed G-quadruplex in both K+-containing solution and crystalline state (Chung et al. 2015). This structure adopts a left-handed parallel folding with two two-tetrad G-quadruplex units stacked upon each other (Fig. 7a). The two units are connected by a 2-nt linker and coordinate a K+ between them. Interestingly, the first G-core guanine adopts a syn-conformation despite all G-tracts of this structure being parallel, an exception to the structural pattern concluded from right-handed Fig. 7 (a) The K+ crystal structure of a left-handed G-quadruplex formed by the modified AS1411 DNA (Chung et al. 2015). DNA sequence is shown with G-tetrad guanines underlined. The two left-handed, parallel structures stack over each other. (b) The K+ solution NMR structure of a G-quadruplex formed in the modified ALS/FTD-related G4CC tandem repeat with an 8-bromo-G21 (Brcic and Plavec 2018). DNA sequence is shown with G-tetrad guanines underlined. This structure adopts a four-tetrad, chair-type folding with three 2-nt lateral loops. Color code: T ¼ blue, C ¼ yellow, G ¼ red, BrG ¼ pink; in folding schematics: antiG ¼ red, syn-G ¼ purple

258

Y. Han et al.

G-quadruplexes. In addition, all bases of the single-nt propeller loops of this lefthanded structure point toward the G-core and stack over the external G-tetrads. This is in stark contrast to single-nt propeller loops of a right-handed G-quadruplex that are oriented away from the G-core and located in the grooves.

Four-Tetrad DNA G-Quadruplexes While most intramolecular G-quadruplexes contain three G-tetrads, structures with four tetrads have been reported. For example, a GGGGCC tandem repeat DNA is found in the intron region of the C9orf72 gene and expands with the progression of ALS and FTD diseases (Haeusler et al. 2014). A modified sequence (GGGGCC)3GG BrGG with an 8-brominated G21 forms a four-tetrad G-quadruplex in K+ solution (Fig. 7b) (Brcic and Plavec 2018). This structure exhibits an antiparallel chair-type folding with the two top lateral-loop cytosines C6:C18 potentially engaging in a hydrogen-bonded capping and the bottom lateral-loop cytosines C11: C12 capping the other terminal G-tetrad. Interestingly, all four G-tetrads in this antiparallel G-quadruplex have alternating G-core guanine arrangements and show only heteropolar stacking, in contrast to a three-tetrad antiparallel G-quadruplex with two homopolar stacking G-tetrads (Fig. 2f) (Geng et al. 2019).

Structural Basis of Small Molecule Interactions of DNA G-Quadruplexes In contrast to duplex DNA, intramolecular G-quadruplexes are globular structures with distinctive structural motifs that can be more selectively targeted by small molecules. The cancer relevance of DNA G-quadruplexes in human telomeres and oncogene promoters has motivated intensive efforts to develop G-quadruplex-interactive compounds for cancer therapy (Fig. 8). Whereas early efforts focused on maximizing the ligand stacking with the external G-tetrads, recent ligands established more specific binding to G-quadruplex structures by recognizing structural features such as flanking and loop residues. End-stacking is strongly preferred as a binding mode for G-quadruplex over intercalation within G-tetrads. Unlike duplex DNA, groove binding is rare for G-quadruplex ligands and no highresolution structure has yet been shown for an intramolecular G-quadruplex bound by a sole groove-binder. High-resolution structures of G-quadruplexes in complex with small molecules provide important information for small molecule recognition of G-quadruplexes and structure-based rational drug design. This section discusses different G-quadruplex drug binding modes using representative G-quadruplexligand complex structures.

8

Structures of G-Quadruplexes and Their Drug Interactions

259

Fig. 8 Examples of G-quadruplex-binding small molecules

G-Quadruplex Interactions with End-Stacking Compounds External G-tetrads of G-quadruplexes are unique structural features accessible for ligand stacking. Because a G-tetrad consists of four bases compared to duplex base pairs, its surface is notably larger and enables selective G-quadruplex recognition by small molecules with extended hydrophobic surfaces. Earlier G-quadruplex drug design was focused on maximizing end-stacking interactions using large aromatic moieties of multiple fused or interconnected aromatic rings, such as TMPyP4, telomestatin, and naphthalene diimide (Fig. 8) (Yang and Okamoto 2010; Micco et al. 2013). In the K+ crystal structure of a naphthalene diimide compound in complex with the human telomeric DNA wt-Tel22, the DNA forms a 50 -50 dimeric parallel G-quadruplex (Fig. 9a) (Micco et al. 2013). One naphthalene diimide molecule stacks on each 30 external G-tetrad. The positively charged side chains of each ligand are positioned near the G-quadruplex grooves, with one chain locating within the groove and engaging in electrostatic interactions. In an NMR solution structure of a telomestatin derivative in complex with a human hybrid-1 telomeric G-quadruplex in K+ containing solution, the ligand stacks on the 50 G-tetrad (Fig. 9b) (Chung et al. 2013). In this case, ligand binding disrupts the 50 -capping structure observed for the free hybrid-1 telomeric G-quadruplex. Similar extensive end stacking with G-quadruplexes has also been observed for other large aromatic compounds, such as TMPyP4 (Phan et al. 2005), DAOTA-M2 (Kotar et al. 2016), Auoxo6 (Wirmer-Bartoschek et al. 2017), and a ruthenium complex [Ru (phen)2(dppz)]2+ (McQuaid et al. 2022).

260

Y. Han et al.

Fig. 9 (a) The K+ crystal structure showing the 1:1 binding of naphthalene diimides (yellow spheres) to human telomeric wt-Tel21 DNA, which adopts a 50 -50 dimerized parallel G-quadruplex with naphthalene diimide stacking on the 30 G-tetrad (Micco et al. 2013). wt-Tel21 DNA sequence is shown with G-tetrad guanines underlined. (b) The K+ solution NMR structure showing the 1:1 binding of a telomestatin derivative (yellow spheres) to the human telomeric Tel24 hybrid-1 G-quadruplex via stacking on the 50 G-tetrad (Chung et al. 2013). Tel24 DNA sequence is shown with G-tetrad guanines underlined. Color scheme: A ¼ green, T ¼ blue, G ¼ red, ligands ¼ yellow

8

Structures of G-Quadruplexes and Their Drug Interactions

261

Small Molecule Recognition of G-Quadruplexes with Additional Loop and Capping Interactions Compared to end-stacking ligands with large symmetric aromatic rings, compounds with smaller and asymmetric structures often recruit a DNA base to form a joint plane stacking over the external G-tetrad. In addition, they can interact with specific structural features, such as loop and capping motifs, to enhance binding selectivity. The resulting drug binding sites can be extensive and involve flanking segments, lateral and diagonal loops, as well as long propeller loops. One example is epiberberine (EPI) binding to the human hybrid-2 telomeric G-quadruplex (Fig. 10a) (Lin et al. 2018). As shown by the NMR structure in K+ solution, EPI binds with a 1:1 ratio at the 50 G-tetrad through an extensive four-layer binding pocket. This binding pocket involves specific interactions with the 50 G-tetrad, the second lateral loop, and the 50 flanking. Specifically, EPI recruits the 50 -flanking A3 to form a hydrogen-bonded ligand-base plane stacking over the 50 G-tetrad. The EPI-A3 co-plane is covered by a hydrogen-bonded T2:T13:A15 triad, which is further capped by a hydrogen-bonded T1:T14 base pair, both involving

Fig. 10 (a) The K+ solution NMR structure showing the 1:1 binding of EPI (yellow spheres) to the human telomeric wt-Tel26 hybrid-2 G-quadruplex (Lin et al. 2018). wt-Tel26 DNA sequence is shown with G-tetrad guanines underlined. The binding pocket consists of four stacked layers: EPI recruits A3 and stacks the 50 G-tetrad (bottom), followed by a A15:T2:T13 triad (middle), and covered by a T1:T14 base pair (top). (b) The K+ solution NMR structure showing the 1:1 binding of Pt-tripod (yellow spheres) to the human telomeric Tel26 hybrid-1 G-quadruplex (Liu et al. 2018). Tel26 DNA sequence is shown with G-tetrad guanines underlined. Pt-tripod binds in the three grooves with three arms and recruits A21, which is capped by an A3:A9:T20 triad. Hydrogen bonds are shown by dashed lines. Color code: A ¼ green, T ¼ blue, G ¼ red, ligands ¼ yellow

262

Y. Han et al.

residues from the 50 flanking and the second lateral loop. These molecular interactions are specific to the hybrid-2 folding and human telomeric sequence. In fact, this molecular recognition is so specific that EPI converts other telomeric DNA topologies to the hybrid-2 conformation, including the hybrid-1 and basket G-quadruplexes, as well as unfolded telomeric DNA (Lin et al. 2018). Another example is the binding of the Pt-tripod compound to the human hybrid-1 telomeric G-quadruplex (Fig. 10b) (Liu et al. 2018). The K+ solution NMR structure shows that the Pt-tripod binds at the 50 G-tetrad of the G-quadruplex at 1:1 ratio. Specifically, the three arms carrying the positively charged platinum ions of the Pt-tripod reach into the three accessible grooves to facilitate electrostatic interactions with the negatively charged sugar-phosphate backbone. The Pt-tripod recruits A21 from the third lateral loop to form a ligand-base co-plane stacking over the 50 G-tetrad; the ligand-base plane is further covered by a A3:A9:T20 triad formed by the 50 flanking, the first propeller loop, and the third lateral loop, respectively. Interestingly, the Pt-tripod can further bind to the 30 G-tetrads at higher drug-DNA ratios and induces a dimeric structure with a 4:2 binding stoichiometry (Liu et al. 2018). These examples demonstrate that ligand interactions with more complex binding pockets formed by specific flanking and loop residues contribute to high G-quadruplex binding selectivity.

Small Molecule Recognition of Parallel G-Quadruplexes Parallel G-quadruplexes with short (1 or 2-nt) propeller loops are common among promoter G-quadruplexes. These G-quadruplexes lack extended loop structures, and selective small molecule recognition can be achieved by recruiting the adjacent flanking bases to form a ligand-base co-plane that stacks on the external G-tetrads. Due to the major role of the flanking recruitment in short-looped parallel G-quadruplexes, their selective recognition becomes dependent on the variable flanking sequences. The most-studied system is the recognition of the major MYC promoter G-quadruplex (Myc2345) by small molecules (Fig. 11). NMR structures of ligand complexes in K+ solution have been determined for quindoline-i, DC-34, BMVC, PEQ, and berberine (Fig. 8) (Dai et al. 2011; Calabrese et al. 2018; Liu et al. 2019; Dickerhoff et al. 2021a, b). All these small molecules bind the Myc2345 G-quadruplex with a 2:1 stoichiometry and one ligand at each external G-tetrad, where the adjacent flanking residues A6 and T23 are recruited to form a ligand-base co-plane covering the external G-tetrads (Fig. 11). Interestingly, the conformations of A6 and T23 in the drug-base co-plane are almost independent from the recruited compounds, indicating a sequence-specific and conserved binding pocket. Additionally, the solution structure of the Myc2345 G-quadruplex bearing the wild-type G23 in complex with PEQ reveals a different recruited G23 orientation and PEQ position compared to the mutated T23. This highlights the importance of the identity of the recruited base for selectivity because it modulates both the drug-DNA interface

8

Structures of G-Quadruplexes and Their Drug Interactions

263

Fig. 11 The K+ solution NMR structure of the 2:1 complexes of quindoline-i and PEQ (yellow) with the Myc2345 parallel G-quadruplex (Dai et al. 2011; Dickerhoff et al. 2021b). Myc2345 DNA sequence is shown with G-tetrad guanines underlined. Right: Ligands recruit the adjacent base to form a ligand-base co-plane stacking over the external G-tetrads, with the recruited A6 at the 50 end (top) and T23/G23 at the 30 end (bottom), respectively. Color code: A ¼ green, T ¼ blue, G ¼ red, ligands ¼ yellow

available for hydrogen bonds and the position of the small molecule functional groups in a sequence-dependent manner (Dickerhoff et al. 2021b). Moreover, the observed sequence-dependent conserved binding pockets justify using determined complex structures for in silico and structure-based rational design of drugs targeting short-looped parallel G-quadruplexes.

Small Molecule Interactions with Vacancy G-Quadruplex Bound by Metabolites Vacancy G-quadruplexes (vG4s) have been recently found in the human genome, and they often adopt parallel folding (Fig. 1). The PDGFRB gene promoter can form vG4s to be filled in by guanine metabolites like dGMP (Fig. 5c, d) (Wang et al. 2020). This PDGFR-β vG4-dGMP binary complex can be further stabilized by small molecules, such as berberine (BER), and the ternary complex of BER, dGMP, and the PDGFR-β vG4 has been determined by NMR in K+ solution (Fig. 12) (Wang et al. 2021). In the ternary complex, berberine binds at both ends and recruits flanking residues to cover external tetrads, significantly stabilizing the dGMP-filledin PDGFR-β vG4. The 50 -bound berberine directly covers the fill-in dGMP in the 50 G-tetrad. This ternary structure indicates that small molecule binding can shift the vG4 equilibrium toward the dGMP-fill-in species. Furthermore, this structure suggests that specific recognition of the PDGFR-β vG4 may be achieved by ligandguanine conjugates and provides a molecular basis for the rational design of such compounds.

264

Y. Han et al.

Fig. 12 The K+ solution NMR structure of the ternary complex formed by berberine (cyan) and the dGMP-fill-in (yellow spheres) PDGFR-β vG4 (Wang et al. 2021). vG4 DNA sequence is shown with G-tetrad guanines underlined. Right: dGMP complements the Hoogsteen hydrogen bonds of the vacancy tetrad (bottom), while the 50 -bound berberine recruits A2 and covers the dGMP fill-intetrad (top). Sequence is shown with G-tetrad guanines underlined. Color code: A ¼ green, C ¼ blue, G ¼ red, dGMP ¼ yellow, berberine ¼ cyan

Small Molecule Interactions with G-Quadruplex-Duplex Junction DNA G-quadruplexes with a hairpin loop contain a duplex-quadruplex junction as a unique binding interface for drug interactions. For example, a K+ solution NMR structure shows the binding of the platinum-containing L1Pt(dien) ligand to the MYT1L G-quadruplex-duplex hybrid (Fig. 13) (Liu et al. 2021). The MYT1L G-quadruplex forms in the intron of the myelin transcription factor 1 like gene. It adopts a hybrid folding topology with a 13-nt lateral loop that folds into a hairpin structure consisting of two A-T and two G-C Watson-Crick base pairs, with the first hairpin base pair stacking on the 30 external G-tetrad. In the complex structure, the fused aromatic ring system of L1Pt(dien) intercalates between the 30 G-tetrad and the adjacent hairpin base pair, while the platinum side chain is positioned within the minor groove of the duplex hairpin to engage in potential electrostatic interactions. Additionally, pyridostatin derivatives (Rodriguez et al. 2008) and an indoloquinoline derivative have been shown to recognize DNA G-quadruplex-duplex junctions by intercalating between the external G-tetrad and the adjacent duplex base pair (Vianney and Weisz 2022; Liu et al. 2022).

8

Structures of G-Quadruplexes and Their Drug Interactions

265

Fig. 13 The K+ solution NMR structure of L1Pt(dien) (yellow sphere) binding to the human MYT1L G-quadruplex (Liu et al. 2021). DNA sequence is shown with G-tetrad guanines underlined. The MYT1L G-quadruplex has a hybrid folding with a 13:2:1 loop arrangement. The 13-nt lateral loop forms a duplex hairpin containing four Watson-Crick base pairs. L1Pt(dien) intercalates between the first hairpin base pair and the 30 G-tetrad. Color code: A ¼ green, T ¼ blue, C ¼ cyan, G ¼ red, L1Pt(dien) ¼ yellow

G-Quadruplex Intercalation with Small Molecule Very recently, the compound Phen-DC3 was shown to convert the hybrid-type G-quadruplex formed by a 23-nt human telomeric DNA to the chair-type G-quadruplex (Ghosh et al. 2022). Interestingly, the NMR structure in K+ solution shows that Phen-DC3 intercalates between the middle and bottom G-tetrads of this chair-type G-quadruplex, with the ligand extensively stacking with the two intercalated G-tetrads (Fig. 14). Due to the intercalation by Phen-DC3, these two G-tetrads are spaced further apart and no longer coordinate a K+. This is the first structure showing an intercalation binding of G-quadruplex by a small molecule.

266

Y. Han et al.

Fig. 14 The K+ solution NMR structure showing the 1:1 binding of Phen-DC3 (yellow spheres) to the wild-type 23-nt human telomeric G-quadruplex (Ghosh et al. 2022). The wt-Tel23 DNA sequence is shown with G-tetrad guanines underlined. Upon binding to Phen-DC3, wt-Tel23 forms a chair-type G-quadruplex with the ligand intercalating between the middle and bottom G-tetrads. The Phen-DC3 molecule extensively stacks with the middle and bottom G-tetrads (top right and bottom right). Color code: A ¼ green, T ¼ blue, G ¼ red, Phen-DC3 ¼ yellow

Electrostatic Interactions of G-Quadruplex-Interactive Small Molecules Many G-quadruplex-interactive compounds, such as quinoline PEQ, BER, and EPI, contain centrally located positive charges (Fig. 8) (Lin et al. 2018; Dickerhoff et al. 2021a, b). Upon binding, the centrally located positive charge is often positioned above the center of G-core for possible attractive electrostatic interactions with the negatively polarized pore of the G-core, analogous to the central K+ coordination. Because the DNA backbone is polyanionic, electrostatic interactions potentially drive the binding of positively charged compounds. Further, as the G-core backbones are close in distance, the G-quadruplex grooves show a high charge density and thereby become hotspots for electrostatic interactions. G-quadruplex drugs often have alkyl side chains bearing amines or permanent positively charged functional

8

Structures of G-Quadruplexes and Their Drug Interactions

267

groups to strengthen G-quadruplex binding. For example, the amine side chain of the quindoline-i is protonated and positively charged at physiological pH and locates closely to the G-quadruplex groove for electrostatic interactions (Fig. 11) (Dai et al. 2011). Likewise, the Pt-tripod has three cationic arms for electrostatic anchoring within three grooves (Fig. 10b) (Liu et al. 2018).

Conclusion This chapter provides an overview of DNA G-quadruplexes and their small molecule interactions from a structural perspective. DNA G-quadruplexes are noncanonical four-stranded secondary structures formed in guanine-rich regions of functional significance. They are globularly folded with multiple levels of structural diversity, distinctly different from the double-helix DNA. DNA G-quadruplexes have emerged as a promising class of molecular targets for drug development. The structural information of DNA G-quadruplexes and their small molecule interactions can provide essential information for understanding their biological functions and drug targeting. Acknowledgments This research was supported by the National Institutes of Health R01CA177585 and U01CA240346 (D.Y.), P30CA023168 (Purdue Center for Cancer Research).

References Agrawal P, Hatzakis E, Guo K, Carver M, Yang D (2013) Solution structure of the major G-quadruplex formed in the human VEGF promoter in K+: insights into loop interactions of the parallel G-quadruplexes. Nucleic Acids Res 41:10584–10592. https://doi.org/10.1093/nar/ gkt784 Agrawal P, Lin C, Mathad RI, Carver M, Yang D (2014) The major G-quadruplex formed in the human BCL-2 proximal promoter adopts a parallel structure with a 13-nt loop in K+ solution. J Am Chem Soc 136:1750–1753. https://doi.org/10.1021/ja4118945 Ambrus A, Chen D, Dai J, Jones RA, Yang D (2005) Solution structure of the biologically relevant G-quadruplex element in the human c-MYC promoter. Impli G-quadruplex stabil Biochem 44: 2048–2058. https://doi.org/10.1021/bi048242p Ambrus A, Chen D, Dai J, Bialis T, Jones RA, Yang D (2006) Human telomeric sequence forms a hybrid-type intramolecular G-quadruplex structure with mixed parallel/antiparallel strands in potassium solution. Nucleic Acids Res 34:2723–2735. https://doi.org/10.1093/nar/gkl348 Andrae J, Gallini R, Betsholtz C (2008) Role of platelet-derived growth factors in physiology and medicine. Genes Dev 22:1276–1312. https://doi.org/10.1101/gad.1653708 Balasubramanian S, Hurley LH, Neidle S (2011) Targeting G-quadruplexes in gene promoters: a novel anticancer strategy? Nat Rev Drug Discov 10:261–275. https://doi.org/10.1038/nrd3428 Bates PJ, Reyes-Reyes EM, Malik MT, Murphy EM, O’Toole MG, Trent JO (2017) G-quadruplex oligonucleotide AS1411 as a cancer-targeting agent: uses and mechanisms. Biochim Biophys Acta Gen Subj 1861:1414–1428. https://doi.org/10.1016/j.bbagen.2016.12.015 Blackburn EH, Greider CW, Szostak JW (2006) Telomeres and telomerase: the path from maize, Tetrahymena and yeast to human cancer and aging. Nat Med 12:1133–1138. https://doi.org/10. 1038/nm1006-1133

268

Y. Han et al.

Blackburn EH, Epel ES, Lin J (2015) Human telomere biology: a contributory and interactive factor in aging, disease risks, and protection. Science 350:1193–1198. https://doi.org/10.1126/science. aab3389 Brcic J, Plavec J (2018) NMR structure of a G-quadruplex formed by four d(G4C2) repeats: insights into structural polymorphism. Nucleic Acids Res 46:11605–11617. https://doi.org/10.1093/nar/ gky886 Calabrese DR, Chen X, Leon EC, Gaikwad SM, Phyo Z, Hewitt WM, Alden S, Hilimire TA, He F, Michalowski AM, Simmons JK, Saunders LB, Zhang S, Connors D, Walters KJ, Mock BA, Schneekloth JS (2018) Chemical and structural studies provide a mechanistic basis for recognition of the MYC G-quadruplex. Nat Commun 9:4229. https://doi.org/10.1038/s41467-01806315-w Chen Y, Agrawal P, Brown RV, Hatzakis E, Hurley L, Yang D (2012) The major G-quadruplex formed in the human platelet-derived growth factor receptor β promoter adopts a novel brokenstrand structure in K+ solution. J Am Chem Soc 134:13220–13223. https://doi.org/10.1021/ ja305764d Chen L, Dickerhoff J, Sakai S, Yang D (2022) DNA G-quadruplex in human telomeres and oncogene promoters: structures, functions, and small molecule targeting. Acc Chem Res 55: 2628–2646. https://doi.org/10.1021/acs.accounts.2c00337 Chung WJ, Heddi B, Tera M, Iida K, Nagasawa K, Phan AT (2013) Solution structure of an intramolecular (3 + 1) human telomeric G-quadruplex bound to a telomestatin derivative. J Am Chem Soc 135:13495–13501. https://doi.org/10.1021/ja405843r Chung WJ, Heddi B, Schmitt E, Lim KW, Mechulam Y, Phan AT (2015) Structure of a left-handed DNA G-quadruplex. Proc Natl Acad Sci U S A 112:2729–2733. https://doi.org/10.1073/pnas. 1418718112 Croteau DL, Popuri V, Opresko PL, Bohr VA (2014) Human RecQ helicases in DNA repair, recombination, and replication. Annu Rev Biochem 83:519–552. https://doi.org/10.1146/ annurev-biochem-060713-035428 Dai J, Chen D, Jones RA, Hurley LH, Yang D (2006) NMR solution structure of the major G-quadruplex structure formed in the human BCL2 promoter region. Nucleic Acids Res 34: 5133–5144. https://doi.org/10.1093/nar/gkl610 Dai J, Carver M, Punchihewa C, Jones RA, Yang D (2007a) Structure of the Hybrid-2 type intramolecular human telomeric G-quadruplex in K+ solution: insights into structure polymorphism of the human telomeric sequence. Nucleic Acids Res 35:4927–4940. https://doi.org/10. 1093/nar/gkm522 Dai J, Punchihewa C, Ambrus A, Chen D, Jones RA, Yang D (2007b) Structure of the intramolecular human telomeric G-quadruplex in potassium solution: a novel adenine triple formation. Nucleic Acids Res 35:2440–2450. https://doi.org/10.1093/nar/gkm009 Dai J, Carver M, Hurley LH, Yang D (2011) Solution structure of a 2:1 quindoline-c-MYC G-quadruplex: insights into G-quadruplex-interactive small molecule drug design. J Am Chem Soc 133:17673–17680. https://doi.org/10.1021/ja205646q Dailey MM, Miller MC, Bates PJ, Lane AN, Trent JO (2010) Resolution and characterization of the structural polymorphism of a single quadruplex-forming sequence. Nucleic Acids Res 38: 4877–4888. https://doi.org/10.1093/nar/gkq166 Dang CV (2012) MYC on the path to cancer. Cell 149:22–35. https://doi.org/10.1016/j.cell.2012. 03.003 Dexheimer TS, Sun D, Hurley LH (2006) Deconvoluting the structural and drug-recognition complexity of the G-quadruplex-forming region upstream of the bcl-2 P1 promoter. J Am Chem Soc 128:5404–5415. https://doi.org/10.1021/ja0563861 Dickerhoff J, Onel B, Chen L, Chen Y, Yang D (2019) Solution structure of a MYC promoter G-quadruplex with 1:6:1 loop length. ACS Omega 4:2533–2539. https://doi.org/10.1021/ acsomega.8b03580

8

Structures of G-Quadruplexes and Their Drug Interactions

269

Dickerhoff J, Brundridge N, McLuckey SA, Yang D (2021a) Berberine molecular recognition of the parallel MYC G-quadruplex in solution. J Med Chem 64:16205–16212. https://doi.org/10.1021/ acs.jmedchem.1c01508 Dickerhoff J, Dai J, Yang D (2021b) Structural recognition of the MYC promoter G-quadruplex by a quinoline derivative: insights into molecular targeting of parallel G-quadruplexes. Nucleic Acids Res 49:5905–5915. https://doi.org/10.1093/nar/gkab330 ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74. https://doi.org/10.1038/nature11247 Geng Y, Liu C, Zhou B, Cai Q, Miao H, Shi X, Xu N, You Y, Fung CP, Din RU, Zhu G (2019) The crystal structure of an antiparallel chair-type G-quadruplex formed by Bromo-substituted human telomeric DNA. Nucleic Acids Res 47:5395–5404. https://doi.org/10.1093/nar/gkz221 Ghosh A, Trajkovski M, Teulade-Fichou M, Gabelica V, Plavec J (2022) Phen-DC 3 induces refolding of human telomeric DNA into a chair-type antiparallel G-quadruplex through ligand intercalation. Angew Chem Int Ed 61:e202207384. https://doi.org/10.1002/anie.202207384 Haeusler AR, Donnelly CJ, Periz G, Simko EAJ, Shaw PG, Kim M-S, Maragakis NJ, Troncoso JC, Pandey A, Sattler R, Rothstein JD, Wang J (2014) C9orf72 nucleotide repeat structures initiate molecular cascades of disease. Nature 507:195–200. https://doi.org/10.1038/nature13124 Hänsel R, Löhr F, Trantirek L, Dötsch V (2013) High-resolution insight into G-overhang architecture. J Am Chem Soc 135:2816–2824. https://doi.org/10.1021/ja312403b Heddi B, Phan AT (2011) Structure of human telomeric DNA in crowded solution. J Am Chem Soc 133:9824–9833. https://doi.org/10.1021/ja200786q Hud NV, Smith FW, Anet FA, Feigon J (1996) The selectivity for K+ versus Na+ in DNA quadruplexes is dominated by relative free energies of hydration: a thermodynamic analysis by 1H NMR. Biochemistry 35:15383–15390. https://doi.org/10.1021/bi9620565 Kale J, Osterlund EJ, Andrews DW (2018) BCL-2 family proteins: changing partners in the dance towards death. Cell Death Differ 25:65–80. https://doi.org/10.1038/cdd.2017.186 Kerkour A, Marquevielle J, Ivashchenko S, Yatsunyk LA, Mergny J-L, Salgado GF (2017) Highresolution three-dimensional NMR structure of the KRAS proto-oncogene promoter reveals key features of a G-quadruplex involved in transcriptional regulation. J Biol Chem 292:8082–8091. https://doi.org/10.1074/jbc.M117.781906 Kotar A, Wang B, Shivalingam A, Gonzalez-Garcia J, Vilar R, Plavec J (2016) NMR structure of a triangulenium-based long-lived fluorescence probe bound to a G-quadruplex. Angew Chem Int Ed Eng 55:12508–12511. https://doi.org/10.1002/anie.201606877 Kouzine F, Sanford S, Elisha-Feil Z, Levens D (2008) The functional response of upstream DNA to dynamic supercoiling in vivo. Nat Struct Mol Biol 15:146–154. https://doi.org/10.1038/ nsmb.1372 Kuryavyi V, Phan AT, Patel DJ (2010) Solution structures of all parallel-stranded monomeric and dimeric G-quadruplex scaffolds of the human c-kit2 promoter. Nucleic Acids Res 38: 6757–6773. https://doi.org/10.1093/nar/gkq558 Lim KW, Amrane S, Bouaziz S, Xu W, Mu Y, Patel DJ, Luu KN, Phan AT (2009) Structure of the human telomere in K+ solution: a stable basket-type G-quadruplex with only two G-tetrad layers. J Am Chem Soc 131:4301–4309. https://doi.org/10.1021/ja807503g Lim KW, Jenjaroenpun P, Low ZJ, Khong ZJ, Ng YS, Kuznetsov VA, Phan AT (2015) Duplex stem-loop-containing quadruplex motifs in the human genome: a combined genomic and structural study. Nucleic Acids Res 43:5630–5646. https://doi.org/10.1093/nar/gkv355 Lin C, Wu G, Wang K, Onel B, Sakai S, Shao Y, Yang D (2018) Molecular recognition of the hybrid-2 human telomeric G-quadruplex by epiberberine: insights into conversion of telomeric G-quadruplex structures. Angew Chem Int Ed Eng 57:10888–10893. https://doi.org/10.1002/ anie.201804667 Liu W, Zhong Y-F, Liu L-Y, Shen C-T, Zeng W, Wang F, Yang D, Mao Z-W (2018) Solution structures of multiple G-quadruplex complexes induced by a platinum(II)-based tripod reveal dynamic binding. Nat Commun 9:3496. https://doi.org/10.1038/s41467-018-05810-4

270

Y. Han et al.

Liu W, Lin C, Wu G, Dai J, Chang T-C, Yang D (2019) Structures of 1:1 and 2:1 complexes of BMVC and MYC promoter G-quadruplex reveal a mechanism of ligand conformation adjustment for G4-recognition. Nucleic Acids Res 47:11931–11942. https://doi.org/10.1093/nar/ gkz1015 Liu L-Y, Wang K-N, Liu W, Zeng Y-L, Hou M-X, Yang J, Mao Z-W (2021) Spatial matching selectivity and solution structure of organic-metal hybrid to quadruplex-duplex hybrid. Angew Chem Int Ed Eng 60:20833–20839. https://doi.org/10.1002/anie.202106256 Liu L-Y, Ma T-Z, Zeng Y-L, Liu W, Mao Z-W (2022) Structural basis of pyridostatin and its derivatives specifically binding to G-quadruplexes. J Am Chem Soc 144:11878–11887. https:// doi.org/10.1021/jacs.2c04775 Luu KN, Phan AT, Kuryavyi V, Lacroix L, Patel DJ (2006) Structure of the human telomere in K+ solution: an intramolecular (3 + 1) G-quadruplex scaffold. J Am Chem Soc 128:9963–9970. https://doi.org/10.1021/ja062791w Mathad RI, Hatzakis E, Dai J, Yang D (2011) c-MYC promoter G-quadruplex formed at the 50 -end of NHE III1 element: insights into biological relevance and parallel-stranded G-quadruplex stability. Nucleic Acids Res 39:9023–9033. https://doi.org/10.1093/nar/gkr612 McQuaid KT, Takahashi S, Baumgaertner L, Cardin DJ, Paterson NG, Hall JP, Sugimoto N, Cardin CJ (2022) Ruthenium polypyridyl complex bound to a unimolecular chair-form G-quadruplex. J Am Chem Soc 144:5956–5964. https://doi.org/10.1021/jacs.2c00178 Mergny J-L, Sen D (2019) DNA quadruple helices in nanotechnology. Chem Rev 119:6290–6325. https://doi.org/10.1021/acs.chemrev.8b00629 Micco M, Collie GW, Dale AG, Ohnmacht SA, Pazitna I, Gunaratnam M, Reszka AP, Neidle S (2013) Structure-based design and evaluation of naphthalene diimide G-quadruplex ligands as telomere targeting agents in pancreatic cancer cells. J Med Chem 56:2959–2974. https://doi.org/ 10.1021/jm301899y Miyoshi D, Karimata H, Sugimoto N (2006) Hydration regulates thermodynamics of G-quadruplex formation under molecular crowding conditions. J Am Chem Soc 128:7957–7963. https://doi. org/10.1021/ja061267m Neidle S (2017) Quadruplex nucleic acids as targets for anticancer therapeutics. Nat Rev Chem 1: 0041. https://doi.org/10.1038/s41570-017-0041 Onel B, Carver M, Wu G, Timonina D, Kalarn S, Larriva M, Yang D (2016) A new G-quadruplex with hairpin loop immediately upstream of the human BCL2 P1 promoter modulates transcription. J Am Chem Soc 138:2563–2570. https://doi.org/10.1021/jacs.5b08596 Onel B, Carver M, Agrawal P, Hurley LH, Yang D (2018) The 30 -end region of the human PDGFRβ core promoter nuclease hypersensitive element forms a mixture of two unique end-insertion G-quadruplexes. Biochim Biophys Acta Gen Subj 1862:846–854. https://doi.org/10.1016/j. bbagen.2017.12.011 Parkinson GN, Lee MPH, Neidle S (2002) Crystal structure of parallel quadruplexes from human telomeric DNA. Nature 417:876–880. https://doi.org/10.1038/nature755 Phan AT, Kuryavyi V, Gaw HY, Patel DJ (2005) Small-molecule interaction with a five-guaninetract G-quadruplex structure from the human MYC promoter. Nat Chem Biol 1:167–173. https://doi.org/10.1038/nchembio723 Phan AT, Luu KN, Patel DJ (2006) Different loop arrangements of intramolecular human telomeric (3+1) G-quadruplexes in K+ solution. Nucleic Acids Res 34:5715–5719. https://doi.org/10. 1093/nar/gkl726 Phan AT, Kuryavyi V, Burge S, Neidle S, Patel DJ (2007a) Structure of an unprecedented G-quadruplex scaffold in the human c-kit promoter. J Am Chem Soc 129:4386–4392. https:// doi.org/10.1021/ja068739h Phan AT, Kuryavyi V, Luu KN, Patel DJ (2007b) Structure of two intramolecular G-quadruplexes formed by natural human telomere sequences in K+ solution. Nucleic Acids Res 35:6517–6525. https://doi.org/10.1093/nar/gkm706

8

Structures of G-Quadruplexes and Their Drug Interactions

271

Qin Y, Hurley LH (2008) Structures, folding patterns, and functions of intramolecular DNA G-quadruplexes found in eukaryotic promoter regions. Biochimie 90:1149–1171. https://doi. org/10.1016/j.biochi.2008.02.020 Qin Y, Fortin JS, Tye D, Gleason-Guzman M, Brooks TA, Hurley LH (2010) Molecular cloning of the human platelet-derived growth factor receptor beta (PDGFR-beta) promoter and drug targeting of the G-quadruplex-forming region to repress PDGFR-beta expression. Biochemistry 49:4208–4219. https://doi.org/10.1021/bi100330w Rodriguez R, Müller S, Yeoman JA, Trentesaux C, Riou J-F, Balasubramanian S (2008) A novel small molecule that alters shelterin integrity and triggers a DNA-damage response at telomeres. J Am Chem Soc 130:15758–15759. https://doi.org/10.1021/ja805615w Sfeir AJ, Chai W, Shay JW, Wright WE (2005) Telomere-end processing the terminal nucleotides of human chromosomes. Mol Cell 18:131–138. https://doi.org/10.1016/j.molcel.2005.02.035 Shay JW, Reddel RR, Wright WE (2012) Cancer and telomeres – an ALTernative to telomerase. Science 336:1388–1390. https://doi.org/10.1126/science.1222394 Siddiqui-Jain A, Grand CL, Bearss DJ, Hurley LH (2002) Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c- MYC transcription. Proc Natl Acad Sci U S A 99:11593–11598. https://doi.org/10.1073/pnas.182256799 Simonsson T, Pecinka P, Kubista M (1998) DNA tetraplex formation in the control region of c-myc. Nucleic Acids Res 26:1167–1172. https://doi.org/10.1093/nar/26.5.1167 Spiegel J, Adhikari S, Balasubramanian S (2020) The structure and function of DNA G-quadruplexes. Trend Chem 2:123–136. https://doi.org/10.1016/j.trechm.2019.07.002 Tan DJY, Winnerdy FR, Lim KW, Phan AT (2020) Coexistence of two quadruplex-duplex hybrids in the PIM1 gene. Nucleic Acids Res 48:11162–11171. https://doi.org/10.1093/nar/gkaa752 Temime-Smaali N, Guittat L, Sidibe A, Shin-ya K, Trentesaux C, Riou J-F (2009) The G-quadruplex ligand telomestatin impairs binding of topoisomerase IIIalpha to G-quadruplexforming oligonucleotides and uncaps telomeres in ALT cells. PLoS One 4:e6919. https://doi.org/ 10.1371/journal.pone.0006919 Vianney YM, Weisz K (2022) Indoloquinoline ligands favor intercalation at quadruplex-duplex interfaces. Chemistry 28:e202103718. https://doi.org/10.1002/chem.202103718 Wang Y, Patel DJ (1993) Solution structure of the human telomeric repeat d[AG3(T2AG3)3] G-tetraplex. Structure 1:263–282. https://doi.org/10.1016/0969-2126(93)90015-9 Wang K-B, Dickerhoff J, Wu G, Yang D (2020) PDGFR-β promoter forms a vacancy G-quadruplex that can be filled in by dGMP: solution structure and molecular recognition of guanine metabolites and drugs. J Am Chem Soc 142:5204–5211. https://doi.org/10.1021/jacs.9b12770 Wang K-B, Dickerhoff J, Yang D (2021) Solution structure of ternary complex of berberine bound to a dGMP-fill-in vacancy G-quadruplex formed in the PDGFR-β promoter. J Am Chem Soc 143:16549–16555. https://doi.org/10.1021/jacs.1c06200 Wang K-B, Liu Y, Li Y, Dickerhoff J, Li J, Yang M-H, Yang D, Kong L-Y (2022) Oxidative damage induces a vacancy G-quadruplex that binds guanine metabolites: solution structure of a cGMP fill-in vacancy G-quadruplex in the oxidized BLM gene promoter. J Am Chem Soc 144: 6361–6372. https://doi.org/10.1021/jacs.2c00435 Wirmer-Bartoschek J, Bendel LE, Jonker HRA, Grün JT, Papi F, Bazzicalupi C, Messori L, Gratteri P, Schwalbe H (2017) Solution NMR structure of a ligand/Hybrid-2-G-quadruplex complex reveals rearrangements that affect ligand binding. Angew Chem Int Ed Eng 56: 7102–7106. https://doi.org/10.1002/anie.201702135 Xu H, Hurley LH (2022) A first-in-class clinical G-quadruplex-targeting drug. The bench-tobedside translation of the fluoroquinolone QQ58 to CX-5461 (Pidnarulex). Bioorg Med Chem Lett 77:129016. https://doi.org/10.1016/j.bmcl.2022.129016 Xu Y, Noguchi Y, Sugiyama H (2006) The new models of the human telomere d[AGGG(TTAGGG) 3] in K+ solution. Bioorg Med Chem 14:5584–5591. https://doi.org/10.1016/j.bmc.2006.04.033 Yang D (2019) G-Quadruplex DNA and RNA. In: Yang D, Lin C (eds) G-Quadruplex nucleic acids. Springer, New York, pp 1–24

272

Y. Han et al.

Yang D, Okamoto K (2010) Structural insights into G-quadruplexes: towards new anticancer drugs. Future Med Chem 2:619–646. https://doi.org/10.4155/fmc.09.172 Zhang AYQ, Balasubramanian S (2012) The kinetics and folding pathways of intramolecular G-quadruplex nucleic acids. J Am Chem Soc 134:19297–19308. https://doi.org/10.1021/ ja309851t Zhang Z, Dai J, Veliath E, Jones RA, Yang D (2010) Structure of a two-G-tetrad intramolecular G-quadruplex formed by a variant human telomeric sequence in K+ solution: insights into the interconversion of human telomeric G-quadruplex structures. Nucleic Acids Res 38:1009–1021. https://doi.org/10.1093/nar/gkp1029 Zheng K, He Y, Liu H, Li X, Hao Y, Tan Z (2017) Superhelicity constrains a localized and R-loopdependent formation of G-quadruplexes at the upstream region of transcription. ACS Chem Biol 12:2609–2618. https://doi.org/10.1021/acschembio.7b00435

In Cell 19F NMR for G-Quadruplex

9

Yan Xu

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In Cell 19F NMR for DNA G-quadruplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In Cell 19F NMR for RNA G-quadruplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In-Cell 19F NMR for Hybrid DNA/RNA G-quadruplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

274 275 275 279 284 288 289

Abstract

G-quadruplexes are four-stranded DNA/RNA structures formed by G-rich sequences. Their structure and function in basic genetic processes are an active area of research in telomere, gene regulation, and functional genomics research. Investigation of G-quadruplex structures associated with biological events is therefore essential to understanding the functions of these molecules. Antibodies and some small molecules have been used to investigate DNA G-quadruplex structures in living cells. However, these methods cannot distinguish the detailed topologies of G-quadruplexes. Very recently, it was demonstrated that 19F NMR spectroscopy can distinguish different nucleic acid structures by the corresponding 19F signal. The simplicity and sensitivity of 19F NMR approach can be used to directly observe DNA G-quadruplex, RNA G-quadruplex, Hybrid DNA/RNA G-quadruplex in vitro and in living cells and quantitatively characterize the thermodynamic properties of the G-quadruplexes. The finding provides new insight into the structural behavior of G-quadruplex in living cells. These results open new avenues for the investigation of G-quadruplex structures in vitro and in living cells. Y. Xu (*) Division of Chemistry, Department of Medical Sciences, Faculty of Medicine, University of Miyazaki, Miyazaki, Japan e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_11

273

274

Y. Xu

Keywords

DNA G-quadruplex · RNA G-quadruplex · Hybrid DNA/RNA G-quadruplex · Telomere · 19F NMR

Introduction Besides encoding the genetic information, DNA itself can take on conformations other than the right-handed B-DNA. Left-handed Z form, triplex, and tetraplex structures of DNA are also known to exist. G-quadruplexes (tetraplex structures) are four-stranded DNA structures formed by G-rich sequences. Such “Non-B type structures” have been suggested to be biologically important in processes such as DNA replication, gene expression and regulation, and the repair of DNA damage (Bochman et al. 2012; Rhodes and Lipps 2015). For example, human telomeric DNA, which forms G-quadruplex structures, protects the cell from recombination and degradation. The gene C9orf72 associated with the neurodegenerative disease amyotrophic lateral sclerosis (ALS) has been shown to form a G-quadruplex structure that alters gene function and increases the risk of ALS disease (Kertesz et al. 2010; Haeusler et al. 2014). RNA structure influences the functions of nearly all classes of RNAs (Kertesz et al. 2010; Wan et al. 2011; Ding et al. 2014), including RNA G-quadruplexes, which are four-stranded RNA structures that have emerged as potential targets for drug design because of their biological importance (Collie and Parkinson 2011; Xu 2011; Balk et al. 2013). Recently, it has been shown that human telomere RNA forms G-quadruplex structures to play an important role in providing a protective structure for telomere ends (Xu et al. 2008; Martadinata and Phan 2009). The understanding of G-quadruplex structure is expected to provide major insights into genome stability, cancer, and related diseases. Fundamental knowledge from this area has opened up the possibility of targeting G-quadruplex in therapeutic strategies against cancer, etc. However, the detailed topologies of G-quadruplexes in cells have not yet been obtained. Therefore, a more effective chemical approach for obtaining structural information on telomere RNA G-quadruplexes in living cells is desired. Although recent developments in NMR technology have made it easier to investigate biological macromolecules in living cells (Hansel et al. 2009; Selenko et al. 2006; Sakai et al. 2006; Sakakibara et al. 2009; Serber et al. 2006), strong background from the cellular environment often leads to complicated or poor-quality in-cell NMR spectra. For example, 1H NMR analysis of a telomere DNA G-quadruplex in the cellular environment yields a low-resolution spectrum (Hansel et al. 2009). Thus, distinguishing the resonance signals for molecules of interest from the background noise arising from the other molecules present in cells remains a challenge. Because there is no natural intracellular concentration of fluorine in cells, there is no background noise in in-cell 19F NMR spectra (Chen et al. 2013; Ye et al. 2015). Therefore, 19F NMR spectroscopy is an ideal tool for studying RNA G-quadruplex

9

In Cell 19F NMR for G-Quadruplex

275

Fig. 1 Concept for the detection different structures by a 19F label. Two 19F resonances of different chemical shifts are expected according to single-stranded and G-quadruplex

structures in living cells (Bao et al. 2017). An additional advantage of 19F NMR spectroscopy is that the 19F nucleus, which has 100% natural abundance (13C is 1.1% and 15N is 0.37%), offers a chemical shift dispersion that is more than 100-fold larger than the one for 1H (Chen et al. (2013). Furthermore, its broader chemical shift range makes the 19F nucleus quite sensitive in the local environment, even in living cells (Ye et al. 2015 and Bao et al. 2017). As19F NMR signals are strongly dependent on the 19F label’s structural environment, it should be possible to distinguish different G-quadruplex structures with the same sequence by the corresponding resonances of the different structures, such as single strand and G-quadruplexes (Fig. 1). Stronger background in the cellular environment often leads to poor-quality in-cell 1H NMR spectra. Because there is no natural intracellular concentration of fluorine in cells, there is no background noise in in-cell 19F NMR spectra. Therefore, 19 F NMR spectroscopy is an ideal tool for studying G-quadruplex structures in living cells.

Results and Discussion In Cell 19F NMR for DNA G-quadruplex Structural investigations have shown that telomere sequences form G-quadruplexes with various folding topologies (Xu 2011). For example, a 22-mer human telomeric sequence d[AGGG(TTAGGG)3] was reported to form a basket-type antiparallel G-quadruplex structure in Na+ solution (Wang and Patel 1993), while it primarily forms the hybrid-1 and hybrid-2 G-quadruplex structures in K+ solution (Xu et al. 2006; Ambrus et al. 2006; Luu et al. 2006; Dai et al. 2007). The same 22-mer sequence was also indicated to adopt a completely different parallel G-quadruplex structure in a crystal grown in the presence of K+ ions, as well as molecular crowding mimics (Parkinson et al. 2002; Nakano et al. 2014). Furthermore, another telomeric sequence d[GGG(TTAGGG)3T] was reported to adopt an antiparallel basket type G-quadruplex with only two G-tetrad layers in K+ solution (Zhang et al. 2010; Lim et al. 2009; Hansel et al. 2013). Recently, some small molecules and antibodies have been used to investigate DNA G-quadruplex structures in living cells (Biffi et al. 2013;

276

Y. Xu

Fig. 2 19F NMR spectra of 19F-labeled 22-mer ODN 1 d[AGGG(TTAGGG)3] in Na+ and K+ solutions. (a) Chemical structure of 19F-labeled DNA bearing 19F group at the 50 terminal. (b)19F NMR of 19F-labeled DNA at different temperatures in Na+ solution. (c) 19F NMR of 19F-labeled DNA at different temperatures in K+ solution. Blue and black spots indicated antiparallel G-quadruplex and single strand, respectively. Green indicated hybrid-1 and hybrid-2 G-quadruplexes conformations. Temperatures indicated on the right. Condition: 0.1 mM DNA in (b) 300 mM NaCl and 20 mM Na-PO4buffer (pH 7.0) or (c) 100 mM KCl and 20 mM K-PO4 buffer (pH 7.0). The sample is kept for 10 min of each temperature for 19F NMR detection

Zhang et al. 2018). However, these methods cannot distinguish the different topologies of G-quadruplexes. Although these approaches gave some structural information, the topology of telomeric DNA G-quadruplex present in living human cells has not yet been obtained. To investigate the human telomeric DNA G-quadruplex in a more realistic human cell environment, the in-cell 19F NMR experiment was performed in HELA CELLS, which is a big step forward to assess the structure of human telomeric DNA in living human cells. To achieve this goal, a 3,5-bis(trifluoromethyl)benzene moiety was introduced into 50 termini of oligonucleotide d[AGGG(TTAGGG)3] ODN 1 using phosphoramidite chemistry (Fig. 2a). Terminal labeling is a very convenient method to connect the 19F NMR sensor and target oligonucleotide since it requires only one step of chemical synthesis. The 19F labeling of RNA with six equivalent 19F atoms was expected to afford the high 19F intensity.

9

In Cell 19F NMR for G-Quadruplex

277

A 19F NMR experiment was performed to investigate the ODN 1 in Na+ solution (an aqueous solution containing 300 mM NaCl and 20 mM Na-PO4 buffer). One sharp peak at 62.57 ppm was observed, which is consistent with the 1H NMR and CD results and with a previous report that showed the 19F-labeled ODN 1 sequence could form an antiparallel G-quadruplex in Na+ solution (Fig. 2b) (Wang and Patel 1993). As the temperature increased, the intensity of the signal decreased; upon heating to 50  C, a new peak corresponding to the unfolded single strand appeared, and at 75  C, only this peak remained with a strong intensity. In the K+ solution, two major peaks at 62.97 and 63.06 ppm were observed (Fig. 2c), which are characterized as hybrid-1 and hybrid-2 G-quadruplex conformations according to the previous studies (Xu et al. 2006; Ambrus et al. 2006; Luu et al. 2006; Dai et al. 2007). Similarly, upon heating the sample, a clearly two-state structural transition from hybrid-type G-quadruplex conformations to a single strand was observed. It has been reported that PEG 200 can induce the structural transition from a hybrid-type to a parallel G-quadruplex (Nakano et al. 2014). Using 19F NMR spectroscopy, we monitored the transition process from the hybrid-type to the parallel G-quadruplex (Fig. 3a). With the addition of 10% (v/v) PEG 200 to the diluted solution, a new peak at 62.83 ppm appeared, and its intensity became remarkably greater than that of the hybrid-type G-quadruplex at 30% PEG 200.

Fig. 3 19F NMR spectra of 19F-labeled ODN 1 in PEG 200. (a) 19F NMR of 19F-labeled 22-mer DNA at different concentration of PEG 200 in K+ solution. (b) 19F NMR of 19F-labeled ODN 1 at different temperatures in 40% PEG 200. Green indicated hybrid-1 and hybrid-2 G-quadruplexes conformations. Orange and black spots indicated parallel G-quadruplex and single strand, respectively. PEG 200 ratio and temperatures indicated on the right. Condition: 0.1 mM DNA in 100 mM KCl and 20 mM K-PO4 buffer (pH 7.0). The sample is kept for 10 min of each temperature for 19F NMR detection

278

Y. Xu

The original peaks for hybrid-type G-quadruplex completely disappeared at 40% and 50% PEG 200. The 19F NMR results are consistent with previous research that PEG 200 can induce the structural transition from a hybrid-type to a parallel G-quadruplex (Nakano et al. 2014). Therefore, we assigned the new peak as parallel G-quadruplex conformation. Figure 3b shows that as the temperature increased from 23 to 90  C in crowded solution of 40% (v/v) PEG, a new peak corresponding to the unfolded single strand appeared and, at 85  C, only this peak remained with a strong intensity. CD experiments were also performed to confirm the formation of parallel G-quadruplexes under molecular crowding mimics. The in-cell NMR study was performed by injecting 19F-labeled telomere RNA into Xenopus laevis oocytes and detecting their in-cell conformations using 19F NMR spectroscopy (Hansel et al. 2014; Dzatko et al. 2018; Salgado et al. 2015). A new method was utilized by using streptolysin O (SLO) to transfect 19F-labeled DNA into HeLa cells (Fig. 4a) (Ogino et al. 2009; Yamaoki et al. 2018). SLO can bind cholesterol in the plasma membrane at low temperature (4  C) and form pores in the membrane to allow the permeabilization of biomolecules into cells at high temperature (37  C). The pore formed by SLO can be resealed by the addition of Ca2+. The SLO treatment cell system can be used as a type of cellular test tube to study the structures, properties, and functions of biomolecules. Here, 19F-labeled

Fig. 4 In-cell 19F NMR of 19F labeled ODN 1. (a) Schematic overview of the SLO treatment cell system for transfection DNA into HeLa cells. (b) Comparison with the position of reference in vitro spectrum provides a reliable determination of intracellular 19F-labeled DNA. (c) Comparison of 19F NMR spectra of ODN 1 in K+ solution, in HeLa cell, in supernatant, difference spectrum between HeLa cell and supernatant, in K+ solution (23 and 60  C) and in 30% PEG 200

9

In Cell 19F NMR for G-Quadruplex

279

22-mer telomeric ODN 1 was transfected into HeLa cells by using the SLO treatment system for in-cell 19F NMR measurements. To compare the in-cell 19F NMR spectrum with the in vitro results of 19F NMR signals for different G-quadruplex conformations, antiparallel, parallel, hybrid-1 and hybrid-2, a reliable determination of the telomeric DNA G-quadruplex conformation in living human cells is required (Fig. 4b). In-cell 19F NMR results showed two major peaks at approximately 62.97 and 63.06 ppm, which were almost identical to those observed in the K+ solution (Fig. 4c). Thus, the in-cell 19F NMR spectrum demonstrates that 19F-labeled telomeric DNA can present hybrid-1 and hybrid-2 G-quadruplex structures in living human cells. After the NMR measurement, the outer solution of the cell suspension was collected and examined by 19F NMR spectroscopy (Fig. 4c). Almost no signal was observed from the supernatant, indicating that almost all of the NMR signals originated from the 19F-labeled DNA within the HeLa cells. A difference spectrum was produced between the HeLa cells and the suspension to eliminate the signal from the supernatant (Fig. 4c). The clear signal in the difference spectrum supports the observation of two hybrid-type G-quadruplex structures in living human cells. A labeled the sequence ODN 2 d[GGG(TTAGGG)3T] reported to form a two-tetrad G-quadruplex was further examined by 19F NMR spectroscopy. Only one 19F NMR signal was observed after incorporation into HeLa cells, which chemical shift is same with the peak obtained from two-tetrad G-quadruplex in K+ solution consistent with previous studies (Zhang et al. 2010; Lim et al. 2009; Hansel et al. 2013). These results indicated that telomere DNA sequence could adopt the two-tetrad G-quadruplex conformation in living human cells. These studies demonstrated that the telomeric DNA sequence can form two hybrid-type and two-tetrad antiparallel G-quadruplex structures by in-cell 19F NMR in living human cells. This result provides valuable information for understanding the structures of human telomeric DNA in living human cells and for the design of new drugs that target telomeric DNA.

In Cell 19F NMR for RNA G-quadruplex RNA structure influences the functions of nearly all classes of RNAs (Kertesz et al. 2010; Wan et al. 2011; Ding et al. 2014), including RNA G-quadruplexes, which are four-stranded RNA structures that have emerged as potential targets for drug design because of their biological importance (Hu et al. 2021; Balk et al. 2013). For example, RNA G-quadruplexes were recently reported to cause protein-dependent oncogene translation in cancer and neurodegenerative diseases (Jain and Vale 2017; Wolfe et al. 2014). It has been shown that 12 nt human telomere RNA forms a parallel two-molecule G-quadruplex structure (Xu et al. 2008). It was also found that a parallel human telomeric RNA G-quadruplex can be stabilized by a U-quartet at the 30 end of the structure. Recently, it was revealed that an eight-stranded helical fragment containing A-, G-, and U-tetrads provided a central intercalated scaffold that connected two G-quadruplex units in an alternating antiparallel arrangement,

280

Y. Xu

giving rise to a novel RNA architecture (Xiao et al. 2017). This novel RNA architecture is very stable with a melting temperature over 90  C, even under denaturing condition. Its unique structural feature adds considerably to our understanding the diversity of RNA architectures. This information is important for research on the biological functions of telomere RNA and for the design of new drugs that target telomere RNA. Recently, some studies have also suggested that telomere RNA G-quadruplexes may form polymorphic higher-order G-quadruplexes comprising two stacked G-quadruplex subunits (Collie et al. 2010; Martadinata and Phan 2013). However, direct evidence for human telomeric RNA higher-order G-quadruplexes existing in cells has not yet been obtained. Therefore, a more effective chemical approach for obtaining structural information on telomere RNA G-quadruplexes in living cells is desired. To achieve this goal, a 3,5-bis(trifluoromethyl)benzene moiety was introduced into 5΄ termini of oligonucleotide ORN-1 using phosphoramidite chemistry (Fig. 5a). Next, a concentration-dependent experiment was performed to investigate the structural behavior of ORN-1 for the formation of RNA G-quadruplex by 19F NMR (Fig. 5b). The 19F NMR spectrum was firstly obtained at 0.2 mM RNA concentration. One signal was observed at 62.91 ppm in the presence of 50 mM KCl, indicating dimer G-quadruplex formation consistent with CD result. As shown in Fig. 5b, a new signal appears as the RNA concentration increases(62.57 ppm). The new signal is clearly observed at an RNA concentration of 1.5 mM, and at 3.0 and 5.0 mM its intensity becomes remarkably greater than that of the initial peak. Each 19F NMR signal arises as a result of the unique fluorine environment; thus, the presence of the two signals confirms the existence of two conformers of the telomere RNA. Accordingly, the two signals were assigned to the dimeric and two-subunit stacked G-quadruplexes in accord with previous studies that suggested two RNA G-quadruplex subunits are most likely to form a two-subunit stacked G-quadruplex (Collie et al. 2010; Martadinata and Phan 2013). A temperature-dependent experiment was then performed to confirm the assignment of the two signals observed in the 19F NMR spectrum (Fig. 5c). As the temperature increased from 23  C to 75  C, the intensity of the signal at 62.57 ppm for the two-subunit stacked G-quadruplex decreased, while that of the signal at 62.91 ppm for the dimeric G-quadruplex increased, indicating the conversion of the two-subunit stacked G-quadruplex to the dimeric G-quadruplex at higher temperatures. Upon heating to 50  C, a new peak at –62.64 ppm corresponding to the unfolded single strand appeared, and at 75  C, only this peak remained with a strong intensity, whereas the peaks of the dimeric G-quadruplex (at 62.91 ppm) and two-subunit stacked G-quadruplex (at 62.57 ppm) completely disappeared. This result indicated that at high temperatures, both the dimer and two-subunit stacked G-quadruplexes unfolded to a single strand. To further confirm that two-subunit G-quadruplexes stack together to form a two-subunit stacked G-quadruplex structure, 1H imino proton NMR analysis was performed. Figure 5e shows the spectrum for a 0.5 mM RNA solution. Six peaks assignable to the dimeric G-quadruplex are observed at 11.0–12.0 ppm that correspond to the peak of the dimeric G-quadruplex in the 19F NMR spectrum (Fig. 5d)

9

In Cell 19F NMR for G-Quadruplex

281

Fig. 5 19F NMR and 1H NMR spectra of 19F labeled RNA. (a) Chemical structure of 19F labeled telomere RNA bearing 3,5 bis(trifluoromethyl)phenyl group at the 50 terminal. (b) 19F NMR spectra of 19F labeled telomere RNA at different concentrations. The peaks of the dimeric and two-subunit stacked G-quadruplex are marked with red and green spots, respectively. Concentrations of RNA indicated on the right. (c)19F NMR of 19F labeled RNA at different temperatures. Red and green spots indicated dimer and two-subunit stacked G-quadruplex. The peaks of single strand are marked with black spots. Temperatures indicated on the right. Condition: 3 mM RNA in 50 mM KCl and 10 mM Tris-HCl buffer (pH 7.0). The sample is kept for 10 min at each temperature. (d) 19F NMR of 19F labeled RNA at different temperatures and concentrations. The peaks of dimer G-quadruplex are red spots. The peak of two-subunit stacked G-quadruplex is green spot. The peaks of single strand are marked with black spots. (e) 1H imino proton NMR of 19F labeled RNA corresponding to 19 F NMR at different temperatures and concentrations. The peaks characteristic of the dimeric and two-subunit stacked G-quadruplex are marked with red and green spots, respectively

282

Y. Xu

and are consistent with the results of previous studies (Xu et al. 2008; Martadinata and Phan 2009)10–13. In the 1H NMR spectrum of a 3 mM RNA solution at 23  C, new signals were observed in addition to major ones (Fig. 5e), which was in agreement with the presence of two signals for the dimeric and two-subunit stacked G-quadruplexes in the 19F NMR spectrum (Fig. 5d) and suggested that the new signals were due to the two-subunit stacked G-quadruplex structure. Upon heating to 60  C, the new signals disappeared and only the dimeric G-quadruplex peaks were detected in the 1H NMR spectrum (Fig. 5e). These observations agreed well with the 19 F NMR spectrum at 60  C (Fig. 5d), in which signal for the dimeric G-quadruplex appeared. Although signal for the unstructured single strand was observed at 19F NMR at 60  C (Fig. 5d), the unstructured single strand does not exhibit any resonance peaks at 11.0–12.0 ppm of the 1H NMR because this region only corresponds to the imino protons of the G-quartet. After complete denaturation (75  C and 85  C), the signals for the dimeric and two-subunit stacked G-quadruplexes were not observed in the 1H NMR spectrum, which is in accordance with the 19F NMR spectrum, in which only the peak for the unstructured single strand was observed. As shown in Fig. 6a, profiles of the relative peak areas of the 19F resonance signals versus concentration revealed that a higher RNA concentration promotes the formation of the two-subunit stacked G-quadruplex, suggesting that the of stacking two G-quadruplex subunits is a concentration-dependent manner. Encouraged by the ability to use 19F NMR spectroscopy to monitor the conformational transition behavior of the RNA telomere via observation of pronounced

Fig. 6 19F NMR shift versus concentration and temperature profiles. (a) Profiles of the relative peak areas of the 19F resonance signals versus concentration. Dimer and two-subunit stacked G-quadruplex conversions followed by 19F NMR spectroscopy. (b) Profiles of the relative peak areas of the 19F resonance signals versus temperature. Two-subunit stacked G-quadruplex/dimer G-quadruplex/single strand conversions followed by 19F NMR spectroscopy. To obtain the relative peak signal of each conformation, the total value of the relative peak signal for three conformations was estimated to be 1.0. Plotting the values of relative peak signal against temperature results in two melting curves for the two-subunit stacked G-quadruplex and dimer G-quadruplex

9

In Cell 19F NMR for G-Quadruplex

283

changes in the 19F resonances as a function of temperature, the melting process was characterized by plotting the relative peak areas of the 19F resonance signals at various temperatures (Fig. 6b). The Tm values for conversion of the two-subunit stacked G-quadruplex to dimeric G-quadruplex and finally to the single strand were estimated to be 52.4  C and 67.8  C, respectively, suggesting stable two-subunit stacked G-quadruplex formation. Importantly, using the corresponding 19F signal curve, a highly precise Tm value was obtained for the two-subunit stacked G-quadruplex, which is not easy to do using other methods, such as CD and ultraviolet spectroscopy, because CD and ultraviolet spectra could not discriminate the higher-order G-quadruplex among different structures. The in-cell NMR study was performed by injecting 19F-labeled telomere RNA into Xenopus laevis oocytes and detecting their in-cell conformations using 19F NMR spectroscopy. The strategy used for in-cell 19F NMR spectroscopy is shown in Fig. 7a. Nuclei of Xenopus oocytes are transferred directly by inserting the glass

Fig. 7 In-cell 19F NMR of telomere RNA G-quadruplex. (a) Schematic overview of in-cell 19F NMR experiments. Xenopus laevis oocytes are sorted and collected for microinjections. For in-cell 19 F NMR applications in Xenopus oocytes, telomere RNA sample was injected into the oocyte cells. Comparison with the position of reference in vitro spectrum provides a reliable determination of intracellular telomere RNA conformation. (b) Comparison of 19F NMR spectra of in vitro sample of telomere RNA (up) with Xenopus egg lysates (middle) and in Xenopus oocytes (bottom)

284

Y. Xu

nuclear transfer pipette into the center of the animal pole of the oocyte (Allen et al. 2007; Halley-Stott et al. 2010). The reference in vitro spectrum was compared to the in-cell 19F-NMR spectrum, enabling reliable determination of the intracellular telomere RNA conformation. Figure 7b shows a comparison of the in vitro and in-cell NMR spectra for the RNA in the pure form (top panel) and upon oocyte injection (bottom panel). Only one signal was observed in the bottom panel NMR spectrum, for which the chemical shift is identical to that observed for the corresponding higher-order G-quadruplex in the in vitro19F NMR spectrum. These results demonstrate that the higher-order G-quadruplex structure is present in living cells. The line width of the signal increased in the in-cell spectrum compared to that in the in vitro spectrum, partially because of the higher viscosity of the cellular environment (Selenko and Wagner 2007; Hansel et al. 2011). NMR signals depend on whether the molecule of interest is freely available for tumbling in solution. In general, molecules display small tumbling rates due to their sizes, intermolecular interactions, intramolecular interactions with subsets of residues, and the high viscosity environments of intracellular materials, leading to fast relaxation and broad NMR lines with reduced overall signal intensities. For the further investigation of the influence of molecular crowding, cellular lysates represent a good molecular crowding since they provide a native-like condition. The crushed oocytes containing the injected RNA were recorded using 19F NMR. Notably, the NMR spectral pattern for the lysate of the telomere RNA is similar to those observed for both the corresponding in vitro and in-cell samples (Fig. 7b), indicating higher-order G-quadruplex formation in a cell-like condition. Therefore, by using 19F NMR spectroscopy, it was demonstrated that the telomere RNA G-quadruplexes preferentially adopt a stacked two-subunit G-quadruplex conformation rather than remaining as single-unit G-quadruplex in living cells, providing the in-cell evidence for the presence of the higher-order G-quadruplex in human RNA. These findings provide new insight into the structural behavior of telomere RNA G-quadruplexes in living cells and open new avenues for the investigation of G-quadruplex structures in living cells.

In-Cell 19F NMR for Hybrid DNA/RNA G-quadruplex Recently, it was discovered that human telomere DNA and RNA molecules form a DNA/RNA hybrid G-quadruplex (HQ) conformation (Xu et al. 2009). This hybrid structure may offer a protective structure for chromosomes and be a valuable target for anticancer drug design. The investigation of telomeric HQ conformations associated with biological events is essential for understanding the functions of human telomeres. Despite the biological importance of the HQ structure, its structural properties are not well understood, and the development of molecular probes to observe the telomeric HQ structure inside living cells has not yet been realized. It is technically difficult to study the properties of telomeric HQ by traditional methods such as CD, 1H NMR, and crystallography because telomeric RNA G-quadruplex (RQ), DQ, and HQ may coexist as a mixture in solution. RNA and DNA may result

9

In Cell 19F NMR for G-Quadruplex

285

in a mixture of the three G-quadruplex types, giving rise to a complicated state. For example, it is not easy to obtain the precise thermodynamic parameters of HQ in this complex chemical environment. A 3,5-bis(trifluoromethyl)benzyl group was labeled on the 50 end of a 12-mer telomeric RNA sequence (UAGGGU)2 and was performed a 19F NMR experiment by combining an unmodified 12-mer DNA sequence with the 19F labeled 12-mer RNA (Fig. 8a, b). As shown in Fig. 8b, the signal at 63.09 ppm was assigned to the telomeric RQ according to previous reports (Xu et al. 2008; Martadinata and Phan 2009). A new peak at 62.62 ppm appeared with the addition of unmodified 12-mer DNA, and its intensity gradually increased as the DNA/RNA ratio increased. Each 19 F NMR signal indicates a unique fluorine environment. Therefore, the new 19F NMR signal suggested that a new structure is formed by telomere RNA and DNA. Accordingly, the new signal was assigned to the DNA-RNA hybrid G-quadruplex (Xu et al. 2009; Xu 2018). To further confirm the HQ structure, a 19F labeled 12-mer telomeric DNA sequence using a 3-(trifluoromethoxy)benzyl group was employed 19F NMR to

Fig. 8 19F NMR of DNA/RNA hybrid G-quadruplex. (a) Concept for the detection HQ structure consisted by 19F labeled telomeric RNA and unmodified DNA by using 19F-NMR spectroscopy. Two 19F resonances of different chemical shifts are expected according to RQ and HQ structures, no signal for DQ. (b) Chemical structure of 19F labeled telomere RNA bearing 3,5 bis(trifluoromethyl) phenyl group at the 50 end. 19F NMR spectra of 19F labeled 12-mer RNA at different ratio of the RNA/DNA. RNA and DNA ratio indicated on the right. Yellow and purple spots indicated parallel RNA G-quadruplex and DNA-RNA hybrid G-quadruplex conformations. Condition: 0.1 mM RNA in 100 mM KCl and 20 mM K-PO4 buffer (pH 7.0)

286

Y. Xu

Fig. 9 19F NMR of DNA/RNA hybrid G-quadruplex. (a) Concept for the detection HQ structure consisted by 19F labeled telomeric DNA and unmodified RNA by using 19F-NMR spectroscopy. Two 19F resonances of different chemical shifts are expected according to DQ and HQ structures, no signal for RQ. (b), Chemical structure of 19F labeled telomere DNA bearing 3-(trifluoromethoxy) benzyl group at the 5΄ end. 19F NMR spectra of 19F labeled 12-mer DNA at different ratio of the DNA/RNA. DNA and RNA ratio indicated on the right. Green and blue spots indicated antiparallel and parallel DNA G-quadruplexes, respectively. Purple spots indicated HQ conformation. Condition: 0.2 mM DNA in 200 mM KCl and 20 mM K-PO4 buffer (pH 7.0)

investigate structural behavior with an increasing quantity of unmodified 12-mer RNA (Fig. 9a, b). One major peak at 58.13 ppm and a minor peak at 58.21 ppm were observed for the DNA sequence, which is consistent with the previous reports that the 12-mer telomeric DNA sequence could simultaneously form parallel and antiparallel G-quadruplex structures in K+ solution (Fig. 9b) (Phan and Patel 2003). After the addition of unmodified 12-mer RNA, a new peak at 58.03 ppm appeared, and its intensity gradually increased as the RNA/DNA ratio increased. The new 19F NMR signal also confirmed the formation of an HQ structure composed of telomere DNA and RNA. Encouraged by the simplicity and sensitivity of 19F NMR spectroscopy for studying telomere HQ in vitro, 19F NMR was utilized to investigate whether this telomeric HQ can form in human cells. To incorporate designed biomolecules for in-cell 19F NMR detection, the pores were formed in the cell membranes of HeLa cells using the bacterial toxin streptolysin O (SLO). Using the SLO approach, the important DQ structures were successfully investigated in living human cells.

9

In Cell 19F NMR for G-Quadruplex

287

Fig. 10 In-cell 19F NMR of DNA/RNA hybrid G-quadruplex. (a) Comparison with the position of reference in vitro spectrum provides a reliable determination of intracellular 19F labeled HQ conformation in living cells. (b), Comparison of 19F NMR spectra of 19F labeled 12-mer telomeric RNA and unmodified DNA in K+ solution, in HeLa cell extraction, in Hela cells, in supernatant and difference spectrum between HeLa cell and supernatant

HQ conformation in living human cells was assessed by comparing the in vitro F NMR spectrum with the in-cell 19F NMR signals (Fig. 10a). A solution with a 1:3 ratio of 19F-labelled 12-mer RNA was prepared to unmodified DNA and performed in-cell 19F NMR experiments (Fig. 10b). Two major peaks were identified at 63.09 ppm and 62.61 ppm, which are nearly identical to those observed in vitro. The 19F NMR signal at 62.61 ppm, which was assigned to the telomeric HQ conformation, was quite clearly observed. Thus, the in-cell 19F NMR experiment demonstrates that telomere HQ can exist as a stable structure in living human cells.

19

288

Y. Xu

After performing the in-cell 19F NMR measurement, the cell suspension solution was collected and examined by 19F NMR. Approximately no signal was observed from the supernatants of both RQ and HQ experiments (Fig. 10b), indicating that the NMR signals were obtained from structures within HeLa cells. A difference 19F NMR spectrum was produced between the supernatant and HeLa cells to eliminate the signal from the suspension solution (Fig. 10b). The clear signals in the difference spectrum support the conclusion that telomeric HQ structures can form in living human cells. Finally, to further verify that the 19F NMR chemical shifts of telomeric RQ and HQ are identical for in vitro and in-cell conditions, 19F NMR experiments were performed using 19F-labelled RNA and DNA-RNA hybrid structures in HeLa cell extracts (Fig. 10b), which were expected to approximate ex vivo conditions. The position of 19F NMR signals obtained from the HeLa cell extracts are identical to those observed in living HeLa cells and in vitro, indicating that telomeric RQ and HQ formation occurs in a cell-like environment.

Conclusion 19

F NMR could directly distinguish the parallel, antiparallel, hybrid-1, and hybrid-2 G-quadruplex structures and determine the thermodynamic properties of the different types of G-quadruplexes. Importantly, in-cell 19F NMR demonstrated, for the first time, that the telomeric DNA sequence forms two hybrid-type and two-tetrad antiparallel G-quadruplex structures in living human cells. This result provides valuable information for understanding the structures of human telomeric DNA in living human cells. Because there is no natural intracellular concentration of fluorine in cells, there is no background noise in in-cell 19F NMR spectra. Therefore, 19F NMR spectroscopy is an ideal tool for studying RNA G-quadruplex structures in living cells. Using 19F NMR spectroscopy, the two-subunit stacked G-quadruplex can be directly observed in living cells using 19F NMR spectroscopy. This study is the first direct observation of higher-order RNA G-quadruplex in a cellular environment and provides new insight into the structural behavior of telomere RNA G-quadruplexes in living cells. These results open new avenues for the investigation of G-quadruplex structures in vitro and in living cells. Using a 19F NMR approach, the telomeric HQ structure was successfully investigated and determined the thermodynamic properties of the HQ conformation. Furthermore, in-cell 19F NMR demonstrated that telomeric DNA and RNA can form a hybrid G-quadruplex in environmental conditions of HeLa cells. These results provide important information for understanding the HQ properties and provide new insight into the structural behavior of telomeric DNA-RNA hybrid G-quadruplex in living cells.

9

In Cell 19F NMR for G-Quadruplex

289

References Allen TD, Rutherford SA, Murray S et al (2007) A protocol for isolating Xenopus oocyte nuclear envelope for visualization and characterization by scanning electron microscopy (SEM) or transmission electron microscopy (TEM). Nat Protoc 2:1166–1172 Ambrus A, Chen D, Dai J et al (2006) Human telomeric sequence forms a hybrid-type Intramolecular G-quadruplex structure with mixed parallel/antiparallel strands in potassium solution. Nucleic Acids Res 34:2723–2735 Balk B, Maicher A, Dees M et al (2013) Telomeric RNA-DNA hybrids affect telomere-length dynamics and senescence. Nat Struct Mol Biol 20:1199–1205 Bao HL et al (2017) Characterization of human telomere RNA G-quadruplex structures in vitro and in living cells using 19F NMR spectroscopy. Nucleic Acids Res 45:5501–5511 Biffi G, Tannahill D, McCafferty J et al (2013) Quantitative visualization of DNA G-quadruplex structures in human cells. Nat Chem 5:182–186 Bochman ML, Paeschke K, Zakian VA (2012) DNA secondary structures: stability and function of G-quadruplex structures. Nat Rev Genet 13:770–780 Chen H, Viel S, Ziarelli F et al (2013) 19F NMR: a valuable tool for studying biological events. Chem Soc Rev 42:7971–7982 Collie GW, Parkinson GN (2011) The application of DNA and RNA G-quadruplexes to therapeutic medicines. Chem Soc Rev 40:5867–5892 Collie GW, Parkinson GN, Neidle S et al (2010) Electrospray mass spectrometry of telomeric RNA (TERRA) reveals the formation of stable multimeric G-quadruplex structures. J Am Chem Soc 132:9328–9334 Dai J, Carver M, Punchihewa C et al (2007) Structure of the hybrid-2 type intramolecular human telomeric G-quadruplex in K+ solution: insights into structure polymorphism of the human telomeric sequence. Nucleic Acids Res 35:4927–4940 Ding Y, Tang Y, Kwok CK et al (2014) In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505:696–700 Dzatko S, Krafcikova M, Hansel-Hertsch R et al (2018) Evaluation of the stability of DNA i-motifs in the nuclei of living mammalian cells. Angew Chem Int Ed 57:2165–2169 Haeusler AR, Donnelly CJ, Periz G et al (2014) C9orf72 nucleotide repeat structures initiate molecular cascades of disease. Nature 507:195–200 Halley-Stott RP, Pasqu V, Astrand C et al (2010) Mammalian nuclear transplantation to Germinal Vesicle stage Xenopus oocytes-a method for quantitative transcriptional reprogramming. Methods 51:56–65 Hansel R et al (2009) Evaluation of parameters critical for observing nucleic acids inside living Xenopus laevis oocytes by in-cell NMR spectroscopy. J Am Chem Soc 131:15761–15768 Hansel R, Lohr F, Foldynova-Trantirkova S et al (2011) The parallel G-quadruplex structure of vertebrate telomeric repeat sequences is not the preferred folding topology under physiological conditions Nucl Acids Res 39:5768–5775 Hansel R, Lohr F, Trantirek L et al (2013) High-resolution insight into G-overhang architecture. J Am Chem Soc 135:2816–2824 Hansel R, Luh LM, Corbeski I et al (2014) In-cell NMR and EPR spectroscopy of biomacromolecules. Angew Chem Int Ed 53:10300–10314 Hu XX, Wang SQ, Gan SQ et al (2021) A small ligand that selectively binds to the G-quadruplex at the human vascular endothelial growth factor internal ribosomal entry site and represses the translation. Front Chem 9:781198 Jain A, Vale RD (2017) RNA phase transitions in repeat expansion disorders. Nature 546:243–247 Kertesz M, Wan Y, Mazor E et al (2010) Genome-wide measurement of RNA secondary structure in yeast. Nature 467:103–107 Lim KW, Amrane S, Bouaziz S et al (2009) Structure of the human telomere in K+ solution: a stable basket-type G-quadruplex with only two G-tetrad layers. J Am Chem Soc 131:4301–4309

290

Y. Xu

Luu KN, Phan AT, Kuryavyi V et al (2006) Structure of the human telomere in K+ Solution: an intramolecular (3 + 1) G-quadruplex scaffold. J Am Chem Soc 128:9963–9970 Martadinata H, Phan AT (2009) Structure of propeller-type parallel-stranded RNA G-quadruplexes, formed by human telomeric RNA sequences in K+ solution. J Am Chem Soc 131:2570–2578 Martadinata H, Phan AT (2013) Structure of human telomeric RNA (TERRA): stacking of two G-quadruplex blocks in K+ solution. Biochemistry 52:2176–2183 Nakano S, Miyoshi D, Sugimoto N (2014) Effects of molecular crowding on the structures, interactions, and functions of nucleic acids. Chem Rev 114:2733–2758 Ogino S, Kubo S, Umemoto R et al (2009) Observation of NMR signals from proteins introduced into living mammalian cells by reversible membrane permeabilization using a pore-forming toxin. Streptolysin O J Am Chem Soc 131:10834–10835 Parkinson GN, Lee MP, Neidle S (2002) Crystal structure of parallel quadruplexes from human telomeric DNA. Nature 417:876–880 Phan AT, Patel DJ (2003) Two-repeat human telomeric d(TAGGGTTAGGGT) sequence forms interconverting parallel and antiparallel G-quadruplexes in solution: distinct topologies, thermodynamic properties, and folding/unfolding kinetics. J Am Chem Soc 125:15021–15027 Rhodes D, Lipps HJ (2015) G-quadruplexes and their regulatory roles in biology. Nucleic Acids Res 43:8627–8637 Sakai T et al (2006) In-cell NMR spectroscopy of proteins inside Xenopus laevis oocytes. J Biomol NMR 36:179–188 Sakakibara D et al (2009) Protein structure determination in living cells by in-cell NMR spectroscopy. Nature 458:102–105 Salgado GF, Cazenave C, Kerkour A et al (2015) G-quadruplex DNA and ligand interaction in living cells using NMR spectroscopy. Chem Sci 6:3314–3320 Selenko P, Wagner G (2007) Looking into live cells with in-cell NMR spectroscopy. J Struct Biol 158:244–253 Selenko P, Serber Z, Gadea B et al (2006) Quantitative NMR analysis of the protein G B1 domain in Xenopus laevis egg extracts and intact oocytes. Proc Natl Acad Sci USA 103:11904–11909 Serber Z et al (2006) Investigating macromolecules inside cultured and injected cells by in-cell NMR spectroscopy. Nat Protoc 1:2701–2709 Wan Y, Kertesz M, Spitale RC et al (2011) Understanding the transcriptome through RNA structure. Nat Rev Genet 12:641–655 Wang Y, Patel DJ (1993) Solution structure of the human telomeric repeat d[AG3(T2AG3)3] G-tetraplex. Structure 1:263–282 Wolfe AL, Singh K, Zhong Y et al (2014) RNA G-quadruplexes cause eIF4A-dependent oncogene translation in cancer. Nature 513:65–70 Xiao CD, Ishizuka T, Zhu XQ et al (2017) Unusual topological RNA architecture with an eightstranded helical fragment containing A-, G-, and U-Tetrads. J Am Chem Soc 139:2565–2568 Xu Y (2011) Chemistry in human telomere biology: structure, function and targeting of telomere DNA/RNA. Chem Soc Rev 40:2719–2740 Xu Y (2018) Recent progress in human telomere RNA structure and function. Bioorg Med Chem Lett 28:2577–2584 Xu Y, Noguchi Y, Sugiyama H (2006) The new models of the human telomere d[AGGG (TTAGGG)3] in K+ solution. Bioorg Med Chem 14:5584–5591 Xu Y, Kaminaga K, Komiyama M (2008) G-quadruplex formation by human telomeric repeatscontaining RNA in Na+ solution. J Am Chem Soc 130:11179–11184 Xu Y, Suzuki Y, Komiyama M (2009) Click chemistry for the identification of G-quadruplex structures: discovery of a DNA-RNA G-quadruplex. Angew Chem Int Ed 48:3281–3284 Xu Y, Ishizuka T, Kimura T et al (2010) A U-tetrad stabilizes human telomeric RNA G-quadruplexstructure. J Am Chem Soc 132:7231–7233 Xu Y, Suzuki Y, Ito K et al (2010) Telomeric repeat-containing RNA structure in living cells. Proc Natl Acad Sci USA 107:14579–14584

9

In Cell 19F NMR for G-Quadruplex

291

Xu Y, Ishizuka T, Yang J et al (2012) Oligonucleotide models of telomeric DNA and RNA form a Hybrid G-quadruplex structure as a potential. J Biol Chem 287:41787–41796 Yamaoki Y, Kiyoishi A, Miyake M et al (2018) The first successful observation of in-cell NMR signals of DNA and RNA in living human cells. Phys Chem Chem Phys 20:2982–2985 Ye Y, Liu X, Xu G et al (2015) Direct observation of Ca2+-induced calmodulin conformational transitions in intact Xenopus laevis oocytes by 19F NMR spectroscopy. Angew Chem Int Ed 54: 5328–5330 Zhang Z, Dai J, Veliath E et al (2010) Structure of a two-G-tetrad intramolecular G-quadruplex formed by a variant human telomeric sequence in K+ solution: insights into the interconversion of human telomeric G-quadruplex structures. Nucleic Acids Res 38:1009–1021 Zhang S, Sun H, Wang L et al (2018) Real-time monitoring of DNA G-quadruplexes in living cells with a small-molecule fluorescent probe. Nucleic Acids Res 46:7522–7532

Structures and Catalytic Activities of Complexes Between Heme and DNA

10

Yasuhiko Yamamoto and Atsuya Momotake

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G-quadruplex DNA and RNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Molecular Recognition Between Heme and G-Quadruplex DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spectroscopic Properties of a Heme(Fe3+)-DNA Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NMR Characterization of Heme-DNA Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CO Adducts of Heme(Fe2+)-DNA Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Resonance Raman Studies of CO Adducts of Heme(Fe2+)-DNA Complexes . . . . . . . . . . . . . . . . . pH-Dependence of a Heme(Fe3+)-DNA Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Imidazole Adducts of Heme(Fe3+)-DNA Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peroxidase Activity of Heme(Fe3+)-DNA Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peroxidation Cycle of Heme(Fe3+)-DNA Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

294 296 297 299 302 303 311 314 319 322 325 327 328

Abstract

Both heme and G-quadruplex DNA are ubiquitous in living systems and play a variety of vital roles in cellular functions. Hence elucidation of the interaction between them at the atomic level is expected to provide valuable information for revealing the molecular mechanism responsible for the regulation of diverse biological processes through their interaction. Heme binds selectively to the 30 -terminal G-quartet of a parallel G-quadruplex DNA to form a stable complex, which exhibits not only peroxidase activity, but also various spectroscopic and functional properties remarkably similar to those of hemoproteins such as myoglobin. Mechanistic studies on the peroxidation cycle of the complex indicated that its catalytic cycle involves the iron(IV)oxo porphyrin π-cation radical intermediate known as compound I formed through heterolytic O-O bond cleavage of an Fe3+-bound hydroperoxo ligand (Fe3+-OOH) in compound 0, like those of Y. Yamamoto (*) · A. Momotake Department of Chemistry, University of Tsukuba, Tsukuba, Japan e-mail: [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_12

293

294

Y. Yamamoto and A. Momotake

peroxidases such as horseradish peroxidase (HRP), and that the formation of compound I in the complex is promoted by mechanisms that are reminiscent of the “push” and “pull” mechanisms in the catalytic cycle of HRP. These findings allow not only a deeper understanding of the functional properties of heme bound to a G-quartet, but also an insight as to control the heme reactivity of the complex. In addition, since heme is believed to be an ancient compound, the catalytic activities of complexes between heme and G-quadruplex nucleic acids could possibly help us to conceptualize redox-catalyzing ribozymes in a primordial “RNA world.”

Introduction Understanding the interactions between small compounds and DNA at the atomic level is important for designing drugs capable of recognizing specific DNA sequences and structural motifs. Various compounds have been shown to interact with DNA in a variety of manners, such as intercalation and groove binding, in addition to nonspecific binding on the DNA exterior. G-quadruplex DNAs found in telomeres play crucial biological roles and hence are considered to be potential therapeutic targets for drugs that interfere with processes such as telomere maintenance and function. G-quadruplex DNAs are stabilized by a unique structural motif, known as the G-quartet, which involves the planar association of four guanine (G) bases held together by eight Hoogsteen hydrogen bonds in the presence of K+ or Na+ (Fig. 1A) (Sen and Gilbert 1988). Various π-conjugated macrocyclic compounds have been shown to exhibit specific binding to the G-quartet through a π-π stacking interaction. The size and planarity of the porphyrin moiety of iron(II) (or iron(III))-protoporphyrin IX complex (heme(Fe2+) or heme(Fe3+) (Fig. 1B) are well suited for interaction with a G-quartet through π-π stacking. Heme is a ubiquitous and abundant cellular cofactor responsible for a variety of metabolic functions. Heme is also involved in various catabolic and regulatory processes in living systems such as circadian pathways (Kaasik and Lee 2004), ion channels (Burton et al. 2016), transcription (Nam et al. 2020), and translation (Nishitani et al. 2019). Sen and coworkers (Travascio et al. 1998; Sen and Geyer 1998; Sen and Poon 2011; Poon et al. 2011) demonstrated that heme(Fe3+) binds to G-quadruplex DNAs to form complexes that exhibit peroxidase and peroxygenase activities. This finding triggered exploration of the biological versatility of heme as the prosthetic group of deoxyribozymes (DNAzymes). Likewise, interaction between heme and G-quadruplex DNAs has been investigated extensively to elucidate the structure-function relationship of DNAzymes possessing heme (heme-DNAzymes) and also to improve the catalytic activity of a hemeDNAzyme. We found that heme and a parallel G-quadruplex DNA of a single repeat sequence of the human telomere, (d(TTAGGG))4 (Fig. 1C), form a stable complex (Fig. 1E), which exhibits not only peroxidase activity (Shinomiya et al. 2018) but also various spectroscopic and functional properties remarkably similar to those of

10

Structures and Catalytic Activities of Complexes Between Heme and DNA

5’

5’

3’

T1 T2 A3 G4 G5 G6 Heme 3’

T1 T2 A3 G4 G5 G6

A

B

C

B’

D

B”

295

E

B’’’

B’’’’

Fig. 1 Molecular structures of the G-quartet (A) and heme(Fe3+) (BB””), i.e., Proto(Fe3+) (B), Meso(Fe3+) (B’), 3,8-DMD(Fe3+) (B”), 7-PF(Fe3+) (B”’), and 2,8-DPF(Fe3+) (B””), and schematic illustration of a parallel G-quadruplex DNA, (d(TTAGGG))4 (C), a dimer of (d(TTAGGG))4 (D), and a complex between heme and (d(TTAGGG))4 (E). In (C), G-quartets are stacked on top of each other through hydrophobic π–π stacking interactions and cation-dipole interactions of monovalent cations (yellow spheres), such as K+ and Na+, sandwiched between the G-quartet planes, the carbonyl groups of the two adjacent quartets contributing to stabilization of the G-quartet stacking. In (E), heme, represented by a red disk, binds selectively to the 30 -terminal G-quartet of the DNA

hemoproteins such as myoglobin (Mb) (Mikuma et al. 2003, Ohyama et al. 2006, Saito et al. 2012a, Suzuki et al. 2014, Yamamoto et al. 2015), implying that the complex possesses the structural essentials of heme-DNAzymes that mimic hemoproteins in function. Therefore, the heme-(d(TTAGGG))4 complex could be considered as an excellent model for elucidating the interaction between the heme and G-quartet in a heme-DNAzyme. In addition, since heme is believed to be an ancient compound (Yarus 2002), expansion of the functional diversity of ribozymes through incorporation of heme into RNAs is expected to provide useful information for evaluating the RNA world hypothesis (Gilbert 1986). Furthermore, although a variety of heme-DNAzymes have been studied in vitro, Gray et al. (2019) demonstrated that the molecular partnership between heme and G-quadruplex DNA is not confined to the test tube but extends to living cells. It was found that G-quadruplex DNA sequesters labile heme in living cells to protect cells from oxidative damage caused by the heme. G-quadruplex DNAs are found in transcriptionally active euchromatin regions and critical regulatory ones of the genome such as oncogene promoters. In fact, G-quadruplex DNA is formed within nuclease hypersensitive element III1, located upstream of the c-MYC gene’s P1

296

Y. Yamamoto and A. Momotake

promoter that regulates the majority of its transcription (González and Hurley 2010). Thus, G-quadruplex DNA is ubiquitous in living systems and plays a variety of crucial roles in the genome. Hence, elucidation of the interaction between heme and G-quadruplex DNA at the atomic level is expected to provide valuable information for revealing the molecular mechanism responsible for the regulation of diverse biological processes through the heme-DNA interaction (Canesin et al. 2020).

G-quadruplex DNA and RNA G-rich DNA and RNA sequences have a propensity to form G-quadruplexes. In G-quadruplex DNAs, G-quartets are stacked on top of each other through hydrophobic π–π stacking interactions (Fig. 1C). In addition, monovalent cations, such as K+ and Na+, sandwiched between the G-quartet planes undergo cation-dipole interactions with the carbonyl groups of the two adjacent quartets, which contribute to stabilization of the G-quartet stacking (Fig. 1C). G-quadruplex DNA can adopt several morphologies that include parallel or antiparallel configurations through either intramolecular or intermolecular organization (Fig. 2); four polynucleotide strands are oriented in the same direction in a parallel G-quadruplex, and two strands and the other two are oriented in an alternative way in an antiparallel one (Brooks et al. 2010). In contrast, most G-quadruplex RNAs adopt a parallel configuration because the anti-conformation of the base, with respect to the sugar moiety, is stabilized relative to the syn-conformation due to the C30 -endo sugar pucker that is favored by the steric constraints imposed by 20 -OH groups (Guiset Miserachs et al. 2016). A single repeat sequence of the human telomere, d(TTAGGG), forms a parallel G-quadruplex DNA, (d(TTAGGG))4, in the presence of low K+ concentrations (Fig. 1C) (Wang and Patel 1992), and (d(TTAGGG))4 dimerizes in the presence of high K+ concentrations, through end-to-end stacking of the 30 -terminal G-quartets (G6 G-quartets) (Fig. 1D) (Kato et al. (2005)). The dimer of (d(TTAGGG))4 is

5’

5’

T1 A2 G3 G4

G8 G9

G5

G10

5’

5’ A2

G12

G3

G13

G16

G4

I17

G5

G18

T11

3’

A

A’

3’

T1 G8

G14

G9

G4

G10

G3 A2

3’

T1

3’

B

G5

C

T11

3’

5’

D

Fig. 2 Schematic illustration of parallel G-quadruplex DNAs, (d(TTAGGGT))4 (A), (d(TTAGGGA))4 (A’), (d(TAGGGTTAGGGT))2 (B), and d(TAGGGTGGGTTGGGTGIG) (C), and an antiparallel one, (d(TAGGGTTAGGGT))2 (D). The bases of T6, T7, and A8 in (B) and (D) and those of T6, T10, T11, and T15 in (C) are omitted for clarity

10

Structures and Catalytic Activities of Complexes Between Heme and DNA

297

stabilized through the intermolecular π–π stacking between the G6 G-quartets and the interaction of K+ at the interface of the dimer with the electric dipoles of the total eight carbonyl groups of the two adjacent G-quartets. On the other hand, parallel G-quadruplex DNAs formed from d(TTAGGGT) and d(TTAGGGA) ((d(TTAGGGT))4 and (d(TTAGGGA))4, respectively (Figs. 2A and A’)), do not dimerize even if the K+ concentration is raised, because an extra base, other than a guanine, attached to the 30 -terminal of the sequence prevents the dimerization.

Molecular Recognition Between Heme and G-Quadruplex DNA The complexation of G-quadruplex DNAs with porphyrin derivatives or their metal complexes has been studied extensively to characterize their molecular recognition of each other. Heme(Fe3+) and heme(Fe2+) bind selectively to the 30 -terminal G-quartet of (d(TTAGGG))4 though π-π stacking interactions between the porphyrin moiety of the heme and the G-quartets to form a stable heme-(d(TTAGGG))4 complex (Fig. 1E) (Mikuma et al. 2003; Saito et al. 2012a; 2012b; Suzuki et al. 2014). Similarly, heme binds selectively to the 30 -terminal G-quartets (G6 G-quartets) of (d(TTAGGGT))4 and (d(TTAGGGA))4 to form the heme-(d(TTAGGGT))4 and heme-(d(TTAGGGA))4 complexes, respectively (Figs. 3A and A’) (Shimizu et al. 2015; Okamoto et al. 2021). A comparative study between the heme-(d (TTAGGG))4 and heme-(d(TTGAGGG))4 complexes indicated that a π-π stacking interaction between the porphyrin moiety of the heme and the 30 -terminal G-quartet of the DNA is affected by the nature of the stacked G-quartets (Shibata et al. 2017). By the way, upon formation of the heme-DNA complex, the pseudo-C2 symmetric heme stacks onto the C4 symmetric G-quartet in two different orientations differing by 180 rotation about the meso 5-H  15-H axis, with respect to the DNA, in the complex, i.e., the obverse and reverse heme forms of the heme-DNA complex (Fig. 4) (Saito et al. 2012a; Suzuki et al. 2014; Shimizu et al. 2015). In the case of 5’

5’

5’

5’

3’

A

A’

5’

3’

3’

3’

5’

B

3’

C

3’

D

Fig. 3 Schematic illustration of complexes between heme and parallel G-quadruplex DNAs, heme-(d(TTAGGGT))4 (A), heme-(d(TTAGGGA))4 (A’), heme-(d(TAGGGTTAGGGT))2 (B), heme-d(TAGGGTGGGTTGGGTGIG) (C), and heme-d(TAGGGTGGGTTGGGTGIGA) complexes (D). In (A)–(D), heme is represented by a red disk

298 Fig. 4 Two different orientations of Proto(Fe3+). These orientations can be interconverted through 180 rotation of Proto(Fe3+) around the pseudo-C2 axis passing through the meso 5-H and 15-H hydrogens

Y. Yamamoto and A. Momotake

5

5 10

20 15

180º rotation

20

10 15

180º rotation

b-type hemoproteins, such heterogeneity of the heme orientation, with respect to the protein, is well-known as “heme orientational disorder” (La Mar et al. 1978). The orientation of the porphyrin moiety of the heme(Fe3+) relative to the G6 G-quartet depends upon which side of the heme(Fe3+) plane, i.e., either the obverse or reverse heme(Fe3+) orientation, interacts with the G6 G-quartet, because the steric contacts between heme vinyl side chains and DNA G6 sugars could be slightly altered on the 180 rotation of the pseudo-C2 symmetric heme around the meso 5-H  15-H axis. In addition, considering the contacts between the heme and the G-quartet in the heme-DNA complex, the thermodynamic stabilities of the obverse and reverse heme forms are almost identical. Consequently, the heme orientational disorder in the heme-DNA complex is manifested in the observation of two sets of comparably intense 1H NMR signals due to heme bound to a G-quartet (see below). G-quadruplex DNA formed from a two-repeat human telomeric sequence, d(TAGGGTTAGGGT), provides a unique opportunity to obtain clues for elucidating the molecular recognition of G-quadruplex DNA by heme, because d(TAGGGTTAGGGT) exists predominantly as parallel and antiparallel G-quadruplexes in the presence of K+ and Na+, respectively (Figs. 2B and D, respectively) (Phan and Patel 2003). As in the case of not only the heme-(d(TTAGGG))4 complex, but also the heme-(d(TTAGGGT))4 and heme-(d(TTAGGGA))4 ones, heme binds selectively to the 30 -terminal G-quartet of a parallel G-quadruplex DNA of d(TAGGGTTAGGGT) (Fig. 3B) (Okamoto et al. 2021). On the other hand, binding of heme to an antiparallel G-quadruplex of d(TAGGGTTAGGGT) (Fig. 2D) does not occur (Okamoto et al. 2021). The selective binding of heme to the 30 -terminal G-quartet also holds for a parallel G-quadruplex DNA formed from nonstandard base inosine(I)containing sequence d(TAGGGTGGGTTGGGTGIG) (Fig. 3C), indicating that the preferential binding of heme to the 30 -terminal G-quartet of parallel G-quadruplex DNAs is a general feature of the molecular recognition between them (Yamamoto et al. 2018). The preferential binding of heme to the 30 -terminal G-quartets of parallel G-quadruplex DNAs could be explained in terms of the orderly arrangement of the constituent guanine deoxyribose rings, with respect to the G-quartet plane (Fig. 5) (Okamoto et al. 2021). In the G-quartets of a parallel G-quadruplex, the guanine

10

Structures and Catalytic Activities of Complexes Between Heme and DNA

5’-terminal side

3’-terminal side

A

299

Positive

Negative

5’-terminal side

3’-terminal side

B

Fig. 5 Conformation of d(GGG) in a parallel G-quadruplex DNA (A), and electrostatic potential maps of the 50 -terminal (left) and 30 -terminal sides (right) of a G-quartet in a parallel G-quadruplex DNA (B). In (A), guanine deoxyribose moieties are all oriented with their ring oxygen atoms (Orings), indicated by downward arrows, on the 50 -terminal side of the G-quartet plane. Due to the orderly arrangement of electron-rich Orings, with respect to the G-quartet plane, in a parallel G-quadruplex, the electrostatic potential of the 50 -terminal surface is highly negative, as indicated in red, and that of the 30 -terminal surface is relatively positive, as indicated in blue (B)

deoxyribose moieties are all oriented with their ring oxygen atoms (Orings) on the 50 -terminal side of the G-quartet plane (Fig. 5A) (Okamoto et al. 2021), and hence the negative electrostatic potential due to the electron-rich Orings interacts unfavorably with negative charges of heme propionate groups, leading to the inferiority of the 50 -terminal G-quartet to the 30 -terminal one as an end-stacking site for heme (Fig. 5B). On the other hand, in the outer G-quartets of an antiparallel G-quadruplex DNA, the orientation of a pair of diagonally situated Orings, with respect to the G-quadruplex plane, is opposite to that of the other pair, and hence the negative electrostatic potential due to two electronrich Orings on the outer G-quartets of the antiparallel G-quadruplex DNA possibly hampers the binding of heme.

Spectroscopic Properties of a Heme(Fe3+)-DNA Complex The UV-vis absorption spectra of native heme(Fe3+), i.e., Proto(Fe3+), at pH 6.80 in the presence of various stoichiometric ratios of (d(TTAGGG))4 showed that, upon addition of the DNA, the Soret band of heme(Fe3+) exhibited a red-shift from 395 nm to 403 nm associated with ~210% hyperchromism, and isosbestic points at 378, 419, and 490 nm (Fig. 6) (Shinomiya et al. 2018). The absorption spectrum of the heme(Fe3+)-DNA complex looks very much like that of aquometmyoglobin (metMb), i.e., Mb possessing a hexa-coordinated high-spin heme(Fe3+) with a histidyl imidazole and a water molecule as axial ligands, suggesting that the heme (Fe3+) environments in the heme(Fe3+)-DNA complex and metMb are alike. The heme-binding constant (Ka) of 6.5  0.7 μM1 was obtained for the Proto(Fe3+)(d(TTAGGG))4 complex through analysis of the absorption change (Hagiwara et al. 2021). The Ka for the heme(Fe3+)-DNA complex is affected by the heme chemical-

300

Y. Yamamoto and A. Momotake

0.20 Abs. 403.0 nm

Abs.

0.2

[DNA] 0.1

0.10

0 300

400

500

0

1

Wavelength/nm

2 [DNA]/ M

3

4

Fig. 6 Soret absorption, 300–500 nm, of (d(TTAGGG))4, 0–6 μM, titrated against 2.0 μM Proto (Fe3+) in 300 mM KCl and 50 mM potassium phosphate buffer, pH 6.80, together with 0.08 w/v% Triton X-100 and 0.5 v/v% dimethyl sulfoxide, at 25  C (Hagiwara et al. 2021). The heme-binding constant (Ka) of 6.5  0.7 μM1 was obtained for the complex

B

A 100

80

60

40 Chemical shift/ppm

20

0

-20

Fig. 7 400 MHz 1H NMR spectra of the Proto(Fe3+)-(d(TTAGGG))4 complex in 100% 2H2O, 200 mM KCl and 100 mM potassium phosphate buffer, pH 7.00 (A), and metMb in 100% 2H2O, pH 7.40 (B) (Mikuma et al. 2003). The spectra were recorded at room temperature

modification and DNA sequence (see below) (Shinomiya et al. 2018). The circular dichroism (CD) spectrum of the heme(Fe3+)-DNA complex exhibited a positive maximum at ~260 nm (Saito et al. 2012b), which is characteristic of a parallel G-quadruplex DNA (Miyoshi et al. 2001), and induced negative intensity at ~400 nm, which is indicative of a π-π stacking interaction between the porphyrin ring of heme(Fe3+) and the base pairs of DNA (Pasternack et al. 1983). The 400 MHz 1H NMR spectrum of the Proto(Fe3+)-(d(TTAGGG))4 complex exhibited well-resolved paramagnetically shifted signals arising from Proto(Fe3+) side chain protons (Fig. 7A) (Mikuma et al. 2003). The specific complexation between Proto(Fe3+) and the DNA was manifested in the appearance of clear paramagnetically shifted signals, which reflected a unique Proto(Fe3+) electronic structure in the

10

Structures and Catalytic Activities of Complexes Between Heme and DNA

301

Fig. 8 EPR spectra of the Proto(Fe3+)-(d(TTAGGG))4 (A) and Proto(Fe3+)-(d(TTAGGGA))4 complexes (B) at 5 K. Spectra recorded for samples prepared in 50 mM potassium phosphate buffer, pH 6.80, with addition of a 150-fold stoichiometric excess of H2O2 to the Proto(Fe3+)(d(TTAGGG))4 complex in the absence (A’) and presence of the substrate, Amplex Red (A”), and one recorded for the Proto(Fe3+)-(d(TTAGGGA))4 complex after addition of a 150-fold stoichiometric excess of H2O2 (B’) (Shinomiya et al. 2018). In (A’) and (B’), the Y-gains of signals near g ¼ 2 are expanded by a factor of ~2.5 and convex peaks at g ¼ ~2 due to compound I are indicated by downward-pointing arrows

complex. The relatively large peaks at 60–80 ppm were reasonably assignable to Proto (Fe3+) side chain methyl protons. Furthermore, the low-temperature EPR spectrum of the Proto(Fe3+)-DNA complex exhibited signals at g ¼ ~2 and ~6 (Fig. 8A), which are

302

Y. Yamamoto and A. Momotake

characteristic of high-spin Proto(Fe3+) with the spin quantum number S ¼ 5/2 (IkedaSaito et al. 1992), and was similar to that of metMb.

NMR Characterization of Heme-DNA Complexes NMR is a powerful tool for elucidating the interaction between heme and DNA in a heme-DNA complex at the atomic level. The heme Fe atom in a heme-DNA complex is generally either in the ferrous (Fe2+) or ferric (Fe3+) state. The numbers of electrons in the 3d orbitals of Fe2+ and Fe3+ are six and five, respectively. Hence, the spin quantum number S is the integer and half-integer for Fe2+ and Fe3+, respectively. Depending upon the degree of spin pairing of electrons in the 3d orbitals, Fe2+ can have 4, 2, or 0 unpaired electrons, corresponding to S ¼ 2, 1, or 0, respectively, and Fe3+ can have 5, 3, or 1 unpaired electron, S ¼ 5/2, 3/2, or 1/2, respectively. Based on an octahedral ligand field, the energy levels of the five 3d orbitals are split into two groups in such a way that the levels of the dz2 and dx2 -y2 orbitals are higher than those of the other three, dxy, dyz, and dxz. The spin state of the heme Fe atom depends on the chemical nature of the ligands coordinated to the heme Fe atom as axial ligands. For example, in the case of Mb, heme(Fe2+) in the deoxy form is penta-coordinated with a high-spin configuration, S ¼ 2, and the oxy form or carbonmonoxy form (MbCO) possesses a low-spin configuration, S ¼ 0. The binding of ligands of relatively weak field strength such as a water molecule (H2O) to heme(Fe3+) gives high-spin state S ¼ 5/2, low-spin state S ¼ 1/2 being obtained with a strong ligand such as cyanide ion (CN). In addition, the binding of ligands of intermediate field strength such as imidazole (Im) and hydroxide ion (OH) causes a thermal equilibrium between the S ¼ 1/2 and S ¼ 5/2 states. Complexes with S ¼ 1 and 3/2 are not so common but have been obtained in some particular systems such as heme(Fe4+)-peroxo species (S ¼ 1) and heme (Fe3+) species coupled with the superoxide ion (O2) (S ¼ 3/2) (Hong et al. 2014). Thus, the heme Fe atom can exhibit a variety of oxidation, ligation, and spin states, and hence NMR characterization of heme complexes is greatly affected by the electronic nature of the heme Fe atom. The gyromagnetic ratio of an electron is ~660 times greater than that of 1H. The large magnetic moment due to the unpaired electron(s) on the heme Fe atom has a great influence on the NMR spectra of paramagnetic heme-DNA complexes. In the case of paramagnetic heme-DNA complexes, taking advantage of the properties of unpaired electron(s), resonances arising from 1H located in close proximity to the heme Fe atom exhibit paramagnetic shifts and hence appear outside of the diamagnetic envelope, i.e., 0–10 ppm, where signals often severely overlap. Paramagnetically shifted signals are extremely sensitive to the structural properties of molecules. The basic concepts of the techniques useful for NMR characterization of paramagnetic compounds, together with the theoretical framework for interpreting NMR parameters, had been well-established (Bertini et al. 2016). One of the major drawbacks in the NMR study of paramagnetic compounds is fast nuclear relaxation.

10

Structures and Catalytic Activities of Complexes Between Heme and DNA

303

Paramagnetic-induced relaxation substantially hinders the development of the connectivities that should be observed if the compounds are diamagnetic.

CO Adducts of Heme(Fe2+)-DNA Complexes Elucidation of the interaction between Proto(Fe3+) (or Proto(Fe2+)) and the G-quartet is of particular importance for controlling and enhancing the heme Fe reactivity within the scaffold of the G-quadruplex structure. The fourfold rotationally symmetric structure of (d(TTAGGG))4 is stabilized through the stacking of three consecutive G-quartets, i.e., the G4, G5, and G6 G-quartets, and, due to its structural stability and simplicity, a G-quadruplex DNA is suitable for detailed characterization of the interactions between heme and the G-quartet using NMR. In the 1H NMR spectrum of (d(TTAGGG))4, the formation of the parallel G-quadruplex DNA is clearly manifested in the appearance of G4  G6 exchangeable imino proton (G4  G6 NH) signals in a downfield-shifted region, 10–12 ppm, characteristic of G-quartets (Fig. 9A) (Wang and Patel 1992). Assignments of G4  G6 NH signals of (d(TTAGGG))4 can be readily made through the observation of nuclear

C’

D G5 NH

G4 NH

Heme meso

G6 NH

C B’ B

*

A’’G4

M

G5M

*

*

G5D

G6D

G6M

G4D

A’ A 10

8

6 Chemical shift/ppm

4

2

Fig. 9 600 MHz 1H NMR spectrum of (d(TTAGGG)4) in 90% 1H2O/10% 2H2O (A), and downfield portions, 8.33–11.70 ppm, of the spectra of (d(TTAGGG)4) in 90% 1H2O/10% 2H2O (A’ and A”), a CO adduct of a Proto(Fe2+)-(d(TTAGGG)4) complex in 90% 1H2O/10% 2H2O (B) and in 100% 2H2O (B’), and a CO adduct of a 3,8-DMD(Fe2+)-(d(TTAGGG)4) complex in 90% 1 H2O/10% 2H2O (C) and in 100% 2H2O (C’) (Saito et al. 2012a). Sample A” was prepared in 50 mM KCl and 50 mM potassium phosphate buffer, pH 7.00, and the others in 300 mM KCl and 50 mM potassium phosphate buffer, pH 7.00. GnD and GnM, where n ¼ 4, 5, or 6 represents the imino proton signals of the dimer (Fig. 1D) and monomer of (d(TTAGGG)4), respectively. Peaks denoted by * in (B) are due to the (d(TTAGGG)4) dimer remaining in the sample. A portion of the NOESY spectrum of sample B (D). A mixing time of 150 ms was used to record the spectrum. NOE connectivities used for signal assignments are indicated in the spectrum. All the spectra were recorded at 25  C

304

Y. Yamamoto and A. Momotake

Fig. 10 Portions of the NOESY spectrum of (d(TTAGGG))4 in 90% 1H2O/ 10% 2H2O, 300 mM KCl, and 50 mM potassium phosphate buffer, pH 7.00, at 25  C. A mixing time of 150 ms was used to record the spectrum. NOE connectivities used for signal assignments are indicated in the spectrum (Saito et al. 2012a)

Overhauser effect (NOE) connectivities, i.e., A3 H8  G4 NH, G4 NH  G5 NH, and G5 NH  G6 NH ones (Fig. 10). In the spectrum of a diamagnetic CO adduct of the Proto(Fe2+)-(d(TTAGGG))4 complex, G4  G6 NH signals are upfield-shifted, relative to the corresponding ones of free (d(TTAGGG))4, mainly due to the ring current of the heme porphyrin moiety (Fig. 9B) (Saito et al. 2012a). In addition, the G6 NH signal of the complex was observed as ~1:1 doublet peaks (Figs. 9B and D), due to the heme orientational disorder (Fig. 4). The NH signals of the complex are ranked as G4 NH < G5

10

Structures and Catalytic Activities of Complexes Between Heme and DNA

3 3

Chemical shift/ppm

6.0

t

8

c

8

305

c

t

G6 H1’ 7.0

8 3 8.0

G6 H8

Chemical shift/ppm

Fig. 11 A portion of the NOESY spectrum of the CO adduct of the Proto(Fe2+)-(d(TTAGGG)4) complex in 100% 2H2O, 300 mM KCl, and 50 mM potassium phosphate buffer, pH 7.00, at 25  C (Saito et al. 2012a). A mixing time of 150 ms was used to record the spectrum. Intermolecular NOE connectivities between the heme vinyl (3α, 3βc, 3βt, 8α, 8βc, and 8βt, where subscripts c and t represent cis and trans protons, respectively) and DNA G6 (H1’ and H8) protons are highlighted by broken line circles

NH < G6 NH, in order of increasing heme porphyrin ring current-induced shift change (ΔδRC), i.e., the values are 0.37, 0.87, and 2.54 (or 2.49) ppm for the G4 NH, G5 NH, and G6 NH signals, respectively. These results clearly indicated that the heme binds selectively to the 30 -terminal G6 G-quartet of the DNA in the complex (Fig. 1E), and that the timescale for the heme binding to the DNA is 300 h in rat serum

2

8 h in rat serum

3



2

63 h in human plasma

2

2. 30 -inverted dT 3. 20 -O-methyl with a single phosphorothioate linkage 4. 20 kDa PEG 1. 26-nt DNA – 2. unmodified 1. 32-nt RNA >72 h in human serum 2. 20 -3O-methylpurine 3. 30 -inverted dT 4. 40 kDa PEG 1. 26-nt DNA – 2. G-rich DNA 3. PEGylated

2 1

2

(continued)

23

Nucleic Acid Aptamers: From Basic Research to Clinical Applications

757

Table 1 (continued) Drug name NOX-A12

Target C-X-C motif chemokine 12 (CXCL12)

NOX-E36

Chemokine (C–C motif) ligand 2 (CCL2)

NOX-H94

Hepcidin peptide hormone

Modifications 1. 26-nt DNA (Spiegelmer) 2. G-rich DNA 3. PEGylated 1. 40-nt RNA (Spiegelmer) 2. L-ribonucleic acid 3. PEGylated 1. 44-nt RNA (Spiegelmer) 2. L-ribonucleic acid 3. PEGylated

Circulation half-life >60 h in human serum

Phase 2

>60 h in human serum

2

>60 h in human serum

2

The target, modifications, half-life, and clinical phase are shown. Aptamers whose names are indicated in bold are further discussed in the text (adapted from [Ni et al. 2021], for more details see references within) h hours, nt nucleotides, PEG polyethylene glycol, dT dideoxythymidine

Locked nucleic acids (LNAs) are also regarded as XNAs because LNAs possess a methylene bridge between the 20 -oxygen and 40 -carbon of the ribose sugar, which defines them as a distinct sugar moiety (Chaput and Herdewijn 2019) (Fig. 5b). Despite chemical and conformational differences, XNAs maintain the ability to keep secondary structures and store information similarly to their DNA and RNA counterparts (Chaput 2021). 20 deoxy-20 -fluoro involves the replacement of the 20 -OH group with an F atom. Due to the fact that 20 deoxy-20 -fluoro substitutions are favored over 20 NH2 (Yang et al. 2007) and to date NH2 modifications are not applied on aptamers which are currently studied in clinical trials (see Table 1), we will not discuss NH2 modifications in this chapter. The most prominent 46 nucleotide 20 -fluoro-modified aptamer was identified by Ruckman et al.. They enriched a 20 -fluoro-modified aptamer directed toward the isoform 165 of the vascular endothelial growth factor (VEGF165) (Ruckman et al. 1998), with an affinity in the two-digit picomolar range. After several post-selective, chemical modifications like 20 -OMe-substitution of purines, addition of a 50 -40 kD polyethylene glycol entity, and an inverted thymidine residue at the 30 -terminus, this aptamer was dubbed Macugen®. This aptamer was the first to be approved by the FDA in 2004 as an antiangiogenic drug for the treatment of the wet form of age-related macular degeneration (AMD). Macugen® is injected intravitreally into the eye of AMD patients. The half-life due to the modifications is of 18 h (see Table 1). A 20 -OMe modification refers to the methylation of the 20 -OH group yielding a methoxy entity (Fig. 5). 20 -OMe modifications are compatible with SELEX approaches (Burmeister et al. 2005; Liu et al. 2017) but are mainly implemented

758

D.-M. Otte et al.

Fig. 5 Examples of chemical structures of modified nucleotides utilized to improve nuclease stability. (a) 20 -modifications of the ribose. (b) Nonribose backbones (XNAs). All modifications are indicated in blue

after the selection process (Green et al. 1995; Ruckman et al. 1998). The aim of 20 -OMe modifications is to protect aptamers from a nucleolytic environment. For instance, Burmeister et al. identified an aptamer binding to the vascular endothelial growth factor (VEGF) using 20 -OMe pyrimidine nucleotides–modified RNA library for the SELEX. The affinity of the selected aptamer (dubbed ARC245) is in the low nanomolar range, and it inhibits binding of VEGF to its receptor expressed on HEK293 cells. ARC245 was stable in plasma for 96 h (Burmeister et al. 2005). Due to these features, aptamers currently in clinical trials are often modified with a 20 -OMe group (see Table 1). For instance, the aptamer dubbed Zimura ® (ARC1905), which is currently in phase 3 of the GATHER1 clinical trial (ClinicalTrials.gov Identifier: NCT02686658), was raised from the complement component 5 (C5), which is a component of the complement system, which in turn is a part of the innate immune system (Manthey et al. 2009). The inhibition of C5 activation prevents the inflammatory-mediated tissue injury associated with AMD (Baas et al. 2010). Zimura ® is a 38-nt RNA aptamer and in its final version it displays a KD-value of 2–5 nM and besides 20 -methoxypurines it contains a 20 -fluoropyrimidines, a 40 kDa-PEG and a 30 -idT cap (see Table 1), whereby idT refers to inverted dideoxythymidine. A 30 -inverted dT residue is a common modification of an oligonucleotide at the 30 -end, which increases its resistance to serum 30 -exonucleases (Ni et al. 2021) by 19- to 50-fold (summarized in [Kratschmer and Levy 2017]). In all these examples, the oligonucleotides tested were predicted to be unpaired at the 30 end. This suggests that the 30 -inverted dT mainly affects the stability of oligonucleotides with unpaired ends. Paired 30 -inverted dT seems to

23

Nucleic Acid Aptamers: From Basic Research to Clinical Applications

759

confer a relatively modest enhancement in stability (two- to threefold) (Kratschmer and Levy 2017). Interestingly, approximately half of the aptamers currently studied in the clinic trails, including Zimura ®, contain 30 -inverted dT (see Table 1). There has been a growing interest in xenobiotic nucleic acids (XNA) for the use in aptamer research, as these polymers are not recognized and degraded by nucleases, thus XNAs have a potential for in vivo applications (Herdewijn and Marliere 2009). Several in vitro SELEX approaches using LNA-, FANA-, TNA-, and HNA-modified aptamers have successfully been conducted. For example, LNAs attracted more and more attention in the field of nucleic acid drug discovery (LNA, Fig. 5b) (Kuwahara and Obika 2013). LNA modifications influence binding interactions, structure, and stability of oligonucleotides in which they are incorporated. Additionally, the incorporation of LNA nucleotides into DNA or RNA stem structures increases thermal stability and decreases susceptibility to nucleolytic degradation and, thus, enables a longer half-life in body fluids (Campbell and Wengel 2011). In this regard, Wahlestedt et al. demonstrated that three consecutive LNAs at the 30 and the 50 terminus of a chimeric LNA/DNA oligonucleotide leads to an increase of its half-life from 1.5 (without LNAs) to 15 h (with LNAs) in human serum. Interestingly, further addition of LNA did not show any significant changes in its half-life (Wahlestedt et al. 2000). Most of the reported LNA in aptamers are done by post-SELEX modifications, where LNA nucleotides usually are placed at the terminal ends of nucleic acids or in putative stem regions. In relation to this, a Tenascin-C binding aptamer was modified with LNA nucleotides, through which the aptamer gained an increase in serum stability and maintained binding to Tenascin-C (Schmidt et al. 2004). As mentioned above, the incorporation of LNAs at distinct positions lead to a significant increase of the melting temperature of the aptamer, from 37–74  C. However, positioning of the LNA nucleotide in an aptamer must be done carefully as it can negatively affect target binding properties (Schmidt et al. 2004; Forster et al. 2012). This was demonstrated using an aptamer that binds to ricin and which was modified with LNA nucleotides in different positions. LNA substitutions were tolerated very well when introduced into the stem regions, whereas the exchange of nucleotides in loop or bulged regions resulted in a substantial decrease of affinity (Forster et al. 2012). LNA-modified nucleic acids can cause severe hepatotoxicity in mice by a yet-not-understood mechanism (Swayze et al. 2007). To date, this modification is currently not used in aptamers studied in clinical trials (see Table 1) (Zhou and Rossi 2017). Alves Ferreira-Bravo et al. described the first FANA aptamer generated through a SELEX process binding to a protein (Alves Ferreira-Bravo et al. 2015). They selected a hybrid DNA-FANA aptamer targeting the human immunodeficiency virus-1 reverse transcriptase (HIV-1 RT). The initial random pool used for the SELEX protocol included a 20-nucleotide fixed DNA sequence at the 50 end followed by a 40-nucleotide region of random FANA nucleotides, and then 19 nucleotides of FANA fixed sequence for this. The identified aptamer, termed FA1, binds the HIV-1 reverse transcriptase with high specificity and an affinity in the low nanomolar range (Alves Ferreira-Bravo et al. 2015). A head-by-head comparison of previously published DNA and RNA aptamers binding to HIV-RT revealed a

760

D.-M. Otte et al.

similar binding affinity, but higher serum stability of the modified aptamer FA1 (Alves Ferreira-Bravo et al. 2015). Given the exciting capabilities provided by XNAs in the context of protein recognition aforementioned, Rangel et al. selected TNA-containing aptamers binding to ochratoxin A (OTA), a mycotoxin (Rangel et al. 2018). Taking into account that small molecules are more challenging targets for aptamer selection compared to proteins, they display significantly less surface area and fewer functional groups (see [Rangel et al. 2018] an references within). Nevertheless, their selected aptamers revealed affinities in the two-digit nanomolar range. Importantly, the aptamer A04T.2 with the shortest sequence (31 nucleotides) was stored in 50% human blood serum for seven days without degradation and without loss of affinity to OTA. In turn, a DNA aptamer that also binds to OTA was completely degraded under the same conditions (Rangel et al. 2018). Furthermore, including the remarkable biostability mentioned above, the TNA aptamer has a high specificity and capable of binding OTA in a large background of competing biomolecules (Rangel et al. 2018). Thus, XNAs containing aptamers display an impressive nuclease resistance potential. Along those lines, the ribose was also substituted by a 1,5-anhydrohexitol, dubbed HNA (hexitol nucleic acids) (Eremeeva et al. 2019; Pinheiro et al. 2012). In stability terms, Eremeeva et al. demonstrated HNAs’ superiority over 20 -ribose modifications. In their approach, they selected three structurally unique 20 -Omethyl-ribose–1,5-anhydrohexitol nucleic acid (MeORNA–HNA) aptamers toward the rat VEGF164 (rVEGF164) target protein. These ligands bound to rVEGF164 in a low nanomolar range, whereas a comparison to 20 F/20 OMe-RNA Macugen ® mentioned above binding the human VEGF165 revealed an affinity in a two-digit picomolar range. But a nuclease resistance assay (95% human serum, seven days, 37  C) revealed that one of their MeORNA-HNA aptamers showed just minor degradation (83% of the aptamer remained intact), while 20 F/20 OMe-RNA Macugen ® was almost completely degraded (7% of full-length aptamer was detected). These results are highly consistent with the well-known biological stability of HNA relative to DNA or RNA, which highlights the importance of HNA biomolecules as convenient skeletons for the development of biologically stable nucleic acid drugs (Pinheiro et al. 2012).

Substitution of Phosphodiester Linkage Stabilization of the nucleotide backbones against nucleases within an aptamer can also be achieved by substituting the phosphate linkages with sulfur-containing phosphorothioate linkages (PTL) (Ni et al. 2021). Indeed, PTL is currently the most used backbone modification used in the field of antisense technologies, as it does not substantially change the, e.g., overall RNA structure (Ly et al. 2020). PTL modifications impact tissue distribution and cellular internalization of antisense oligonucleotides, a class of therapeutic RNAs (Eckstein 2014). A recent example of a PTL-modified aptamer was published by Wang et al. They identified a DNA aptamer, dubbed ThioAp52, recognizing the tumour-specific MAGE-A3 111–125 peptide antigen (Wang et al. 2016). Full-length MAGE-A3 is a tumor-specific protein that has been found on many tumor cells, including melanoma, non-small

23

Nucleic Acid Aptamers: From Basic Research to Clinical Applications

761

cell lung cancer, and hematologic malignancies (Wang et al. 2016). The PTL-modified aptamer version bound to cells was obtained from seven cancer lineages, e.g., melanoma, breast, colorectal, liver, lung, pancreas, and mouth cancer (Wang et al. 2016). Due to the PTL modification, the stability of the aptamer after 24 h in serum-containing medium was increased twofold compared to the unmodified aptamer (Wang et al. 2016). When employed in vivo, ThioAp52 accumulated in pancreatic tumors of xenograft mice after i.v. injection (Wang et al. 2016). The aptamer was detectable in the tumor for more than five days, indicating the potential clinical utility of the aptamer. The aptamer ARC 1779 blocks binding of the von Willebrand Factor (VWF) A1 domain to the platelet GPIb receptor (Spiel et al. 2009). VWF plays a crucial role in acute myocardial infarction (AMI), as reflected by increased shear-dependent platelet function (Spiel et al. 2009). This highly modified aptamer is composed of 39 nucleotides and besides 26 20 -OMe-modified nucleotides it bears a single PTL group. Additionally, ARC1779 has an inverted 30 dT residue and a 20 kDa PEG moiety at the 50 -terminus (see Table 1). These modifications enabled a mean half-life of approximately two hours and the mean clearance ranged from ~10% to ~21% of the glomerular filtration rate in healthy volunteers (Gilbert et al. 2007). Arangundy-Franklin et al. followed a different strategy by establishing a SELEX procedure that allows for the incorporation of alkyl phosphonate nucleotides into nucleic acids libraries (phNA) (Arangundy-Franklin et al. 2019). PhNA combine an uncharged alkyl phosphonodiester backbone chemistry with close steric and electronic analogy to canonical nucleic acids. Furthermore, phNA polymers in the context of antisense technologies have been shown to be nuclease resistant and nontoxic (see [Arangundy-Franklin et al. 2019] and references within). ArangundyFranklin et al. employed streptavidin as a target for a successful de novo selection of aptamers directly from a random-sequence phNA repertoire. This new approach opens the path for the selection of nuclease-resistant phNA aptamers in nonaqueous solvents (Arangundy-Franklin et al. 2019).

Spiegelmers An effective approach for the generation of nuclease-resistant aptamers is to employ L-ribose-containing nucleotides as building blocks (Fig. 6). L-ribose is the enantiomer of D-ribose, of which natural occurring nucleic acids are built. Therefore, L-ribose nucleic acids behave like enantiomers and aptamers containing L-ribose backbone are termed spiegelmers. These nucleic acids form lefthanded helices instead of the typical right-handed helices. Spiegelmers reveal extensive serum stability as they are not recognized by endogenous nucleases (Ni et al. 2021). Until 2021, spiegelmers could not be amplified by using natural polymerases, and thus they were routinely selected as natural nucleic acids binding to the enantiomers of the putative target molecule (i.e., a D-peptide). For instance, the obtained D-aptamers which bind to a D-peptide were then converted into L-aptamers (spiegelmers), which then bind to the natural occurring enantiomer, e.g., L-peptide or protein target (Ni et al. 2021). This approach had a major drawback, as the chemical synthesis of large proteins is not feasible based on current

762

D.-M. Otte et al.

Fig. 6 Chemical structure of D-DNA and L-DNA (Spiegelmer). Spiegelmers consist of nonnatural L-ribose nucleotides. The dashed line represents the mirror plane

technologies. Based on that account, only smaller peptides were chosen as targets. Recently, Chen et al. invented a special SELEX procedure using a mirror-image DNA polymerase for directed evolution and selection of nuclease-resistant L-aptamers (Chen et al. 2022). By circumventing the synthesis of D-protein sequences, this approach opens the path for the direct selection of biostable sensors, therapeutics, and basic research tools without any target length limitations. Currently, three spiegelmers are being investigated in clinical trials: NOX-H94 that binds to hepcidin, NOX-E36, a CCL2-binding spiegelmer, and NOX-A12, which bind to CXCL12. For clinical use, all spiegelmers were PEGylated and display slow in vivo clearance. In healthy volunteers the spiegelmers demonstrated good safety profiles. Both NOX-A12 and NOX-H94 have also yielded promising results in phase II studies, summarized in (Ni et al. 2021).

Reduction of Renal Clearance of Aptamers PEGylation refers to the addition of polyethylene glycol residues (PEG tail) to a variety of substrates, including DNA, proteins, and peptides. In developing therapies based on proteins, peptides, or nucleic acids, PEG is the most used polymer to extend half-life as well as to reduce renal clearance, since it has unique combination of properties, such as being nontoxic, inert, highly water soluble, inexpensive, and FDA approved. PEG has been intensively used for over four decades in research and for over two decades in clinical trials (Hoang Thi et al. 2020). PEG tails are also commonly conjugated to aptamers which are tested in preclinical in vivo studies. For example, the PEGylated (40 kDa PEG) aptamer ARC19499 inhibits the tissue factor pathway inhibitor (TFPI). TFPI is the main regulator of the FVIIa/tissue factor pathway of coagulation (Gorczyca et al. 2012). In a study, ARC19499 was administered i.v. into cynomolgus monkeys with induced hemophilia after FVIII antibody administration. In most of the animals, two doses of aptamer reinstalled the bleeding time to baseline (100%). A 30% of monkeys were dosed with a third dose of ARC19499 to recover the bleeding time to 85%. Furthermore, aptamer stability monitoring for 2 h after treatment revealed no decline in drug concentration in the monkeys indicating that the half-life must be

23

Nucleic Acid Aptamers: From Basic Research to Clinical Applications

763

considerably longer (Waters et al. 2011). Thus, ARC19499 inhibition of TFPI may be an alternative option for current treatments of bleeding associated with hemophilia. Consequently, ARC19499 is currently studied in a clinical trial (see Table 1). A PEG tail is also commonly conjugated to aptamers studied in clinical trials (see Table 1). For instance, pegnivacogin, an RNA aptamer that binds to and inhibits the activated coagulation factor IXa, contains in addition to a 20 -fluoro modification a 50 40-kDa branched PEG tail. An i.v. infusion of pegnivacogin almost completely inhibited factor IX activity in patients with acute coronary syndrome with a calculated half-time of approximately six days. However, further analysis of pegnivacogin displayed a major drawback of the PEGylation in this study: More than half of the treated patients were found to produce high levels of anti-PEG antibodies to which evoked allergic reactions were correlated (Ganson et al. 2016). This immunogenicity was also substantiated by reports of patients who developed PEG-binding antibodies after receiving PEGylated enzymes or other PEGylated aptamers (Qi et al. 2016). Consequently, the clinical trial was discontinued (Ganson et al. 2016). In a follow-up study, pegnivacogin was conjugated to the less immunogenic moiety poly[oligo (ethylene glycol) methyl ether methacrylate (POEGMA) (Qi et al. 2016) in order to circumvent allergic immune responses. The resulting aptamer was dubbed POMEGA-RB005 and displayed in a study in mice similar to antithrombotic efficacy as observed with pegnivacogin, but no detectable evidence of an immune response was found (Ozer et al. 2022). Another option to increase circulation time and to reduce renal excretion of aptamers is the conjugation to a cholesterol moiety (summarized in [Ni et al. 2021]). Cholesterol–aptamer conjugates display an increased half-life most likely due to association with plasma lipoproteins (Lee et al. 2015). Those conjugates do neither induce the immune system nor evoke toxicity signs in cells in vitro and in vivo (Ni et al. 2021). For example, Lee et al. selected a 29-nucleotide spanning, 20 -fluoro-modified RNA aptamer directed toward the NS5B protein of hepatitis C virus (HCV). This aptamer inhibited the replication of HCV in human liver cells (Lee et al. 2013). Subsequently, this aptamer was further modified by addition of 30 -inverted dT and by conjugation to a 50 -cholesterol moiety. In a head-to-head stability approach in mice, the i.v. administered cholesterol-modified aptamer was compared to the i.v. administered nonmodified aptamer. After collection of venous blood samples, the measurement of aptamer presence demonstrated a half-live of 10–14 h for the cholesterol-modified compared to 6 h of the non-cholesterolconjugated aptamer (Lee et al. 2015). Currently, there are no cholesterol-modified aptamers in clinical trials (Table 1) but might soon come into fashion.

Nucleobase Modifications Aptamers are composed of the (deoxy)ribose-phosphate backbone and the four aromatic nucleotides A, G, C, and T (DNA) or U (RNA). Compared to antibodies, which are built from 20 amino acids with aromatic, aliphatic, basic, acidic, and polar side chains, aptamers thus have a limited chemical diversity. Because of this

764

D.-M. Otte et al.

limitation, the success rate of SELEX experiments is restricted and on average 3 out of 10 proteins subjected to enrichment processes yield aptamers. Therefore, strategies by which the chemical diversity of nucleic acids is broadened are believed to impact the success rates of SELEX experiments by entailing interactions with target molecules a nucleic acid built from canonical nucleic acid cannot do. In this way, Gold and coworkers established a so-called slow off-rate modified aptamers (SOMAmers) (Fig. 7a) (Gold et al. 2010). These aptamers bear amino acid–like side chains, e.g., 5-tryptaminocarbonyl-dU (resembling tryptophan) or 5-benzylaminocarbonyl-dU (resembling phenylalanine) covalently attached to the C5-position of uridine-triphosphate via an amide bond (Vaught et al. 2010). These modified triphosphates are compatible with enzymatic steps of the SELEX procedure and, thus, can be used as building blocks in nucleic acid libraries to enrich specific SOMAmers binding to target proteins. Indeed, this process has being proven more effective compared to using standard nucleic acid libraries, as such target molecules that previously escaped the selection process were successfully employed (Gold et al. 2010). SOMAmers reveal affinities to their target molecules in the low nanomolar range and low off rates, and hence the name (Davies et al. 2012). Structural analysis of SOMAmer-target complexes not only revealed the modifications being in direct contact with the cognate protein, but also novel nucleic acid structural motifs, e.g., benzyl zippers (Davies et al. 2012). These examples show that not only the chemical but also the structural diversity of SOMAmers increase when compared to nucleic acid with canonical nucleobases.

Fig. 7 SOMAmers and clickmers. (a) A series of nucleoside triphosphate analogs modified at the 5-position of uridine triphsophate (dUTP): 5-benzylaminocarbonyl-dU (BndU); 5-naphthylmethylaminocarbonyl-dU (NapdU); and 5-tryptaminocarbonyl-dU (TrpdU). (b) Functionalization of EdU in DNA libraries by the Cu(I)-catalyzed azide–alkyne cycloaddition (CuAAC) in order to modify it with an azide

23

Nucleic Acid Aptamers: From Basic Research to Clinical Applications

765

SOMAmers were identified for thousands of different proteins, found in plasma and serum samples (Candia et al. 2017). These SOMAmers constitute the specific binders and reporters of the so-called SOMAscan analysis (Gold et al. 2012; Kraemer et al. 2011), in which the proteome of clinical samples, e.g., plasma, blood, urine, and spinal fluid (Baird et al. 2012; Kukova et al. 2019), is analyzed. The comparisons of samples from patients versus normal allow insight into proteomic alterations caused by the disease. This approach is broadly applicable and so far has been used to generate proteome analysis of cardiac surgery (Kukova et al. 2019) and Alzheimer patients (Zhao et al. 2015). A different way to introduce chemical entities into nucleic acids is click chemistry (Fig. 7b) (Pfeiffer et al. 2017). In this approach, C5-ethynyl-deoxyuridine triphosphate is incorporated into DNA by PCR and after production of the ssDNA reacted with azides of choice (Fig. 7b) (Pfeiffer et al. 2018). Chemical-modified aptamers generated by this process are dubbed clickmers (Pfeiffer et al. 2017). Using this reaction, a broad variety of chemical modifications can be introduced into DNA libraries, including polar, hydrophobic, and aromatic sides chains (Plückthun et al. 2020). Additionally, in a recent study, cubane was used to generate cubane-modified clickmers (therefore also termed cubamers) that bind to the lactate dehydrogenase from Plasmodium falciparum (PvLDH) (Cheung et al. 2020). Structural studies using X-ray analysis revealed the cubane moieties of the cubamer to interact with a hydrophobic pocket of PvLDH. Besides protein targets, the clickmer technology was also applied to identify binders that recognize small molecules, e.g., Δ9-tetrahydrocannabinol (THC) and carbohydrates (Rosenthal et al. 2019; Gordon et al. 2019). In the former, a design strategy was applied in which chemical residues were introduced into the DNA library that mimic the natural interaction sites of THC within the cannabinoid receptor (Rosenthal et al. 2019; Shim et al. 2011). In this way, a THC-binding clickmer was enriched, underlining the modularity of the process in which virtually any molecules of choice can be introduced into DNA that might facilitate target binding.

Aptamers Bearing an Expanded Genetic Alphabet Besides adding chemical moieties to nucleobases, the introduction of novel noncanonical base pairs also provides access to modified aptamers with different properties. This expansion of the genetic alphabet and its application to SELEX experiments has been realized by Benner and coworkers, in which an artificial base pair based on hydrogen bonding, termed dZ:dP, was introduced into DNA. They termed this process artificial extended genetic information system (AEGIS) (Fig. 8a). AEGIS was used to enrich aptamers with an extended alphabet bearing binding to cells, e.g., MDA-231 or HepG2 cells or to the protein glypican. Enriched aptamer sequences without dZ and dP showed reduced binding to the targets on the cells, indicating not only the importance of the artificial base pair for the enrichment process but also for the target recognition and/or the conformation of the respective aptamer (reviewed in [Biondi and Benner 2018]).

766

D.-M. Otte et al.

Fig. 8 The chemical structure of artificial base pairs used in SELEX. (a) The dZ:dP base pair is built on hydrogen bonds. dP: 2-amino-8-(10-b-D-2-deoxyribofuranosyl)-imidazo[1,2-a]-1,3,5-triazin-4(8H); dZ: 6-amino-5-nitro-3-(10-b-D-20-deoxyribofur-anosyl)-2(1H)-pyridone. (b) Synthetic base pair dDs:dPx relying on hydrophobic interactions. dDs: 7-(2-thienyl) imidazo[4,5-b] pyridine; dPx:2-nitro-4-propynylpyrrole (Px)

Expanded genetic alphabets with hydrophobic base pairs have been introduced by Hirao and coworkers (Kimoto et al. 2013). This base pair is built from the nucleotides bearing 7-(2-thienyl) imidazo[4,5-b] pyridine (Ds) and 2-nitro-4-propynylpyrrole (Px) (Fig. 8b) (Hirao and Kimoto 2012). Using DNA libraries extended by this base pair, aptamers binding to interferon-γ (IFN-γ) and vascular endothelial cell growth factor-165 (VEGF-165) were identified. These aptamers were found to have higher affinities to the respective target proteins when modified with Ds compared to variants without Ds or previously described aptamers that consist of canonical nucleotides only (Kimoto et al. 2013).

Aptamers as Drug Delivery Vehicles An inhibitory aptamer can impair the function of the target protein, thus such aptamers can be used directly as a therapeutic antagonist. For instance, as putative treatment against HIV, Zhou screened 20 -F-pyrimidine-substituted RNA aptamers binding to gp120 and showed that they neutralized infectivity in cultured CEM T cells and primary blood mononuclear cells (PBMCs) (Zhou et al. 2009). Additionally, aptamers can be selected to target and treat a variety of human diseases, but they can also be used for targeted cytotoxic drug delivery, e.g., in cancer treatment. These aptamer–drug conjugates (ApDCs) could be a desirable future treatment option, because one of the limitations of conventional chemotherapy drugs is the lack of selectivity, which usually leads to serious side effects. For instance, daunorubicin and doxorubicin (Dox) are representative anthracycline antibiotics used in clinic to treat various types of cancer (Ni et al. 2021). In vitro these anthracycline drugs cause

23

Nucleic Acid Aptamers: From Basic Research to Clinical Applications

767

DNA damage, e.g., fragmentation and single-strand breaks which ultimately kill cells (Aubel-Sadron and Londos-Gagliardi 1984). However, severe side effects are common due to nonselective intercalation within the double-stranded nucleic acid of all cells in vivo (Ni et al. 2021). Intensive progress has been made today in the design of modified doxorubicin with specific properties. Various aptamer–drug conjugates (ApDCs) targeting biomarkers such as prostate-specific membrane antigen (PSMA) aptamer (A10), antiepithelial cell adhesion molecule (EpCAM) aptamer (EpDT3), anti-protein tyrosine kinase (PTK7) RNA (Sgc8), and anti-mucin (MUC) aptamers (MUC1), summarized in (Zhu and Chen 2018), have been successfully applied to target and kill cancer cells in vitro. However, such an approach is limited. A proper match between the linker chemistry and the biomolecular property of the aptamer is necessary to improve the safety and efficacy of the ApDC system. Most aptamer–drug conjugate systems are reported to use linkers of around 36 carbon atoms to maintain optimal spacing and flexibility between aptamer and the drug moieties, thereby avoiding structural conflicts and unanticipated therapeutic failures (Huang et al. 2009; Catuogno et al. 2015). There have been successful attempts to use DNA or RNA aptamers as tumor targeting agents in vivo principally in mice making use of xenotransplantated animal models (Zhu et al. 2015; Baek et al. 2014). For instance, Beak et al. designed and synthesized drug-encapsulating liposomes (so-called aptamosoms), conjugated with an RNA aptamer specific to the prostate-specific membrane antigen (PSMA). Those aptamosoms are composed of several phospholipids, including 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC), 1,2-dioleoyl-snglycero-3-phosphoethanolamine (DOPE), 1,2-distearoyl-sn-glycero-3-phosphoethanolamine-N-[methoxy(polyethylene glycol)-2000] (DSPE-PEG2000), and cholesterol. After analysis of specific binding and subsequent uptake of the aptamosomes into LNCaP prostate epithelial cells that express PSMA, the anticancer drug Dox was encapsulated into the aptamosomes and investigated for drug targeting efficacy. Intravenous injections of Dox-encapsulating PSMA-aptamosomes into LNCaP xenotransplantated mice revealed improved antitumor efficacy over treatment with the pure drug (Baek et al. 2014). This in vivo study along with others justifies the usage of aptamer–drug conjugates as a future cancer treatment with fewer side effects in clinical practice.

Conclusion Overall, aptamers work exceptionally well regarding detection, costs, and sensitivity under defined conditions. To date aptamers are close to meet the requirements of clinical applications. Their value and potential in disease treatment has been proven in many preclinical trials in mice and other laboratory animals. Some of the limitations of aptamers like nuclease breakdown, requisite safety, e.g., PEG-related side effects, and fast renal excretion need to be addressed in future research. The fact that aptamers are recently engineered substances could explain their rare application as therapeutic agents until now. Monoclonal antibodies, for instance, were developed in 1975, but the first antibody-based drug was approved by the FDA in 1986

768

D.-M. Otte et al.

(muromonab-CD3). The second drug of that type reached the market in 1994 (Abciximab), and today 111 antibody-based drugs are used in the clinic (as of 2022, The Antibody Society. Therapeutic monoclonal antibodies approved or in regulatory review, www.antibodysociety.org/antibody-therapeutics-product-data). In terms of clinical relevance, aptamers could have the same history as antibodies as aptamers were first discovered in 1990 and Macugen® was FDA approved in 2004 with more aptamers in late clinical phases. With further advances in SELEX technology, aptamer modification, and modification methods and routes of application, it is most possible that aptamers will, like antibodies, play a crucial role in molecular biology and biomedical fields. Detailed mechanisms on how aptamers bind to their targets must also be elucidated. In this concern, the aptamer field would profit from effective computational methods for predicting aptamers and aptamer–target interactions. Albeit several attempts have been made since the recent years (Lee et al. 2021), such prediction routines still lack accuracy. The future looks promising for aptamers. With more investment into aptamer-specific (pre)clinical trials, improved engineering, e.g., XNA tolerating high-fidelity polymerases, aptamers may become a powerful molecular tool in bioscience, addressing tenacious problems in medicine and other scientific fields.

References Abula A et al (2021) Molecular mechanism of RNase R substrate sensitivity for RNA ribose methylation. Nucleic Acids Res 49(8):4738–4749 Alves Ferreira-Bravo I et al (2015) Selection of 20 -deoxy-20 -fluoroarabinonucleotide (FANA) aptamers that bind HIV-1 reverse transcriptase with picomolar affinity. Nucleic Acids Res 43(20):9587–9599 Arangundy-Franklin S et al (2019) A synthetic genetic polymer with an uncharged backbone chemistry based on alkyl phosphonate nucleic acids. Nat Chem 11(6):533–542 Aubel-Sadron G, Londos-Gagliardi D (1984) Daunorubicin and doxorubicin, anthracycline antibiotics, a physicochemical and biological review. Biochimie 66(5):333–352 Baas DC et al (2010) The complement component 5 gene and age-related macular degeneration. Ophthalmology 117(3):500–511 Baek SE et al (2014) RNA aptamer-conjugated liposome as an efficient anticancer drug delivery vehicle targeting cancer cells in vivo. J Control Release 196:234–242 Baird GS et al (2012) Age-dependent changes in the cerebrospinal fluid proteome by slow off-rate modified aptamer array. 180(2):446–456 Biondi E, Benner SA (2018) Artificially expanded genetic information systems for new aptamer technologies. Biomedicines 6(2) Burmeister PE et al (2005) Direct in vitro selection of a 20 -O-methyl aptamer to VEGF. Chem Biol 12(1):25–33 Campbell MA, Wengel J (2011) Locked vs. unlocked nucleic acids (LNA vs. UNA): contrasting structures work towards common therapeutic goals. Chem Soc Rev 40(12):5680–5689 Candia J et al (2017) Assessment of variability in the SOMAscan assay. 7(1):1–13 Catuogno S et al (2015) Selective delivery of therapeutic single strand antimiRs by aptamer-based conjugates. J Control Release 210:147–159 Chaput JC (2021) Redesigning the genetic polymers of life. Acc Chem Res 54(4):1056–1065

23

Nucleic Acid Aptamers: From Basic Research to Clinical Applications

769

Chaput JC, Herdewijn P (2019) What Is XNA? Angew Chem Int Ed Engl 58(34):11570–11572 Chen J et al (2022) Directed evolution and selection of biostable L-DNA aptamers with a mirrorimage DNA polymerase. Nat Biotechnol Cheung YW et al (2020) Evolution of abiotic cubane chemistries in a nucleic acid aptamer allows selective recognition of a malaria biomarker. Proc Natl Acad Sci U S A 117(29):16790–16798 Civit L et al (2019) Targeting hormone refractory prostate cancer by in vivo selected DNA libraries in an orthotopic xenograft mouse model. Sci Rep 9(1):4976 Darmostuk M et al (2015) Current approaches in SELEX: an update to aptamer selection technology. Biotechnol Adv 33(6 Pt 2):1141–1161 Davies DR et al (2012) Unique motifs and hydrophobic interactions shape the binding of modified DNA ligands to protein targets. 109(49):19971–19976 Eckstein F (2014) Phosphorothioates, essential components of therapeutic oligonucleotides. Nucleic Acid Ther 24(6):374–387 Eremeeva E et al (2019) Highly stable hexitol based XNA aptamers targeting the vascular endothelial growth factor. Nucleic Acids Res 47(10):4927–4939 Fernandez G et al (2018) TLR4-binding DNA aptamers show a protective effect against acute stroke in animal models. Mol Ther 26(8):2047–2059 Forster C et al (2012) Properties of an LNA-modified ricin RNA aptamer. Biochem Biophys Res Commun 419(1):60–65 Ganson NJ et al (2016) Pre-existing anti–polyethylene glycol antibody linked to first-exposure allergic reactions to pegnivacogin, a PEGylated RNA aptamer. 137(5):1610–1613.e1617 Gilbert JC et al (2007) First-in-human evaluation of anti von Willebrand factor therapeutic aptamer ARC1779 in healthy volunteers. Circulation 116(23):2678–2686 Gold L et al (2010) Aptamer-based multiplexed proteomic technology for biomarker discovery. 1–1 Gold L et al (2012) Advances in human proteomics at high scale with the SOMAscan proteomics platform. 29(5):543–549 Gorczyca ME et al (2012) Inhibition of tissue factor pathway inhibitor by the aptamer BAX499 improves clotting of hemophilic blood and plasma. J Thromb Haemost 10(8):1581–1590 Gordon CKL et al (2019) Click-particle display for base-modified aptamer discovery. ACS Chem Biol 14(12):2652–2662 Green LS et al (1995) Nuclease-resistant nucleic acid ligands to vascular permeability factor/ vascular endothelial growth factor. Chem Biol 2(10):683–695 Herdewijn P, Marliere P (2009) Toward safe genetically modified organisms through the chemical diversification of nucleic acids. Chem Biodivers 6(6):791–808 Hirao I, Kimoto M (2012) Unnatural base pair systems toward the expansion of the genetic alphabet in the central dogma. Proc Jpn Acad Ser B Phys Biol 88(7):345–367 Hoang Thi TT et al (2020) The importance of poly (ethylene glycol) alternatives for overcoming PEG immunogenicity in drug delivery and bioconjugation. 12(2):298 Hong SL et al (2019) Ebola virus aptamers: from highly efficient selection to application on magnetism-controlled chips. Anal Chem 91(5):3367–3373 Hoshino H et al (2020) DNA polymerase variants with high processivity and accuracy for encoding and decoding locked nucleic acid sequences. J Am Chem Soc 142(51):21530–21537 Huang YF et al (2009) Molecular assembly of an aptamer-drug conjugate for targeted drug delivery to tumor cells. Chembiochem 10(5):862–868 Kimoto M et al (2013) Generation of high-affinity DNA aptamers using an expanded genetic alphabet. 31(5):453–457 Komarova N et al (2018) Selection, characterization, and application of ssDNA aptamer against furaneol. Molecules 23(12) Kraemer S et al (2011) From SOMAmer-based biomarker discovery to diagnostic and clinical applications: a SOMAmer-based, streamlined multiplex proteomic assay. 6(10):e26332 Kratschmer C, Levy M (2017) Effect of chemical modifications on aptamer stability in serum. Nucleic Acid Ther 27(6):335–344

770

D.-M. Otte et al.

Kukova LZ et al (2019) Comparison of urine and plasma biomarker concentrations measured by aptamer-based versus immunoassay methods in cardiac surgery patients. J Appl Lab Med 4(3): 331–342 Kuwahara M, Obika S (2013) In vitro selection of BNA (LNA) aptamers. Artif DNA PNA XNA 4(2):39–48 Lee CH et al (2013) Inhibition of hepatitis C virus (HCV) replication by specific RNA aptamers against HCV NS5B RNA replicase. J Virol 87(12):7064–7074 Lee CH et al (2015) Pharmacokinetics of a cholesterol-conjugated aptamer against the hepatitis C virus (HCV) NS5B protein. Mol Ther Nucleic Acids 4:e254 Lee G et al (2021) Predicting aptamer sequences that interact with target proteins using an aptamerprotein interaction classifier and a Monte Carlo tree search approach. PLoS One 16(6):e0253760 Li Z et al (2021) Advances in screening and development of therapeutic aptamers against cancer cells. Front Cell Dev Biol 9:662791 Lin Y et al (1994) Modified RNA sequence pools for in vitro selection. Nucleic Acids Res 22(24): 5229–5234 Lin PH et al (2011) Studies of the binding mechanism between aptamers and thrombin by circular dichroism, surface plasmon resonance and isothermal titration calorimetry. Colloids Surf B Biointerfaces 88(2):552–558 Liu Z et al (2017) Evolved polymerases facilitate selection of fully 20 -OMe-modified aptamers. Chem Sci 8(12):8179–8182 Longmire M et al (2008) Clearance properties of nano-sized particles and molecules as imaging agents: considerations and caveats. Nanomedicine (Lond) 3(5):703–717 Ly S et al (2020) Single-stranded phosphorothioated regions enhance cellular uptake of cholesterolconjugated sirna but not silencing efficacy. Mol Ther Nucleic Acids 21:991–1005 Manthey HD et al (2009) Complement component 5a (C5a). Int J Biochem Cell Biol 41(11): 2114–2117 Mi J et al (2010) In vivo selection of tumor-targeting RNA motifs. Nat Chem Biol 6(1):22–24 Ni S et al (2021) Recent progress in aptamer discoveries and modifications for therapeutic applications. ACS Appl Mater Interfaces 13(8):9500–9519 Ohuchi S (2012) Cell-SELEX technology. Biores Open Access 1(6):265–272 Ozer I et al (2022) PEG-like brush polymer conjugate of RNA aptamer that shows reversible anticoagulant activity and minimal immune response. 34(10):2107852 Pfeiffer F et al (2017) Customised nucleic acid libraries for enhanced aptamer selection and performance. Curr Opin Biotechnol 48:111–118 Pfeiffer F et al (2018) Identification and characterization of nucleobase-modified aptamers by clickSELEX. 13(5):1153–1180 Pinheiro VB et al (2012) Synthetic genetic polymers capable of heredity and evolution. Science 336(6079):341–344 Plückthun O et al (2020) Dynamic changes in DNA populations revealed by split–combine selection. 11(35):9577–9583 Qi Y et al (2016) A brush-polymer/exendin-4 conjugate reduces blood glucose levels for up to five days and eliminates poly (ethylene glycol) antigenicity. 1(1):1–12 Rangel AE et al (2018) In vitro selection of an XNA aptamer capable of small-molecule recognition. Nucleic Acids Res 46(16):8057–8068 Rosenthal M et al (2019) A receptor-guided design strategy for ligand identification. 58(31): 10752–10755 Ruckman J et al (1998) 2'-Fluoropyrimidine RNA-based aptamers to the 165-amino acid form of vascular endothelial growth factor (VEGF165). Inhibition of receptor binding and VEGFinduced vascular permeability through interactions requiring the exon 7-encoded domain. J Biol Chem 273(32):20556–20567 Schmidt KS et al (2004) Application of locked nucleic acids to improve aptamer in vivo stability and targeting function. Nucleic Acids Res 32(19):5757–5765

23

Nucleic Acid Aptamers: From Basic Research to Clinical Applications

771

Shim JY et al (2011) Identification of essential cannabinoid-binding domains: structural insights into early dynamic events in receptor activation. J Biol Chem 286(38):33422–33435 Sola M et al (2020) Aptamers against live targets: is in vivo SELEX finally coming to the edge? Mol Ther Nucleic Acids 21:192–204 Spiel AO et al (2009) The aptamer ARC1779 is a potent and specific inhibitor of von Willebrand Factor mediated ex vivo platelet function in acute myocardial infarction. Platelets 20(5): 334–340 Stoltenburg R et al (2007) SELEX–a (r)evolutionary method to generate high-affinity nucleic acid ligands. Biomol Eng 24(4):381–403 Swayze EE et al (2007) Antisense oligonucleotides containing locked nucleic acid improve potency but cause significant hepatotoxicity in animals. Nucleic Acids Res 35(2):687–700 Taylor AI, Holliger P (2018) Selecting fully-modified XNA aptamers using synthetic genetics. Curr Protoc Chem Biol 10(2):e44 Vaught JD et al (2010) Expanding the chemistry of DNA for in vitro selection. J Am Chem Soc 132(12):4141–4151 Wahlestedt C et al (2000) Potent and nontoxic antisense oligonucleotides containing locked nucleic acids. Proc Natl Acad Sci U S A 97(10):5633–5638 Wang CY et al (2016) An aptamer targeting shared tumor-specific peptide antigen of MAGE-A3 in multiple cancers. Int J Cancer 138(4):918–926 Waters EK et al (2011) Aptamer ARC19499 mediates a procoagulant hemostatic effect by inhibiting tissue factor pathway inhibitor. Blood 117(20):5514–5522 Yang W (2011) Nucleases: diversity of structure, function and mechanism. Q Rev Biophys 44(1): 1–93 Yang Z et al (2007) Nucleoside alpha-thiotriphosphates, polymerases and the exonuclease III analysis of oligonucleotides containing phosphorothioate linkages. Nucleic Acids Res 35(9): 3118–3127 Zhao X et al (2015) A candidate plasma protein classifier to identify Alzheimer’s disease. J Alzheimers Dis 43(2):549–563 Zhou J, Rossi J (2017) Aptamers as targeted therapeutics: current potential and challenges. Nat Rev Drug Discov 16(6):440 Zhou J et al (2009) Selection, characterization and application of new RNA HIV gp 120 aptamers for facile delivery of Dicer substrate siRNAs into HIV infected cells. Nucleic Acids Res 37(9): 3094–3109 Zhu G, Chen X (2018) Aptamer-based targeted therapy. Adv Drug Deliv Rev 134:65–78 Zhu G et al (2015) Aptamer-drug conjugates. Bioconjug Chem 26(11):2186–2197

Part IV Ligand Chemistry of Nucleic Acids

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

24

Daniela Verga, Anton Granzhan, and Marie-Paule Teulade-Fichou

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bisquinolinium Pyridodicarboxamide (PDC) and PhenDC3: Prototypic G4 Ligands . . . . . . . . Genesis and Design of Bisquinolinium Ligands: The Preorganization Concept . . . . . . . . . . . In Vitro Binding: Affinity, Selectivity, and Ligand-Induced Conformation Changes . . . . . . Binding to Alternative Quadruplexes (VK2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biological Effects of Bisquinolinium Ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functionalized Bisquinolinium Ligands for Detection and Manipulation of G4 Structures . . . Biotinylated PDC and PhenDC3 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fluorescent Derivatives for In Vitro Detection and Cellular Imaging of G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G4 Cross-Linking and Alkylating Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Immunotagged G4 Ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Bisquinolinium Derivatives as G4 Ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimeric Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variations of the PDC Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Variations in Linker Groups and Quinolinium Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

776 777 777 780 786 787 796 796 798 804 808 811 811 813 820 827 828

Abstract

G-quadruplexes (G4s) are non-canonical secondary structures that can form in single-stranded DNA and RNA sequences containing multiple guanine tracts. G4s can accommodate and be stabilized by small molecules (G4 ligands) that typically interact by π-stacking with their external quartets. Along these lines, numerous G4 ligands acting as probes and drug prototypes have been reported, but only a few meet the criteria of selectivity and affinity necessary to achieve D. Verga · A. Granzhan · M.-P. Teulade-Fichou (*) CMIB, CNRS UMR9187, INSERM U1196, Institut Curie, PSL Research University, Orsay, France CMIB, CNRS UMR9187, INSERM U1196, Institut Curie, Paris-Saclay University, Orsay, France e-mail: mp.teulade-fi[email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_28

775

776

D. Verga et al.

efficient G4 targeting in cells. The present chapter is focused on bisquinolinium compounds comprising two quinolinium units, typically linked to a (hetero) aromatic dicarboxamide core and which represent the “gold standard” of G4 ligands. The seminal works that led to their design, the development of functional derivatives and of new analogues are described. In addition, a brief overview of their applications to imaging and covalent trapping of G4s in cells and of their therapeutic potential in treating cancer and other diseases is presented. Keywords

G-quadruplexes · G4 ligands · Bisquinolinium compounds · PDC · PhenDC

Introduction G-quadruplexes (G4s) are non-canonical, tetrahelical structures resulting from the folding of nucleic acid sequences harboring multiple guanine repeats, which have gained growing importance over the past two decades. The molecular event initiating the formation of G4 is the self-assembly of four guanines through Hoogsteentype hydrogen-bonding. The ensuing cyclic planar array, called G-quartet, is a highly hydrophobic surface prone to π-stacking interactions with other G-quartets to constitute the core of a G4 structure. In addition, potassium (K+) cations can intercalate between two G-quartets through octacoordination to O6 atoms of guanine residues, creating sandwich-like motifs which provide G4s with outstanding thermodynamic stability. These particular structural features, originating in the chemical structure of guanine, make G4s unique structures in the world of nucleic acids, as well as a fascinating example of the way biology is implementing self-assembly to produce supramolecular structures. The G4 structure itself is largely discussed in several other contributions to the present Handbook; therefore the reader is invited to refer to those chapters for more details. The current paradigm defines G4s as key structural elements able to impact fundamental processes linked to main functions of DNA and RNA. Nonetheless, several essential questions remain to be addressed and are related to: (i) the transient/dynamic nature of G4s and their abundance in cells; (ii) their specific recognition by proteins that shape and process the genome and the transcriptome; and (iii) their potential as therapeutic targets linked to cancer, viral, and neurodegenerative diseases. The hydrophobic nature and the steric accessibility of the external quartets allow G4s to readily accommodate small molecules also possessing planar aromatic units. This phenomenon has been largely exploited by chemists, leading to a plethora of G4 ligands with broad structural diversity, as summarized in more than 30 reviews on this topic published to date. Although several of the reported chemical series have been instrumental to probe the biology of G4 and explore their therapeutic potential, many do not exhibit the prerequisites of affinity and selectivity for interrogating G4 cellular functions with confidence.

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

777

Herein are reviewed the inception and the development of G4-interactive compounds grouped under the general term of “bisquinolinium” ligands that belong to the first generation of G4 ligands and that, after two decades, still represent the most potent G4-targeting agents.

Bisquinolinium Pyridodicarboxamide (PDC) and PhenDC3: Prototypic G4 Ligands Genesis and Design of Bisquinolinium Ligands: The Preorganization Concept The discovery of bisquinolinium compounds for targeting G4 structures originates in the pioneering study of Riou et al. (2002). The authors, upon screening of a library of about hundred triazine derivatives for identification of new G4 ligands as potential telomerase inhibitors, discovered that the derivatives bearing quinolinium units, in particular compound 12459 (Chart 1), were endowed with exceptional properties, namely, high binding affinity for the telomeric G4 structure, high selectivity versus duplex DNA, and a potent inhibitory effect on telomerase activity in vitro. Since triazines harbor poor metabolic and chemical stability, it was decided to replace the N–C bonds with amide bonds, and the central triazine with a pyridine unit for a synthetic purpose. These modifications thus afforded the series with a Pyridine-2,6DiCarboxamide core (termed PDC) of which compounds 360A and 832A are the prototypic members (Chart 1) (Hittinger et al. 2004; Mailliet et al. 2003). Subsequently, extensive structure–activity relationship (SAR) studies have been carried out, broadly exploring the chemical space around the structure of PDC 360A (Hittinger et al. 2004; Mailliet et al. 2003). This readily established that the

Chart 1 Structures of the first series of bisquinolinium derivatives identified as G4 ligands: (a) triazine derivative (Riou et al. 2002) and (b) Pyridine-2,6-dicarboxamide (PDC) derivatives (counterions: iodide or triflate) (Hittinger et al. 2004; Mailliet et al. 2003). Quinolinium units are shown in blue

778

D. Verga et al.

Chart 2 Structures of PDC 832A and analogues used for SAR studies, showing the effect of (a) the central moiety and (b) the replacement of one of the two N-methylquinolinium moieties. Bottom line: ligand-induced stabilization (ΔTm, at ligand: DNA ¼ 5: 1) of telomeric G4 sequence F21T [50 -FAM-GGG(TTAGGG)3-TAMRA-30 ] measured by FRET-melting (Hittinger et al. 2004; Mailliet et al. 2003)

heteroaromatic PDC core was a prerequisite, as its replacement by an ethylene linker completely abolished the binding activity for G4 (Chart 2a). Also, both the presence of the nitrogen-containing heterocycle and the 2,6-substitution pattern appeared essential since a significant drop in affinity was observed upon replacement by either a 1,3-phenyl ring or 2,4- or 2,5-substituted pyridine moieties (Chart 2a). Finally, the replacement of only one N-methyl quinolinium moiety by N-methylpyridinium, a non-aromatic ammonium group, or a neutral quinoline strongly impacted the affinity of the resulting analogues (Chart 2b), thereby demonstrating that the quinolinium motif was a strong determinant of G4 recognition both in terms of size and charge. Of note, the neutral precursor bearing two non-methylated quinoline residues shows no G4 binding. These studies established the PDC derivatives (in particular, 360A and 832A) among the most potent G4 ligands exhibiting submicromolar affinity. This remarkable property was rationalized in terms of preorganization of the pyridine2,6-dicarboxamide core that is known to adopt preferentially a bent syn–syn conformation through internal hydrogen bonding (Chart 3a) (Hamuro et al. 1997). This internal structuration locks the final bisquinolinium derivative in a V-shape (De Cian et al. 2007) (Chart 3b) which makes it more prone to interact with G-quartets than the linear form, namely, via a putative overlap of the three guanine residues (Chart 3c). The crucial role of the bent (V-) shape was confirmed by inverting the connectivity of the amide bond on the pyridine core, which prevents the internal structuration. This modification afforded the “inverted” PDC compound (i-PDC) (Chart 3e) that exhibited a very weak affinity for G4 (Fig. 1). However, V-shaped and linear conformations both co-exist in solution as shown by recent molecular dynamics studies of PDC derivatives as well as of Pyridostatin (PDS, Chart 3e), the latter possessing the same pyridine-2,6-dicarboxamide core (Xie et al. 2018; Rocca et al. 2017). Although the results demonstrated the predominance of the bent conformation(s) in a ~70/30 ratio, the PDC scaffold is nonetheless highly flexible and can adopt a variety of conformations which may minimize the gain in binding entropy as compared to a more rigid preorganized system. On this ground, Teulade-

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

779

Chart 3 (a) Conformations of the pyridine-2,6-dicarboxamide motif with the associated energies calculated using the CHARMM force field. (Adapted from Hunter and Purvis (1992)). (b) Bent vs. linear conformation of 360A. (c) Manual overlap of V-shaped 360A (purple) and three guanines (blue) constitutive of a G-quartet. (d) Various conformations of the PDC scaffold identified by molecular dynamics simulations. (From Granzhan et al. (2018)). (e) Structures of PDS and i-PDC

Fig. 1 (a) Thermal denaturation of the double-labelled telomeric sequence F21T (cf. Chart 2) in the presence of the various bisquinolinium derivatives, measured by FRET-melting (lithium cacodylate buffer pH 7.2, KCl 10 mM, ligand/DNA ratio ¼ 5:1, [DNA] ¼ 0.25 μM), in the absence and in the presence of 10 and 50 molar equiv. of duplex competitor (ds26, 50 -CAATCGGATCGAATTCGATCCGATTG-30 ). (b) Results of the fluorescence indicator displacement assay (G4-FID) performed with the same panel of ligands (except for i-PDC) and the telomeric sequence 22AG [50 -A(GGGTTA)3GGG-30 ]; buffer and DNA concentration are identical to FRET-melting

Fichou et al. undertook a further optimization of this scaffold, namely, through a substitution of the pyridine core with a more rigid 1,10-phenanthroline (phen) core that is bent per se, while exhibiting a larger aromatic surface area. The introduction of

780

D. Verga et al.

Chart 4 Structures of PhenDC and Bipy-DC derivatives developed by Teulade-Fichou et al. (2007)

2,9-disubstituted 1,10-phenanthroline in place of 2,6-disubstituted pyridine led to the generation of the Phenanthroline DiCarboxamide (PhenDC) series with the two prototypic members PhenDC3 and PhenDC6 (Chart 4) analogous to the PDC 360A and 307A, respectively (De Cian et al. 2007). Simple geometric considerations indicated that PhenDC compounds should exhibit a U-shaped conformation with a size perfectly matching that of a G-quartet in the two dimensions (lateral and diagonal) thereby providing a larger π-overlap as compared to PDC counterparts (De Cian et al. 2007). As for PDC, eventual hydrogen bonding and electrostatic repulsion may also participate to favor the “closed” conformation with C¼O bonds being “pulled out.” A systematic comparison of the two series confirmed this prediction, and PhenDC compounds were shown to have a significantly increased binding affinity for the telomeric G4, as shown by thermal denaturation and G4-FID assays (Fig. 1). Remarkably, the key importance of the phenanthroline motif rigidity was confirmed by the much lower affinity exhibited by the 2,20 -bipyridine (bipy) analogues (Chart 4 and Fig. 1). This trend has been subsequently confirmed with a large panel of G4 structures, where the stabilization effect (ΔTm) of PhenDC3 systematically exceeded the one of 360A by 1 to >10  C, depending on the structural features of G4 targets such as topology and loop length (Saha et al. 2020). This first evaluation shed light on the two exceptional properties of the PDC and PhenDC families of compounds. In the following years, only PhenDC3 and 360A have been further exploited both for chemistry and biology purposes since their corresponding isomers, differing by the substitution position on the quinoline ring (6 versus 3, i.e., PhenDC6 and 832A), show similar binding behavior.

In Vitro Binding: Affinity, Selectivity, and Ligand-Induced Conformation Changes After the evaluation by semi-quantitative methods (FRET-melting and G4-FID), the binding constants of PhenDC3 and 360A were determined using diverse biophysical methods (ITC, SPR, SPRi, ESI-MS, fluorimetric titration), which allowed to compare the binding thermodynamics. These data are summarized in Table 1. Of note, Ka values are relative to the conditions and methods used, which explains the large variation of the values (one to two orders of magnitude) observed from one

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

781

Table 1 Summary of DNA affinity constants (Ka) determined for PhenDC3 and 360A using various methodsa

Method ITCb (Bončina et al. 2015a) SPR (Bonnat et al. 2017) SPRi (Pillet et al. 2011)

G4-FID (Halder et al. 2011) ESI-MS (Marchand et al. 2021)

Ka G4 (M1) 360A PhenDC3 3.2  106 5.5  106

Ka duplex (M1) 360A PhenDC3 7.2  104 n.d.

1  106

4.2  108 – 1.2  109c 1  106

1.5  108

9.8  108

No bindingd 2  103 (AT-rich) 2  103 (GC-rich) n.d.

1.2  106

2.9  106

n.d.

4.0  107c

No bindingd 2  103 (AT-rich) 5  104 (GC-rich) n.d. n.d.

G4/duplex selectivity 360A PhenDC3 >50 n.d. –



500



500 (AT-rich) 20 (GC-rich) –





n.d. not determined G4 ¼ human telomeric G4 sequence in ITC, SPR, and SPRi studies, T952T [50 -TT(GGGT)4] in ESI-MS and G4-RNA 50 -[(G3U2)] in G4-FID, duplex is ds26 (Fig. 1) or 8-bp GC-rich or AT-rich hairpins (SPRi). In all cases data were fitted with 1:1 binding model thus sensing the site of highest affinity, although 2:1 binding was observed at higher ligand concentrations b The ligand used is 360A-Br, the bromo derivative of 360A c Dejeu et al. (unpublished) d no signal in the range of concentration used a

study to another. However, collectively, the data confirm the affinity ranking (PhenDC3 > PDC 360A), with Ka values for PhenDC3 being two to tenfold higher than those for PDC 360A, thereby validating the design of the PhenDC series. Importantly, these studies demonstrate that G4 vs. duplex selectivity is high for both series (50- to 500-fold, Table 1). This selectivity is crucial for targeting and probing G4 in a biological context and is unfortunately too often neglected or even overlooked in many studies. Further works confirmed that PhenDC3 is definitely targeting all G4s with high (nanomolar) affinity, irrespective of the topology and loop length, DNA or RNA nature, thereby establishing PhenDC3 as a “molecular G4 glue” and a universal G4 probe (Bonnat et al. 2017; Halder et al. 2011). Taken together, these data place the two compounds among the top-four benchmark G4 ligands together with PDS and BRACO19, although the two latter show five- to tenfold lower G4-binding constants as compared to PhenDC3 (Bonnat et al. 2017). As mentioned above, this exceptionally high binding interaction was hypothesized to result from the U-shape and the size of the bisquinolinium phenanthroline2,9-dicarboxamide scaffold, likely to afford large π-overlap with a G-quartet surface. This hypothesis was fully confirmed in 2014 by a landmark study reporting the highresolution NMR structure of PhenDC3 bound to the c-myc quadruplex (Pu24T sequence), which forms a well-defined parallel-stranded G4 structure (Chung et al.

782

D. Verga et al.

2014). The study identified one high-affinity site corresponding to the terminal quartet on the 50 -end, and the analysis showed an extensive π-overlap of the PhenDC3 scaffold with the four guanines. The large number of conformational constraints based on NOE measurements together with a 13C-labelled PhenDC3 derivative enabled to fully solve the structure and assess the position of quinolinium rings. On this basis, a model has been built which showed a perfect aromatic overlap of one quinolinium unit with the guanine residue G13 (Fig. 2), thereby confirming the key role of the crescent-shaped phenanthroline core for the optimal distribution of quinolinium units above the guanines. Remarkably, interaction of ligands with all four guanines of a G-quartet is scarce (Neidle 2016). Indeed as shown in Fig. 3 the aromatic overlap observed for most reported quadruplex-ligand complexes involves only two guanines, with the exception of PhenDC3 and 6OTD (Chung et al. 2013, 2014). This feature is likely to be at the origin of the exquisite binding affinity of PhenDC3 for quadruplexes and rationalize the reduced activity of bipyridine analogues possessing a flexible core (Fig. 1). Also, this study provided evidence that the size of the quinolinium moieties is a critical factor for π-π interaction with guanines, in line with the negative impact

Fig. 2 (a) 13C-labelled PhenDC3 used in the NMR study. (b) Structure of the PhenDC3–Pu24T complex based on NOE restraints. (c) Top view showing the overlap of PhenDC3 (pink) and guanines of the top G-quartet (blue). (Adapted with permission from Chung et al. (2014). Copyright 2014 Wiley-VCH)

Fig. 3 Top view of four models of G4–ligand complexes generated from NMR or X-ray crystallography studies, showing the overlap between the guanines of the external G-quartet (blue) and the ligand (pink). PDB structures from left to right: 3CE5 (BRACO-19), 2JWQ (MMQ1), 1NZM (RHPS4), 2MB3 (6OTD)

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

783

observed when smaller aromatic moieties were grafted as shown both in the initial SAR study (Chart 2) (Hittinger et al. 2004) and in follow-up studies (section “Variations in Linker Groups and Quinolinium Residues”). Another important feature of the PhenDC3/G-quartet interaction was revealed by molecular dynamics simulations (Chung et al. 2014). The high flexibility of the C–N bonds connecting the two quinolinium moieties to the central aromatic core is strongly reduced in the complex but still enables the molecule to adapt to the movements (“breathing”) of the guanine quartet, maintaining optimal aromatic–aromatic interactions (Fig. 4). This feature, which has also been observed in the case of 6OTD, indicates that the inherent flexibility of unfused polyheteroaryl scaffolds is beneficial and offers clear advantage as compared to the high rigidity featured by condensed aromatic scaffolds (e.g., RHPS4, BRACO19, or NDI). In summary, the solved NMR structure allowed to better understand how the bisquinolinium PhenDC scaffold interacts with G4 and is in line with all trends observed initially (Charts 1, 2, 3, and 4). Interestingly, a recent molecular modelling analysis of a putative interaction of PhenDC3 with duplex DNA showed that, while the phenanthroline moiety overlaps well the surface area of the Watson–Crick base pairs, the two quinolinium units protrude into the solvent, disfavoring this interaction (Fig. 5) (Neidle 2016). This illustrates the poor capacity of PhenDC3 to intercalate between Watson–Crick base pairs thus providing rationalization for the strong preference for G4 structures versus duplex DNA. Altogether, these data demonstrate that the PhenDC scaffold is already quasioptimal offering the perfect trade-off between charge, size, shape, rigidity, and flexibility, thereby suggesting that any further structural modification aiming at an improvement might be challenging and could compromise its exquisite

Bound PhenDC3

Free PhenDC3 150

150

-150 0

10 Time (ns)

15

T2(deg)

T2(deg)

50

-150

10

20 30 Time (ns)

40

1

N O

NH

HN

Me N

2

N Me

50 -50

-150 5

10 Time (ns)

15

0

T1 6 4 2 -50 50 Angels (deg)

10

20 30 Time (ns)

150

Quinolinium

40

15

G13

5 -150

PhenDC3

T1 T2

25

T2

Frequency

Frequency

N O 0

150

-50

-150

-50

-150 5

150

0

50

T1(deg)

T1(deg)

50 -50

G4 G17 G8

-50 50 Angels (deg)

150

Fig. 4 Molecular dynamics simulations showing the reduced movements of the C–N bond in the G4/PhenDC3 complex (left) and synchronized movements of the quinolinium unit (pink) and the G13 residue (blue) that remain essentially parallel (right). (Adapted with permission from Chung et al. (2014). Copyright 2014 Wiley-VCH)

784

D. Verga et al.

Fig. 5 Molecular model representing putative PhenDC3 intercalation between base pairs of duplex DNA. (Reproduced with permission from Neidle (2016). Copyright 2016 American Chemical Society)

G4-recognition properties. Indeed, most follow-up studies using the phenanthroline motif directly derivatized with amino side chains (cf. section “Variations of the PDC Core”) or bisphenanthroline derivatives provided compounds of significantly weaker potency and/or abolished the G4 vs. duplex selectivity (Artese et al. 2013). Subsequently, two studies based on a global thermodynamic analysis using a combination of methods such as isothermal titration calorimetry (ITC), CD and fluorescence spectroscopy, gel electrophoresis, and molecular modelling reported on the fine dissection of energetic contributions of the interaction of PhenDC3 with the telomeric G4 structure (Bončina et al. 2015a, b). As expected, electrostatic forces, hydrogen bonding, and polar interactions significantly contributed to the strength of the binding, but most interestingly the data indicated that the hydrophobic contribution was a strong determinant of the binding event, since the major contribution to the Gibbs free energy change (ΔG0) was the component corresponding to dehydration of the interface between ligand and G-quartet (ΔG0hyd). This result is fully consistent with the large size of the aromatic surfaces involved in the stacking of PhenDC on the terminal G-quartet revealed by the NMR investigation. This also holds true for a bromo derivative of 360A (PDC 360A-Br) used in the same study (Bončina et al. 2015a), which suggests that the PDC series interacts via the same external stacking mode implying a broad overlap of the ligand and G-quartet hydrophobic surfaces as predicted (Chart 2). Surprisingly, there are no structural data available so far to confirm this putative, though likely, interaction of PDC 360A. Nonetheless, a recent NMR investigation has provided information on this matter with a particular quadruplex structure (cf. section “Binding to Alternative Quadruplexes (VK2)”). This study indicated also that structural rearrangement of the G4 structure is occurring during the process of ligand binding corroborating previous observations made by Gabelica et al. (2015). Ligand-induced conformational rearrangements of telomeric G-quadruplexes: In 2015, Gabelica et al. published a native mass-spectrometry (ESI-MS) study coupled with CD analysis which revealed that PhenDC3 and 360A were able to induce a conformational rearrangement of the telomeric G4 upon binding, accompanied by ejection of one K+ cation that implied a disruption of the external G-quartet (Fig. 6) (Marchand et al. 2015). This effect is specific for the telomeric quadruplex-forming sequence (Lecours et al. 2017) that is characterized by high polymorphism and dynamics involving multiple equilibria between various G4

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

785

Fig. 6 (a) Mass spectra (ESI-MS) of 5 μM 24TTG alone (top), or incubated for 3 days with 5 μM of ligands: (middle) 360A, (bottom) PhenDC3. (b) CD spectra in the absence (“Hybrid-1”) and in the presence of ligands PhenDC3, 360A, PDS (“Pyrido”), L2H2-6ODT (“L2H2”) (Chung et al. 2013), and TrisQ (Bertrand et al. 2011). (c) Schematic representation of conformational rearrangement of Hybrid-1 structure (24TTG) induced by PhenDC3. (Adapted with permission from Marchand et al. (2015). Copyright 2015 American Chemical Society)

conformations (Type 1 and 2 hybrids, parallel, antiparallel, and two-quartet Type hybrid) but also prefolded intermediates (G-triplex, G-hairpin) (Aznauryan et al. 2016; Frelih et al. 2020). Worth mentioning is the fact that weak binders with low specificity (Bončina et al. 2015a) or strong binders but devoid of flexibility and H-bonding capacity (e.g., TrisQ) do not induce conformational changes of telomeric G4s. Conversely, conformational switching has also been observed for PDS that harbors the same pyridodicarboxamide core as PDC (cf. Chart 3e), as well as numerous functionalized bisquinolinium derivatives (Part 2) or PDC analogues (Part 3). Collectively, this suggests that H-bonding involving the dicarboxamide linkers together with π-π stacking of quinoline (or quinolinium) units with G4 residues are strong determinant of the ligand-induced conformation rearrangement of telomeric G4s. ESI-MS monitoring of K+ ejection with a variety of G4s has shown that this phenomenon is not observed with parallel topologies, but can occur in marginal cases (e.g., bimolecular antiparallel G4 ([G4T4G4]2 or G4T4G4)) (Lecours et al. 2017). Two limiting mechanisms can be evoked to rationalize the ligand-induced K+ ejection and conformational rearrangements, namely, (i) induced fit, in which the ligand binds to the predominant, free quadruplex conformation causing subsequent conformational change, and (ii) conformational selection, where the ligand selectively binds to a weakly populated G4 conformation, with eventual conversion to another, ligand-bound G4 conformation. The latter hypothesis is currently supported by several independent observations. In particular, a single molecule-

786

D. Verga et al.

based study (smFRET) (Aznauryan et al. 2021) suggested that G4 ligands might proceed through binding to prefolded intermediate states like G-triplex, G-hairpins, which in fine may lead to conformational selection of otherwise poorly populated G4 conformations. This conclusion is corroborated by an SPR study reporting on the strong binding of PhenDC3 and PDS to a constrained G-triplex (Kd ¼ 5 nM) (Bonnat et al. 2019). Finally, a recent NMR analysis evidenced an unprecedented intercalative binding mode of PhenDC3 into the telomeric G4 causing hybrid to antiparallel conversion and most likely resulting from conformational selection (Ghosh et al. 2022). Of note, PhenDC3 and 360A can induce formation of G4 structures even in absence of K+ cations whereas PDS does not. This molecular chaperone activity initially evidenced by gel electrophoresis using tetramolecular and bimolecular quadruplexes (Cian and Mergny 2007) has been then confirmed by CD spectroscopy in other studies (Lecours et al. 2017; Ghosh et al. 2022). Collectively, these studies shed light on the remarkable G4 specificity of bisquinolinium compounds, which have been instrumental to gain insight on the complex and multiple pathways underlying the conformational exchanges of telomeric quadruplexes. Whether other G4s exhibit also conformational dynamics modulated by ligands remains to be explored (Lecours et al. 2017).

Binding to Alternative Quadruplexes (VK2) In 2020, Plavec et al. reported on the NMR structure of the interaction of PDC 360A with a particular case of quadruplex structure adopted by the VK2 G-rich sequence (50 -G3AGCGAG3AGCGAG3AGCGA-30 ) (Kotar et al. 2020). VK2 originates from the regulatory region of the PLEKHG3 human gene associated to autism and forms a quadruplex structure based on the assembly of GAGA and GCGC quartets, belonging to a family of four-stranded structures identified recently. The VK2 fold comprises GG, GC, and GA base pairs that are coplanar, thus forming quartets and exhibiting a central cavity, but is not stabilized by K+ ions (Fig. 7). The NMR analysis revealed the formation of a well-defined VK2–360A complex with a 1:1 binding stoichiometry, in which the ligand intercalates in the central cavity between GAGA and GCGC quartets. The numerous NOE signals allowed the authors to assess the position of the ligand, in particular that of the quinolinium moieties. These revealed to adopt a V-shaped conformation enabling the interaction with 2–3 base residues in each quartet. Upon binding, the ligand strongly stabilizes the structure (ΔTm ¼ 18  C), while the topology of VK2 remains unchanged. The authors stated that the binding is tightly regulated by the size of 360A, as PhenDC3 appeared too bulky to intercalate into this structure. Of note, this mode of interaction is reminiscent of intercalation into duplex DNA but was shown for the first time with a four-stranded structure, thus contributing to enlarge the repertoire of G-rich genomic loci potentially targeted by bisquinolinium ligands (at least, of the PDC family).

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

787

Fig. 7 (a) Model of VK2–360A interaction showing the intercalation of the ligand (green) between GAGA and GCGC quartets (guanines, blue; adenines, red; cytosines, yellow). (b, c) Top views showing the V-shape of 360A overlapping the two neighboring quartets: C6G13C6#G13 quartet (b) and G7A12G7#A12 quartet (c). (Copyright 2019 Kotar et al. (2020). Published by Wiley-VCH Verlag GmbH & Co. KGaA)

Biological Effects of Bisquinolinium Ligands Telomeric Effects Human telomeric G-repeats (TTAGGG) form “bead-on-a-string” quadruplex structures that, together with the importance of telomeres in cancer and aging, have made them a paradigm for the design of G4 targeted drugs. First-generation G4 ligands were initially developed as indirect inhibitors of telomerase since G4 have been proven to inhibit telomerase activity in vitro. In particular, this is well documented for PDC 360A and PhenDC3 which are indeed strong (nanomolar) in vitro inhibitors of telomerase (Cian et al. 2007). Nonetheless, this notion has been challenged when using cellular models, and it soon appeared that the action of bisquinolinium compounds (and more broadly, G4 ligands) in cells might be independent of telomerase inhibition (De Cian et al. 2008). Furthermore, this is consistent with recent findings showing that telomerase has a G4-resolvase activity that is not inhibited even by potent G4-stabilizing ligands such as PhenDC3 (Paudel et al. 2020). Conversely, G4 ligands were found to induce profound telomeric dysfunctions through disorganization of the telomeric DNA structure and its capping by proteins. This ligand-induced telomeric instability is recognized to contribute to anticancer properties featured by this class of compounds; in this regard, 360A represents the prototype of telomere-targeting agents (together with RHPS4 and BRACO19). Since extensive works have been carried out on this subject, this part will not be detailed; instead, a brief summary of the main trends observed for PDC 360A is proposed. The telomeric effects of PDC 360A have been thoroughly explored by the groups of Riou and Boussin through seminal works (Granotier et al. 2005; Granotier and Boussin 2011). First, it was shown that 360A is localized at telomeres using a tritium-labelled PDC 360A (Granotier et al. 2005). The latter was incubated in live cells (glioma cell line T98G), and then metaphase chromosome spreads were

788

D. Verga et al.

Fig. 8 (a) Structure of 3H-360A showing 3H-labelled methyl groups (red). (b) Autoradiography of metaphase chromosome spreads from T98G cells cultured with 3H-360A. Arrows indicate silver grains on the terminal (black) or interstitial (red) chromosomal regions, bar ¼ 10 μm. (c) Densities of the terminal (T) and interstitial (I) regions. Silver grains were counted in untreated control and 3 H-360A treated cells; T values are systematically and significantly greater than I values. (Copyright 2005, Oxford University Press (Granotier et al. 2005))

prepared and submitted to autoradiography. As shown in Fig. 8b, the drug was found distributed both at chromosome ends (telomeres) and at interstitial regions, and its presence increased as a function of incubation time, which is consistent with nuclear accumulation of the compound (Fig. 8c). In all cases the labelling density was significantly higher at terminal regions in consistency with preferential binding of 3 H-360A to telomeres. Of note, a similar distribution was observed in normal cells. This pioneering work represents the first attempt of G4 ligand localization on metaphase chromosomes and has been followed by a parallel study based on secondary ion mass spectrometry chemical imaging (nanoSIMS, see below) (Verga et al. 2017). In parallel, investigations in cells have shown that PDC 360A induces rapid apoptosis both in telomerase-positive cancer cell lines (glioma: T98G, CB193, U118-MG) and in telomerase-negative ALT cells (osteosarcoma: SAOS-2). This effect is independent of telomere length (there is no telomere shortening) but is rather attributed to rapid degradation of the single-stranded telomeric overhang. 360Ainduced apoptosis is associated with capping alteration which was evidenced by displacement of telomeric binding proteins such as TRF1/TRF2 and hPOT1 (Fig. 9). Moreover, 360A treatment was shown to activate extensive DNA damage response at telomeres, likely to result from the exposure of telomeric DNA. This was assessed using colocalization experiments between DNA Damage Response (DDR) markers (γH2AX and 53BP1 foci) and PNA telomeric in situ hybridization (telo-FISH). Subsequently, it was shown that deficiency in repair pathways (both ATR and ATM) strongly enhanced sister telomere fusions and telomere aberrations. Most importantly, these studies evidenced that 360A is acting selectively in cancer cells since telomere aberrations are not detected in primary blood lymphocytes upon drug treatment. Since telomere localization in normal cells is also observed by autoradiography, this suggests that the selective toxicity of 360A is not due to preferential binding to telomeres but rather suggest a particular sensitivity and/or structural status of telomeric domains in cancer cells.

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

789

Telomeric effects of PhenDC3 have been less investigated, although its telomeric localization on metaphase chromosomes has been visualized using nanoSIMS. In 2017 Verga et al. (2017) reported on the synthesis of Br-PhenDC3 (Fig. 10) to study G4 ligand distribution on metaphase chromosome spreads by using a chemical imaging technique known as nanoscale secondary ion mass spectrometry (nanoSIMS). The latter allows the simultaneous detection of halogens together with other

Fig. 9 (a) Telomere aberrations revealed by CO-FISH in human cancer cells. Telomere lagging strand is labelled in red and leading strand in green. From left to right: representative examples of a control chromosome, a sister telomere fusion, a sister telomere loss involving the lagging strand, and a telomere double involving the lagging strand. (b, c) TRF2 delocalization from the telomere revealed by immunostaining with anti-TRF2 antibody (green) in CEM1301 cancer cells treated for 8 days with 5 μM 360A or 0.05% DMSO. Nuclei and metaphase chromosomes were counterstained with DAPI (blue). Histograms show the mean number of TRF2 foci per nucleus (b) and the mean number of telomeres on metaphase chromosomes with TRF2 foci (c). (Copyright 2011 C. Granotier and F. D. Boussin. Adapted from Granotier and Boussin (2011); originally published under CC-BY 3.0 license. Available from DOI: https://doi.org/10.5772/24130)

Fig. 10 NanoSIMS visualization of Br-PhenDC3 (left) in chromosome spreads. (a) Image obtained by merging significant 81Br (red) and 31P (green) signals in both plain nucleus and chromosome spreads. Scale 4 μm. White arrows indicate telomeric ends. (b) Image of the single chromosome indicated by the yellow arrow in (a): from left to right 31P signal, overlay 81Br and 31P signal and 81Br signal. (Copyright 2017 Verga et al. (2017))

790

D. Verga et al.

elements present in biological samples (N, P, S, C). This technique is based on an energetic primary ion beam that hits the sample surface, causing emission of secondary ions which are then collected and guided towards a mass spectrometer and a set of detectors recording their intensity in parallel. By scanning the primary beam on the sample surface, a chemical map can be obtained, providing elemental and isotopic information of a biological compartment or structure. The choice to gear position 5 of the phenanthroline core with a bromine atom was driven by necessity to avoid perturbations upon the interaction with G-quartets. Indeed, Br-PhenDC3 retained PhenDC3’s affinity and selectivity for G4s, and its cytotoxicity was not significantly modified. Therefore, it was postulated that the two compounds should display similar behavior in cells and a comparable genomic distribution. After incubation of cells with Br-PhenDC3 at low concentration (0.3 μM) for 48 h, cells were fixed, and metaphase chromosome spreads were prepared on silicon wafers. NanoSIMS images obtained from the analysis of statistically significant 81 Br signal from Br-PhenDC3, originated from both entire nuclei and metaphase chromosomes, were merged with the 31P map, identifying the presence of DNA. Interestingly, the compound was not uniformly distributed within the nucleus, and metaphase spreads highlighted the presence of Br-PhenDC3 in the telomeric regions of almost all chromosomes as well as in interstitial regions, a distribution that was significantly differing from one chromosome to another (Fig. 10). Remarkably, the high spatial resolution of nanoSIMS permitted to compare the precise localization of Br-PhenDC3 on sister chromatids of five different chromosomes, showing that the positions of most 81Br-peaks overlap or are very close. Altogether, these results suggested that Br-PhenDC3 binding is not random: it differs from one chromosome to another but is similar on sister chromatids; this behavior is fully consistent with the presence of specific DNA domains bound by the G4 ligand. In line with these observations, one study reported on the capacity of PhenDC3 to slow down replication fork progression at telomeres through stabilization of G4 (Drosopoulos et al. 2015).

Genetic Instability and Inhibition of Helicases Besides telomeres, human minisatellites represent another example of G-rich sequences composed of tandem repeats with a length of up to several Kb, which are potentially able to form clusters of G4 structures. These sequences that are predominantly located at pericentric and subtelomeric regions are highly dynamic and known to pose challenges to replication. In a seminal work A. Nicolas et al. explored the influence of G4 structures on the rearrangement (i.e., size variation upon replication) of the human minisatellite G-rich sequence (CEB1). CEB1 sequences (39 repeats) were inserted into a yeast (Saccharomyces cerevisiae) chromosome providing an easy-to-handle system capitalizing on the advantages of yeast genetic (small size of genome, knock-down of single genes, etc.) (Fig. 11a). The quantification of the % of rearrangement (also called genetic instability) is taken as a “G4 phenotype” and used as a readout: the more G4 forms, the higher the genetic instability. The authors used a combined chemical-genetic approach based on PhenDC3 treatment and deletion of the helicase Pif1, as both conditions are

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

791

Fig. 11 (a) Wild-type sequence of CEB1 minisatellite (CEB1-WT); the G4 motif is underlined. This sequence, repeated 39 times and likely to form a G4 cluster (schematically represented), was inserted in the chromosome VIII of S. cerevisiae. (b) Southern blot analysis of CEB1 length in WT cells treated for eight generations with PhenDC3 (10 μM), or in Pif1Δ cells. CEB1-Gmut is the mutated sequence (50 -GCGCTGAGCGGCGAGTGAGAGTGGCCTGCGGAGGTCCCT-30 ) unable to form G4 and used as negative control. (Copyright 2010, Oxford University Press (Piazza et al. 2010))

known to promote formation and induce the persistence of quadruplex structures in vivo. Remarkably, wild-type cells treated with PhenDC3 showed strong rearrangements of CEB1, which is reminiscent of the phenotype observed in Pif1depleted variants (Fig. 11b). In addition, PhenDC3 did not affect the mutated sequence CEB1-mut that is unable to fold into G4. This first study confirmed that PhenDC3 was able to act in cells site-specifically by stabilizing and increasing the lifetime of G4 conformations, thus preventing their processing by the helicase Pif1 (Piazza et al. 2010). This effect was not observed for weaker binders such as NMM, indicating that the high G4-binding activity of PhenDC3 is a prerequisite to reach G4 targets and probe G4 biology. Subsequently, using another minisatellite sequence (CEB25, Fig. 12a), it was shown that the G4-induced instability is strongly dependent on the sequence context. The CEB25 sequence forms a parallel G4 in vitro with a long 9-nt central loop

792

D. Verga et al.

Fig. 12 (a) G4 structure and sequence of CEB25 variants differing by the central loop length, from 9 nt (CEB25-WT) to 1 nt (CEB25-L111(T)), and PAGE analysis showing the frequency of rearrangement of the WT and 1 nt loop G4 after treatment by PhenDC3 or Pif1 deletion (bottom). (b) Percentage (%) of genetic instability correlated with thermal stability (Tm,  C) of the panel of CEB25 variants differing by central loop length (1–9) and/or central loop sequence (T, C, A) after treatment by PhenDC3 or Pif1 deletion. (Copyright 2015 Piazza et al. (2015))

(Fig. 12a) (Amrane et al. 2012) but is not able to exert genomic instability in yeast. Conversely, the systematic variation of the central loop length (from 9 to 1 nt) demonstrated that short-loop CEB25 quadruplexes (i.e., 3 nt or less, preferentially containing pyrimidine bases), are more prone to trigger genomic instability (Fig. 12) (Piazza et al. 2015). Since short-loop G4s exhibit a significantly higher thermodynamic stability, this observation establishes a robust correlation between G4 thermal stability and higher rearrangement frequency of the minisatellite sequences in yeast. These results strongly suggest that highly stable G4s are strongly suspected to drive genomic instability in vivo. The similarity between the response to a drug treatment and a genetic modification is called phenocopy. This phenomenon observed for the first time with PhenDC3 and Pif1 deletion in budding yeast is remarkable and lends strong support to site-specific pharmacological action of G4 drugs. Moreover, this work evidenced the heterogeneous behavior of G4 demonstrating that only a subset of G4 sequences identified in vitro are able to fold in vivo. This observation, de facto, restricts the G4

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

793

consensus sequence to short loop G4 and is in stark contrast with the large number of G4 forming sequences identified by both bioinformatics and various sequencing methods, which take into consideration G4s with loop length of up to 25 nucleotides (Puig Lombardi and Londoño-Vallejo 2019; Chambers et al. 2015). Supporting this notion, a follow-up study by Nicolas et al. showed that thermodynamically stable G4s are depleted in genomes across species in consistency with their high genetic instability (Puig Lombardi et al. 2019). In parallel, a number of studies, carried out by independent groups in various cellular and biochemical contexts, demonstrated the strong potency of PhenDC3 to inhibit helicases in charge of G4-processing such as FANCJ (Bharti et al. 2013), BLM (Drosopoulos et al. 2015), Pif1 (Mendoza et al. 2015), and RHAU (Gueddouda et al. 2017). Although the molecular mechanism underlying the inhibitory effect is not completely deciphered, it may likely proceed via additional stabilization of the G4 substrate; the latter may prevent G4 unfolding and generate potential competitive binding affecting substrate recognition. In this line, PhenDC3 has been shown to inhibit the association of the G4-binding protein Nucleolin to CEB25 G4 with a much higher efficacy as compared to a panel of benchmark ligands including PDC 360A, PDS, and RHPS4, which strongly suggests a direct relation between the biological impact of G4 ligands and the strength of interaction with G4 (Saha et al. 2020). More recently, Sabouri et al. reported that G4s occur in the genome of the yeast S. pombe and that G4 stabilization upon PhenDC3 treatment impedes replication and induces single-strand DNA lesions (Obi et al. 2020). The authors clearly stated that the use of PhenDC3 improved the detection of G4 structures through Chip-seq experiments. In parallel, the involvement of G4 in the initiation of replication in mammalian cells has been evidenced by Méchali et al. (2019) using PhenDC3 as G4 probe.

Miscellaneous DNA- or RNA-Related Effects Bisquinolinium compounds have been utilized in transcriptomic analysis to determine how the cell expression profiles are affected by G4 stabilization. A first study by Hartig et al. was carried out in HeLa S3 (human cervical cancer) cells treated by PhenDC3 or 360A and was based on the microarray technique (Halder et al. 2012). A second report by Maizels et al. was based on RNA-seq analysis using HT1080 (human fibrosarcoma) cells treated by PhenDC3 (Gray et al. 2019). Although the two studies have been conducted in different conditions, similar transcriptome-wide change patterns were observed upon PhenDC3 treatment. Indeed, after 48 h of drug incubation, 57–58% of genes were found upregulated versus 43–42% downregulated, with a total number of affected genes ranging from 2686 (microarray study) to 1745 (RNA-seq study). The differentially expressed genes were related to the occurrence of a G4 motif in their promoter: thus, upregulated genes carry significantly more G4 sequences within their promoters than downregulated or unaltered genes, whereas depletion of G4 motifs characterized the promoters of poorly expressed genes. Firstly, these results indicated G4-mediated effects of PhenDC3 on transcription, in consistency with the specific action of this compound.

794

D. Verga et al.

Secondly, they suggested that G4 ligands facilitate transcription, thereby challenging the notion that G4 are negative regulators of transcription (“roadblocks”). PhenDC3 treatment was also shown to alter splicing at genes bearing G4-motifs at the 50 -end of intron 1 on the non-transcribed strand, which reflects ligand interaction with pre-mRNA and provides support for the role of G4 in the regulation of splicing. Finally, PhenDC3 treatment significantly impaired cell division genes which may result from replication stress induced by G4 stabilization. Remarkably, among the numerous genes upregulated by PhenDC3, pathways associated with heme and iron homeostasis have been strongly affected. Especially striking was the 30-fold upregulation of the HMOX1, a gene encoding for the heme oxygenase 1 protein, an essential enzyme of heme degradation. Thorough analysis based on comparison with responses of cellular pathways to exogenous hemin treatment and transcriptomic data reported for other G4 ligands showed significant overlap for deregulation of four genes including HMOX1 and three other genes (FTH1, GCLM, and NQO1) which all share a common regulator NRF2 that is stabilized by heme to activate transcription (Fig. 13a). These data led to the conclusion that PhenDC3 should out compete heme in cells for G4 binding, thus leading to the activation of genes related to heme catabolism (Fig. 13b). This study unraveled a new function of G4 DNA that could sequester heme in cells to limit its high endogenous toxicity. Of note, similar trends were observed with 360A in the microarray-based study, but with a lower number of genes affected as compared to PhenDC3 (650 vs. 2686). In the whole, the two studies indicated that the bisquinolinium compounds appear to

A)

PAEC + Hemin Up

HT1080 + PDC3 Up

B) OFF

2

8

2466

4 0

2

12

PhenDC3

HMOX1 FTH1 GCLM NQO1

ON

PMVEC + Hemin Up

Fig. 13 (a) Venn diagram of common genes significantly upregulated (log2(fold change) > 0.5; FDR < 0.05) in PAEC and PMVEC endothelial cell lines treated with hemin, and HT1080 epithelial cells treated with PhenDC3 (Ghosh et al. 2011). (b) Model for sequestration of heme by G-quadruplexes and displacement by PhenDC3 that releases heme to induce genes involved in heme homeostasis and transport. (Copyright 2011 Ghosh et al. (2011))

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

795

Fig. 14 Schematic representation of the various roles of G4 structures deciphered from transcriptomic studies using bisquinolinium compounds. (PhD thesis Puig Lombardi 2019; Puig Lombardi et al. unpublished)

interact both with DNA and RNA quadruplexes to perturb gene expression and accentuate their physiological roles as schematized in Fig. 14. A third study using PhenDC3 and PDS and aiming at the discovery of genes and pathways involving G4s was reported in 2019 by Balasubramanian et al. (2019). This genome-wide screening study is based on an original strategy combining shRNA silencing with G4-stabilizing ligands to identify human genes promoting cell death in a G4-dependent manner as well as genetic vulnerabilities to G4 ligands. The study, performed in A375 human melanoma cells, led to the identification of 758 genes whose silencing induced vulnerability to G4 ligands, called G4 sensitizers (204 for PDS, 458 for PhenDC3, and 101 common to both ligands). Further bioinformatics data mining-based analysis uncovered five pathways enriched in G4 sensitizers: cell cycle, ribosome, spliceosome, ubiquitin-mediated proteolysis, and DNA replication. Classification of molecular function by gene ontology showed all genes related with DNA and RNA functions, consistent with PDS/PhenDC3 nucleic acid binding. Analysis through protein domains indicated enrichments in helicase C-terminal domains, RNA recognition motifs (RRM, RBD, RNP), and DNA binding domains (zinc fingers, BZip, and HMG boxes). Enrichment in multifunctional ATPase domains and ubiquitin hydrolase are more unexpected as not previously known to be affected by G4. Search in the COSMIC (genes causally implicated in cancer) database showed 50 cancer-associated genes. Among these are the DDR cluster BRCA1-2 and their interacting tumor suppressor partners PALB2 and BAP1, cluster of chromatin remodelers SMARCA4, SMARCB1, SMARCE1 and TOP1, ATRX, FUS that have been already identified as G4 related genes.

796

D. Verga et al.

A second screen with a custom G4-focused shRNA panel uncovered 290 G4 sensitizer genes among which 40 were common to both ligands and are mainly associated to DNA/RNA binding processes (chromatin modification, replication, transcription, translation) plus ubiquitin. Druggable genome interaction database and clinically actionable genome analysis of these 40 genes gave 12 druggable genes including BRCA1, CHEK1, CDK12, TOP1, and PDKP1, common to both classifications and opening up new possibilities to cancer therapies based on vulnerabilities to G4 ligands. Same screen was done in HT1080 cells. Collectively, evaluation of the data from all screens revealed the four genes that were repeatedly found in both cell types and with both ligands: BRCA1, TOP1, DDX42, and GAR1. The take-home message of this comprehensive study is that these four genes represent genetic vulnerabilities to G4 ligands and open up to future possibilities for therapeutic development.

Functionalized Bisquinolinium Ligands for Detection and Manipulation of G4 Structures Biotinylated PDC and PhenDC3 Derivatives A biotin-tagged derivative of PDC 360A (Chart 5) was designed and synthesized by Teulade-Fichou, Mergny et al. back in 2011, and used as a target for aptamer generation through in vitro systematic evolution of ligands by exponential enrichment (SELEX) technology (Renaud de la Faverie et al. 2011). PDC-biotin retained a significant G-quadruplex stabilization potential (ΔTm ¼ 30.8  C vs. 34.1  C for 360A), and competition experiments performed in the presence of a doublestranded competitor evidenced little to no effect on the stabilization. The same trends were observed using G4-FID, suggesting that biotin functionalization of 360A does not affect its G4 binding properties. However, among 80 sequences identified from SELEX experiments, only five exhibited a G4-prone motif. Unexpectedly most sequences did not follow the G4 consensus motif and exhibited poor affinity (with micromolar Kd), thereby highlighting the possible biases of this approach. To gain more insight into G4–ligand interactions, Sugiyama et al. directly visualized, at a single-molecule level, the complexes formed between PDC-biotin and G-quadruplexes by using DNA origami and atomic force microscopy (AFM) (Rajendran et al. 2014). The DNA origami frame allows to incorporate two duplexes containing G-G mismatch tracts and to visualize their assembly into a tetramolecular G4 structure. In the absence of ligand, the two duplexes do not interact and adopt a parallel configuration. Conversely, in the presence of the ligand (PDC-Biotin), the two duplexes associate to form a tetramolecular G4 structure characterized by an X-shape (Fig. 15). The presence of PDC-biotin on the G4 structure, located in the center of the X-shape, can be visualized through the generation of a bulky biotin–streptavidin (STV) complex easily traced out by high-speed AFM (Fig. 15) (Rajendran et al. 2014). All the experiments evidenced the presence of STV in the

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

797

Chart 5 Biotin-tagged 360A (PDC-biotin) and PhenDC3 (PhenDC3-biotin) reported by Mergny et al. (2011) and Teulade-Fichou et al. (unpublished data)

Fig. 15 Schematic representation and AFM images of a DNA origami frame incorporating the duplexes containing six G-G mismatches in the middle (left). G4 folding is induced by the presence of PDC-biotin (center), and the biotin residue recruits a bulky streptavidin (STV) molecule (right). (Adapted with permission from Rajendran et al. (2014). Copyright 2014 The Royal Society of Chemistry)

center of the X-shape structure, clearly indicating that the binding of the ligand to G4 structures promotes their formation. From these results, the authors hypothesized that, in solution, G4 ligands might initially bind intermediate structures (G-triplex, G-hairpin) and thus induce stepwise folding into G4s. This hypothesis was verified

798

D. Verga et al.

in a follow-up study by the same authors and is fully consistent with more recent observations (Aznauryan et al. 2021; Bonnat et al. 2019; Ghosh et al. 2022). More recently, Teulade-Fichou et al. synthesized biotinylated PhenDC3 (PhenDC3-biotin, Chart 5) aiming to perform G-quadruplex pull-down experiments. Although the addition of the biotin sensibly affected the G4 binding of the compound, PhenDC3-biotin conserved a good affinity and selectivity towards G4. Pull-down experiments performed with 32P-radiolabelled oligonucleotides and STV-coated magnetic beads pre-saturated with PhenDC3-biotin displayed an efficient and selective capture of G4 structures characterized by short loops (30–40% with 22AG, c-myc, and CEB25L121T), but were unable to retrieve long-loop structures (20 μM) induced a decrease in mtDNA copy numbers over time, suggesting mitochondrial replication stalling produced by direct targeting of G4 mtDNA. On this basis, the authors concluded that cytoplasmic localization of PhenDC3 was responsible for the effects produced on mitochondrial functions. More puzzling is the claim that PhenDC3 does not enter the nucleus, which is in stark contradiction with most studies reporting on the localization of PhenDC3 in nuclear DNA (Verga et al. 2017; Lefebvre et al. 2017) and its successful use as a G4-DNA probe for genomic and transcriptomic analysis both in yeast and mammalian cells. It might well be that PhenDC3, like most lipophilic cations (and thus, many DNA-interactive molecules), is partially trapped in mitochondria in a non-specific manner. This may influence its cellular activity to various extents but does not constitute the major cellular effect of PhenDC3. Since several controls are missing in this study, its conclusions should be interpreted with more caution.

G4 Cross-Linking and Alkylating Agents A series of PDC derivatives tethered to photoactivatable units, namely, benzophenone (Bp) and 4-azido-2,3,5,6-tetrafluorobenzoic acid (N3), which can be excited by UVA irradiation to generate highly reactive intermediates able to form a covalent bond with G4s, was reported by Verga et al. in 2014 (Verga et al. 2014). Six different PDC derivatives were obtained by combining the two photoalkylating moieties and three linkers differing by size and nature: two diaminoalkyl chains with four (C4) and eight atoms (C8) and a diaminopoly(ethylene glycol) (PEG) chain (Chart 9a). The G4 binding affinity of the compounds was first evaluated by G4-FID assay using c-myc and the human telomeric sequence 22AG as G4 targets. Globally, all the ligands retained high binding affinity and excellent selectivity versus dsDNA, although the benzophenone in PDC-C4-Bp slightly affected the interaction. Afterwards, the ability of the compounds to form covalent bonds with G4 substrates was confirmed, and high photochemical yields were obtained at low excess of ligand (ligand/G4 ¼ 2.5). All derivatives displayed higher reactivity in K+-rich buffer with a strong dependency on the nature of the linker: highest alkylation yields were observed for C4 and PEG derivatives (26–36% in K+, 13–29% in Na+ conditions); instead, C8 derivatives showed negligible reactivity in both conditions. The stark difference in reactivity observed for C8 compounds was attributed to the highly lipophilic nature of the spacer, which may position the alkylating moiety away from the hydrated DNA structure. Competition experiments carried out with 22AG and 8 molar equiv. of dsDNA or a scrambled G-rich sequence confirmed that the reactivity was G4 structure-driven and allowed to identify PDC-C4-Bp and PDC-

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

805

Chart 9 (a) Photocrosslinking probes designed and synthesized by Teulade-Fichou et al. (2014). (b) Photocleaving ligands studied by Kerwin et al. (2015)

C4-N3 as the most selective compounds, suggesting that a short linker restrains molecular motions and efficiently promoted alkylation. Alkylation sites were identified using two different sequencing methodologies: thus, the thermally reversible nature of the benzophenone adducts permitted to analyze the adducts by 30 -exonuclease digestion, which stops at alkylated sites. Conversely, the stability of the azide products allowed the sequencing by alkaline treatment, which induces cleavage of the 50 side of alkylated bases. Remarkably, Bp derivatives selectively alkylated thymine nucleobases in the loops of 22AG, whereas N3 compounds preferentially alkylated the guanine nucleobases G10 and G14 located on the 50 external quartet (Fig. 19a). Similar studies were conducted in the presence of c-myc sequence and, quite unexpectedly, no reactivity was observed for the Bp derivatives whereas N3 derivatives generated a mixture of both retarded and accelerated migrating bands, which displayed alkylation of nucleobases located in the external quartets and the loops (Fig. 19b). Photocytotoxicity experiments conducted in MCF7 and A549 cell lines identified PDC-C4-Bp as the most active compound. In conclusion, this work described PDC derivatives able to alkylate selectively G4 structures

806

D. Verga et al.

Fig. 19 Schematic representation of the G4 structures used by Verga et al. (2014). (a) Two examples of G4 structures formed by the human telomeric sequence in K+- and Na+-rich buffers and (b) c-myc oncogene. In each structure, characterized alkylation sites are shown

in vitro after photochemical activation and highlighted the important balance existing between the nature of the alkylating moiety, the spacer and the topology of the G4 structure on the alkylation pattern, suggesting the possibility to develop specific G4 ligands preferentially targeting certain G4-forming domains. Following a similar idea, Kerwin et al. (2015) designed and synthesized a family of PDCs with DNA photocleavage properties by introducing a photocleavage agent (benzophenone, naphthalenediimides, and anthraquinones) to the position 4 of the central pyridine through CuAAC coupling (Chart 9b). Unfortunately, most of the derivatives showed poor binding properties towards G4 and no compound behaved as an efficient photocleavage agent. Other research groups focused on the development of intrinsically reactive alkylating G4 ligands based on the PDC platform. In 2016 Bombard et al. reported on the study of a new family of trans N-heterocyclic carbene (NHC)–platinum (II) complexes conjugated to PDC (NHC-Pt-PDC, Chart 10a) to selectively target telomeric G4s (Betzer et al. 2016). G4-FID assay highlighted that the presence of the NHC-Pt moiety only moderately affected the binding to 22AG and c-kit2 and instead strongly affected the binding to CEB25 and c-myc sequences; however, the selectivity over dsDNA was retained. These results, together with the wellknown telomere aberrations produced in cells by 360A, prompted the authors to investigate the platination reaction of the ligand with telomeric quadruplexes. Remarkably, NHC-Pt-PDC was able to platinate telomeric G4-DNA in vitro without the need to perform a ligand exchange at platinum, suggesting that compound binding to G4 establishes a proximity between the metal and the coordination site, favoring the metalation reaction. Additionally, the PDC core completely ruled the reactivity and G4 selectivity of the conjugate: thus, sequencing experiments identified metalation sites on both the loops and on the guanines of the external quartets. Cell experiments with ovarian cancer cell lines (A2780 and A2780cis) showed that the presence of the NHC-Pt moiety improved cellular uptake and increased DNA binding, as determined by ICP-MS experiments. Notably, DNA adducts produced by NHC-Pt-PDC are less toxic than the ones produced by the NHC-Pt moiety alone,

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

807

Chart 10 (a) Trans N-heterocyclic carbene–platinum(II) complex conjugated to 360A (NHC-PtPDC), synthesized and studied by Bombard et al. (2016). (b) Red-light photoactivatable G4 ligands PDC-A and PDC-B developed by Madder et al. (2021a). (c) Chlorambucil-conjugated 360A (PDC-C4-C) studied by Gabelica et al. (2019)

letting the authors infer about a more efficient repair pathway for the generated monofunctional products. Cell cycle progression analysis did not show any noticeable cell cycle arrest in S or G2/M phase for both PDC and NHC-Pt-PDC. The telomeric effects produced by the former can be associated with delocalization of TRF2, one of the proteins essential for telomere maintenance; therefore, the authors quantified the amount of TRF2 bound to telomeres before and after treatment with the ligand. Remarkably, PDC did not produce any effect in these conditions whereas NHC-Pt-PDC was found able to induce a 50% decrease in the number of TRF2 foci (Fig. 20). Although NHC-Pt moiety improved TRF2 delocalization from telomeres, the latter did not reach the critical level required to induce telomere shortening and TERRA transcription upregulation. In conclusion, combining PDC with the NHCPt group allowed the authors to obtain a ligand displaying a synergistic mechanism of action: PDC drives the selective targeting of G4 located on the telomeres, and NHC-Pt induces metal coordination and TRF2 delocalization. In 2021, Madder et al. (2021a) proposed a bimolecular approach for G-quadruplex alkylation by exploiting an interesting feature of pro-reactive furans that can be oxidized into a reactive keto-enol with singlet oxygen (generated in situ by photoirradiation of a photosensitizer), and react with exocyclic amino groups of DNA nucleobases. The authors designed compound PDC-A harnessing a pro-reactive furan moiety linked to the PDC core via a long spacer to maximize G4 binding properties and selectivity (Chart 10b). To validate their approach, they used G4 c-kit protooncogene and induced oxidation of PDC-A with methylene blue, a G4-binding photosensitizer capable of producing 1O2 through red-light irradiation. HPLC analysis showed a 55% conversion with the formation of one product corresponding to the monoalkylation adduct generated from PDC-A and c-kit as

808

D. Verga et al.

Fig. 20 TRF2 delocalization from the telomeres revealed by immunostaining with anti-TRF2 antibody in A2780 cells, untreated and treated with PDC 360A and NHC-Pt-PDC. (Adapted with permission from Betzer et al. (2016). Copyright 2016 American Chemical Society)

confirmed by MALDI-TOF. The systematic substitution of the c-kit loop nucleobases with unreactive thymines did not modify the HPLC profile suggesting direct alkylation of the guanines located in the tetrads; these results were also supported by molecular docking. Compound PDC-B, bearing a shorter linker, showed much lower alkylation efficiency due to reduced spatial proximity with the reactive sites. The authors concluded that the PDC functionalization did not undermine G4 binding properties and the use of a G4-binding photosensitizer allowed the local production of 1O2, avoiding competitive oxidative damages, and inducing efficient alkylation of G4 structures. Lastly, Gabelica et al. exploited a chlorambucil-functionalized PDC (PDC-C4-C, Chart 10c) to validate a new mass spectrometry methodology known as electron photodetachment (EPD), able to identify the binding regions of G4 ligands by top-down sequencing of the targeted oligonucleotide sequences (Paul et al. 2019). PDC-C4-C alkylation of 22AG was studied by PAGE analysis, and the identified alkylation sites corroborated with the EPD method, which found the ligand able to replace a potassium cation (cf. section “In Vitro Binding: Affinity, Selectivity, and Ligand-Induced Conformation Changes”) and to bind close to the central loop of 4-repeat human telomeric sequence.

Immunotagged G4 Ligands Fluorescently labelled G4 ligands have been employed to study compound distribution in cells. However, the structural modification introduced by the fluorophore may affect physicochemical properties and biological activity. Consequently, CuAAC reaction has been employed for in situ functionalization to attach a fluorophore to a G4 ligand of interest in cells after target recognition. Despite the improvement provided by this approach (cf. section “Fluorescent Derivatives for In Vitro Detection and Cellular Imaging of G-Quadruplexes”), it does not allow to overcome the detection limit associated to the resolution of classical fluorescence microscopy. Therefore, Verga et al. proposed a new visualization strategy based on specific recognition by G4 ligands coupled to antibody-based signal amplification, named G4-ligand Guided Immunofluorescence Staining (G4-GIS) (Masson et al. 2021). To achieve this goal, the authors synthesized four hapten-modified PDC conjugates and

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

809

four PDC CuAAC precursors used for in situ functionalization (Chart 11). To validate the ability of the compounds to bind G4 structures, FRET melting and G4-FID experiments were conducted in the presence of six G4 DNA forming sequences (22AG, c-Myc 22G14T-G23T, c-kit2, CEB25wt, CEB25L111T, and 22CTA). The data indicate that PDC CuAAC precursors induce G4 stabilization (29  C  ΔTm  31  C) and TO displacement (0.16 μM  DC50  0.41 μM  C) comparable to 360A, while 5-BrdU-PDC conjugates stabilize (20  C  ΔTm  23  C) and displace TO (0.33 μM  DC50  1.34 μM) in the range right below. Interaction study with two G4 RNAs (TERRA and NRAS) showed a very similar trend. Notably, the ligand functionalization did not affect the selectivity towards G4 structures as compared to dsDNA and tRNA. The efficiency of the CuAAC reaction in situ was validated in the presence of three G4 conformations: 22AG as an example of hybrid structure, and c-Myc and TERRA as examples of parallel G4-DNA and RNA structures. Without exception, the G4 structure behaves as a template during CuAAC and gathers in spatial proximity the reaction partners increasing the reaction kinetics, reaching up to 100% conversion after 1 h in the presence of the two parallel G4. Differently, the 22AG behaves as a less efficient platform for the click reaction mainly for derivatives PDC-4,2-Alk and PDC-4,3-Alk, suggesting that the presence of the lateral loops may hinder the reactivity of the shorter spacer PDC CuAAC precursors. Then the authors assessed whether anti-5-BrdU antibody was able to recognize 5-BrdU-PDC once bound to G4 targets by devising a solid-phase enzymelinked immunoassay (ELISA). PDC-4,3-BrdU was chosen as a reference compound for this assay and a panel of G4 and control sequences was used: hTelo, c-Myc, TERRA, a hairpin-duplex (hp), two mutated non-G4 forming sequences 22Agmut and TERRAmut, and a C-rich sequence ihTelo. Notably, the method allowed to

Chart 11 Structures of the PDC CuAAC precursors and 5-BrdU PDC conjugates obtained by in situ CuAAC reaction reported by Verga et al. (2021)

810

D. Verga et al.

detect PDC-4,3-BrdU exclusively when bound to the three G4 structures and enabled to determine binding constant (Kd) values lying in the low nanomolar range (Fig. 21a). The proof of concept for the G4-GIS method was then done in A549 cells as the generation of a fluorescent signal was observed for both the conjugates and their in situ functionalized counterparts. These experiments revealed preferential ligand cytoplasmic accumulation in defined regions and poor nuclear localization. Additionally, RNase A treatment identified RNA as the main target. To be highlighted, the amplification of the signal guaranteed by the antibodies allowed to increase spatial sensitivity and detect perinuclear accumulation of PDC-4,3-Alk with a generation of well-defined florescent foci (Fig. 21b). In the last part of the study, the authors performed experiments in the presence of a G4 specific antibody (BG4) and, quite unexpectedly, found that the presence of a G4 ligand bound to a specific G4 conformation may affect BG4 recognition and consequently fluorescent foci generation (Fig. 21c); these results were corroborated by ELISA, which showed significant variation of BG4 affinity to hTelo and c-Myc when bound by PDC-4,3Alk, PDS, or PhenDC3. At last, G4-GIS methodology showed the impossibility to perform colocalization experiments between BG4 and anti 5-BrdU antibody due to the spatial proximity of the two primary antibodies substrate and strong dependency of secondary antibody binding. The synthesis of the new family of immuno-tagged

Fig. 21 (a) (left) Principle of the method and (right) binding curves determined by an adapted ELISA assay, (b) G4 ligand immunofluorescence staining in A549 cells treated with 15 μM PDC4,3-Alk after copper-based click reaction, and (c) column scatter plots representing the number of BG4 foci before and after treatment with 5 and 15 μM of PDC-4,3-Alk. (Copyright 2021, Oxford University Press (Masson et al. 2021))

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

811

PDC derivatives, pursued by Verga et al., provided the research bases for the application of the G4-GIS methodology and its amplified florescent signal to study G4 ligand cellular subcompartment distribution more in depth and more generally to improve the spatial detection of biologically active small molecules in cells.

Other Bisquinolinium Derivatives as G4 Ligands Dimeric Derivatives Several dimeric derivatives of PDC 360A (Chart 12) were independently described by Bugaut et al. (2018) and Zhou et al. (2018) in 2018. The dimer (360A)2A synthesized by Bugaut et al. contains two 4-amino-PDC units linked by an aliphatic bis(carboxamide) chain. Its interaction with mono- and multimeric G4 structures adopted by the human telomeric sequence was studied by CD spectroscopy in isothermal conditions. The authors demonstrated that, while 360A interacted with the monomeric, antiparallel G4 substrate with a 2:1 (ligand/G4) stoichiometry suggesting the binding of two ligand species to the external G-tetrads, the dimer (360A)2A interacted with this substrate with a 1:1 stoichiometry that could indicate the formation of a “sandwich”-like complex, with two PDC units of the ligand stacking with the top and bottom G-tetrads. However, in the case of multimeric G4 substrates, the binding stoichiometry of (360A)2A exceeded one ligand per a G4 unit, suggesting that new binding sites, presumably corresponding to TTA linkers connecting the G4 units, were available for binding of the ligand. The authors interpreted these data in terms of an original binding model (Fig. 22a) where two PDC moieties of each ligand embraced the external G-tetrads of each G4 unit, while additional ligand species were accommodated in the pockets formed between each two contiguous G4 units. The high affinity of (360A)2A to telomeric G4s resulted in an efficient in vitro displacement of human replication protein A (hRPA), a singlestranded DNA-binding protein involved in telomere replication, from both monomeric and trimeric G4 substrates, significantly exceeding the effect of the prototype ligand 360A (Saintomé et al. 2018).

Chart 12 Dimeric derivatives of PDC 360A described by Bugaut et al. (2018) and Zhou et al. (2018)

812

D. Verga et al.

Fig. 22 Proposed binding modes of dimeric PDC derivatives to multimeric telomeric G-quadruplexes. (a) Model proposed by Bugaut et al. for binding of (360A)2A. (Adapted with permission from Saintomé et al. (2018). Copyright 2018 The Royal Society of Chemistry). (b) Model proposed by Zhou et al. for binding of 67.3b. (Reproduced with permission from Liao et al. (2018). Copyright 2018 Wiley-VCH)

The effect of (360A)2A on telomere stability was assessed by single telomere length analysis (STELA). While neither 360A nor (360A)2A treatment did not significantly modify the mean telomere length in A549 cells, both ligands increased the frequency of rare short telomeres (2.2 kb) resulting from telomere deletion events (TDEs) (Hwang et al. 2019). Specifically, the frequency of TDEs in cells treated with (360A)2A at a concentration of 5 μM was increased 3.7–3.9-fold compared to non-treated cells and 2.6–2.8-fold compared to DMSO-treated cells, whereas 360A (used at the same concentration) was less efficient (2.6–3.0-fold and 1.9–2.1-fold increase of TDEs compared to non-treated and DMSO-treated cells, respectively). These results highlight the advantages of the dimeric scaffold in terms of targeting telomeric DNA and corroborated with the stronger effect of (360A)2A on hRPA binding discussed above. Of note, the antiproliferative effects of 360A and (360A)2A in A549 cells were similar, leading to a growth arrest after 11 days of treatment with 5 μM of compounds (Hwang et al. 2019). PDC dimers linked by ethylene glycol linkers (67.3a–67.3c) were also designed for targeting multimeric G-quadruplexes formed by the telomeric sequence (Liao et al. 2018). Interestingly, compound 67.3a featuring the shortest linker did not stabilize neither the monomeric nor the dimeric antiparallel G-quadruplexes; in contrast, dimers 67.3b and, to a lesser extent, 67.3c, stabilized the dimeric antiparallel G-quadruplex in CD-melting experiments (ΔTm ¼ 21.6 and 17.5  C), albeit less efficiently than 360A employed at a twofold higher concentration accounting for the number of PDC moieties (ΔTm ¼ 23.8  C, at a ligand/G4 unit ratio of 3:1). Most importantly, dimer 67.3b (but not 67.3c) showed a marked selectivity for the dimeric G4 substrate, as its thermal stabilization effect on the monomeric telomeric G4 or monomeric parallel quadruplexes (c-myc, c-kit1, c-kit2) was negligible. Similar selectivity of 67.3b for binding to the dimeric G4 was evidenced by the values of binding constants obtained from spectrophotometric titrations and differing almost by two orders of magnitude (Ka ¼ 4.3  107 M1 for the dimeric vs. 7.0  105 M1 for the monomeric G4). Based on fluorimetric measurements with a dimeric G4 substrate containing fluorescent 2-aminopurine residues at selected positions, the authors proposed a binding model where the PDC units of two 67.3b species were

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

813

stacked with the opposite G-tetrads of two contiguous G4 units (Fig. 22b), in a quite different fashion comparing with the model proposed for the binding of (360A)2A. This difference may be due to a shorter linker in compound 67.3b, which does not allow the sandwiching of a G4 unit by the ligand. Compound 67.3b inhibited telomerase in the TRAP-LIG assay more efficiently than 360A (IC50 ¼ 5.0 vs. 9.0 μM, respectively). However, the cytotoxicity of 67.3b in both HeLa and MCF-7 cells (GI50 ¼ 9.5 and 23.9 μM, respectively) was somewhat lower compared with 360A (GI50 ¼ 6.4 and 8.8 μM upon 96-h incubation), suggesting that larger molecular size and higher charge may challenge the transport of dimeric bisquinolinium compounds into cells. Altogether, these studies demonstrate that dimerization of the PDC scaffold can lead to ligands with remarkably high affinity and selectivity towards multimeric G-quadruplexes formed by the telomeric sequence; however, these advantages can be mitigated by poor cellular uptake and drug-like properties that may hamper therapeutic applications of these compounds.

Variations of the PDC Core Following the variations of the core motif connecting the two quinolinium (or other quaternized heterocyclic) units that were explored in the early stages of G4 ligand development (cf. section “Genesis and Design of Bisquinolinium Ligands: The Preorganization Concept”) (Riou et al. 2002), a series of 1,8-naphthyridine derivatives including two bisquinolinium (3AQN, 6AQN) and one bis-pyridinium dicarboxamide (3APN, Chart 13) compounds were developed by Pradeepkumar et al. in 2012 (Pradeepkumar et al. 2012). 3AQN can be considered as an intermediate between PDC 360A and PhenDC3, both in terms of chemical structure and G4-binding properties. Indeed, 3AQN is more efficient than 360A for binding to antiparallel telomeric G4-DNA, as demonstrated by both FID assay and CD-melting experiments in Na+-rich buffer (Fig. 23). Similar to what was described for 360A and PhenDC3 (Marchand et al. 2015), interaction of 3AQN and 6AQN with the hybrid form of telomeric G4 in K+-rich buffer leads to a conformation change into an antiparallel form. Interestingly, 3AQN was about twofold more selective than 360A for telomeric G4, as compared to several promoter (parallel) G4-DNA substrates. Conversely, the bis-pyridinium derivative 3APN is significantly less efficient as a ligand for telomeric G-quadruplexes, a fact that highlights once again the advantages of quinolinium units for efficient G4 binding. Interestingly, 3APN showed significant binding and thermal stabilization of several parallel G4-DNA (c-myc, c-kit1,

Chart 13 Structures of naphthyridine derivatives developed by Pradeepkumar et al. (2012)

814

D. Verga et al.

Fig. 23 (a) FID assay of naphthyridine derivatives and 360A with telomeric G4-DNA. (b) CD melting curves of telomeric DNA in the presence of 3 molar equiv. of ligands. (Adapted with permission from Dhamodharan et al. (2012). Copyright 2011 American Chemical Society)

c-kit2), presumably through a different binding mode, demonstrating that even small changes in ligand structure may impact the binding preferences of G4 ligands. Similarly to 360A, compounds 3AQN and 6AQN induced telomere shortening in the malaria parasite Plasmodium falciparum and showed an anti-parasitic effect at low micromolar concentrations, without causing significant toxicity in human HEK293 cells (Anas et al. 2017). A series of bisquinolinium 4,7-diamino-1,10-phenanthroline-2,9-dicarboxamides 71.7a–k and 71.9a–c, representing core-substituted derivatives of PhenDC3 and differing by the nature of side-chain substituents (Chart 14), was studied by Ulven et al. in 2012 (Ulven et al. 2012). The derivatives 71.7a–e lacking ammonium groups in side chains were less efficient G4 binders than PhenDC3, a fact that was rationalized by the authors by an increased electron density of the phenanthroline moiety stemming from the electron-donating amino substituents and leading to less efficient π-π stacking of the phenanthroline core with the G-tetrad. Conversely, derivatives 71.7f–k bearing two additional quaternary ammonium groups, as well as compounds 71.9a–c endowed with guanidinium groups in side chains, turned out to be excellent G4 ligands that seemed to outperform PhenDC3 in terms of thermal stabilization of both telomeric (in K+ conditions) and promoter (c-kit2 and c-myc) G4 targets, as evidenced by FRET-melting experiments; unfortunately, a direct comparison with PhenDC3 in the conditions of this study has not been performed. Nevertheless, most of these derivatives demonstrated lower G4 vs. duplex specificity as characterized by the values of selectivity index S, with only 71.7k reaching the same level of specificity as PhenDC3 (S  0.95 with all three G4 substrates). The authors attribute the high potency of 71.7k to the conformation of the amino groups in positions 4 and 7 of the phenanthroline core, twisted out of the phenanthroline plane and thereby reducing the electron-donating effect that appears detrimental for G4 binding. Similar to other dicarboxamide ligands, 71.7k was shown to induce the conformational switching of the human telomeric G-quadruplex from hybrid to antiparallel form. Interestingly, despite being a strong G4 ligand, compound 71.7k was significantly less cytotoxic than PhenDC3

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

815

Chart 14 Bisquinolinium phenanthroline derivatives studied by Ulven et al. (2012)

in HT1080 and HeLa cells, a fact that can be attributed to increased cationic charge leading to poor cellular uptake. The bis-indole scaffold endowed with quinolinium units was extensively explored in the design of G4 ligands by Chorell et al. In their first work, Sabouri, Chorell et al. harnessed the 2,20 -diindolylmethane fragment to design two series of derivatives differing by the attachment point of 3-carboxamido-1-methylquinolinium moieties to the bis-indole core (Chart 15a), as well as one derivative with “inverted” orientation of amide groups (72.17) (Livendahl et al. 2016). Using FID assay with Thioflavin T as indicator dye, the authors found that derivatives 72.6a, 72.6b, and 72.17 bound to parallel G4-DNA (Pu24T derived from the c-myc promoter) with a similar efficiency as PhenDC3 (DC50 ¼ 0.2–0.9 μM); conversely, derivatives 72.13a and 72.13b bearing bulky phenyl substituents were much less efficient (DC50 ¼ 7–8 μM). Similar, albeit less efficient binding was observed to a parallel G4-DNA formed by a fragment of ribosomal DNA (rDNA) from the yeast S. pombe. In contrast, these derivatives showed much weaker binding to telomeric (hybrid form) G4-DNA, revealing an interesting intra-G4 selectivity. Of note, the substitution pattern of the diindolylmethane fragment (72.6a vs. 72.6b) or even the inversion of the carboxamide link (72.17) had minor impact on the G4-binding properties; in fact, 72.17 represents the most active ligand of this series. This contrast was observed with pyridine derivatives were the inversion of the amide bond (i-PDC, Chart 3b) leads to a dramatic loss of G4 affinity (De Cian et al. 2007). Upon further exploration of bis-indole derivatives, Chorell et al. studied two series of bisquinolinium carboxamides based on either rigid (74.11a–j) or flexible

816

D. Verga et al.

Chart 15 Bisquinolinium derivatives of (a) 2,20 -diindolylmethane and (b) 3,30 -diindolylmethane studied by Sabouri and Chorell (2016, 2020)

(74.12a–c) 1,10 -carbonyldiindole cores (Chart 16) (Prasad et al. 2018). In this work, the authors directly employed Taq DNA polymerase stop assay to identify the ligands selectively blocking the progression of DNA polymerase on DNA templates containing a G4-forming motif (rDNA from S. pombe) (Fig. 24). Most bis-indoles selectively inhibited the polymerase progression on the G4-forming template; however, ligands 74.12a–c, based on the rigid scaffold, also inhibited polymerase on a non-G4 containing template. From this experiment, six compounds (indicated by red arrows in Fig. 24b) were selected for a detailed biophysical assessment. These derivatives were found to moderately stabilize human telomeric and ribosomal

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

817

Chart 16 Rigid (74.11a–j) and flexible (74.12a–d) bisquinolinium derivatives of 1,10 -carbonyldiindole studied by Chorell et al. (2018)

Fig. 24 Screening of G4 ligands by Taq polymerase stop assay used by Chorell et al. (2018). (a) Sequences of ribosomal G4 and a non-G4 DNA templates and primers (red). (b) Effect of ligands on the synthesis of the full-length polymerase product with G4 and non-G4 DNA templates, assessed by PAGE. The compounds selected for detailed studies are shown with red arrows. (Adapted with permission from Prasad et al. (2018). Copyright 2018 Wiley-VCH)

(parallel) G4-DNA in CD-melting experiments (ΔTm ¼ 8–17  C, with 8 molar equiv. of ligands) and were characterized by apparent Kd values in the range of 0.5–2 μM (from FID titration data), with the flexible derivatives (74.11a, 74.11c, 74.11e) generally showing higher affinity than their rigid counterparts (74.12a, 74.12b, 74.12d). With the support of molecular dynamics simulations, the authors proposed

818

D. Verga et al.

that flexible bis-indole scaffolds could better adapt to the G4 structure, resulting in more favorable interactions and higher activity in most of the assays. Another series of bis-indole derivatives developed by Sabouri and Chorell was based on the 3,30 -diindolylmethane scaffold (Chart 15b) (Prasad et al. 2020). Through a screening by FRET-melting and Taq polymerase stop assays (both with the Pu24T sequence), the authors identified the derivatives 73.7ba and 73.7ca as most efficient G4 ligands of this series (ΔTm ¼ 17.5 and 10.4  C, respectively, at a 5 μM concentration; IC50 ¼ 0.17 and 0.24 μM in the polymerase stop assay). In contrast, the analogues 73.7aa and 73.7da, as well as the derivatives containing a non-methylated quinoline, benzothiazole, piperazine or morpholine side chains (not shown) were inactive. Inspired by these results, the authors synthesized two non-symmetrically substituted derivatives (73.5bac and 73.7bac, Chart 15b). While the former turned out to be inactive due to the presence of a non-quaternized quinoline residue, the quinolinium derivative 73.7bac was almost as active as the symmetric bisquinolinium 73.7ba (ΔTm ¼ 12.4  C, IC50 ¼ 0.19 μM in identical conditions). The intra-G4 selectivity of 73.7ba and 73.7bac was assessed by FRET melting with a panel of 11 different DNA structures and revealed that these compounds could stabilize most G4-DNA substrates, with a clear preference to parallel G4 c-kit1 and c-kit2 and a good selectivity over double-stranded DNA (Fig. 25). Finally, to investigate the effects of conformational locking on G4 binding of bis-indole derivatives, Chorell et al. developed a series of macrocyclic derivatives incorporating a 3,30 -diindolylmethane fragment and two quinolinium residues, additionally connected at positions 6 or 7 with a bis(carboxamide) linker (Chart 17a) (Das et al. 2020). FRET-melting experiments revealed that macrocycles 75.14a1, 75.14a2, and 75.14a3 linked at positions 6 of quinolinium units were efficient G4-DNA stabilizers, with ΔTm values of up to 20  C (at 5 μM of ligands), which Fig. 25 Selectivity profiling of 73.7ba and 73.7bac by FRET-melting with a panel of DNA substrates (radial plot of ΔTm values obtained with ligand concentration of 2 μM) (Prasad et al. 2020)

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

819

Chart 17 (a) Macrocyclic bisquinolinium bis-indoles; (b) model macrocyclic bisquinolinium (75.17) and the acyclic analogue (75.16) studied by Chorell et al. (2020)

is comparable to the acyclic prototype 73.7ba (Chart 17b); at the same time, the G4 vs. duplex selectivity of macrocycles was significantly better than the one of compound 73.7ba that induced a non-negligible stabilization of the duplex (ΔTm ¼ 8  C). Remarkably, the G4 stabilization effect of the derivatives 75.14a2–75.14a4 was inversely proportional to the length of the linker connecting the quinolinium residues, and the presence of amino groups in the linker (75.15a) did not bring any clear advantage. In contrast, the macrocycles connected through positions 7 of quinolinium moieties (75.14b1–75.14b4) were poor G4 binders. To verify that macrocyclization, as a general rule, improves G4-binding properties of ligands, the authors synthesized a simple bisquinolinium macrocycle 75.17 and compared it with the acyclic analogue 75.16 (Chart 17b). Indeed, while G4-binding properties of the macrocycle 75.17 (ΔTm ¼ 7.5  C with 8 μM of ligand, measured by FRET melting

820

D. Verga et al.

with Pu24T sequence) were quite modest in comparison with those of benchmark ligands (cf. Fig. 1), they were clearly superior comparing with the acyclic ligand 75.16 (ΔTm ¼ 2.5  C in identical conditions). These observations confirm the value of macrocyclization as an effective strategy to optimize G4 ligands in terms of their affinity to the target and G4 vs. duplex selectivity.

Variations in Linker Groups and Quinolinium Residues A small series of ligands termed “bisaryldiketenes” (yet irrelevant to the highly reactive diketene) was studied by Huang et al. (Peng et al. 2010); these compounds contain two quinolinium moieties linked through vinyl groups to the central acyclic (M1) or cyclic (M3–M4) ketone core (Chart 18a). These ligands, in particular M2 and, to a lesser extent, M1, efficiently stabilized several G4-DNA structures as evidenced by FRET-melting experiments (ΔTm ¼ 17–24  C); interestingly, compound M4 featuring an additional quaternary nitrogen atom in the core part was the

Chart 18 Structures of (a) “bisaryldiketenes” M1–M4 and (b) bisquinolinium divinylbenzenes (BQ) studied by Huang et al. (2010, 2013). (c) Bisquinolinium divinylbenzene studied by Czerwinska and Juskowiak (2012)

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

821

least efficient ligand of the series. All compounds demonstrated an excellent G4 vs. duplex selectivity and were able to arrest Taq polymerase on a DNA template harboring a G4-forming sequence from c-myc promoter, with IC50 values of about 1 μM for M1–M3. The interaction of ligand M2 with several G4-DNA substrates was studied by ITC experiments that revealed a major enthalpic contribution in the case of binding to parallel quadruplexes; in contrast, binding to the telomeric quadruplex was mostly entropy-driven, suggesting a different binding mode. In another study, Huang et al. designed two series of bisquinolinium ligands based on 1,3- (m-BQ) or 1,4-divinylbenzene (p-BQ) scaffolds (Chart 18b); in addition, side chains were introduced through amino substituents in position 4 of quinolinium units (except for compounds m/p-BQ-4O) (Liu et al. 2013). FRETmelting experiments revealed that m-BQ derivatives were more efficient stabilizers of both telomeric (in K+ conditions) and parallel (c-myc, c-kit1) G4 substrates than p-BQ analogues, with ΔTm values of up to 20  C (at 1 μM of ligand). Intriguingly, the FID assay revealed an opposite trend, that is, p-BQ derivatives were more efficient in displacing the fluorescent probe (with DC50 values down to 0.22 μM) than meta-substituted counterparts. In both assays, the compounds m/p-BQ-4O devoid of amine substituents were much less performant, demonstrating the “added value” of additional cationic charges in the side chains. Most good ligands of the series were able to induce a conformational change of the human telomeric G-quadruplex from the hybrid form (in K+ conditions) into a parallel fold. In addition, compound p-BQ-OHetP induced a similar change in Na+ conditions, indicative of a particularly strong affinity of the ligand to the parallel G4 fold. The authors proposed that this property is due to the linear shape of p-BQ derivatives, leading to steric clashes with lateral or diagonal loops present in hybrid or antiparallel G4 structures. Of note, these conformational changes are opposite to those induced by PDC 360A, PhenDC3, 3AQN, and other bisquinolinium dicarboxamides that favor the antiparallel G4 structure (cf. section “In Vitro Binding: Affinity, Selectivity, and Ligand-Induced Conformation Changes”). Czerwinska and Juskowiak studied the DNA binding of several arylstilbazolium derivatives including a structurally related bisquinolinium divinylbenzene 78.1 (Chart 18c) (Czerwinska and Juskowiak 2012). Competition dialysis experiments demonstrated that (E,E)-78.1 preferentially bound to the antiparallel G4 structure formed by the human telomeric sequence (Kapp ¼ 1.3  105 M1), whereas the isomer (E,Z )-78.1 was selective to the parallel G4-DNA formed by c-myc sequence (Kapp ¼ 0.9  105 M1). For both isomers, the affinity to other DNA structures (single-stranded, duplex, and triplex) was lower. Interestingly, while irradiation of (E,E)-78.1 with a 450-nm light in the absence of DNA led to a fast photoisomerization into the (E,Z )-isomer, the addition of telomeric G4-DNA resulted in a complete suppression of the photoisomerization process (Fig. 26). The authors proposed that, in addition to the binding selectivity shifting the photostationary equilibrium towards the (E,E)-form, interaction with G4-DNA leads to a rapid deactivation of the excited state of the ligand through electron transfer to guanine residues, resulting in inefficient E ! Z photoisomerization.

822

D. Verga et al.

Fig. 26 Photoisomerization of (E,E)-78.1 (10 μM) in the absence and the presence of telomeric quadruplex (20 μM) monitored with HPLC. (Reproduced with permission from Czerwinska and Juskowiak (2012). Copyright 2012 Elsevier B.V)

Chart 19 Structures of (a) bisquinolinium derivatives of 1,3-ditriazolylbenzene and (b) PDC derivatives with a variable substitution pattern of (iso)quinolinium units studied by Paulo et al. (2019, 2021b); (c) phenanthroline derivatives studies by Danac et al. (2021)

Finally, Paolo et al. studied G4-DNA binding and anti-cancer properties of two ligands containing N-methylquinolinium residues bound to a 1,3-benzene core through flexible (79.1b) or rigid (79.1d) triazole linkers (Chart 19a) (Mendes et al. 2019). These compounds were moderately efficient as G4 stabilizers in FRETmelting assay (ΔTm ¼ 12–16  C at 5 μM ligand), with the rigid ligand 79.1d showing higher stabilization and G4 vs. duplex selectivity. Both ligands, in particular 79.1d, were cytotoxic in several cancer cell lines. Interestingly, 79.1d was also moderately active towards cancer stem cell (CSC)-enriched HT29 culture (GI50 ¼ 10.6 μM), demonstrating the promising anti-cancer properties of G4 ligands.

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

823

The same team also recently explored the influence of the substitution pattern of quinolinium (or isoquinolinium) residues on G4-binding properties in a small series of PDC isomers 80.2b–80.2d (Chart 19b) (Cadoni et al. 2021b). All three isomers were able to stabilize promoter (k-RAS, c-myc) and telomeric G-quadruplexes in FRET-melting and CD-melting experiments, albeit with a much lower efficiency than PDC 360A as illustrated by the following activity trend: 360A > 80.2d > 80.2c > 80.2b. The loss of G4 stabilizing effect was particularly impressive in the case of parallel G4-DNA: e.g., ΔTm (c-myc) ¼ 35.7, 11.8, 8.0, and 2.7  C for 360A, 80.2d, 80.2c, and 80.2b, respectively (CD-melting with one molar equiv. of ligands). The reasons of this behavior could not be understood, since molecular dynamics simulations instead predicted compound 80.2b to be the most efficient G4 binder of the series. In parallel, Danac et al. aimed at a similar investigation of a structure–properties relationship in the series of phenanthroline dicarboxamides (Chart 19c); unfortunately, due to synthetic difficulties, only few compounds, including bispyridinium 81.6b and a monomethylated quinolinium derivative 81.6d could be obtained (Craciun et al. 2021). G4-binding properties of these derivatives were only rudimentarily studied; thus, both compounds were shown to stabilize human telomeric G4-DNA in CD-melting experiments (ΔTm ¼ 24 and 15  C, respectively, in the presence of 450 μM of ligand). However, these conditions represent an unusually large excess of ligand that minimizes the significance of the obtained values. The corresponding non-methylated precursors were even less potent. Considering the fact that the synthesis of cationic dicarboxamides is often troublesome, Granzhan et al. introduced the cationic bis(acylhydrazone) motif as an alternative scaffold for modular G4 ligands. In their first work, the authors synthesized 20 cationic bis(acylhydrazones) differing by the nature of the core heterocycle (Ar1) and lateral quinolinium or pyridinium units (Ar2, Chart 20), and studied them as potential ligands of G4-RNA formed in EBNA1 mRNA from Epstein–Barr virus (Reznichenko et al. 2019). Biophysical assessment (by FRET-melting and FID assays) revealed that the heterocyclic core plays a crucial role in G4-binding properties of the ligands, as demonstrated by the following activity trend: PhenDH > NaphDH ≈ PyDH  PymDH. The poor G4-binding properties of pyrimidine derivatives (PymDH) were rationalized on the basis of crystallographic analysis that revealed a linear shape, in contrast to V-shaped PyDH or U-shaped PhenDH derivatives whose structure was maintained through intramolecular hydrogen bonds between the heterocyclic core and coordinated water molecules (Fig. 27). Interestingly, within each sub-family of ligands, the derivatives with lateral pyridinium residues (PyDH1, PymDH1, etc.) systematically demonstrated less efficient binding to G4 structures comparing with quinolinium analogues, once again highlighting the importance of quinolinium groups for efficient π-stacking with G-quadruplexes. Conversely, the substitution pattern of quinolinium groups (1,4- vs. 1,6-) or the nature of side chains (R in Chart 20) had minimal impact on G4-binding properties of bis(acylhydrazones); at the same time, these substituents strongly influenced the physico-chemical properties (such as aqueous

824

D. Verga et al.

Chart 20 Structures of cationic bis(acylhydrazones) developed by Teulade-Fichou et al. (2019)

Fig. 27 Solid-state structures of PyDH1 (2I)  2 H2O, PymDH1 (2I)  2 H2O  MeCN, and PhenDH1 (2I)  2 H2O  MeCN from single-crystal X-ray diffraction analysis. Non-bound water, acetonitrile molecules, and counter-ions are omitted for clarity; green lines and labels indicate intermolecular hydrogen bonds with crystallized water molecules

solubility) and drug-likeness of G4 ligands. Altogether, most efficient compounds from the series (PyDH2 and PhenDH25) were almost as efficient G4 ligands as PhenDC3, demonstrating that bis(acylhydrazone) scaffold is a suitable motif for the design of G4 ligands. The biological activity of bis(acylhydrazones) was assessed in a cellular model recapitulating the G4-mediated immune evasion of Epstein–Barr virus. Specifically, the expression of viral EBNA1 protein is believed to be regulated through the interaction of the host protein nucleolin with G4 structures formed in EBNA1 mRNA. Ligands that bind to this G4-RNA outcompete nucleolin and relieve the inhibitory effect of the latter on mRNA translation, leading to increasing expression of the highly antigenic EBNA1 protein that reveals infected cells to the

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

825

immune system. Two bis(acylhydrazones), PyDH2 and PhenDH2, significantly enhanced EBNA1 expression in transfected H1299 cells to a level comparable with the one induced by PhenDC3 treatment, while being considerably less toxic (PyDH2: GI50 > 100 μM, PhenDH2: GI50 > 50 μM). Thus, these compounds represent promising drug candidates for interfering with the immune evasion of Epstein–Barr virus (Granzhan et al. 2020) and other herpesviruses (Zheng et al. 2022). An interesting feature of acylhydrazone derivatives is the reversibility of their synthesis in the presence of suitable catalysts in aqueous solutions, allowing the formation of dynamic combinatorial libraries (DCLs) of putative ligands. The composition of such libraries is sensitive to the presence of external species (targets) capable of binding to most affine ligands, which can be subsequently isolated and identified. Taking advantage of this property, Granzhan et al. developed a dynamic combinatorial chemistry approach to identify G4 ligands from DCLs of acylhydrazones (Reznichenko et al. 2021). In this method, a set of building blocks is incubated with a biotinylated G4-forming oligonucleotide (or a control sequence) in the presence of a catalyst enabling the formation of multi-component DCLs (Fig. 28a). A subsequent pull-down using STV-coated magnetic beads, followed by stringent washing, ligand release, and HPLC analysis, allows to rank the ligands according to their affinity to the target. The observed ranking of pulled-down ligands was in a good agreement with their G4 stabilizing effects determined through FRETmelting experiments; e.g., PyDH2 (or A2-L1-A2) and NaphDH2 (or A2-L1-A2) were identified as best ligands for parallel G4-DNA (Pu24T) from DCL1 (Fig. 28a, b). Unfortunately, this method is not free from bias, chiefly due to the difficulties related to the release of the strongest ligands (with Kd values in the low-nanomolar range) and non-specific binding of ligands to the beads. Thus, upon analysis of more complex DCLs, the authors identified non-symmetrical bis(acylhydrazone) A2-L1-A5 as a promising ligand for Pu24T quadruplex (Fig. 29a). However, FRET-melting and native mass spectrometry analysis performed with its synthetically accessible analogues (85.5a and 85.5b) demonstrated that these derivatives, while being good G4 ligands, failed to outperform the parent compound PyDH2 (A2-L1-A2). Interestingly, most acylhydrazone ligands strongly stabilized the telomeric G-quadruplex (25TAG), with ΔTm values of up to 30  C (at 1 μM of ligands, Fig. 29b). Easy synthetic accessibility, modular design, and high stability make bisquinolinium bis (acylhydrazones) a privileged scaffold in the quest of novel generations of biologically active G4 ligands. Altogether, the results summarized in this Part demonstrate that, despite massive efforts invested by numerous teams into the optimization of G4-binding properties of ligands, the potency and selectivity of the state-of-the-art scaffolds represented by PDC 360A and PhenDC3 are difficult to overmatch. Nevertheless, the structural variations explored in these works undoubtedly contribute to a better understanding of structure–properties relationships of G4 ligands and can be considered as important steps towards fine-tuning of intra-G4 selectivity and biological activity of these scaffolds, opening new avenues for G4 ligands as chemical probes and drug candidates.

826

D. Verga et al.

Fig. 28 Dynamic combinatorial chemistry approach to identification of G4 ligands developed by Granzhan et al. (Reznichenko et al. 2021). (a) Generation of a model dynamic combinatorial library of acylhydrazones (DCL1); (b) relative amounts of DCL1 components isolated via pull-down with biotinylated oligonucleotides and streptavidin-coated magnetic beads; (c) thermal stabilization of Pu24T and a hairpin control (hp2) by selected derivatives

24 A)

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

O N

Me

B)

O

N NH

HN

N

O

N

N

N N

A2-L1-A5

O

N HN

NH

N

Me

N

Me

25TAG

O

pu24T

0

N HN

N O

N

85.5a

N

O N

Me N

O N

N NH

O

N NH

HN

N

O HN Me N

Me

N

A5-L1-A5

O

N

20 10

N

N

NH

hp2

30

Me

Me O

A2-L1-A2 A2-L4-A2 85.5a 85.5b A5-L1-A5

827

N

N

85.5b

Me

N

N

Me

A2-L4-A2 22CTA

myc22

Fig. 29 (a) Structures of advanced acylhydrazone ligands identified from DCL analysis; the non-symmetric bis(acylhydrazone) A2-L1-A5 (inaccessible through preparative synthesis) was emulated by two carboxamide/acylhydrazone hybrids 85.5a and 85.5b. (b) Selectivity profiling of these ligands by FRET-melting experiments. (Adapted with permission from Reznichenko et al. (2021). Copyright 2021 The Royal Society of Chemistry)

Conclusion An enormous work has been done over the past 20 years in terms of design and engineering of G4-targeting compounds, and the enthusiasm created by this topic is illustrated by the numerous reviews published to date and which could not be cited herein due to space limitation. Consequently, the binding modes and the rules that govern association of small molecules to G4 structures have been well established, and several easy-to-implement biophysical assays have been developed to evaluate the binding thermodynamics of the reported chemical series. In the future, newcomers to the field are invited to carefully characterize the G4-binding properties of the new compounds, and in particular to carry out systematic and robust comparison with top five benchmark compounds (PhenDC3, PDS, PDC 360A, BRACO19, RHPS4), which are currently commercially available. Without this rigorous approach, the plethora of G4 ligands will continue to grow without providing real and significant improvement to the field. Targeting G4 in cells poses great challenges due to, in part, their dynamic and transient nature and their formation both genome- and transcriptome-wide. Hence, the capacity of many G4 ligands to reach G4 targets in cellular context is still an open question. The very broad cellular profiles of the “gold standard” ligands PDS and PhenDC3, as revealed by transcriptomic analysis, illustrate this complexity and the difficulties to translate in vitro G4-binding thermodynamics to in vivo effects. For instance, all G4 ligands do not have the same capacity to interfere with the endogenous G4 proteins, which undoubtedly underlie their biological response. In addition, since most G4 ligands are lipophilic cations, they may partly accumulate in mitochondria under the influence of non-specific physico-chemical driving forces, contributing to their cellular phenotypes in a

828

D. Verga et al.

non-G4-related manner. These issues and challenges explain in part the slow pace of development of therapeutic application of G4 ligands. It is clear that in the future, the development of robust chemical tools for high-resolution imaging and high-efficiency capture of G4 structures in live cells will be of utmost importance for the field.

References Amrane S, Adrian M, Heddi B, Serero A, Nicolas A, Mergny J-L, Phan AT (2012) J Am Chem Soc 134(13):5807–5816 Anas M, Sharma R, Dhamodharan V, Pradeepkumar PI, Manhas A, Srivastava K, Ahmed S, Kumar N (2017) Biochemistry 56(51):6691–6699 Artese A, Parrotta L, Alcaro S, Ortuso F, Costa G, Sissi C (2013) Open J Med Chem 3(2):41–49 Aznauryan M, Søndergaard S, Noer SL, Schiøtt B, Birkedal V (2016) Nucleic Acids Res 44(22): 11024–11032 Aznauryan M, Noer SL, Pedersen CW, Mergny J-L, Teulade-Fichou M-P, Birkedal V (2021) Chembiochem 22(10):1811–1817 Bertrand H, Granzhan A, Monchaud D, Saettel N, Guillot R, Clifford S, Guédin A, Mergny J-L, Teulade-Fichou M-P (2011) Chem Eur J 17(16):4529–4539 Betzer J-F, Nuter F, Chtchigrovsky M, Hamon F, Kellermann G, Ali S, Calméjane M-A, Roque S, Poupon J, Cresteil T, Teulade-Fichou M-P, Marinetti A, Bombard S (2016) Bioconjug Chem 27(6):1456–1470 Bharti SK, Sommers JA, George F, Kuper J, Hamon F, Shin-ya K, Teulade-Fichou M-P, Kisker C, Brosh Jr RM (2013) J Biol Chem 288(39):28217–28229 Bončina M, Podlipnik Č, Piantanida I, Eilmes J, Teulade-Fichou M-P, Vesnaver G, Lah J (2015a) Nucleic Acids Res 43(21):10376–10386 Bončina M, Hamon F, Islam B, Teulade-Fichou M-P, Vesnaver G, Haider S, Lah J (2015b) Biophys J 108(12):2903–2911 Bonnat L, Bar L, Génnaro B, Bonnet H, Jarjayes O, Thomas F, Dejeu J, Defrancq E, Lavergne T (2017) Chem Eur J 23(23):5602–5613 Bonnat L, Dautriche M, Saidi T, Revol-Cavalier J, Dejeu J, Defrancq E, Lavergne T (2019) Org Biomol Chem 17(38):8726–8736 Cadoni E, Manicardi A, Fossépré M, Heirwegh K, Surin M, Madder A (2021a) Chem Commun 57(8):1010–1013 Cadoni E, Magalhães PR, Emídio RM, Mendes E, Vítor J, Carvalho J, Cruz C, Victor BL, Paulo A (2021b) Pharmaceuticals 14(7):669 Chambers VS, Marsico G, Boutell JM, Di Antonio M, Smith GP, Balasubramanian S (2015) Nat Biotechnol 33:877 Chung WJ, Heddi B, Tera M, Iida K, Nagasawa K, Phan AT (2013) J Am Chem Soc 135(36): 13495–13501 Chung WJ, Heddi B, Hamon F, Teulade-Fichou MP, Phan AT (2014) Angew Chem Int Ed 53(4): 999–1002 Cian AD, Mergny J-L (2007) Nucleic Acids Res 35(8):2483–2493 Cian AD, Cristofari G, Reichenbach P, Lemos ED, Monchaud D, Teulade-Fichou M-P, Shin-ya K, Lacroix L, Lingner J, Mergny J-L (2007) Proc Natl Acad Sci U S A 104(44):17347–17352 Craciun A-M, Rotaru A, Cojocaru C, Mangalagiu II, Danac R (2021) Spectrochim Acta A Mol Biomol Spectrosc 249:119318 Czerwinska I, Juskowiak B (2012) Int J Biol Macromol 51(4):576–582 Das RN, Andréasson M, Kumar R, Chorell E (2020) Chem Sci 11(38):10529–10537 De Cian A, DeLemos E, Mergny J-L, Teulade-Fichou M-P, Monchaud D (2007) J Am Chem Soc 129(7):1856–1857

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

829

De Cian A, Lacroix L, Douarre C, Temime-Smaali N, Trentesaux C, Riou J-F, Mergny J-L (2008) Biochimie 90(1):131–155 Deiana M, Jamroskovic J, Obi I, Sabouri N (2020) Chem Commun 56(91):14251–14254 Dhamodharan V, Harikrishna S, Jagadeeswaran C, Halder K, Pradeepkumar PI (2012) J Org Chem 77(1):229–242 Drosopoulos WC, Kosiyatrakul ST, Schildkraut CL (2015) J Cell Biol 210(2):191–208 Frelih T, Wang B, Plavec J, Šket P (2020) Nucleic Acids Res 48(4):2189–2197 Ghosh S, Tan F, Yu T, Li Y, Adisa O, Mosunjac M, Ofori-Acquah SF (2011) PLoS One 6(3):e18399 Ghosh A, Trajkovski M, Teulade-Fichou M-P, Gabelica V, Plavec J (2022) Angew Chem Int Ed 61(40):e202207384 Granotier C, Boussin FOD (2011) Vengrova S (ed) DNA Repair and Human Health, vol 22. Intechopen Ltd UK, pp 559–596 Granotier C, Pennarun G, Riou L, Hoffschir F, Gauthier LR, De Cian A, Gomez D, Mandine E, Riou J-F, Mergny J-L, Mailliet P, Dutrillaux B, Boussin FD (2005) Nucleic Acids Res 33(13): 4182–4190 Granzhan A, Martins RP, Fåhraeus R, Blondel M, Teulade-Fichou M-P (2020) Neidle S (ed) Annual Reports in Medicinal Chemistry, vol 54. Academic Press, pp 243–286 Gray LT, Puig Lombardi E, Verga D, Nicolas A, Teulade-Fichou M-P, Londoño-Vallejo A, Maizels N (2019) Cell Chem Biol 26(12):1681–1691.e5 Gueddouda NM, Mendoza O, Gomez D, Bourdoncle A, Mergny J-L (2017) Biochim Biophys Acta Gen Subj 1861(5, Part B):1382–1388 Halder K, Largy E, Benzler M, Teulade-Fichou M-P, Hartig JS (2011) Chembiochem 12(11): 1663–1668 Halder R, Riou J-F, Teulade-Fichou M-P, Frickey T, Hartig JS (2012) BMC Res Notes 5(1):138 Hamuro Y, Geib SJ, Hamilton AD (1997) J Am Chem Soc 119(44):10587–10593 Hittinger A, Caulfield T, Mailliet P, Bouchard H, Mandine E, Mergny J-L, Guittat L, Riou J-F, Gomez D, Belmokhtar C (2004) Pat WO2004072027 (A2):2004-08-26.2004 Hunter CA, Purvis DH (1992) Angew Chem Int Ed 31(6):792–795 Hwang IP, Mailliet P, Hossard V, Riou J-F, Bugaut A, Roger L (2019) Molecules 24(3):577 Kotar A, Kocman V, Plavec J (2020) Chem Eur J 26(4):814–817 Largy E, Hamon F, Teulade-Fichou M-P (2012) Methods 57(1):129–137 Larsen AF, Nielsen MC, Ulven T (2012) Chem Eur J 18(35):10892–10902 Lecours MJ, Marchand A, Anwar A, Guetta C, Hopkins WS, Gabelica V (2017) Biochim Biophys Acta Gen Subj 1861(5, Part B):1353–1361 Lefebvre J, Guetta C, Poyer F, Mahuteau-Betzer F, Teulade-Fichou MP (2017) Angew Chem Int Ed 56(38):11365–11369 Liao T-C, Ma T-Z, Liang Z, Zhang X-T, Luo C-Y, Liu L, Zhou C-Q (2018) Chem Eur J 24(59): 15840–15851 Liao T-C, Ma T-Z, Chen S-B, Cilibrizzi A, Zhang M-J, Li J-H, Zhou C-Q (2020) Int J Biol Macromol 158:1299–1309 Liu Z-Q, Zhuo S-T, Tan J-H, Ou T-M, Li D, Gu L-Q, Huang Z-S (2013) Tetrahedron 69(24): 4922–4932 Livendahl M, Jamroskovic J, Ivanova S, Demirel P, Sabouri N, Chorell E (2016) Chem Eur J 22(37):13004–13009 Mailliet P, De Lemos E, Caulfield T, Mandine E, Petitgenet O, Renou E, Belmokhtar C, Mergny JL, Guittat L, Gomez D, Riou JF (2003) 94th AACR Meeting. LB28. Washington, DC, USA Marchand A, Granzhan A, Iida K, Tsushima Y, Ma Y, Nagasawa K, Teulade-Fichou M-P, Gabelica V (2015) J Am Chem Soc 137(2):750–756 Marchand A, Beauvineau C, Teulade-Fichou M-P, Zenobi R (2021) Chem Eur J 27(3):1113–1121 Masson T, Landras Guetta C, Laigre E, Cucchiarini A, Duchambon P, Teulade-Fichou M-P, Verga D (2021) Nucleic Acids Res 49(22):12644–12660 McBrayer D, Kerwin SM (2015) Molecules 20(9):16446–16465

830

D. Verga et al.

Mendes E, Cadoni E, Carneiro F, Afonso MB, Brito H, Lavrado J, dos Santos DJVA, Vítor JB, Neidle S, Rodrigues CMP, Paulo A (2019) ChemMedChem 14(14):1325–1328 Mendoza O, Gueddouda NM, Boulé J-B, Bourdoncle A, Mergny J-L (2015) Nucleic Acids Res 43(11):e71–e71 Neidle S (2016) J Med Chem 59(13):5987–6011 Obi I, Rentoft M, Singh V, Jamroskovic J, Chand K, Chorell E, Westerlund F, Sabouri N (2020) Nucleic Acids Res 48(19):10998–11015 Paudel BP, Moye AL, Abou Assi H, El-Khoury R, Cohen SB, Holien JK, Birrento ML, Samosorn S, Intharapichai K, Tomlinson CG, Teulade-Fichou M-P, González C, Beck JL, Damha MJ, van Oijen AM, Bryan TM (2020) eLife 9:e56428 Paul D, Marchand A, Verga D, Teulade-Fichou M-P, Bombard S, Rosu F, Gabelica V (2019) Analyst 144(11):3518–3524 Peng D, Tan J-H, Chen S-B, Ou T-M, Gu L-Q, Huang Z-S (2010) Bioorg Med Chem 18(23): 8235–8242 Piazza A, Boulé J-B, Lopes J, Mingo K, Largy E, Teulade-Fichou M-P, Nicolas A (2010) Nucleic Acids Res 38(13):4337–4348 Piazza A, Adrian M, Samazan F, Heddi B, Hamon F, Serero A, Lopes J, Teulade-Fichou M-P, Phan AT, Nicolas A (2015) EMBO J 34(12):1718–1734 Pillet F, Romera C, Trévisiol E, Bellon S, Teulade-Fichou M-P, François J-M, Pratviel G, Leberre VA (2011) Sensors Actuators B Chem 157(1):304–309 Prasad B, Jamroskovic J, Bhowmik S, Kumar R, Romell T, Sabouri N, Chorell E (2018) Chem Eur J 24(31):7926–7938 Prasad B, Das RN, Jamroskovic J, Kumar R, Hedenström M, Sabouri N, Chorell E (2020) Chem Eur J 26(43):9561–9572 Prasad B, Doimo M, Andréasson M, L’Hôte V, Chorell E, Wanrooij S (2022) Chem Sci 13(8): 2347–2354 Prorok P, Artufel M, Aze A, Coulombe P, Peiffer I, Lacroix L, Guédin A, Mergny J-L, Damaschke J, Schepers A, Cayrou C, Teulade-Fichou M-P, Ballester B, Méchali M (2019) Nat Commun 10(1):3274 Puig Lombardi E (2019) Conséquences de la stabilisation des G-quadruplex (G4) dans le génome humain; une approche multi-omique. [Doctoral dissertation, Paris Sciences et Lettres (ComUE)]. Theses.fr Puig Lombardi E, Londoño-Vallejo A (2019) Nucleic Acids Res 48(1):1–15 Puig Lombardi E, Holmes A, Verga D, Teulade-Fichou M-P, Nicolas A, Londoño-Vallejo A (2019) Nucleic Acids Res 47(12):6098–6113 Rajendran A, Endo M, Hidaka K, Thao Tran PL, Teulade-Fichou M-P, Mergny J-L, Sugiyama H (2014) RSC Adv 4(12):6346–6355 Renaud de la Faverie A, Hamon F, Di Primo C, Largy E, Dausse E, Delaurière L, Landras-Guetta C, Toulmé J-J, Teulade-Fichou M-P, Mergny J-L (2011) Biochimie 93(8):1357–1367 Reznichenko O, Quillévéré A, Martins RP, Loaëc N, Kang H, Lista MJ, Beauvineau C, GonzálezGarcía J, Guillot R, Voisset C, Daskalogianni C, Fåhraeus R, Teulade-Fichou M-P, Blondel M, Granzhan A (2019) Eur J Med Chem 178:13–29 Reznichenko O, Cucchiarini A, Gabelica V, Granzhan A (2021) Org Biomol Chem 19(2):379–386 Riou JF, Guittat L, Mailliet P, Laoui A, Renou E, Petitgenet O, Mégnin-Chanet F, Hélène C, Mergny JL (2002) Proc Natl Acad Sci U S A 99(5):2672–2677 Rocca R, Talarico C, Moraca F, Costa G, Romeo I, Ortuso F, Alcaro S, Artese A (2017) Chem Biol Drug Des 90(5):919–925 Saha A, Duchambon P, Masson V, Loew D, Bombard S, Teulade-Fichou M-P (2020) Biochemistry 59(12):1261–1272 Saintomé C, Alberti P, Guinot N, Lejault P, Chatain J, Mailliet P, Riou JF, Bugaut A (2018) Chem Commun 54(15):1897–1900 Verga D, Hamon F, Poyer F, Bombard S, Teulade-Fichou M-P (2014) Angew Chem Int Ed 53(4): 994–998

24

Targeting Quadruplex Nucleic Acids: The Bisquinolinium Saga

831

Verga D, Hamon F, Nicoleau C, Guetta C, Wu T-D, Guerquin-Kern J-L, Marco S, Teulade-Fichou M-P, Mol J (2017) Biol Mol Imaging 4(1) Xie X, Reznichenko O, Chaput L, Martin P, Teulade-Fichou M-P, Granzhan A (2018) Chem Eur J 24(48):12638–12651 Yang P, De Cian A, Teulade-Fichou M-P, Mergny J-L, Monchaud D (2009) Angew Chem Int Ed 48(12):2188–2191 Zheng AJ-L, Thermou A, Guixens Gallardo P, Malbert-Colas L, Daskalogianni C, Vaudiau N, Brohagen P, Granzhan A, Blondel M, Teulade-Fichou M-P, Martins RP, Fahraeus R (2022) Life Sci Alliance 5(2):e202101232 Zyner KG, Mulhearn DS, Adhikari S, Martínez Cuesta S, Di Antonio M, Erard N, Hannon GJ, Tannahill D, Balasubramanian S (2019) eLife 8:e46793

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

25

W. David Wilson and Ananya Paul

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AT Sequence-Specific MG Compounds That Can Also Bind at GC Sequences by Intercalation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diversity in the Recognition of AT MG Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Curvature Determination for MG Binders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Out-of-Shape DNA MG Binders: Inclusion of Interfacial Water for Induced Fit Interactions of Heterocyclic Dications with DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Development of Heterocyclic Amidine MG Binders with GC Recognition . . . . . . . . . . . . . . . . . . . Pyridine Compound Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N-Alkyl-Benzimidazole-Thiophene Compound Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Azabenzimidazole Compound Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MG Binders with Additional GC BP-Binding Capability: Compounds with the Same GC Recognizing Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MG Binders with Additional GC BP-Binding Capability: Compounds with Different GC Recognizing Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MG Binders with Additional GC BP-Binding Capability: Compounds That Recognize the GGAA Sequence That Is Conserved in the PU.1 Promoter . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

834 836 837 840 841 844 845 847 850 856 860 862 867 868

Abstract

Compounds from AT-specific minor groove binders that can also intercalate to compounds that can bind to complex mixed AT and GC base pair sequences are included in this chapter. Depending on the compound structure and DNA sequence, MG compounds can bind as monomers or dimers. AT-specific MG binding compounds that can assume a planar conformation, such as DAPI, can bind to GC sequences by intercalation but more weakly than as MG binders. MG W. D. Wilson (*) · A. Paul Department of Chemistry and Center for Diagnostics and Therapeutics, Georgia State University, Atlanta, GA, USA e-mail: [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_29

833

834

W. D. Wilson and A. Paul

binders generally have a concave shape that closely matches the convex surface at the floor of the MG. Linear compounds, however, can bind strongly in the MG if they can capture a terminal, interfacial water molecule that can complete the curvature of the compound. Such terminal water molecules can rapidly exchange with bulk water and form H-bonds between the MG binder amidine groups and DNA to account for their minor entropy cost and significant binding enthalpy. All initial MG binders were AT-specific, but many applications require broader sequence recognition capability. To accomplish this recognition, H-bond accepting groups were synthetically incorporated into designed MG binding modules to allow them to favorably interact with the G-NH group that projects into the MG. Pyridine was substituted for phenyl, azabenzimidazole for benzimidazole and N-alkylbenzimidazole for benzimidazole. These modules would only bind a GC base pair in DNA if they were incorporated into compounds with appropriate structure and functional groups to recognize the MG. With molecules that can recognize complex mixed base-pair sequences, an entirely new chapter in the design of functional MG binders has been opened. Keywords

DNA minor groove binder · Heterocyclic amidines · Sequence specificity · DNA microstructures · Molecular curvature · AT minor groove binders · Mixed base pair sequence · GC base pair specific · Therapeutic agents · African sleeping sickness · Cooperative dimer · Entropy and enthalpy of binding · Sigma-hole · aza-Benzimidazole · Surface plasmon resonance (SPR) · Isothermal titration calorimetry (ITC) · X-ray crystal · NMR · Molecular dynamic simulations · Competition ESI-mass spectroscopy · Interfacial water · Transcription factor · Water interactions in the DNA minor groove

Introduction Small molecules that bind strongly to nucleic acids are generally divided into two main classes, intercalators and minor groove (MG) binders, with major groove binders adding a minority class. In early studies to distinguish DNA intercalators from groove binding agents, such as the MG binder, netropsin, Waring and coworkers used closed circular supercoiled DNA (Waring 1991). The DNA interactions can be monitored by hydrodynamic methods, for example, intercalators cause unwinding of closed circular DNA supercoils, but netropsin and other MG binders do not. The first identified DNA MG binders were all specific for binding to AT base pair (bp) sequences. They were uniformly concave-shaped compounds that fitted snugly into the narrow MG in A-tract sequences and had groups to H-bond with N3 of A and O2 of T on bp edges at the floor of the groove (Nguyen et al. 2009; Neidle 2001). In addition to forming an H-bond with the cytosine C¼O in the MG, the 2-amino group of G projects an -N-H into the groove and presents a steric block to classical AT-specific MG binders (Nanjunda and Wilson 2012; Reddy et al. 2001).

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

835

Fig. 1 Structures of classical AT-specific DNA minor groove binders

The classical AT-specific agents were a mix of natural products, such as netropsin, and synthetic compounds, such as pentamidine, DAPI, berenil, furamidine (DB75) a variety of cyanine dyes, and Hoechst 33258 derivatives (Fig. 1), and they are very valuable as biotechnology and therapeutic agents (Barrett et al. 2013; Dickie et al. 2020; Wilson et al. 2008; Wenzler et al. 2013a; Hannah et al. 2005; Antony-Debré et al. 2017; Depauw et al. 2019). Due to the G-NH steric block and electrostatic differences in the MG, they all bind much more weakly to GC than to AT sequences. Interestingly, furamidine has been used in clinical trials against parasites that cause diseases such as African sleeping sickness (Paine et al. 2010) and remains a valuable tool in biotechnology (Sauer et al. 2017; Jenquin et al. 2018; Matthes et al. 2018). Pentamidine is still used against many human parasitic microorganisms, while berenil has been in use as an anti-trypanosome drug for livestock since the 1950s (Elamin et al. 1982; Soeiro et al. 2013; Ming et al. 2009). Other MG binders, such as DAPI, Hoechst 33258, and analogs, are valuable as cellular probes and nuclear stains (Gonen et al. 2022; Crowley et al. 2016). In spite of the many successes with these early MG binders, the lack of broad sequence-specific compounds was a limitation. The publication of a crystal structure of netropsin bound into the DNA MG of an AATT sequence by Dickerson and coworkers (6BNA.pdb) provided ideas for the design of GC bp-specific MG binders (Kopka et al. 1985). Dickerson and Lown proposed separately that replacement of the pyrrole -CH that points into the MG by an unprotonated -N in an imidazole group could accept an H-bond from the -G-NH in the minor groove to give a GC bp-specific compound (Lown et al. 1986; Lee et al. 1988; Kopka et al. 1997). Many compounds were synthesized by the Lown and other groups and GC-specific binding was obtained (Dervan and Edelson 2003; Kawamoto et al. 2018; Crowley et al. 2003; Kiakos et al. 2015). Unfortunately, problems with polyamide synthesis, cell uptake, aggregation, and others have prevented their entry into clinical trials despite extensive study for the last 30+ years (Nozeret et al. 2018; Hargrove et al. 2012). Polyamides have, however, provided useful

836

W. D. Wilson and A. Paul

agents for biotechnology. These studies did clearly show that additional molecular structures, with the potential to bind to GC bps, were required for the continued development of MG binders as therapeutic agents.

AT Sequence-Specific MG Compounds That Can Also Bind at GC Sequences by Intercalation In addition to their important biological effects, aromatic diamidines are important tools in molecular biology and cytochemistry (Paine et al. 2010; Sauer et al. 2017; Jenquin et al. 2018; Matthes et al. 2018). The fluorescence quantum yield of DAPI, for example, is very strongly enhanced by AT base-pair sequences, (Härd et al. 1990) and it is widely used as a specific fluorescence marker for chromosomal DNA. Due to some early confusion about the binding mode of DAPI with MG, major groove, and intercalation being proposed, we hypothesized that this could be due to sequence-dependent effects. As can be seen in Fig. 1, DAPI has the aromaticamidine torsional freedom with hydrogen-bonding N-H groups on the inside edge of the molecule typically seen with groove-binding compounds. DAPI can, thus, form hydrogen bonds to O2 of T and N3 of A in a minor groove complex as typically observed with MG binders. There is even a crystal structure of DAPI bound in the same AATT site as with netropsin (Larsen et al. 1989). These results leave no doubt that DAPI binds in the MG at AT sequences. Our question was, does DAPI bind at GC sequences and if so, what is the binding mode? To answer this question, a detailed experimental comparison of DAPI interactions with poly[d(A-T)]2 and poly[d(GC)]2 was conducted. In NMR experiments, many aromatic proton signals for DAPI and DNA are resolved and can be monitored on complex formation. DAPI aromatic protons shift significantly upfield, as predicted for intercalation, in the presence of poly[d(G-C)]2, and the G and C base protons also shift upfield on complex formation (Fig. 2a). Imino proton signals for the AT and GC polymers can be monitored in H2O and show significantly different effects on

Fig. 2 (a) and (b) Schematic plots of the DAPI aromatic proton signal chemical shift changes at GC sites and at AT sites at a ratio of 0.15 DAPI/base pair

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

837

complex formation with DAPI. The T-imino proton signal of poly[d(A-T)]2 shifts downfield (Fig. 2b), while the G-imino peak of poly-[d(G-C)]2 shifts upfield on complex formation with DAPI (Wilson et al. 1990). These results are again as expected for intercalation at GC and MG binding at AT sequences. In binding studies by UV spectral methods in 0.1 M NaCl, DAPI has a binding constant, KA, of approximately 105 with the GC polymer, similar to the value for the intercalator ethidium, while it has a value about 100 times higher with the AT polymer and the binding is highly cooperative (Wilson et al. 1990). The AT binding constant is similar to other MG binders of similar size. The higher binding constant at AT is a result of a much slower dissociation rate constant from AT sequences, where three H-bonds are formed, than from GC sites. These binding constants are reasonable since intercalation at GC requires the separation of stacked GC pairs with stacking of DAPI (Pullman and Jortner 1990). At the MG of AT, no large conformational change is required and both amidines and the indole -NH form H-bonds with the edges of AT bps at the floor of the MG. The positive cooperative binding at extended AT sequences is probably due to small, propagated conformational changes in the groove that make it easier for compounds to bind after the first DAPI molecules are inserted in the MG.

Diversity in the Recognition of AT MG Sequences In the search for modules that can selectively recognize specific DNA sites, we have investigated binding at different AT sequences that have quite different structures and/or properties.

Heterocyclic Diamidines That Recognize Some AT Sequences as Dimers In early NMR studies of the binding of the polyamides, netropsin and distamycin, to a variety of DNA AT sequences, Wemmer and coworkers discovered that some AT sequences, with wider MGs than A-tracts, such as -ATATAT-, could bind distamycin as a dimer but netropsin as a monomer (Pelton and Wemmer 1989; Rentzeperis et al. 1995). They noted that distamycin has a better stacking surface, three pyrroles, than netropsin with two pyrroles and that distamycin is a monocation and netropsin is a dication. These features obviously predict that distamycin would form better-stacked dimers than netropsin (Mrksich et al. 1992). An AT sequence with a wide minor groove is -TTAA- (Fig. 3) which, for example, in CGTTAAGC has a predicted minor groove width (MGW) of almost 6 Å while it is below 4 Å with the CGAATTCG sequence isomer (Fig. 3) (Zhou et al. 2013). We found that a monocationic analog of the Hoechst 33258 bisbenzimidazole, DB183, bound to -TTAA- as a stacked dimer but a structurally similar dication, DB185, bound as a monomer. In this case, because of the structural similarity, the difference had to be due to the dicationic, DB183, versus tetracationic, DB185, and stacked dimer (Fig. 4). The DB183 complex is a useful example of both DNA and compound molecular reorganization for structural complementary in binding. The results with DB183 suggest that compounds could be designed to selectively

838

W. D. Wilson and A. Paul

Fig. 3 Minor groove width versus target DNA sequences calculated from the online algorithm of Rohs and coworkers (Nucleic Acids Res 46:2636–2647); * indicates that minor groove width standard B form DNA. Groove width gives perpendicular separation of helix strands drawn through phosphate groups, diminished by 5.8 Å to account for van der Waals radii of phosphate groups. The groove depth is also based on van der Waals radii

Fig. 4 (a) SPR sensorgrams for the interaction of DB183 with AATT and TTAA hairpin DNA; (b) Models and energetics for the formation of 1:1 and 2:1 complexes of DB183 with TTAA: spacefilling models of the average structure of 1:1 and 2:1 complexes as well as the ΔG values for complex formation

recognize different AT sequences (Tanious et al. 2004). Both of the benzimidazole compounds bind strongly to -AATT- as monomers and are, thus, not as sequence selective as desired (Tanious et al. 2004). These observations led to two major questions: (i) are there compounds that bind strongly to -TTAA- as a dimer but do not bind well to -AATT- and (ii) can any dications bind well to -TTAA- as a dimer?

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

839

All well-known AT-specific MG compounds (Fig. 1) bind better to -AATT- than to -TTAA-. To begin to answer the two questions experimentally, a range of compounds have been designed, prepared, and screened for binding to a -TTAA- test site. This screening set of compounds allows exploration of structural, conformational, and substituent space of heterocyclic amidines to find motifs that are more favorable for binding to a -TTAA- binding sequence than to other AT sequences of similar length (Munde et al. 2010). Any discoveries will allow the improved design of agents with specificity for different AT binding sequences. Compounds with different specificity can also be used in combination to bind to appropriately long sections of target promoters and other sequences in chromosomes. One concept for the design of compounds to recognize wider MG sequences, such as -TTAA-, is to make them too highly curved to bind well to the MG as monomers. Such compounds, however, might be able to assume a stacked conformation that could bind well to a wider groove (Fig. 3). A promising curved compound in which a furan could stack on the imidazole of benzimidazole for dimer formation is DB1003 (Fig. 5). All of the designed compounds were compared for binding to the -TTAA- and -AATT- sites. DB75, furamidine, is a well-studied AT-specific MG binder that has a binding constant near 100 nM KD with -AATTsites. With -TTAA- DNA, however, DB75 has a significantly lower signal level in SPR experiments compared to -AATT- (Fig. 5a). The results for TTAA–DB75 binding

Fig. 5 SPR sensorgrams for binding of (a) DB75, (b) DB1003, and (c) DB293 to the AATT and TTAA binding sites at 25  C. The RU values from the steady-state region of SPR sensorgrams were converted to r by r ¼ RU/RUmax and are plotted versus the unbound compound concentration for DB75, DB1003, and DB293. The red lines represent a one-site equation fit for DB75, DB1003, and DB293 and the green lines represent two-site equation fit for DB75, DB1003, and DB293. Experiments were conducted in Tris-HCl buffer (50 mM Tris-HCl, 100 mM NaCl, 1 mM EDTA, 0.05% P20, pH 7.4) at 25  C

840

W. D. Wilson and A. Paul

indicate weak, nonspecific binding with approximately 100 times weaker binding affinity for the -TTAA- site. DB832 is a more highly curved compound, with a phenyl to furan modification, and it binds weakly to both -AATT- and -TTAA- sites compared to DB75 (Nanjunda et al. 2012). DB1003 has a phenyl-to-furan change from DB293, much like the conversion of DB75 to DB832. Surprisingly, DB1003 binds as a cooperative dimer with a high affinity to TTAA (Fig. 5b). DB1003 binds as a monomer with significantly weaker binding affinity to AATT as expected from its curvature. With the TTAA sequence, DB1003 has a biphasic melting curve at ratios less than 2:1 (compound/hairpin duplex) and a monophasic curve at a 2:1 ratio in agreement with dimer formation (not shown). The ΔTm was found to be 28.7  C for the DB1003 –TTAA complex, in agreement with strong binding as a dimer as observed in SPR (Fig. 5b). DB1003 has a strong positive CD curve with -TTAA- that indicates a MG binding mode of the dimer, as expected (Munde et al. 2010). The experiments described above were all conducted with small DNA duplexes, and there is a question of whether these results are relevant to higher molecular weight DNAs. To address this issue, DNase I footprinting experiments with DNAs that contain both -TTAA- and A-tract binding sites were conducted (Munde et al. 2010). Results for DB75 and DB1003 were obtained with a radiolabeled DNA sequence that contains -TTAA-, -TATA-, -AATTA-, and -ATTT- sites of four or five AT base pairs. The results for the compounds are strikingly different and clearly illustrate the difference between classical MG binders and those that bind well to -TTAA-. DB75 gives a strong footprint with -AATTA-, and -ATTT-, but no significant protection to the -TTAA- or -TATA- sites, classical MG binder behavior. DB1003 gives exactly the opposite footprinting pattern, a strong footprint at -TTAA- but no significant footprint at -AATTA- or -ATTT-(Munde et al. 2010). Note that the selectivity for TTAA with DB1003 is observed even in the presence of competing AT binding sites that are generally quite favorable for MG compounds. In summary, the important features for the DB1003 MG dimer are: (i) a favorable stacking surface provided by the benzimidazole–furan–furan pi system that can interact well with the wider MG in -TTAA- site; (ii) benzimidazole –NH and amidine groups that provide H-bond donors in a stacked dimer to interact with the edges of TA base pairs at the floor of the groove; and (iii) curvature of the stacked dimer complex that matches the curvature of the -TTAA- MG. Finally, the large curvature of the DB1003 system decreases its binding strength to -AATT- sites.

Curvature Determination for MG Binders As can be seen from the discussion above for -TTAA- binding dimers, molecular curvature, along with stacking surface, H-bonding groups, and compound charge, is a key feature for MG recognition. Appropriate curvature of monomer compounds is essential for strong H-bonding and charge interactions in the minor groove. There is no universally accepted method for calculating the curvature of minor groove binders. To address that issue, we have established a graphical approach for the determination of comparative molecular curvature values for heterocyclic

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

841

Fig. 6 Molecular curvature for selected minor groove binding compounds

diamidines. In this method, a compound of interest is first energy minimized in the SPARTAN software package. The compound is then imported into a graphics package such as PowerPoint and compared to other minor groove binders. In the next step for curvature determination, a reference circle is defined in the graphics package and the circle is required to pass through both amidine carbons as illustrated in Fig. 6. Next, the reference circle must be adjusted to have a radius that allows it to pass as closely as possible through the center of each molecular unit of the entire molecule in addition to the two amidine interactions (Guo et al. 2021). This approach is illustrated with DB75 and DB1003 (Fig. 6). To calculate the relative curvature, two straight lines are drawn from the amidine carbons to the circle point at the center of the molecule. The angle between these two lines then defines a relative curvature value for each molecule. Analysis of a range of strong binding MG binding compounds by this method has defined a calibration curvature value of 140 –145 for compounds that bind strongly to the DNA MG. While this is a relative comparison, it is a very useful number to determine before synthesis of a new heterocyclic diamidine. It is also helpful in relative comparisons of compound- MG binding constants. Note that the procedure could be extended to MG binders with other structures for relative comparison of a set of compounds.

Out-of-Shape DNA MG Binders: Inclusion of Interfacial Water for Induced Fit Interactions of Heterocyclic Dications with DNA Figure 7 shows two isomeric biphenyl-benzimidazole diamidine derivatives, DB911 and DB921, which have complementary and fascinating MG binding results. The central meta-substituted phenyl of DB911 has a classical type curvature for MG binding (Fig. 7), similar to that of DB75 and related MG binding agents. However,

842

W. D. Wilson and A. Paul

Fig. 7 (a) Chemical structure of DB911 and the X-ray structure of DB911-d(CGCGAATTCGCG)2 minor groove binding complex (PDB ID:2NLM). The ball and stick model in purple-orange-blue (C-H-N) color scheme represents DB911. The DNA bases are represented in a cyan-white-red-blueyellow (C-H-O-N-P) color scheme. The important interactions between different sections of the DB911-DNA complex are illustrated. One strong hydrogen bond (black dashed line) with a DNA base is shown; (b) Chemical structure of DB921 and the X-ray structure of DB921 and d (CGCGAATTCGCG)2 minor groove binding complex (PDB ID:2B0K). The ball and stick model in green-orange-blue (C-H-N) color scheme represents DB921. The DNA bases are represented in a cyan-white-red-blue-yellow (C-H-O-N-P) color scheme. The important interactions between different sections of the DB911-DNA complex are illustrated. DB921 forms two directly strong hydrogen bonds with DNA bases (black dashed line). DB921 is also bound to the AATT site with an interfacial water-mediated interaction between the phenyl-amidine of DB921 and DNA. The interfacial water (red, ball, and stick and black dashed lines) serves to complete the curvature of the bound compound

the central para-substituted phenyl of DB921 has a much more linear shape (Fig. 8) that lacks the adequate radius of curvature to match the DNA MG shape. Based on the established library of MG binders, DB911 should bind similarly to DB75, but DB921 should bind very weakly to the DNA MG. Biosensor-SPR experiments, however, discovered that these two compounds, at an AATT binding site, provided surprising results for binding kinetics and affinities (Fig. 9). The binding affinity of DB921 (KA ¼ 14.2 x 107 M1) at the -AATT- MG is about 14 times greater than for DB911. DB911 binds very similarly to DB75 (KA ~ 107 M1), but DB921 binds more strongly than these classical compounds despite its linear shape (Miao et al. 2005). Isothermal titration calorimetry (ITC) thermodynamic analysis of DB921 and DB911 with an -AATT- binding site DNA sequence showed more favorable binding enthalpy for DB921. The ΔH for binding of DB921 to the AATT site is 4.5 kcal/ mol, while the value for DB911 is only 2.5 kcal/mol. The –TΔS contributions to the binding free energy are similar, 7.0 kcal/mol for DB921 and  7.5 kcal/mol for DB911 (Miao et al. 2005). The overall binding for both compounds is entropydriven with the AATT site, but the favorable binding enthalpy indicates stronger

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

843

Fig. 8 (a) Chemical structures of DB185, DB921, and DB911; overlay molecular curvatures for DB185, DB921, and DB911. Structure optimization of minor groove binding compounds was performed by using DFT/B3LYP theory with the 6–31 þ G* basis set in Gaussian 09

Fig. 9 Representative SPR sensorgrams for (a) DB911 and (c) DB921 in the presence of (a) AATT hairpin DNA. (b) Steady-state binding plot for AATT with DB911. The data are fitted to a steadystate binding function using a 1:1 model to determine equilibrium binding constants. In (a) and (c), the solid black lines are best-fit values for global kinetic fitting of the results with a single site function

H-bond, electrostatic, and van der Waals interactions, leading to a higher binding affinity for DB921 (Miao et al. 2005). The X-ray crystal structure and molecular dynamics (MD) simulation structures facilitate understanding the binding affinities and thermodynamic differences for these two compounds. The X-ray structure of DB921 bound to an -AATT- site (PDB ID: 2B0K) by the Neidle group shows an interfacial water-mediated, noncovalent interaction between the phenyl-amidine of DB921 and DNA that serves to complete the curvature of the bound compound (Fig. 7b) (Nguyen et al. 2009; Miao et al. 2005).

844

W. D. Wilson and A. Paul

The X-ray structure reveals that the terminal phenyl-amidine rises off of the groove floor due to the linear structure of DB921. However, the interfacial water molecule “rescued” DB921 as a potent MG complex by forming a bridge between the floor of the groove and the amidine, so that the optimum curvature was obtained. A phenylamidine –NH forms strong H-bonds to the interfacial water O, while an –OH group on the same water forms another strong H-bond to an N3 of dA. The interfacial water molecule in the DB921 binding site can adequately orient to provide favorable curvature to the DNA complex and interactions between the compound and DNA in a dynamic complex (Fig. 7b). The dynamic-flexible H-bonding ability of the bound water helps provide the high binding affinity of DB921 to the -AATT- binding site with a favorable enthalpy and minimum loss of entropy. The interfacial water acts as an H-bond donor and acceptor to connect DB921 and DNA bases noncovalently. The X-ray crystal structure also shows that, at the other end of DB921, the benzimidazoleamidine binds in a relatively classical manner and locks the compound at the -AATTsite of the MG. The amidine and BI-N-H form strong H-bonds between an O2 of T with an average of 2.9 Å bond length. The BI-C-H proton that faces into the groove also stabilizes the DB921-AATT complex near the floor of the MG. Aromatic protons carry a slight positive charge and can form stabilizing interactions with the N3 of A and O2 of T groups (Fig. 7b). The central phenyl makes van der Waals contact with the base edges and walls of the groove. Both amidines of DB921 are also stabilized by an external, extended terminal water network in the groove. The structural result indicates that interfacial water in an optimal site can convert a linear compound to a perfect match for DNA MG interactions (Fig. 7b) (Miao et al. 2005). A combination of factors H-bonds, van der Waals contacts, favorable electrostatic interactions with DB921, water and DNA, and release of water from the compound and tightly bound water from the MG contribute to the affinity. Small molecule complexes with DNA that incorporate an interfacial water molecule are rare, although they are quite common in DNA-protein complexes (Poon 2012a). The DB921-DNA complex provides a unique and well-defined system for analyzing water-mediated binding in the context of a DNA complex. The system also provides valuable information for incorporating water in the design of new lead scaffolds. It will be interesting to search through such nonstandard MG binders for other water-mediated strong complexes.

Development of Heterocyclic Amidine MG Binders with GC Recognition Until recently, recognizing mixed base-pair sequences in the minor groove of DNA has largely continued to follow the original ideas with pyrrole and imidazole and related groups with amide linkers in polyamides (Lown et al. 1986; Lee et al. 1988; Kopka et al. 1997; Lown 1992; Dervan and Edelson 2003; Kawamoto et al. 2018; Crowley et al. 2003; Kiakos et al. 2015; Nozeret et al. 2018; Hargrove et al. 2012). Such compounds have an advantage of design simplicity, but have been plagued by solution and synthetic difficulties as described above. Heterocyclic amidine minor

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

845

Fig. 10 Chemical structures of three new molecular modules and compounds for mixed DNA sequences binders

groove binding derivatives, on the other hand, are attractive for the development of GC bp-specific agents since they have good solution properties and cell uptake (e.g., berenil, DAPI, pentamidine, DB75, Fig. 1). Even more important, the AT-specific furan derivative, DB75 in Fig. 1, furamidine, has progressed to Phase III in human clinical trials against parasitic disease (Paine et al. 2010; Wenzler et al. 2013b; Thuita et al. 2012; Yang et al. 2014). The question then was how could these AT-specific agents be modified to include GC bp-specific binding groups? The first goal was to prepare compounds that could recognize a single GC bp in an AT sequence context. Many compounds were prepared and tested and three different modules were discovered that could specifically bind to the minor groove G-NH in a GC bp in an AT sequence. The modules, with examples in Fig. 10, are an N-alkylbenzimidazole with an adjacent thiophene (DB2429 and DB2457), a pyridine (DB2120, DB2447) in a specific synthetic context, and an azabenzimidazole (DB2277) (Paul et al. 2019). Brief chemical structure descriptions of how each GC-specific module was developed from a purely AT-specific binding compound are given below.

Pyridine Compound Design For the lead pyridine GC binding compound, DB2447, the starting point was DB2119 (Fig. 11), a relatively large and very AT-specific binding agent (Paul et al. 2015a, 2017). DB2119 has an open central architecture where the central phenyl has flanking -CH2-O- linkages. A similar architectural linkage was used in our azabenzimidazole single GC bp binder (Fig. 10). This open type linkage, as versus directly linked aromatic systems, is optimized to bind to the G-NH group that projects into the minor groove with the appropriate curvature to match the groove. Conversion of the central phenyl of DB2119 to a pyridine provided a strong single GC bp-binding module in DB2120 (Figs. 10 and 11) (Paul et al. 2015a, 2017). The change from a strong AT binding unit to GC recognition with change in only a single molecular position is quite surprising and encouraging (Fig. 12c). It is similar to what was seen with the pyrrole to imidazole switch in netropsin and distamycin. The pyridine group

846

W. D. Wilson and A. Paul

Fig. 11 Schematic representation of the development of AT-specific DNA binding compounds (DB2119 and DB2559) to mixed DNA sequence-specific compounds (DB2120 and DB2447)

Fig. 12 Representative SPR sensorgrams for (a) DB2120, and (b) DB2447 in the presence of AAAAGTTT and AAAGTTT, respectively. In A with AAAAGTTTT, the concentrations of DB2120 from bottom to top are 2, 3, 5, 7, 10, and 15. In B, with AAAGTTT, the concentrations of DB2447 from bottom to top are 2, 5, 10, 15, and 20 nM. In (a) and (b), the solid black lines are best-fit values for global kinetic fitting of the results with a single site function; (c) comparison of equilibrium binding constants (KA, M1) of Phenyl and Pyridine analogs with pure AT base pair and mixed single GC base pair containing DNA sequences. “X” represents no measurable KA under our experimental conditions

accepts an H-bond from the G-NH2 and totally changes the recognition unit in DB2119. Because the terminal amidine-benzimidazole units are very strong and specific AT binding units, however, they adversely affected the GC binding specificity of DB2120. To deal with this issue, it was decided to eliminate the terminal benzimidazole groups of DB2120 to give a smaller derivative where the pyridine, GC bp recognition unit, was a more important part of the molecule (Fig. 11). DB2447 still binds strongly to a single GC base pair, as desired, but with better specificity than DB2120 (Fig. 12b and c). The ΔTm for DB2447 is 14  C with an AAAGTTT recognition sequence and the KD determined by biosensor-SPR methods is 1.8 nM (Fig. 12b). The ΔTm and KD values with AAATTT are 9  C and 374 nM, while with AAAGCTTT they are 6  C and 58 nM respectively. These results reveal the excellent affinity with good

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

847

specificity of DB2447. In an effort to improve the selectivity while maintaining strong binding, a number of derivatives of DB2447 were prepared. Many of these showed strong affinity for AAAGTTT but with poor selectivity. An exception was DB2448 (Fig. 10), with an amidine to imidazole conversion. This compound has slightly reduced ΔTm with AAAGTTT (11  C) but less than 1  C ΔTm with both pure AT and two GC bps DNA binding sequences. The five atom cyclic imidazoline derivative has a KD for AAAGTTT of 3.6 nM but a much reduced affinity for AAATTT with no binding detected under our conditions (Fig. 12c). The KD for the two GC bps binding site DNA sequence is 123 nM. Phenyl modification did not improve the binding selectivity and our best overall compound in the pyridine series is now DB2448.

N-Alkyl-Benzimidazole-Thiophene Compound Design DB818 (Fig. 1) is a very strong and specific AT binding compound, one of the strongest binders in its molecular weight range. As shown by X-ray crystal structures of its complex with -AATT- (Mallena et al. 2004; Wilson et al. 2005), it has the appropriate curvature and functional group positioning to match the curvature of an AT sequence MG surface (Fig. 13b). A closely related compound, DB293, with a furan in place of the thiophene is more curved and binds more weakly to AT sequences (Fig. 5c). DB818 is, thus, an ideal platform for modifications to develop GC binding specificity. N-Me alkylation of the benzimidazole group in DB818 gave DB2429 (Fig. 13). Although DB2429 has a very similar curvature to DB818 (Fig. 13b), it has an unprotonated nitrogen that faces into the MG in place of the benzimidazole -NH of DB818. The unprotonated nitrogen in DB2429 is a potential H-bond acceptor for the exocyclic -N-H of G for recognition of GC bps. The unprotonated -N- and adjacent thiophene sulfur form a strong, sigma-hole (σ-hole) type interaction that preorganizes DB2429 for a snug fit to the MG (Guo et al. 2016).

Fig. 13 (a) Chemical structures of DB818 and DB2429; (b) molecular curvatures for DB818 and DB2429. Structure optimization of minor groove binding compounds was performed by using DFT/B3LYP theory with the 6–31 þ G* basis set in Gaussian 09

848

W. D. Wilson and A. Paul

Thiophene C  S single bonds present a relatively positive electrostatic potential that can form an interaction with electron-donating atoms such as the unsubstituted NMe-benzimidazole N, a 1,4 NS interaction. The interaction is based on the presence of low lying thiophene C  S σ* orbitals on S that give rise to the positive electrostatic potential or a σ-hole. Biophysical analysis shows there is essentially complete conversion of the AT specificity of DB818 to single GC bp-binding specificity in DB2429. The dramatic change in specificity from DB818 to DB2429 is another remarkable example of the effects on binding to the DNA MG by very small changes in compound chemistry. Small but creative changes in compound structure and functional groups can clearly provide a large change in compoundDNA interactions. The σ-hole organization of DB2429 structure and polarity are essential for its selective GC recognition since N-Me alkylation of the benzimidazole of DB293 (Fig. 5c) does not give a compound with GC bp-binding specificity. The NH of benzimidazole is a strong AT bp recognizing element while the unsubstituted N of N-MeBI is a strong GC bp interacting element. As our understanding of MG interactions continues to develop, the design of new types of sequence-specific MG binders will become even more attractive and effective. Modifications of DB2429 were made with the goal of improving its GC bp recognition and affinity. Expanding DB2429 with another phenyl to better cover the AT bps that flank the central GC bp gave DB2457 (Fig. 10) (Guo et al. 2018). The selectivity for binding to the single GC bp sequence instead of a pure AT sequence is only about a factor of 10 for DB2429 while it is increased to 50 for DB2457 (Fig. 14b). The phenyl of the alkyl-benzimidazole group apparently helps lock it down on the MG for specific GC binding. Additional ideas were considered for methods for possible increases in the selectivity for single GC bp sequences, an essential feature for use in biological systems. An attractive idea is to take advantage of DNA microstructural variations, such as the wider MG in GC bp containing regions, such as the difference in MGW for the single GC bp sequence versus pure AT sequences (Fig. 14a) (Zhou et al. 2013; Rohs et al. 2009; Azad et al. 2018). Since the first crystal structures of DNA with a MG binder to extensive structures of heterocyclic diamidines (Nguyen et al. 2009) and recent predictions about the MG from microstructural analysis, (Zhou et al. 2013; Azad et al. 2018) the wider DNA MG in GC versus AT sequences has been recognized. To test the idea of using groove width differences to separate AT and GC containing sequences, two changes in structure and functional groups of DB2457 were made with the goal of increasing specificity: (i) larger substituents than a methyl group were added as alkylators to the benzimidazole and (ii) the compound bulk due to the overall twist in the compound-linked aromatic core structure was modified by adding appropriate aromatic substituents (Figs. 14 and 15). Compounds in both groups caused changes with significant increase in the selectivity of the thiophene derivatives for single GC bp sequences. We have used the DNA shape algorithm (Zhou et al. 2013; Rohs et al. 2009; Azad et al. 2018) to estimate the MGWs for the DNA sequences of interest. For AAATTT, the predicted MGW in the center is 2.9 Å and widens to near 5 Å at the flanking GC

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

849

Fig. 14 (a) Minor groove width versus target DNA sequences calculated from the online algorithm of Rohs and coworkers; * indicates that minor groove width standard B form DNA. Groove width gives perpendicular separation of helix strands drawn through phosphate groups, diminished by 5.8 Å to account for van der Waals radii of phosphate groups. The groove depth is also based on van der Waals radii; (b) comparison of equilibrium binding constants (KA, M1) of DB2429, DB2457, and analogs with pure AT base pair and mixed single/two GC base pair(s) containing DNA sequences. “X” represents no measurable KA under our experimental conditions; (c) chemical structures of σ-hole containing compounds have been discussed in this work

bps. At the AAAGTTT binding site, the predicted central MGW is between 3.3 and 3.4 Å, which is significantly wider on a molecular scale than the AAATTT site. For the reference AAAGCTTT site, the predicted MGW is over 4 Å. All sequences widen to near 5 Å at the terminal GC sequence (Fig. 14a). With appropriate compound modifications, these microstructural variations can allow us to selectively target a single GC bp-binding site DNA sequence. In initial modifications at the N- benzimidazole position of DB2457 (Figs. 14 and 15), the length and complexity of the alkyl group were increased, for example, from methyl to -isopropyl and -neopentyl. Cyclo-substituents from cyclobutyl to cyclohexyl to phenyl were also introduced. In the aromatic core, -Cl and trifluoromethyl substituents were introduced adjacent to an amidine (Fig. 14b and c). Over 20 compounds were prepared and two compounds showed exciting improvements in selectivity. Starting with the N-Methyl compound, DB2457, the KD is essentially constant through N-ethyl, isopropyl, and cyclobutyl substitutions, but the selectivity for single GC bp binding, especially relative to AT binding, is markedly enhanced. With the N-aromatic derivatives, binding is strong with good selectivity over AT sites but decreased selectivity for the AAAGCTTT sequence (Guo et al. 2018). For the aromatic substitutions, DB2759 with a -Cl substituent adjacent to the amidine has good binding and excellent selectivity for the AT and GC test sequences (Figs. 14 and 15). The strong binding of these compounds is a consequence of relatively fast association with the MG coupled to a quite slow dissociation to give KD values below 10 nM for the optimum compounds. Increasing the size of the N-alkyl substituents and increasing the aromatic twist with appropriate substituents have resulted in a dramatic

850

W. D. Wilson and A. Paul

Fig. 15 Representative SPR sensorgrams for (a) DB2714 and (b) DB2753 in the presence of AAATTT, AAAGTTT, and AAAGCTTT hairpin DNAs. With AAATTT and AAAGCTTT sequences, the concentrations of DB2714 and DB2753 are 2–500 nM of each compound from bottom to top. In A, with AAAGTTT sequence, the concentrations of DB2714 from bottom to top are 15, 20, 30, 50, and 100 nM; In B, with AAAGTTT the concentrations of DB2753 from bottom to top are 30, 70, 100, 200, and 500 nM. In A and B, the solid black lines are best-fit values for the global kinetic fitting of the results with a single site function

increase in selectivity for a single GC bp site with little change in KD (Guo et al. 2018).

Azabenzimidazole Compound Design DB1476 (Fig. 16) is a benzimidazole derivative with phenyl-amidines on the imidazole and six-member ring groups of the benzimidazole. DB1476 is a very strong and specific AT binding compound (Fig. 17) with excellent solution properties. Using the procedure described above, its curvature,148 , is near the ideal range for optimum affinity for a MG binder. It binds to a -AAATTT- reference site with a 0.3 nM KD and over 100 times more weakly to a single GC bp containing related site: -AAAGTTT- (Paul et al. 2015b; Chai et al. 2014). This is very strong AT binding and excellent selectivity for a compound with a relatively low molecular weight. Because these features suggest an excellent fit to the MG, the DB1476 structure platform along with its favorable solution properties makes a very attractive molecular platform for conversion to a GC binding derivative. DB1476 and derivatives modified with the goal of enhancing GC bp-binding specificity will be used to illustrate additional design concepts and compound modifications for converting

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

851

Fig. 16 Chemical structures of single GC base pair recognizing aza-benzimidazole derivatives

from AT to GC recognition. The initial GC design concept with DB1476 is extremely simple – convert the benzimidazole -CH group that faces into the MG to an unprotonated -N-, a benzimidazole to aza- benzimidazole conversion to give DB2285 (Fig. 16). The goal of this modification is to put a G-NH H-bond acceptor in the optimum location in the MG. DB2285 has essentially the same curvature as DB1476. The aza benzimidazole derivative, DB2285, does bind to the single GC sequence; however, its selectivity is not satisfactory for use in cells (Fig. 17c). It binds to -AAAGTTT- with a KD of 63 nM and while binding to -AAATTT-, KD increases to 70 nM, much weaker than the DB1476 binding affinity towards -AAATTT-. While the enhancement of GC binding is encouraging, the selectivity must be enhanced. Conversion of the DB1476 -CH to -N- results in an increase in KD for -AAATTTfrom 0.4 nM to 70 nM, a 175 fold decrease in affinity. While the reduction in AT binding was significant and desired, the KD for binding to -AAAGTTT- is only slightly better than the AT affinity (Paul et al. 2015b; Chai et al. 2014). Analysis of models suggested that the ortho -CH groups on the phenyl adjacent to the aza-N partially blocked access to the N for H-bonding to a G-NH. To remedy this steric block, various linkers between the phenyl and aza benzimidazole were evaluated (Paul et al. 2015b; Chai et al. 2014). By far the best results were obtained with a -CH2-O- linker in DB2277 (Fig. 16). DB2277 has an excellent KD for -AAAGTTTof 0.3 nM while the KD for AAATTT is 24 nM, almost 100 fold selectivity with very strong binding to the target sequence, much stronger than with DB2285 (Fig. 17c). The binding to -AAAGCTTT- by DB2277 is even weaker than with the AT site, indicating excellent selectivity. The results also illustrated the dependence of the GC binding on the compound structure. Isomers and close analogs of DB2277 bind much more weakly to single GC bp containing sequences and with low specificity.

852

W. D. Wilson and A. Paul

Fig. 17 Representative SPR sensorgrams for DB2277 in the presence of (a) AAAGTTT and AAATTT DNA binding sites. (b) Comparison of steady-state binding plots for AAATTT and AAAGCTTT with DB2277. The data are fitted to a steady-state binding function using a 1:1 model to determine equilibrium binding constants. In (a), the solid black lines are best-fit values for global kinetic fitting of the results with a single site function; (c) comparison of equilibrium binding constants (KA, M1) of DB1476, and aza-benzimidazole analogs with pure AT base pair and mixed single/two GC base pair(s) containing DNA sequences

DB2275, an isomer of DB2277 with the -O-CH2- linker moved to the other end of the structure (Fig. 16), has much weaker binding to the G-NH2 in the MG (Fig. 17c). This validates the concept that the aza-N needed additional access, through the -OCH2- linker for G-NH2 H-bonding (Paul et al. 2015b; Chai et al. 2014). Because there are no high-resolution structures of heterocyclic cations with a mixed AT and GC sequence, the NMR structure of DB2277 complexed with the single GC bp-binding site sequence -CCAAGATAG- was solved (Fig. 18a) (Harika et al. (2016, 2017). The -AAGATA- sequence was selected because after screening

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

853

Fig. 18 (a) 2D NMR-refined structure of the DB2277-ds[(50 -CCAAGATAG-30 )(50 - CTATCTTGG-30 )] complex (PDB ID: 6AST). The DB2277 (green spheres) fits isohelically into the minor groove of the DNA duplex; (b) Snapshot of MD simulations of DB2277-ds[(5´-CCAAGATAG-3´)(5´-CTATCTTGG3´)] complex. The ball and stick model in green-white-blue-red (C-H-N-O) color scheme represents DB2277. The DNA bases are represented in tan-white-red-blue-orange (C-H-O-N-P) color scheme; (c) the important interactions between different sections of the DB2277–DNA complex are illustrated. DB2277 forms three direct H-bonds (black dashed lines) and one interfacial water-mediated (purple, ball and stick) H-bond with DNA bases

multiple oligomer hairpin sequences, it gave excellent 1D and 2D NMR spectra with DB2277 and clearly showed only a single 1:1 complex. Other sequences, such as with the -AAAGTTT- binding site, had mixed 1:1 complexes, even though the stoichiometry of binding was 1:1. For DB2277 complexed with the -AAAGTTTsequence as a 1:1 complex, six imino proton NMR peaks are expected for AT bp at the 1:1 binding ratio. The imino proton spectra, however, reveal that all complex signals are doubled. The two sets of signals require two binding modes for DB2277, a major and a minor binding species based on imino peak intensity (Rettig et al. 2012). Interestingly, other biophysical methods such as CD, SPR, mass spectrometry, and isothermal titration calorimetry also indicated a 1:1 complex for -AAAGTTT- with DB2277, but these methods were not able to show any evidence of two binding orientations (Harika et al. 2016). The local symmetry (similar AT bp sequences on both sides of the central G•C bp) of the -AAAGTTT- binding site results in two orientations of DB2277 with similar energy and similar CD spectra. NMR results for the two modes indicate complexes with reverse or opposite orientations in the MG. These mixed orientations, however, revealed a very interesting feature of MG complexes: The half-life for exchange of DB2277 complexes that face in opposite directions in the -AAAGTTT- MG binding site is approximately 0.1 s (from NMR), while the half-life for complete dissociation of DB2277 from the complex is 230 s (from SPR analysis) (Harika et al. 2016). These surprising results

854

W. D. Wilson and A. Paul

show that DB2277 can rapidly flip orientations in the MG while formally bound. Similar exchange kinetics have been observed for other compounds bound in the MG (Rettig et al. 2012). Clearly, MG complexes can be quite dynamic, depending on the local sequence. One and two dimensional 1H and 31P spectra for both exchangeable and nonexchangeable protons of the AAGATA site were collected in H2O and D2O buffer solutions at a 1:1 ratio of DB2277 to DNA binding sites. NMR constraints were used with restrained molecular dynamics (rMD) methods to determine the solution structure of the DB2277 complex by using standard methods (Crnugelj et al. 2002). After the addition of 0.4 equivalents of DB2277, free DNA signals decrease with the emergence of new peaks for the DB2277–DNA complex. This co-existence of free DNA and DB2277–DNA complex peaks indicates slow exchange between the complex and free DNA on the NMR chemical shift timescale. The absence of any free DNA signals at a 1:1 binding ratio agrees with tight binding of DB2277 with the DNA sequence. NOE distance restraints were generated in the usual manner for the complex, and restrained MD simulations were performed on both free DNA and the DB2277–DNA complex (Harika et al. 2017). The complex of DB2277 with -AAGATA- has several unusual recognition features that were not expected. The results from the rMD calculations provide a model that clearly shows that DB2277 fits tightly into the MG at the GC bp in the single GC DNA sequence. The G-NH group in the MG of the binding site forms a strong H-bond with the aza-N of the aza benzimidazole group with the H-bond distance of 2.1 Å. Because this interaction was part of the original design, this observation is helpful in future design efforts. The DB2277-GC contact is additionally locked down by an unpredicted and unexpected formation of a benzimidazole -NH to O2 of C H-bond between the benzimidazole-NH and the O2 of C of the GC bp in the MG. This double interaction is an advantage of using aza- benzimidazole groups for GC bp recognition and provides a very strong stabilizing component to the complex. An important question is: how are the flanking AT sequences recognized? In the rMD model (Fig. 18b and c), the amidine on the phenyl-amidine connected to the azabenzimidazole six-member ring forms a direct H-bond to O2 of a T group (H-bond distance of 2.8 Å) in the MG. The model also shows that the amidine–DNA interaction is stabilized by an ensemble of dynamic MG H-bonds to terminal water molecules (Harika and Wilson 2018). The second amidine, at the other end of DB2277, interacts in a very different manner with the bases at the MG floor. The -NH groups of this amidine are too far from the bases to form direct H-bonds. For this reason, an interfacial water molecule is required to link an -NH from the amidine to an O2 of T at the floor of the MG. Interfacial water molecules of this type are a key component of many protein–DNA complexes but have rarely been seen in small molecule–DNA complexes. Other features of the DB2277 molecule help provide strong binding in the MG complex. The -O-CH2 – linkage, for example, provides correct spacing and flexibility for the compound to track along the shape of the MG and correctly index the functional groups of DB2277 with those of DNA. As described above, removal of

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

855

this linker yields DB2285 with a lower affinity for single GC bp sites. An important stabilizing feature of the complex is from the phenyl C-H protons that are oriented to the floor of the MG. The phenyl protons carry a small positive charge and can form a stabilizing interaction with N3 of A and O2 of T groups in the MG. All of the DB2277–DNA interactions in addition to the dynamic stabilizing hydration network yield a very favorable complex for specific DNA recognition. Both amidines are also stabilized by a dynamic, external, extended water network in the groove (Fig. 18b and c) that is terminal to the compound structure (Harika and Wilson 2018). Waters of hydration of this type provide an important stabilization feature of the complexes and are seen in essentially all MG complexes that have been determined at sufficient resolution. These waters are clearly a common feature of DNA MG interactions. NMR spectra obtained for the DB2277AAGATA complex at 25  C show degeneracy in the chemical shifts (δ) observed for all phenyl proton signals of DB2277. The observed averaging of chemical shifts of phenyl proton signals is normally caused by rapid phenyl rotation (Harika and Wilson 2018; Searle and Embrey 1990). Since the half-life of bound DB2277 is approximately 50s, the rapid rotation of the phenyl rings must occur while DB2277 is bound within the MG. This rotation of the phenyl group while the compound is bound complements the observed compound flip observed in symmetric sequences such as AAAGTTT, as described above. Interestingly, in microsecond MD simulations of the DB2277-DNA complex, 180 rotations of the phenyl attached to the flexible -CH2O-, occurred several times while no rotation of the other phenyl was observed in this time span. The two phenyl-amidine groups of DB2277 are both H-bonded to O2 of T groups at the floor of the MG; however, the phenyl-amidine group that is coupled through an interfacial water that is an integral part of the complex is more dynamic. In the MD simulation, the 180 phenyl rotations occur in less than one ns. The phenyl rotations in the bound compound are coupled to motions in DNA. Dynamic motions of the MG as the bound DB2277 phenyl-amidine undergoes large torsional fluctuations cause increases in the MGW of DNA. Transient breathing motions of the complex assist in the dynamic flipping of the phenyl. The flexibility of the -OCH2- phenyl-amidine helps DB2277 to track along the MG curvature by forming indirect and dynamic water-mediated H-bond contacts of the amidine with the bases at the floor of the MG. The flexibility and water-mediated contact allow the phenyl to rotate several times in a microsecond so that rotation in a bound compound was observed for the first time. During these phenyl flips, the aza-benzimidazole H-bonds to N-H of G and O2 of C at the central GC bp, which accounts for the mixed sequence recognition by DB2277. Neither the aza- benzimidazole nor the other phenyl of DB2277 shows the large dynamic motions as seen with the -O-CH2- linked phenyl. The impressive improvement in GC bp-binding affinity and selectivity on adding the -O-CH2- to DB2285 is a dramatic illustration of the importance of using both compound synthesis and biophysical analysis in drug design. As yet, we do not understand MG recognition in sufficient detail to do the design completely by computational approaches. The MD results do allow compounds with poor contacts to be identified without synthesis.

856

W. D. Wilson and A. Paul

MG Binders with Additional GC BP-Binding Capability: Compounds with the Same GC Recognizing Modules Although specific recognition of a single GC bp by traditional AT-specific compounds was a major advance, it does not provide sufficient diversity to target a broad range of DNA sequences and inhibit DNA binding proteins, such as transcription factors (TFs). For example, the lambda B motif of the murine Igλ2-4 (50 -ATAAAAGGAAGTG-30 ) promoter sequence is targeted by the PU.1 ETS TF (Poon 2012b). The critical recognition sequence of the promoter 5’-AATAAAGGAAGTGAAACCAAG-30 has the AT-rich 50 sequence described above, which is targeted by many minor groove binders. The conserved -GGAAcentral sequence as well as the 30 region have multiple GC bps sequence combinations that require more complex compounds for recognition: -AAGGAA-, -AAGTGAA-, and -TGAAACCA-. The initial idea for multiple GC bps binding was to link the single GC bp-binding modules described above, with different linkers, to create more complex sequence recognition units (Lombardy et al. 1996; Guo et al. 2017). The critical question is how to combine the GC recognition units, such as the thiophene-N-Me benzimidazole in DB2429, to selectively recognize longer, more complex DNA sequences with additional GC bp. Although progress in understanding the selective targeting of DNA by proteins has been excellent, there has been limited progress in the development of synthetic compounds designed to target complex, mixed bp DNA sequences. The first set of successful multiple GC bp recognizing heterocyclic amidine compounds was obtained by linking N-methyl- benzimidazole-thiophene modules, based on DB2429, with -O-(CH2)n-O- linking units (Fig. 19, DB2528 and analogs). The value of “n” in the linker could be varied depending on the distance between the GC bps to be targeted (Guo et al. 2017). The phenylamidines at both ends of DB2528 and the methylene linker provide AT base pair binding affinity. The design concept must include enough molecular flexibility to match the DNA minor groove shape and curvature. With an appropriate linker, a full turn of the double helix can be selectively recognized. The new linked compounds are particularly striking because relatively simple changes in the chemistry of AT-specific compounds, such as the addition of specific H-bond accepting groups for binding to GC bps, can convert them into modules that strongly and specifically recognize mixed bp DNA sequences. This is an example of the systematic design and preparation of heterocyclic cations to recognize multiple GC bps in complex sequences of DNA. DB2528 (Fig. 19) is a two-thiophene-BI-based compound with a flexible threecarbon linker that was designed to H-bond and bind to two GC bps with three intervening AT bps. The terminal phenyl-amidines groups of DB2528 can then form H-bonds with flanking AT bps to complete the binding unit. Biophysical results and molecular dynamics investigations of the compound-DNA complexes provide support for this binding mode (Fig. 19). Compounds with “n” ¼ 1 or 2 do not have the correct shape and curvature to bind strongly to the DNA MG. Fortunately, n ¼ 3 is a magic number, but compounds with n > 3 display unfavorable solution properties

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

857

Fig. 19 (a) Chemical structure of two linked GC recognition modules, DB2528; (b) representative SPR sensorgrams for DB2528 in the presence of -GAAAC- binding site DNA. The concentrations of DB2528 from bottom to top are 10, 15, 30, 50, and 100 nM. The solid black lines are best-fit values for the global kinetic fitting of the results with a single site function; (c) ESI-MS negative mode spectra of the competition binding of sequences (C) -GAAC-, -GAAAC-, -GAAAAC-, and -GAAAAAC- (10 μM each); with 40 μM DB2528 in buffer (50 mM ammonium acetate with 10% methanol (v/v), pH 6.8). The top figure shows the ESI-MS spectra of free DNA mixtures, and the bottom figure shows the ESI-MS spectra of DNA mixture with DB2528. The ESI-MS results shown here are deconvoluted spectra, and molecular weights are shown with each peak

and increasing aggregation in solution, especially at n ¼ 5 and above. Compounds with n ¼ 4 do not have the correct shape to complement the MG curvature (Liu et al. 2012; Cory et al. 1992) and also have poor solution properties. A number of analogs of DB2528 with n ¼ 5 were prepared with different substituents in an effort to get a longer compound with the correct curvature that had improved solution properties. None of the compounds, however, was able to be successfully studied with our biophysical methods. Thus, in the DB2528 design scheme, only analogs with n ¼ 3 had adequate solution properties for successful recognition of two GC bps. The poor solution properties of the n ¼ 5 compounds cannot simply be due to the linker, however, since compounds with the same n ¼ 5 connecting group but linked amidine-benzimidazole units have good solution properties (Cory et al. 1992). These types of compounds have been studied in detail with a range of linkers and bind strongly to DNA with AT bp specificity. Thiophenes frequently give some difficulties with solubility, and this must be compounded in n ¼ 5 analogs of

858

W. D. Wilson and A. Paul

DB2528 with two thiophenes. DB2528 does not bind well to a single GC segment (AAAAGTTTT, ΔTm ¼ 2  C) but binds strongly with the target two GC bps sequence, -GAAAC-, ΔTm ¼ 9.1  C. It binds more weakly with longer -GAAAACand -GAAAAAC- sequences (ΔTm ¼ 6  C). It is very important that it has no significant binding with an all AT sequence (ΔTm ¼ 1  C) and more closely spaced GC bps (-GC-, -GAC-, -GAAC-), all with ΔTm ¼ 1–2  C. Modification of the amidine or thiophene groups can enhance interactions with the MG in some cases, but the changes did not enhance binding with DB2528 (Guo et al. 2017). The replacement of N-Me-benzimidazole in DB2528 with benzimidazole gives a MG binder that shows strong binding with AT sequences and weaker binding to the GC sequences as expected for a benzimidazole group. This result confirms the importance of the central N-Me benzimidazole-thiophene group for GC bp recognition. Based on the ΔTm results, DB2528 binding to the target sequence -GAAAC- was evaluated by an SPR assay. The compound binds strongly with -GAAAC- and global kinetics fitting indicated a single strong binding site with a KD of 5 nM. Analysis of kinetics indicated a rapid on-rate (ka ¼ (2.83 +/ 0.5)  106 M1 s1) and a very slow off-rate (kd ¼ (1.7 +/ 0.4) x 102 s1) to account for the very low KD (Fig. 19b). The association and dissociation rates of DB2528 with the -GAAGsequence are quite fast with a higher KD ¼ 149 nM in agreement with the ΔTm values. Binding to the pure AT and single GC bp containing sequences was also evaluated by SPR, and it is encouraging that DB2528 has no detectable binding with either sequence under our conditions. These results indicate excellent selectivity and strong binding for a two-GC sequence that has a specific distance between the two GC bps. The KD was also determined by a fluorescence anisotropy titration of the compound with the -GAAAC- sequence, and the KD value obtained is in agreement with the results from SPR, as expected. Competition ESI-mass spectrometry (MS) provides a direct method for analysis of relative binding stoichiometry, specificity, and affinity for DNA complexes (Paul et al. 2015a; Laughlin and Wilson 2015). In this method, several closely related DNA sequences are simultaneously mixed with a compound to create a competition that allows the analysis of DNA interaction specificity and relative affinity. Competition ESI-MS results for DB2528 with -AAAATTTT-, -AAAAGTTTT-, and -GAAAC-, are shown in Fig. 19c. In the figure, the upper plot has peaks for the three DNA sequences. The peak for -GAAAC- (9773) decreases on the addition of DB2528 with the simultaneous appearance of a new peak at m/z ¼ 10,510 that is characteristic of a 1:1 -GAAAC–DB2528 complex. Under these conditions, there is no complex peak with any of the other DNA sequences (Guo et al. 2017). In ESI-MS analysis of DB2528 with sequences with different numbers of AT sequences, only the signal for -GAAAC- (9773) decreases with the simultaneous appearance of a new peak at m/z ¼ 10,510 for a 1:1 GAAAC–DB2528 complex. The ESI-MS results clearly show that DB2528 binds with -GAAAC- very strongly and specifically in a 1:1 complex (Fig. 19c). Although the experimental results show that DB2528 binds strongly and specifically to the -GAAAC- sequence, they do not tell us how the compound, especially the flexible linker, can fit to the MG. To address this question, MD simulations for

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

859

Fig. 20 (a) Snapshot of MD simulations of minor groove view of the DB2528-ds[(50 CCAAAGAAACTTTGG-30 )(50 -CCAAAGTTTCTTTGG-30 )] complex. The spheres model in green-orange-blue-red-yellow (C-H-N-O-S) color scheme represents DB2528. The DNA bases are represented in a ball and stick and ribbon model with cyan-white-red-blue-orange (C-H-O-N-P) color scheme; (b) the important interactions between different sections of the DB2528–DNA complex are illustrated. DB2528 forms four direct H-bonds (black dashed lines) with DNA bases

the DB2528 complex with the target sequence were conducted with the AMBER software suite and the ff99 force field (Case et al. 2014). Force constants for DB2528 were added to the force field and the structure determined as previously described (Athri and Wilson 2009; Špačková et al. 2003). A view of the MG in the complex (Fig. 20) shows that DB2528 is able to match the curvature of the DNA MG and can cover a full turn of the double helix with excellent contacts and van der Waals interactions with the MG molecular walls. Both N-Me- accept an H-bond from -N-H of G groups in the MG while the terminal amidines form H-bonds with O2 of T groups. Both amidines are strongly hydrated at the ends of DB2528 in the MG, and this certainly contributes to the energetics of binding and the fit to the groove (Fig. 20). The compound curves around the groove and there is an excellent indexing of both N-Me- benzimidazole-G-NH and amidine-AT bp interactions. In summary, the goal of this section of the research was to link previously designed single GC bp-binding modules with a flexible linker for recognition of two GC bps in the PU.1-lambda B binding sequence, -AGAAACT, that has functional significance for PU.1 inhibition. With optimized design, synthesis, and analysis a linked heterocyclic diamidine has been engineered that is an excellent match to the DNA MG shape. In addition, registry of the compound with DNA functional groups yields high specificity and affinity. These results show, for the first time, what

860

W. D. Wilson and A. Paul

can be accomplished with modular compounds of this type in selective targeting of complex DNA. The compound has a synthesis that is reasonable in cost and time for typical organic students. This linking method for two single GC bp-binding modules offers a promising method for development of a broad array of modular agents for control of gene expression.

MG Binders with Additional GC BP-Binding Capability: Compounds with Different GC Recognizing Modules Another two GC bps-binding sequence in the PU.1 promoter that is an important primary target is the -AAGTGAA- binding site that is adjacent to the 30 end of the conserved -GGAA- core of the promoter, 50 -AATAAAGGAAGTGAAACCAA-30 . A number of connected GC recognition units with equivalent modules, as with DB2528, were evaluated to bind to this sequence. The closely spaced GC bps in -AGTGA- are, however, a difficult recognition component for selective binding of linked heterocyclic amidines. Screening of the initial compounds indicated that none of them gave strong and selective binding. Next, recognition of -AGTGA- by connecting different single GC bp-binding modules was evaluated. It was noted that both DB2429 and DB2277 have equivalent terminal phenyl groups as part of their heterocyclic core (Fig. 10). This suggested a new structural concept that required linking parts of the two single GC bp recognizing modules of DB2429 and DB2277 through a common phenyl group. This linkage places the GC bp recognition units close enough to bind to -GTG- in a new compound, DB2763 (Fig. 21a). This approach was successful and strong and specific binding to -AAGTGAA- was obtained with DB2763, which is the first compound to couple two hetero-modules that each recognizes single GC bp modules. Interestingly, linking DB2277 through the reverse orientation of the compound gave DB2779 which is the same size and essentially an isomer of DB2763 (Fig. 21a). DB2779, however, binds the target -GTG- biding site much more weakly than DB2763 and with much lower selectivity. This result emphasizes the importance of optimization of compound structure for mixed sequence recognition of DNA. The interaction of DB2763 with the -GTG- or -GAC- target binding sites was evaluated by SPR analysis with the DNA immobilized on the sensor chip surface. DB2763 binds with a sub-nM KD in our usual binding buffer (100 mM salt), suggesting that it has a surprisingly good fit to the MG and with indexing of functional groups in the -GTG- and -GAC- binding site containing DNA sequence (Fig. 21b). The binding is so strong in the 100 mM salt solution that it is actually impossible to determine a quantitative binding constant because the compound is hardly dissociated after a 30 min time period. The compound was also tested with -AAGAAGTT- and -AAAGTTT- binding sites to evaluate specificity in binding. It dissociates much faster with these two DNAs and obviously binds more weakly. Since it is not possible to determine accurate binding results at the 100 mM salt

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

861

Fig. 21 (a) Chemical structures of hetero-dimers for two GC bps containing mixed DNA sequences; (b) Comparison of equilibrium binding constants (KA, M1) of two hetero-dimer, DB2763 and DB2779, with mixed single/two GC base pair(s) containing DNA sequences.; (c) ESI-MS negative mode spectra of the competition binding of sequences -AAAGTTT-, -GAC- and, -GAAC- (10 μM each); with 40 μM DB2763 in buffer (50 mM ammonium acetate with 10% methanol (v/v), pH 6.8). The top figure shows the ESI-MS spectra of free DNA mixtures, and the bottom figure shows the ESI-MS spectra of DNA mixture with DB2673. The ESI-MS results shown here are deconvoluted spectra, and molecular weights are shown with each peak

concentration, the salt was increased to 300 mM. Under these conditions, complete dissociation of DB2763 could be obtained with the -GTG- and single GC bp containing DNA sequence but with bindings constants that are still quite strong, between 5 and 10 nM KD (Fig. 21b). Surprisingly, with the -GAC- sequence, again very little compound dissociation was observed even at 300 mM salt, indicating that the compound continues to bind with a sub-nM KD. To evaluate binding specificity, competition ESI-MS experiments were conducted with these same three DNAs and DB2763. The results clearly showed excellent specificity for DB2763 with -AAGACTT-. At both a 1:1 and 2:1 ratio of DB2763 to the three DNAs, only binding to the -GAC- sequence was observed (Fig. 21c). In summary, by linking DB2429 and DB2277 modules through a common phenyl group, an entirely new type of two-GC bps recognizing compound has been engineered. This compound has an ideal curvature and group indexing for exceptionally strong binding to the target -GTG- binding site in the PU.1 promoter. It represents a breakthrough in our design of compounds to recognize a broad array of mixed AT and GC sequences and suggests a model for the direct linkage of additional single GC bp-binding modules.

862

W. D. Wilson and A. Paul

MG Binders with Additional GC BP-Binding Capability: Compounds That Recognize the GGAA Sequence That Is Conserved in the PU.1 Promoter Direct targeting of the MG of the central conserved 50 -GGAA-30 component and closely flanking regions of the ETS transcription factor promoters gives a key critical site for inhibiting ETS TFs and is an important goal of this research. Because of the adjacent spacing of the two GC bps in -GGAA- binding site, it is a difficult sequence to target with linked single GC bp heterocyclic modules. To accomplish this recognition goal, we have developed a new concept to target novel mixed bps DNA sequences. An initial idea in targeting adjacent -GG- bps was DB2830 (Fig. 22), a directly linked thiophene-N-i-Pr-benzimidazole module with a σ-hole motif (thiophene N-R-benzimidazole). This novel structure has not been previously used to recognize the DNA MG. Although this directly linked compound is capable of binding to the -GG- bps containing DNA sequence, it has relatively weak binding with a KD ¼ 553 nM with the test sequence -AAAGGTTT- (Fig. 23a). The curvature evaluation mechanism described above suggested that due to an excessive curvature (curvature angle 1100) for the MG (Fig. 23b), the DB2830 was unable to fit deeply into the MG at the adjacent -GG- bps sequence (Guo et al. 2021). To reduce the curvature of DB2830 for a better match to the MG and integrate into the short distance between adjacent GC bps, a unique structure with benzodiimidazole-bisthiophene fused planar core (curvature angle 1360), DB2831 was synthesized (Figs. 22 and 23b). The biophysical results and molecular simulation studies reveal that the preorganized thiophene-N-R- benzimidazole, σ-hole unit, exhibits a high stabilization towards the adjacent -GG- bps sequence (KD ¼ 4 nM, -AAAGGTTT-) (Fig. 23a, c). Moreover, DB2831 shows a weak stabilization potential for pure AT, single GC bp, -GTG- binding site, and even for adjacent -GC- bps

Fig. 22 The chemical structure of novel adjacent GG base pairs DNA minor groove binders

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

863

Fig. 23 (a) Overlay molecular curvatures for DB2830, and DB2831. Structure optimization of minor groove binding compounds was performed by using DFT/B3LYP theory with the 6–31 þ G* basis set in Gaussian 09; (b) comparison of equilibrium binding constants (KA, M1) of adjacent GG base pairs binders with -AAAGTTT, -AAAGGTTT-, and -AAAGTGTTT- binding sites containing DNA sequences. “X” represents no measurable KA under our experimental conditions; (C) representative SPR sensorgrams for DB2831 with the -AAAGGTTT-, -AAAGTTT-, and -AAAGTGTTT- binding site DNAs. In (a) the concentrations of DB2831 in these SPR experiments are 5–30 nM from bottom to top. In (b) and (c), the concentrations of DB2831 in these SPR experiments are 5–500 nM from bottom to top. In (a), solid black lines are best-fit values for global kinetic fitting of the results with a single site function

(-AAAGCTTT-) sequences (Fig. 23a). This critical finding illustrates the extraordinary sequence selectivity property of DB2831. Furthermore, the binding results revealed that DB2831 has an optimized size, curvature, and terminal flexibility for highly selective recognition and strong affinity for adjacent GG bps in an A-tract sequence. To test the effects of molecular size, curvature, and phenyl groups of the benzodiimidazole-bisthiophene compound on sequence-specific DNA binding, two truncated compounds with terminal thiophene amidines (DB2834 and DB2835) were synthesized (Fig. 22). The binding KD of DB2835 (AAAGGTTT,

864

W. D. Wilson and A. Paul

Fig. 24 ESI-MS negative mode spectra of the competition binding of sequences AAATTT; AAAGTTT; AAAGCTTT; AAAGGTTT; AAAGTGTTT: (10 μM each); with 40 μM DB2831 in buffer (50 mM ammonium acetate with 10% methanol (v/v), pH 6.8). The top figure represents the ESI-MS spectra of free DNA mixtures. The bottom figure represents the ESI-MS spectra of the DNA mixture with DB2831. The ESI-MS results shown here are deconvoluted spectra, and molecular weights are shown with each peak. The inset in the red box is the expansion of the bottom figure between 9600 and 10,000 m/z

KD ¼ 286 nM) indicates that terminal phenyl groups play significant roles for DB2831 affinity. One additional compound was prepared with substitutions designed to increase the twist of the aromatic rings of the core structure. DB2836 has a chloro group adjacent to the amidine on the phenyl rings on both sides. DB2836 shows good binding affinity (AAAGGTTT, KD ¼ 62 nM) (Fig. 23a) and excellent sequence selectivity. DB2836 neither binds with pure AT sequences nor single GC bp containing sequences in our experimental conditions. However, DB2838 with N-Me substituents shows almost no binding with the selected sequences due to the extensive aggregation of this compound under the experimental conditions (Fig. 23a) (Guo et al. 2021). Competition ESI-MS is a powerful high-throughput screening technique to determine the ligand-DNA sequence selectivity and binding sociometry (Paul et al. 2015a; Laughlin and Wilson 2015). As described above, in this assay binding to multiple DNA sequences can be evaluated with libraries of MG binding molecules or DNAs. DB2831 was tested in competition mass spectrometry with five DNA sequences with different binding sites and different molecular weight: As can be seen in Fig. 24a, five free DNA peaks are shown for -AAATTT- (7303), -AAAGTTT- (8539), -AAAGCTTT- (7921), AAAGGTTT (m/z 9158), and AAAGTGTTT (m/z 9775). On addition of DB2831, the peak intensity for

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

865

-AAAGGTTT- (9158) decreases with the simultaneous appearance of a new peak at m/z 9801 that is characteristic of a 1:1 AAAGGTTTDB2831 complex (Fig. 24b). There is no appearance of other DNA and ligand complex peaks. The observed spectra distinctly indicate the high specificity and affinity of DB2831 for the adjacent -GG- bps-binding sequence (Guo et al. 2021). MD simulation results of DB2831 bound to a B-form ds[(50 -CCAAAGGTTTCC0 3 )(50 -GGAAACCTTTGG-30 )] DNA with the target GG sequence site reveal exciting, complex features that are difficult to obtain from the experimental analysis. The 600 ns MD simulation was performed with the Amber 16 software package in the presence of 0.15 M Na+ (total 37 Na+ added), and to balance the excess Na+, a total of 17 Cl have been added. The MD analysis revealed that the compound fits well between the walls of the MG and is able to twist to adapt to the groove curvature (Fig. 25b). Due to the central planar core, the fit of the compound to the floor of the MG is more complex. The two unprotonated N of the N-isopropyl-benzodiimidazole group form strong H-bonds (based on distance) with the exocyclic two central NHs of G6 and G7 that project into the MG. In addition, the central -CH of the benzodiimidazole that faces into the groove is close to the O2 of C19 which H-bonds to Gs and forms a stabilizing interaction (Fig. 25c and d). These three interactions between the benzodiimidazole and two GC bps account for the strong and selective preference of DB2831 with the MG of the adjacent -GG- sequence. MD simulation also suggests that benzodiimidazole and two thiophene groups remain in close proximity to the floor of the MG. The two sulfur atoms of the thiophene provided additional ligand–DNA interaction by being close to the MG with an average distance of 3.3 +/ 0.2 Å from the floor of the MG and the two AT base pairs that are adjacent to the -GG- binding site (Fig. 25c and d). Phenyl -CH interactions with O2 of T, O2 of C, and N3 of A groups provide additional stability for the ligand–DNA complex. Due to the optimum indexing of the compound with the MG functional groups, there are no 180 rotational motions of the phenyl groups of DB2831 throughout the 600 ns MD simulation. Surprisingly, the MD simulation discovered that despite having optimum compound curvature, DB2831 is found in only approximately 10% of the simulation with two-terminal amidines connected directly to the O2 atoms of T with an average of 2.7–3.0 Å H-bond length. The combination of rigid thiophene-benzodiimidazolethiophene center and flexible terminal diamidine of DB2831 allows the compound to form a stable complex with one interfacial water-mediated H-bonding interaction (75% occupancy) with a terminal dynamic amidine(-NH) and O2 of T (Fig. 25c). Due to the dynamic nature of the terminal phenyl-amidine groups, approximately 15% the ligand–DNA complex also has interfacial water molecules at both amidine groups to link the amidines to O2 of T groups (Fig. 25c and d). The water molecules in the DB2831 binding site can effectively orient to provide favorable curvature to the DNA complex and interactions between the compound and DNA. The H-bonding ability, flexibility, and dynamics of the bound waters help provide the high binding affinity of DB2831 to the -AAAGGTTT- binding site. The interfacial waters act as H-bond donors and acceptors to connect DB2831 and DNA bases noncovalently. Surprisingly, due to the strong interaction with DNA bases, the

866

W. D. Wilson and A. Paul

Fig. 25 (a) The DNA sequence used for MD analysis; red-underlined bases indicate the binding site of the compounds; Molecular Dynamics (MD) model of DB2831 bound to an AAAGGTTT site; (b) Snapshot of MD simulations of surface model viewed into the minor groove of the -AAAGGTTT- binding site with bound DB2831. The ball & stick model in green-orange-bluered (C-H-N-O) color scheme represents DB2831. The DNA bases are represented in tan-white-redblue-orange (C-H-O-N-P) color scheme. The important interactions between different sections of the DB2831–DNA complex are illustrated in (c) and (d); (c) DB2831 forms three direct hydrogen bonds (black dashed lines) with DNA bases and one C–H interaction (purple dashed lines) with a DNA base. The direct interactions are (i) one of the terminal amidines with O2 of T21, (ii), and (iii) two central imidazole-Ns with two exocyclic H–N of G6 and G7. The central phenyl–C–H forms a C–H interaction with O2 of C19 (purple dashed lines). The other terminal amidine group forms interfacial water-mediated (red, ball and stick) H-bond with O2 of T9 (amidine–N–H–O–H–O–T9) to stabilize the compound in the minor groove. (d) DB2831 forms two direct hydrogen bonds (black dashed lines) with DNA bases and one C–H interaction (purple dashed lines) with a DNA base. The direct interactions are two central imidazole-Ns with two exocyclic H–N of G6 and G7. The central phenyl–C–H forms a C–H interaction with O2 of C19 (purple dashed lines). The two-terminal amidine groups form interfacial water-mediated (red, ball, and stick) H-bonds with O2 of T9 and T21 (amidine-N-H–O–H–O–T9 and amidine–N–H–O–H–O–T21) to stabilize the compound in the minor groove

central thiophene-benzodiimidazole-thiophene remains in a very stable position throughout the simulation. The strong binding affinity of the DB2831-AAAGGTT complex is the result of the H-bond network and electrostatic and van der Waals interactions between the compound and DNA. Extensive interactions are formed by the conjugated aromatic system of DB2831 with the sugarphosphate walls of the MG. There is also a terminal amidine-water network linking the compound to the floor of the MG. This critical finding implies that dynamic, terminal interfacial water molecules can cost a minimum amount of entropy of complex formation while

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

867

allowing extremely strong compound–DNA interactions (Guo et al. 2021). The DB2831 result indicates that entirely new synthetic modules may be required in many cases to strongly and specifically recognize complex DNA sequences with important biological functions.

Conclusion This chapter shows the exceptional versatility and wide applications of noncovalent DNA MG binders. The initial MG binders, discovered and prepared over 50 years ago, were all compounds that could bind only in the MG at AT base pair sequences. These compounds displayed a tremendous variety of structures – from polyamides like netropsin and distamycin to nuclear stains like DAPI and Hoechst 33258 to therapeutics like pentamidine, furamidine, and benenil. The use of these compounds as cellular probes has also had a major impact on cell biology, DNA structure, and therapeutics but much more is possible. The interactions of these compounds with DNA are as varied as their structures. Compounds from AT-specific MG binders that can also intercalate to compounds that can bind to complex mixed AT and GC base pair sequences are described. An exciting feature of this design scheme is that the heterocyclic cation design platform has consistently shown favorable solution and cell uptake properties. AT-specific MG binding compounds that can assume a planar conformation, such as DAPI, can also bind to GC sequences weakly by intercalation. MG binding compounds generally have a concave shape that closely matches the convex surface at the floor of the MG. Again, however, there is significant variety in their structures. For example, linear compounds can bind strongly in the MG if they can capture a terminal, interfacial water molecule that can complete the curvature of the compound. Such active terminal water molecules can rapidly exchange with bulk water and form dynamic H-bonds between the MG binder and DNA to account for their minor binding entropy cost and significant binding enthalpy. Compounds at the other extreme, which are too curved to bind well in the MG, can form stacked dimers that adjust their stacking to match the MG curvature and bind strongly and specifically. This type of shape-selective MG binding has only been barely exploited and is a significant growth opportunity for the design of new types of MG binders with new applications. Initial MG binders were AT specific, but applications could be significantly expanded if compounds with broader sequence recognition capability were available. To accomplish this goal requires design ideas for new types of MG binders with H-bond accepting groups incorporated into MG binding modules to allow them to favorably interact with the G-NH group that projects into the MG. Single GC bp recognizing modules could be combined in different compounds to bind to longer and more complex DNA sequences. For single GC bp-binding modules, pyridine was substituted for phenyl, azabenzimidazole for benzimidazole, and N-alkylbenzimidazole for benzimidazole. These modules could bind a GC bp in DNA when they were incorporated into compounds with appropriate structure and function groups to index with acceptor groups on AT and GC bps that project into the MG. As a result of

868

W. D. Wilson and A. Paul

concentrated synthetic and biophysical efforts, several modules were prepared that could strongly and specifically bind a GC bp in an AT sequence context. This represents the first systematic design of nonpolyamide sequence-specific recognizing compounds. Current efforts now are involved with determining linking modes for these, and other, modules to allow them to bind to more challenging sequences. Molecules that can recognize complex mixed base-pair sequences open an entirely new, important, and exciting chapter in the design of functional MG binders. Several such compounds are described in this chapter, along with their single GC bp-binding base modules.

References Antony-Debré I, Paul A, Leite J et al (2017) Pharmacological inhibition of the transcription factor PU.1 in leukemia. J Clin Invest 127:4297–4313 Athri P, Wilson WD (2009) Molecular dynamics of water-mediated interactions of a linear benzimidazolebiphenyl diamidine with the DNA minor groove. J Am Chem Soc 131: 7618–7625 Azad RN, Zafiropoulos D, Ober D et al (2018) Experimental maps of DNA structure at nucleotide resolution distinguish intrinsic from protein-induced DNA deformations. Nucleic Acids Res 46: 2636–2647 Barrett MP, Gemmell CG, Suckling CJ (2013) Minor groove binders as anti-infective agents. Pharmacol Ther 139:12–23 Case DA, Babin V, Berryman J, et al (2014) AMBER 14 Chai Y, Paul A, Rettig M et al (2014) Design and synthesis of heterocyclic cations for specific DNA recognition: from AT-rich to mixed-base-pair DNA sequences. J Organomet Chem 79:852–866 Cory M, Tidwell RR, Fairley TA (1992) Structure and DNA binding activity of analogues of 1,5-bis (4-amidinophenoxy)pentane (pentamidine). J Med Chem 35:431–438 Crnugelj M, Hud NV, Plavec J (2002) The solution structure of d(G(4)T(4)G(3))(2): a bimolecular G-quadruplex with a novel fold. J Mol Biol 320:911–924 Crowley KS, Phillion DP, Woodard SS et al (2003) Controlling the intracellular localization of fluorescent polyamide analogues in cultured cells. Bioorg Med Chem Lett 13:1565–1570 Crowley LC, Marfell BJ, Waterhouse NJ (2016) Analyzing cell death by nuclear staining with Hoechst 33342. Cold Spring Harb Protoc. https://doi.org/10.1101/pdb.prot087205 Depauw S, Lambert M, Jambon S et al (2019) Heterocyclic diamidine DNA ligands as HOXA9 transcription factor inhibitors: design, molecular evaluation, and cellular consequences in a HOXA9-dependant leukemia cell model. J Med Chem 62:1306–1329 Dervan PB, Edelson BS (2003) Recognition of the DNA minor groove by pyrrole-imidazole polyamides. Curr Opin Struct Biol 13:284–299 Dickie EA, Giordani F, Gould MK et al (2020) New drugs for human african trypanosomiasis: a twenty first century success story. Trop Med Infect Dis. https://doi.org/10.3390/ tropicalmed5010029 Elamin EA, Homeida AM, Adam SE et al (1982) The efficacy of berenil (diminazene aceturate) against Trypanosoma evansi infection in mice. J Vet Pharmacol Ther 5:259–265 Gonen R, Platkov M, Gardos Z et al (2022) A DAPI-based modified C-banding technique for a rapid achieving high photographic contrast of centromeres on chromosomes. Cell Biochem Biophys. https://doi.org/10.1007/s12013-022-01065-5 Guo P, Paul A, Kumar A et al (2016) The thiophene “sigma-hole” as a concept for preorganized, specific recognition of GC base pairs in the DNA minor groove. Chem Eur J 22:15404–15412 Guo P, Paul A, Kumar A et al (2017) A modular design for minor groove binding and recognition of mixed base pair sequences of DNA. Chem Commun (Camb) 53:10406–10409

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

869

Guo P, Farahat AA, Paul A et al (2018) Compound shape effects in minor groove binding affinity and specificity for mixed sequence DNA. J Am Chem Soc 140:14761–14769 Guo P, Farahat AA, Paul A et al (2021) Engineered modular heterocyclic-diamidines for sequencespecific recognition of mixed AT/GC base pairs at the DNA minor groove. Chem Sci 12: 5849–15861 Hannah KC, Gil RR, Armitage BA (2005) 1H NMR and optical spectroscopic investigation of the sequence-dependent dimerization of a symmetrical cyanine dye in the DNA minor groove. Biochemistry 44:15924–15929 Härd T, Fan P, Kearns DR (1990) A fluorescence study of the binding of Hoechst 33258 and DAPI to halogenated DNAs. Photochem Photobiol 51:77–86 Hargrove AE, Raskatov JA, Meier JL et al (2012) Characterization and solubilization of pyrroleimidazole polyamide aggregates. J Med Chem 55:5425–5432 Harika NK, Wilson WD (2018) Bound compound, interfacial water, and phenyl ring rotation dynamics of a compound in the DNA minor groove. Biochemistry 57:5050–5057 Harika NK, Paul A, Stroeva E et al (2016) Imino proton NMR guides the reprogramming of A•T specific minor groove binders for mixed base pair recognition. Nucleic Acids Res 44: 4519–4527 Harika NK, Germann MW, Wilson WD (2017) First structure of a designed minor groove binding heterocyclic cation that specifically recognizes mixed DNA base pair sequences. Chem Eur J 23: 17612–17620 Jenquin JR, Coonrod LA, Silverglate QA et al (2018) Furamidine rescues myotonic dystrophy type I associated mis-splicing through multiple mechanisms. ACS Chem Biol 13:2708–2718 Kawamoto Y, Bando T, Sugiyama H (2018) Sequence-specific DNA binding pyrrole-imidazole polyamides and their applications. Bioorg Med Chem 26:1393–1411 Kiakos K, Pett L, Satam V et al (2015) Nuclear localization and gene expression modulation by a fluorescent sequence-selective p-anisyl-benzimidazolecarboxamido imidazole-pyrrole polyamide. Chem Biol 22:862–875 Kopka ML, Yoon C, Goodsell D et al (1985) Binding of an antitumor drug to DNA, Netropsin and C-G-C-G-A-A-T-T-BrC-G-C-G. J Mol Biol 183:553–563 Kopka ML, Goodsell DS, Han GW et al (1997) Defining GC-specificity in the minor groove: sideby-side binding of the di-imidazole lexitropsin to C-A-T-G-G-C-C-A-T-G. Structure 5: 1033–1046 Larsen TA, Goodsell DS, Cascio D et al (1989) The structure of DAPI bound to DNA. J Biomol Struct Dyn 7:477–491 Laughlin S, Wilson WD (2015) May the best molecule win: competition ESI mass spectrometry. Int J Mol Sci 16:24506–24531 Lee M, Chang DK, Hartley JA et al (1988) Structural and dynamic aspects of binding of a prototype lexitropsin to the decadeoxyribonucleotide d(CGCAATTGCG)2 deduced from high-resolution 1H NMR studies. Biochemistry 27:445–455 Liu Y, Chai Y, Kumar A et al (2012) Designed compounds for recognition of 10 base pairs of DNA with two at binding sites. J Am Chem Soc 134:5290–5299 Lombardy RL, Tanious FA, Ramachandran K et al (1996) Synthesis and DNA interactions of benzimidazole dications which have activity against opportunistic infections. J Med Chem 39: 1452–1462 Lown JW (1992) Lexitropsins in antiviral drug development. Antivir Res 17:179–196 Lown JW, Krowicki K, Bhat UG et al (1986) Molecular recognition between oligopeptides and nucleic acids: novel imidazole-containing oligopeptides related to netropsin that exhibit altered DNA sequence specificity. Biochemistry 25:7408–7416 Mallena S, Lee MP, Bailly C et al (2004) Thiophene-based diamidine forms a “super” at binding minor groove agent. J Am Chem Soc 126:13659–13669 Matthes F, Massari S, Bochicchio A (2018) Reducing mutant huntingtin protein expression in living cells by a newly identified RNA CAG binder. ACS Chem Neurosci 9:1399–1408

870

W. D. Wilson and A. Paul

Miao Y, Lee MP, Parkinson GN et al (2005) Out-of-shape DNA minor groove binders: induced fit interactions of heterocyclic dications with the DNA minor groove. Biochemistry 44: 14701–14708 Ming X, Ju W, Wu H et al (2009) Transport of dicationic drugs pentamidine and furamidine by human organic cation transporters. Drug Metab Dispos 37:424–430 Mrksich M, Wade WS, Dwyer TJ et al (1992) Antiparallel side-by-side dimeric motif for sequencespecific recognition in the minor groove of DNA by the designed peptide 1-methylimidazole-2carboxamide netropsin. Proc Natl Acad Sci U S A 89:7586–7590 Munde M, Kumar A, Nhili R et al (2010) DNA minor groove induced dimerization of heterocyclic cations: compound structure, binding affinity, and specificity for a TTAA site. J Mol Biol 402: 847–864 Nanjunda R, Wilson WD (2012) Binding to the DNA minor groove by heterocyclic dications: from AT-specific monomers to GC recognition with dimers. Curr Protoc Nucleic Acid Chem Chapter 8. Unit8.8 Nanjunda R, Musetti C, Kumar A et al (2012) Heterocyclic dications as a new class of telomeric G-quadruplex targeting agents. Curr Pharm Des 18:1934–1947 Neidle S (2001) DNA minor-groove recognition by small molecules. Nat Prod Rep 18:291–309 Nguyen B, Neidle S, Wilson WD (2009) A role for water molecules in DNA-ligand minor groove recognition. Acc Chem Res 42:11–21 Nozeret K, Loll F, Cardoso GM et al (2018) Interaction of fluorescently labeled pyrrole-imidazole polyamide probes with fixed and living murine and human cells. Biochimie 149:122–134 Paine MF, Wang M, Generaux CN et al (2010) Diamidines for human African trypanosomiasis. Curr Opin Investig Drugs 11:876–883 Paul A, Nanjunda R, Kumar A et al (2015a) Mixed up minor groove binders: convincing AT specific compounds to recognize a GC base pair. Bioorg Med Chem Lett 25:4927–4932 Paul A, Chai Y, Boykin DW et al (2015b) Understanding mixed sequence DNA recognition by novel designed compounds: the kinetic and thermodynamic behavior of azabenzimidazole diamidines. Biochemistry 54:577–587 Paul A, Kumar A, Nanjunda R et al (2017) Systematic synthetic and biophysical development of mixed sequence DNA binding agents. Org Biomol Chem 15:827–835 Paul A, Guo P, Boykin DW, Wilson WD (2019) A new generation of minor-groove-bindingheterocyclic diamidines that recognize GC base pairs in an AT sequence context. Molecules 24(5):946 Pelton JG, Wemmer DE (1989) Structural characterization of a 2:1 distamycin A.d (CGCAAATTGGC) complex by two-dimensional NMR. Proc Natl Acad Sci U S A 86: 5723–5727 Poon GM (2012a) Sequence discrimination by DNA-binding domain of ETS family transcription factor PU.1 is linked to specific hydration of protein-DNA interface. J Biol Chem 287: 18297–18307 Poon GM (2012b) DNA binding regulates the self-association of the ETS domain of PU.1 in a sequence-dependent manner. Biochemistry 51:4096–4107 Pullman A, Jortner J (eds) Molecular basis of specificity in nucleic acid-drug interactions proceedings of the Twenty-Third Jerusalem symposium on quantum chemistry and biochemistry held in Jerusalem, Israel, May 14–17, 1990 Reddy BS, Sharma SK, Lown JW (2001) Recent developments in sequence selective minor groove DNA effectors. Curr Med Chem 8:475–508 Rentzeperis D, Marky LA, Dwyer TJ et al (1995) Interaction of minor groove ligands to an AAATT/ AATTT site: correlation of thermodynamic characterization and solution structure. Biochemistry 34:2937–2945 Rettig M, Germann MW, Ismail MA et al (2012) Microscopic rearrangement of bound minor groove binders detected by NMR. J Phys Chem B 116:5620–5627 Rohs R, West SM, Sosinsky A et al (2009) The role of DNA shape in protein-DNA recognition. Nature 461:1248–1253

25

Compound Shape and Substituent Effects in DNA Minor Groove Interactions

871

Sauer B, Skinner-Adams TS, Bouchut A et al (2017) Synthesis, biological characterisation and structure activity relationships of aromatic bisamidines active against plasmodium falciparum. Eur J Med Chem 127:22–40 Searle MS, Embrey KJ (1990) Sequence-specific interaction of Hoechst 33258 with the minor groove of an adeninetract DNA duplex studied in solution by 1H NMR spectroscopy. Nucleic Acids Res 18:3753–3762 Soeiro MN, Werbovetz K, Boykin DW et al (2013) Novel amidines and analogues as promising agents against intracellular parasites: a systematic review. Parasitology 140:929–951 Špačková TE, Cheatham F, Ryjáček F et al (2003) Molecular dynamics simulations and thermodynamics analysis of DNA-drug complexes. Minor groove binding between 40 ,6-diamidino-2phenylindole and DNA duplexes in solution. J Am Chem Soc 125:1759–1769 Tanious FA, Hamelberg D, Bailly C et al (2004) DNA sequence dependent monomer-dimer binding modulation of asymmetric benzimidazole derivatives. J Am Chem Soc 126:143–153 Thuita JK, Wang MZ, Kagira JM, et at (2012) Pharmacology of DB844, an orally active aza analogue of pafuramidine, in a monkey model of second stage human African trypanosomiasis PLoS Negl Trop Dis 6:e1734 Waring M (1991) Binding of antibiotics to DNA. Ciba Found Symp 158:128–142 Wenzler T, Yang S, Braissant O et al (2013a) Pharmacokinetics, trypanosoma brucei gambiense efficacy, and time of drug action of DB829, a areclinical candidate for treatment of second-stage human african trypanosomiasis. Antimicrob Agents Chemother 57:5330–5343 Wenzler T, Yang S, Braissant O et al (2013b) Pharmacokinetics, Trypanosoma brucei gambiense efficacy, and time of drug action of DB829, a preclinical candidate for treatment of second-stage human african trypanosomiasis. Antimicrob Agents Chemother 57:5330–5343 Wilson WD, Tanious FA, Barton HJ et al (1990) DNA sequence dependent binding modes of 40 ,6diamidino-2-phenylindole (DAPI). Biochemistry 29:8452–84561 Wilson WD, Nguyen B, Tanious FA et al (2005) Dications that target the DNA minor groove: compound design and preparation, DNA interactions, cellular distribution and biological activity. Curr Med Chem Anticancer Agents 5:389–408 Wilson WD, Tanious FA, Mathis A (2008) Antiparasitic compounds that target DNA. Biochimie 90:999–1014 Yang S, Wenzler T, Miller PN et al (2014) Pharmacokinetic comparison to determine the mechanisms underlying the differential efficacies of cationic diamidines against first- and second-stage human African trypanosomiasis. Antimicrob Agents Chemother 58:4064–4074 Zhou T, Yang L, Lu Y et al (2013) DNA shape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res 41:W56–W62

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

26

Yue Ma, Keisuke Iida, and Kazuo Nagasawa

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G4 and G4 Ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TMPyP4: A Macrocyclic G4 Ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Telomestatin: A Natural Macrocyclic G4 Ligand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Macrocyclic Polyoxazoles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HXDVs as Macrocyclic Polyoxazoles (Rice Group) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . OTDs as Macrocyclic Polyoxazoles (Nagasawa Group) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

874 874 875 877 878 878 879 899 899

Abstract

The various functions of G-quadruplexes (G4s) are starting to emerge with the development of G4 ligands, which possess selective and strong interacting ability with G4. To achieve the selective interaction of G4 with small molecules, it is necessary to target the characteristic motif of the G-quartet in G4. In this chapter, we described an overview of the macrocyclic polyoxazole-type compounds inspired by the natural G4 ligand of telomestatin, so-called OTDs. The OTDs possess six or seven oxazoles in the macrocycles, and the structure has a similar size to the G-quartet. By introducing some side chains in the OTD core structure, regulation of the G4 properties including stability or topologies was Y. Ma Tokyo Medical and Dental University, Tokyo, Japan e-mail: [email protected] K. Iida Chiba University, Chiba, Japan e-mail: [email protected] K. Nagasawa (*) Tokyo University of Agriculture and Technology, Tokyo, Japan e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_30

873

874

Y. Ma et al.

demonstrated. In addition, fluorescence-type OTDs for the visualization of G4 in vitro as well as in vivo and a biotin-conjugated OTD for the isolation of G4-forming sequences with a pull-down strategy were developed. The application of anticancer drugs using OTDs was demonstrated. Some proteins are known to regulate the biological activity of G4 by specific binding to these structures, and these interactions have been shown to be regulated by OTDs.

Introduction Nucleic acids form characteristic higher-order structures, of which one of the bestknown and most common is the double helix, known as B-form DNA. In addition, about ten kinds of “non-B-type” nucleic acid structures (non-B-form DNA) have been identified. Among them, the G-quadruplex (G4) and i-motif have been most extensively studied from a wide range of perspectives, including structural analysis, identification of the sequences forming these structures, physiological roles, associated biological functions (mainly for G-quadruplex), and subcellular localization. Small molecules that selectively interact with G4, so-called G4 ligands, have made a great contribution to such studies, and various types of G4 ligands have been developed. This chapter deals with G4 ligands having macrocyclic core structures. We will focus especially on the natural G4 ligand telomestatin and on its synthetic derivatives, oxazole telomestatin derivatives (OTDs), and related compounds.

G4 and G4 Ligands G-quadruplex (G4) is a higher-order structure formed in single-stranded guaninerich regions of DNA and is constructed by stacking planar G-quartets, each of which consists of four guanines, in the presence of a monovalent cation, K+ or Na+, through π-π interactions. Each G-quartet is stabilized by Hoogsteen-type base pairing at N7 of guanine, different from the Watson-Crick-type base pairing seen in the duplex. The monovalent cations assist the stacking of G-quartets by coordinating with the carbonyl oxygen of guanine. These interactions lead to the formation of G4, which is composed of loops, grooves, and G-quartets (Fig. 1). A O

N N

G

B

H H N

N H

N

N H H

O

R N

C

N

N

N

A

N

N

G

H N 2 H

G-C pair

H

O

N! H

H N

O H N

N

Me

T

N

A-T pair

O

7N N R

G N

N

Mn+ N

N

N H

H N

H

H N H

G

O

H

N

G-quartet

R N

N

G O

O6

H N H

N

H

N

N

G G

G

Loop

G

Mn+ Mn+

N R

M = cations, n = 1∼3

Groove

G-quadruplex; G4

Fig. 1 (A) Watson-Crick base pairs, (B) G-quartet and G-quadruplex structures

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

875

Me N S

G-quartet stacking O N

N

HN

Me N

N Me NH

O

N

O

Me

N

N

O

N N

O

Groove binding

N

N

Me

N O

O N Me

TMPyP4 (1)

Telomestatin (2)

Fig. 2 Structures of TMPyP4 (1) and telomestatin (2) and their modes of binding

These characteristic motifs in G4, i.e., loops, grooves, and G-quartet planes, can be recognized specifically by small molecules. The grooves are considered to be major recognition sites, though ligand binding to grooves is often competitive with respect to binding to duplexes or triplexes, and only a limited number of G4 ligands specifically targeting the grooves have been developed (Hamon et al. 2011; Di Leva et al. 2013; Martino et al. 2007; Ali and Bhattacharya 2014; Cosconati et al. 2009, 2010, 2012; Li et al. 2009). On the other hand, the G-quartet is a G4-specific structure and has a larger planar area than the duplex structure formed by Watson-Crick base pairing. Therefore, compounds targeting the G-quartet have been intensively explored as G4-selective ligands. Most of these compounds have highly planar structures involving aromatic motifs, which can stack efficiently with the G-quartet plane through π-π interactions (Chung et al. 2014; Collie et al. 2012; Kotar et al. 2016; Bazzicalupi et al. 2013; Lin et al. 2018). Indeed, macrocycles containing aromatic structures, represented by TMPyP4 (Tetrakis-(N-methyl-4-pyridyl)porphine; 1) and telomestatin (TMS; 2), are typical G4 ligands that interact selectively with the G-quartet and have been extensively applied in analyses of G4 functions. In the following sections, we will first describe TMPyP4; then we will discuss the natural G4 ligand telomestatin and its synthetic derivatives, oxazole telomestatin derivatives (OTDs) (Fig. 2).

TMPyP4: A Macrocyclic G4 Ligand Porphyrins are G4 ligands in which the macrocyclic structure stacks with the G-quartet. This interaction mode was confirmed by X-ray and NMR studies of telomeric G4 and the porphyrin derivative TMPyP4, a well-known G4 ligand that interacts strongly with G4, though it also interacts with dsDNA (Anantha et al. 1998; Wheelhouse et al. 1998; Parkinson et al. 2007). Despite its moderate selectivity for G4, TMPyP4 has been widely used to study the pharmacological effects associated with G4 stabilization because of its ready availability and strong G4-binding ability.

876

Y. Ma et al.

Various structural developments of porphyrin-based G4 ligands have been reported. Among them, ligand 3 with manganese as the central metal and long side chains shows remarkable G4-interacting ability (Dixon et al. 2007). In particular, it showed 1000-fold selectivity for telomeric G4 over dsDNA. This high selectivity is attributed to the presence of manganese. In order to further improve the G4 selectivity of porphyrin-type G4 ligands, the introduction of anionic side chains was investigated. This approach was successful, and the ligand NMM (N-methyl mesoporphyrin IX: 4) showed high selectivity based on targeting the G-quartet with the porphyrin structure, and the anionic side chains improved selectivity over the duplex (Fig. 3) (Nicoludis et al. 2012). Hurley and co-workers reported that TMPyP4 (1) stabilizes c-myc G4 in the promoter region, thereby inhibiting transcriptional initiation by RNA polymerase II and repressing the transcription of c-myc (Grand et al. 2002). Although c-myc is upregulated in many cancers, and its overexpression is correlated with a poor prognosis, it does not have a druggable structure, making it difficult to develop as a drug target. However, the discovery that c-myc expression is suppressed by G4 ligands has opened up a new approach for drug development. Recently, it has been suggested that porphyrin derivatives, which are biosynthesized in the body, may contribute to the treatment of ATR-X syndrome, an intractable disease that causes severe mental retardation (Shioda et al. 2018). ATR-X syndrome is caused by dysfunction of the ATRX protein, which is thought to regulate the expression of multiple genes via chromatin remodeling (Law et al. 2010). Dysfunctional ATRX was reported to recover its biological functions upon binding to G4. Furthermore, ingestion of a 5-ALA (Peng et al.

N

HN

Me

O

Me N

Me

O

N

HN

Mn N

Me

NH N

N O N Me

O

Me

N

Me

N

NH

Manganese porphyrin (3)

Fig. 3 Structures of TMPyP4 derivatives 3 and 4

Me

NH

Me

N N

Me

O

Me

OH

O

NMM (4)

OH

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

877

1997) led to increased biosynthesis of a porphyrin derivative, which stabilized G4, resulting in recovery of the biological function of the ATRX protein.

Telomestatin: A Natural Macrocyclic G4 Ligand Telomestatin (TMS; 2) is a natural G4 ligand isolated from the actinomycete Streptomyces anulatus 3533-SV4 in 2001 by Shin-ya and co-workers (2001). TMS (2) contains seven oxazoles and one thiazolino moiety in the macrocycle, which is similar in size to the G-quartet plane, and it specifically interacts strongly with G4 through π-π interactions with the G-quartet (Fig. 4). In 2002, Shinya and Hurley reported that TMS (2) inhibits telomerase activity with an IC50 value of 5 nM, as determined by the TRAP assay (telomeric repeat amplification protocol), by strongly stabilizing the telomeric G4 (Kim et al. 2002). Telomerase, which is related to cell immortalization, is overexpressed in more than 90% of cancer cells. On the other hand, it is expressed in only a limited number of normal cells, such as gametes. Therefore, G4, including telomeric G4, is considered to be a promising drug target for cancer treatment. Many studies have explored the effects of TMS (2) on various cancer cell lines. For example, TMS (2) was effective against human leukemia cell line U937, and it inhibited U937 tumor growth in a mouse model by 92.5% at the dose of 15 mg/kg (Tauchi et al. 2003, 2006). In this case, TMS (2) activated p38 MAP kinase, resulting in apoptosis of cancer cells via inhibition of their growth. TMS (2) was also effective against brain tumor stem cells (glioma stem cells; GSC) (Miyazaki et al. 2012). In this case, TMS (2) is believed to suppress the expression of c-Myb, which is overexpressed in GSC, and to promote the removal of TRF2 from telomeres, thereby causing replication stress (Tahara et al. 2006). TMS (2) is produced by actinomycetes, but only in small amounts, and consequently the chemical synthesis of the TMS (2) was investigated. Doi and co-workers reported a total synthesis of TMS (2) starting from amino acids, involving 31 steps with 3% overall yield (Doi et al. 2006). However, obtaining a sufficient supply of TMS (2) for further studies of its biological activity, whether by fermentation or chemical synthesis, remains an issue.

R N

H

N

S O N O

N

O N

N

O

N

H N H

O

N N

N

Me

H

O

N

O

N R

O

Telomestatin (2)

S-S interaction

N N

N

N

H N N

R N

O

N H O

Me

N

H N

O

H N H

H

N N

G-quartet

Fig. 4 Manual superposition of telomestatin (2) on G-quartet planar structure

H N H

N R

878

Y. Ma et al.

Macrocyclic Polyoxazoles Various compounds based upon the characteristic structure of TMS (2) have been reported as synthetic G4 ligands. Among them, polyoxazoles with macrocyclic structures were independently developed by the Rice and Nagasawa groups. These synthetic TMS analogs contain four to seven oxazoles with substituents originating from the amino acids used as starting materials. In contrast to TMS (2), these analogs are easily synthesized in high chemical yields. In addition, a variety of functional groups can be introduced in the side chains to modulate the ligand functions. Moreover, SAR studies have been carried out to develop G4-targeted anticancer agents based on these synthetic TMS analogs. The approaches used by the Rice and Nagasawa groups are discussed in the following sections.

HXDVs as Macrocyclic Polyoxazoles (Rice Group) Rice and co-workers have reported macrocyclic tetraoxazole 5 and hexaoxazoles HXDV (6) and 7 (Fig. 5) as G4 ligands (Minhas et al. 2006). These cyclic polyoxazoles were synthesized by coupling amino acids bearing oxazoles via amide linkages. The ability of these compounds 5–7 to stabilize telomeric G4 (telo24) and duplex (dsDNA) was evaluated by DNA melting analysis, monitored by UV spectroscopy (absorbance at 295 nm). The ΔTm values of 6 and 7 for telo24 were found to be 17.5 and 6.5  C, respectively, while neither of the ligands affected the Tm of dsDNA. Ligand 5 did not affect the Tm of either duplex or G4-forming DNA. Thus, the ligands 6 and 7 are selective stabilizers of G4. They further evaluated the interaction mode of HXDV (6) with telomeric G4 by means of replacement experiments, in which aminopurine was introduced in the loop of G4 instead of adenine (Barbieri et al. 2007). They measured the fluorescence change of aminopurine upon interaction with the ligand 6. Specifically, three kinds of telomeric DNA sequences ((TTAGGG)4) of 9AP, 15AP, and 21AP were prepared, in which adenine at the 9th, 15th, or 21st position was replaced with aminopurine,

O

O O

N

O

N H

O

N

N

O

N

O NH

HN

O

O N O

N

H N

O O

5

N

O

N N

O

O

HXDV (6)

O

N O

7

NH N

O O

N HN

O

N

H N

O

N H

O

O

N

O

N

N

N

O O

O

N H

N

N

H N

O

O

O

N H

O

N

H N

O O

TXTLeu (8)

IC50 = 0.5 µM (Anti-proligertive activity against to KB-3-1 cell)

Fig. 5 Structures of macrocyclic tetraoxazole 5 and hexaoxazoles HXDV (6), 7 and TXTLeu (8)

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

879

respectively, and each sequence was titrated with HXDV (6) in the presence of K+. This analysis showed no significant fluorescence change with 9AP, but the fluorescence intensity of 15AP and 21AP increased with the addition of HXDV (6). This suggests that HXDV (6) stacks with the G-quartet of nearby adenines in 15AP and 21AP, i.e., HXDV (6) interacts with telomeric G4 in a “terminal capping” mode. HXDV (6) exhibits potent growth-inhibitory activity against telomerase-positive cancer cells and induces apoptosis (Tsai et al. 2009). It also inhibits the cell cycle and causes M-phase cell cycle arrest by downregulating the expression of Aurora A, a regulator of the M-phase checkpoint. TXTLeu (8), a synthetic analog of HXDV that does not bind to G4, did not exhibit these effects, suggesting that HXDV (6) exhibits these properties by binding to G4.

OTDs as Macrocyclic Polyoxazoles (Nagasawa Group) Nagasawa et al. independently developed a series of macrocyclic OTDs (oxazole telomestatin derivatives) as G4 ligands, based on the structure of TMS (2), focusing on heptaoxazole (seven consecutive oxazoles; 7OTD) and hexaoxazole (two sets of three contiguous oxazoles; 6OTD). Since 7OTD has a more planar structure similar to TMS (2), compared to 6OTD (the dihedral angle is 175.5 for 7OTD and 156.3 for 6OTD), it exhibits higher affinity for G4 than 6OTD through π-π interactions with the G-quartet plane. However, although 6OTD usually exhibits lower affinity for G4 than 7OTD derivatives, it is possible to increase (or modulate) its G4-stabilizing ability by introducing appropriate functional groups (R) on the side chains (Fig. 6). The synthesis of 7OTD and 6OTD is depicted in Scheme 1 for L1H1-7OTD (9) (Tera et al. 2009) and L2H2-6OTD (10) (Tera et al. 2008) as examples. The coupling reaction of N-protected lysine 11 with serine methyl ester 12 yielded dipeptide 13,

O

O

N

N

O

N N

O

O

Me

N O

O

N

design

O

N N

O

Me

N

O

O

N

O

N

7OTD

N N

– Easy Access – Variety of Derivatives

N H

R1

O

R

O

S

N

R4 O

N

O

N

O

O

Telomestatin (2)

175.55

Fig. 6 Structures of 7OTD and 6OTD and their dihedral angles

R2 O

O

Side views 7OTD

O

N N

H N

R3

O

N

6OTD N

N

O

N H

6OTD

156.35

880

Y. Ma et al.

NHBoc

HO H2N

12

DMT-MM, NMM CbzHN

56%

CO2H

NHBoc

NHBoc

NHBoc

CO2Me

H N

CbzHN

DAST CO2Me

BrCCl3, DBU

Na2CO3

N

CbzHN O

O HO

11

13

HO N

H 2N O

LiOH·H2O N

CbzHN

CO2H

O

16

CO2Me

O

1) DMT-MM, NMM 2) DAST, Na2CO3 3) BrCCl3, DBU 53% (4 steps)

NHBoc

NHBoc

N

LiOH·H2O O

N

N O

MeO2C

O

HO O

19

CO2Me

O

N N N O

BocHN

O

N

18

O

O

CbzHN

N N

CO2Me

15

O

CbzHN

N

CbzHN

14

17

NHBoc

78% (2 steps)

CO2Me

NH2

O

20

CO2Me CbzHN

N

N N O BocHN

N

H N

O O

O

1) Pd(OH)2, H2 2) LiOH·H2O 3) DPPA, DMAP, DIPEA 4) TFA 50% (4 steps)

21

NH2

O O

N

O

DMT-MM, NMM 63% (2 steps)

NHBoc

O

N

O

N

N

N

O

O

N N

H2N

O

N H

N

H N

O O

L2H2-6OTD (10)

Scheme 1 Synthesis of L2H2-6OTD (10)

which was converted to oxazole 15 via oxazoline 14 by cyclization followed by oxidation. After hydrolysis of the ester group of 15, the resulting carboxylic acid 16 was coupled with oxazole amine 17 derived from two serine dipeptides, and the product was further converted to trioxazole 18 by cyclization and oxidation. After hydrolysis of 18, the resulting carboxylic acid 19 was coupled with trioxazole amine 20, derived from 18, to give bis-trioxazole 21. Hydrolysis of the ester followed by deprotection of the amino group in 21 results in macrocyclization of the resulting amino acid to give L2H2-6OTD (10) (Scheme 1) (Tera et al. 2008). L1H1-7OTD (11) was synthesized from 6OTD derivative 22. The TBS group of 22 was deprotected to give the alcohol 23, which was further converted into enamide 24 by mesylation, followed by treatment with DBU. The resulting enamide 24 was converted to heptaoxazole 25 by reaction with NBS and Cs2CO3 treatment. Then, the Boc group was deprotected with TFA to give L1H1-7OTD (9) (Scheme 2) (Tera et al. 2009). 7OTD and 6OTD can be applied as chemical tools to elucidate the functions of G4 by introducing appropriate functional groups to their side chains. In the following sections, we discuss the range of applications of these ligands in chemicalbiology studies, as well as the application of OTDs in drug development, especially for cancer treatment.

7OTDs as G4 Ligands: Chemical-Biology Studies Taking advantage of the selective recognition of G4 by macrocyclic heptaoxazole, a fluorescent G4 ligand was synthesized by introducing a fluorophore into the core

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs NHBoc

O O

O

N

O

N

N O

HF·Py

O

N

N

O

N

MsCl, TEA

O

; DBU 96% (2 steps)

N

H N

O

O

NBS O

N

N H

N

N O

N

O

N

N H

N

O

N

N

25

O

O

N N

O

O

O

O

TFA 96%

N

H N

NH2

O O

N

N

24

O

N

O

N

O

NHBoc

O

N

N

O

23

O

O

N

O

N H

O

HO

22

Cs2CO3 31%

O

N N

O O

TBSO

N N

N

H N

O

N H

NHBoc

O

NHBoc

O

O

N H

881

N

N O

O

L1H1-7OTD (9)

Scheme 2 Synthesis of L1H1-7OTD (9)

structure of 7OTD. The BODIPY-labeled macrocyclic heptaoxazole L1BOD-7OTD (26) selectively interacts with oligonucleotides that form G4 and stabilizes their G4 structure (Tera et al. 2010). Genes and artificial nucleotides forming G4 are directly visualized as green fluorescence, originating from the fluorescent ligand of L1BOD7OTD (26) (Fig. 7). L1BOD-7OTD (26) can visualize G4 in cell-free and cell-based assay systems. Another fluorescent G4 ligand, L1Cy5-7OTD (27), was synthesized by introducing a Cy5 fluorophore into L1H1-7OTD (9) (Iida et al. 2013a). L1Cy5-7OTD (27) selectively interacts with G4-forming oligonucleotides, exhibiting fluorescence, and was applied to high-throughput screening of G4-forming oligonucleotides (GFOs) on DNA microarrays. High-throughput DNA microarray screening in mouse CpG islands revealed 1998 novel GFO among 88,737 on the array (Fig. 8). Gene ontology analysis revealed that these sequences are mainly related to metabolic processes, transcriptional regulation, biosynthetic processes, and developmental processes. The biotinylated derivative of 7OTD, L1Bio-7OTD (28), was synthesized by introducing a biotin group into L1H1-7OTD (9) and was applied for the isolation of G4s using a pull-down strategy, as depicted in Fig. 9 (Iida et al. 2019). Biotinconjugated L1Bio-7OTD (28) is added to a mixture of DNA sequences containing G4-forming and non-G4-forming sequences. After incubation, the G4-forming sequences are recovered by incubation with streptavidin-coated magnetic beads. After magnetic separation, the beads are heated, dissociating the denatured G4 sequences, which are analyzed by gel electrophoresis of the supernatant. G4-forming DNAs of telomere, bcl-2, c-kit, c-myc, and k-ras were selectively pulled-down from mixtures of G4-forming and non-G4-forming DNA sequences (stem-loop sequence with double strand and the cytosine-rich complementary sequence to the G4-forming sequence).

882

Y. Ma et al.

N

O

N H

O N

O

O O

N N

N

O

N

N

N

O

O

N H

O N

N O

O

O

N

N

N O

N H

O

NH2

O

N

F B F N

N O

O

L1H1-7OTD (9)

L1BOD-7OTD (26)

A: L1BOD-7OTD

B: DAPI

C: Merged

Fig. 7 Structures of L1H1-7OTD (9) and L1BOD-7OTD (26). Fluorescence microscopic imaging of nuclei of fixed HeLa I.2.11 cells after treatment with L1BOD-7OTD (26) (0.5 μM) for 18 h. (A) L1BOD-7OTD fluorescence (λex ¼ 502 nm, λem ¼ 512 nm). (B) DAPI fluorescence (λex ¼ 358 nm, λem ¼ 461 nm). (C) Merged image of A and B

7OTD O

O

O

N

N H

O N

N N O

N N

Identification of G4

O

N

Cy5

N

N H

O

N

O Cl

O

L1Cy5-7OTD (27)

DNA probes All DNA probes

G4 Number of DNA probes

Newly identified

88737 probes

1998 probes

G3+N1-G3+N1-G3+N1-G3+

427 probes

296 probes

G2+N1-G2+N1-G2+N1-G2+

23779 probes

1702 probes

Fig. 8 Structure of L1Cy5-7OTD (27) and high-throughput DNA microarray screening of G4-forming DNA sequences in mouse CpG islands with 27

6OTDs as G4 Ligands As described above, 6OTD has a less planar macrocyclic structure than 7OTD and a lower affinity for G4 than 7OTD, but it does have the following advantages: (a) better chemical properties such as solubility, (b) easier synthesis, and (c) easier derivatization. Indeed, diverse structural modifications have been investigated, and various functional groups, such as alcohol, amine, guanidine, and aromatic groups,

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

O N H

O O

O

N

N H

O

O

N N

N N

4

N

N

O

O

H N

O

O

L1Bio-7OTD (28)

4

O H HN

Pull-down of G4s (Supernatant) 28 (μM)

S H NH

Input DNA

O

883

0

30

Recovery of G4s (Supernatant) 28 (μM) 100 0

30

100

G4 DNA dsDNA

Fig. 9 Structure of L1Bio-7OTD (28) and selective pull-down of G4-forming DNAs

have been introduced into the side chain. This has led to 6OTDs with G4-stabilizing ability comparable to or stronger than those of TMS (2) and 7OTDs. Interestingly, S2A2-6OTD (29) and L2H2-6OTD (10), which have bis-hydroxymethyl acetate and aminobutyl groups on their side chains, respectively, exhibit different behavior toward telomeric G4 of telo24 (Tera et al. 2008). S2A2-6OTD (29) interacts with telo24, but does not show clear G4 formation in CD analysis of the titration of the ligand into telo24. On the other hand, G4 formation was observed upon titration with L2H2-6OTD (10). In this case, the ligand induces a shift to the antiparallel topology of telo24 even in the absence of cations, and Tm at 292 nm was 38.1  C. The Tm of TMS (2) was 47.8  C under the same conditions. L2H2-6OTD (10) showed potent telomerase inhibitory activity with an IC50 of 15 nM by TRAP assay, while the corresponding value for S2A2-6OTD (29) was 2 μM. L2H2-6OTD (10) is a selective and potent stabilizer of G4-forming sequences, but not non-G4 sequences. These observations indicate that cationic functional groups, such as amine, in the side chain can promote potent and selective interaction with G4-forming sequences. L2G26OTD (30) bearing a guanidine group in its side chain showed a Tm value as large as 53.2  C with telo24. The interaction of the cationic group on the side chain of 6OTD with G4 was studied by NMR in the case of L2H2-6 M(2)OTD (31) and hybrid-type human telomeric DNA (Fig. 10) (Chung et al. 2013). Based on that work, electrostatic interactions between the cationic side chains in the 6OTD and the phosphate backbone of DNA were concluded to play an important role. It was further suggested that planarity is not the most important factor in the design of G4-targeted small molecule ligands and that attention should be paid to the flexibility of the molecule. Further, the binding affinity and selectivity of small molecules for specific G4 scaffolds can be enhanced by adjusting the length of substituents on the cationic side chains or by targeting specific loops or grooves of the selected G4 by attaching specific linkers to the side chains. The results of the NMR experiments supported the idea that it is possible to improve the binding affinity and selectivity of 6OTD for a specific G4 by targeting its inherent loops or grooves. In the following sections, we will discuss the application of 6OTDs as chemical probes from the following perspectives: (i) control of G4 topology, (ii) sequence selectivity, and (iii) detection of G4 by fluorescence in vitro as well as in vivo. The anticancer activity of 6OTDs will also be discussed (iv).

884

Y. Ma et al. NH OAc

O O

O

N

N H

O

N

N

O

N N

O AcO

H 2N

N

O

H 2N

O

H N

O

NH2

N O

N N

O

O

N H

N

N

H N

O

O

O

N N

O

O

N

N

O

N

H N

O

N H

N

N H

O

NH2

O

O

N

H N

O O

NH

S2A2-6OTD (29)

L2H2-6OTD (10)

L2G2-6OTD (30) L2H2-6M(2)OTD (31)

O O

O

N

O

N H

O

Me

N

N

O

N N

H 3N

NH 3

Me

N

H N

O O

L2H2-6M(2)OTD (31)

Hybrid-type telomeric G4 PDB:2MB3

Fig. 10 Structures of S2A2-6OTD (29), L2H2-6OTD (10), L2G2-6OTD (30), and complexation structure of L2H2-6 M(2)OTD (31) with telomeric G4, based on NMR studies

Anti-parallel-type

Hybrid-type

Parallel-type

Fig. 11 Three topologies of G4

Control of G4 Topologies by 6OTDs G4 is classified into three topologies based on the orientation of the G-tract: antiparallel, hybrid, and parallel (Fig. 11) (Ou et al. 2008). Interestingly, telomeric G4 can form those topologies depending upon the ions or additives present, even though the sequence remains the same. G4s are distributed widely throughout the genome and are associated with various biological phenomena. If we can selectively recognize and stabilize a particular G4, it should be possible to elucidate the biological phenomena associated with that G4. Nagasawa and co-workers examined the control of G4 topologies using unsymmetric G4 ligands obtained by changing the introduction position of the

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

885

aminobutyl side chain of the symmetric L2H2-6OTD (10). Since the aminobutyl side chain of symmetric L2H2-6OTD (10) plays a major role in the recognition of loops with antiparallel topology, different positions of the side chain in the 6OTD core structure may selectively recognize other topologies. Thus, unsymmetric 4,2-L2H2-6OTD (32) and 5,1-L2H2-6OTD (33) were synthesized (Sakuma et al. 2016). Circular dichroism (CD) titration analysis of these ligands revealed that the 4,2-type compound 32 induces antiparallel topology of telomeric G4, while the 5,1-type compound 33 induces hybrid-type topology. To understand the reason for this, computational docking experiments (docking study) were carried out with telomeric G4 and three types of L2H2-6OTD, 10, 32, and 33 (Fig. 12). In the case of antiparallel topology, docking models were obtained with all three ligands. The symmetric compound 10 showed the greatest stabilization ability. In this model, two aminobutyl side chains interact with phosphate group of DNA to form hydrogen bonds, whereas in the case of 4,2-L2H2-6OTD (32), only one of the two side chains interacts with DNA phosphate group, and the other is located in the groove of the G4. On the other hand, the side chain in 5,1-L2H2-6OTD (33) suffers steric hindrance with lateral loops in the antiparallel topology. In the case of the hybrid type, all three compounds were docked on the G-quartet plane, though 5,1-L2H2-6OTD (33) showed the best docking score. Thus, 4,2-L2H2-6OTD (32) preferentially induces antiparallel form of telomeric G4 owing to electrostatic interaction between the aminobutyl side chain and DNA phosphate, while 5,1-L2H2-6OTD (33) induces the hybrid topology to minimize steric repulsion with the G4 loop through the interaction between the G-quartet and the 5,1-core structure. An alternative approach to the development of topology-selective G4 ligands using 6OTD derivatives was investigated. Nagasawa and co-workers further developed the structure of 6OTD and found that L2G2-2M2EG-6OTD (34), which has

Top view

O

O

N

N H

O

O

N

N

O

N HN

NH2

O

O N

N N

4,2-L2H2-6O TD (32) (32)

NH2

O

O

N

HN N

N O

N O

O H2N

H2N

4,2-L2H2-6OTD (32)

Groove

N O

O

O

O

N H

Lateral loop

Docking model with Anti-parallel type telomeric G4

Propeller loop

5,1-L2H2-6OTD (33) 5,1-L2H2-6OTD (33)

109.3˚

104.9˚

Side view Docking model with Hybrid type telomeric G4

Lateral loop

Fig. 12 Structures of 4,2-L2H2-6OTD (32) and 5,1-L2H2-6OTD (33) and docking models with antiparallel and hybrid-type topologies of telomeric G4

886

Y. Ma et al.

four guanidinylbutyl chains, strongly induces the parallel form of telomeric G4 (Ma et al. 2018). In the parallel topology, all four G tracts are parallel, which provides four phosphate group-rich grooves with similar spatial size. The four guanidinylbutyl side chains in ligand 34 interact efficiently with the four grooves in the parallel topology. Interestingly, ligand 34 was found to induce a conformational change of telomeric G4 to parallel topology from both the anti-parallel type (formed in the presence of Na+) and the hybrid type (formed in the presence of K+). Since telomere sequences are the only sequences that can form three different topologies, they have generally been used to investigate the topology-inducing ability of G4 ligands, and there have been fewer reports on G4-forming sequences without topological diversity. Therefore, Nagasawa and co-workers further examined the topological change upon addition of 34 to G4-forming sequences for which topological diversity has not been reported. Specifically, the G4 sequence of bcl-2, which encodes an anti-apoptotic protein that forms a hybrid-type structure, changed its topology to a parallel type upon the addition of 34. The thrombin aptamer sequence consisting of two G-quartets forms an anti-parallel G4, and this topology was also changed to a parallel one by the addition of 34. Nagasawa and co-workers also examined the switching of topologies by applying L2H2-6OTD (9) and 34, which induce anti-parallel and parallel telomeric G4, respectively (Fig. 13). Sequential addition of both ligands to telomeric G4 resulted in reversible switching of these topologies. This ligand-induced conformational switching could provide a strategy for regulating the biological functions of G4. Recently, Nagasawa and co-workers have found that a linear polyoxazole compound L2G2-2M2EG-6LCO (35) induces antiparallel topology of telomere G4 (Fig. 14) (Sasaki et al. 2020). The macrocyclic L2H2-6OTD (10) does not induce antiparallel topology in the presence of ions. However, 35 induces antiparallel topology even in the presence of ions, indicating that it has a stronger ability to induce antiparallel (chair-type) topology than the cyclic ligand 10.

Top view

Side view

NH N H Me

O

H 2N

O

H N

HN

O

N

H2N

O

NH2

N

N

O

N N

Me H N

O

N H

N

H N

O O

NH N H

NH2

L2H2-6OTD (10)

L2G2-2M2EG-6OTD (34)

NH

L2G2-2M2EG-6OTD (34) Random Structure

Anti-parallel type

Parallel type

Fig. 13 Structure of L2G2-2M2EG-6OTD (34) and its docking model with parallel topology of telomeric G4 and topological switching with 34

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

NH2

H2N

NH2

NH2 O

N

N

O Me

O N O

N H

N

887

NH2 O

N N

O

CO2Me

O

Me

L2H2-2M2EA-6LCO (35)

Docking model with Anti-parallel type telomeric G4 (Chair-type)

Fig. 14 Structure of L2G2-2M2EG-6LCO (35) and docking model with antiparallel topology of telomeric G4

G4-Forming Sequence-Selective 6OTDs In the previous section, strategies for developing selective G4 ligands for the three topologies using 6OTD were described. To understand the biological functions of G4 in more detail, it is necessary to recognize and stabilize G4s formed in different sequences. The grooves in the G4 structure differ among G4-forming sequences, as well as among the three topologies. Based on these insights, Nagasawa and co-workers designed a new 6OTD derivative as a sequence-selective ligand, using a two-point recognition strategy targeting both G-quartet and groove. They successfully developed 6OTDs that strongly interact with telomeric G4 among the three G4-forming sequences of c-kit, K-ras, and telomere. More specifically, since NMR analysis of the complex between L2H2-6M(2) OTD (31) and telomeric G4 revealed a characteristic groove of telomeric G4 close to the C5 position of 6OTD, development of a sequence-selective ligand for telomeric G4 was investigated by introducing a side chain at the C5 position of 6OTD (Ma et al. 2019). Five 6OTD derivatives 36–40 were synthesized bearing a primary or tertiary amine at the C5 position of the 6OTD via linkers of different lengths, and their interactions with the three G4-forming sequences were examined (Fig. 15). Among them, 6OTD 40 bearing tertiary amines exhibits potent and selective stabilizing ability for telomeric G4. Docking studies of 6OTD 40 with telomeric G4 showed that the core structure of 6OTD stacked with the G-quartet of telomeric G4 and the side chain interacted appropriately with the groove of the parallel-type telomeric G4. Detection of G4 by Fluorescent 6OTDs In Vitro and In Vivo As described in section “7OTDs as G4 Ligands: Chemical-Biology Studies,” the 7OTD-type fluorophore G4 ligands were suitable for fluorescence imaging of G4s. In this section, we focus on two types of fluorescent G4 ligands, i.e., ligands that introduced fluorophores into G4 ligands in the cells and G4 ligands that interact with G4 and show fluorescence (turn-on type). We also describe their application for the detection of G4 in living cells and tracking the dynamics of G4 formation, respectively.

888

Y. Ma et al.

L2H2-6M(2)OTD (31) NH2

O O

O

O

N H

N

NH2

N

N

H N

H2N

H 2N

O

Me O

O

NH2

O

O

O

N H

N N

O H 2N

N n

O Me

N

H N

O

Ligand 40 g

Me

O N

N

N

H N

N

Additional binding site in G4

37 (n = 1) 38 (n = 2)

78 O

n

O

N

H N

O

NH2

O O

N N

O

N

N

O

Me

H N

O

N H

N

O

N

O

O

N

N

NH2

O

O

39 (n = 1) 40 (n = 2)

O

Docking model with Parallel type telomeric G4

Fig. 15 Structures of 6OTDs 36–40 bearing a side chain at C5 and docking model of 40 with parallel-type telomeric G4

O

N N

O H2 N

O

N H

Me

N O

N3

N

H N

O

O

N

OTD N

Hoechst®33342

NH2

O O

NH O

O O

CO-1 (42)

L2H2-6OTD-Az (41)

N F

B

CO-1

N F

N

Live cell

N3

N

N

Merged Triazole Addition of 41

Addition of 42

Interaction with G4

In situ Huisgen cycloaddition

Fig. 16 Structures of L2H2-6OTD-Az (41) and CO-1 (42) and visualization of RNA G4 in living cells with 41 and 42

For the identification and visualization of the cellular targets of 6OTD, a ligand that introduces a fluorescent group into 6OTD in the cell was developed. Thus, L2H2-6OTD-Az (41) (Abraham Punnoose et al. 2018) having an azide group in the side chain was designed to introduce the fluorophore by reaction with CO-1 (42), (Alamudi et al. 2016) an alkyne compound containing a fluorescent group, via Huisgen cyclization in live cells (Fig. 16) (Yasuda et al. 2020). CO-1 is membrane-permeable and does not react nonspecifically. The distorted cyclic alkyne in CO-1 allows the Huisgen cyclization reaction to proceed without requiring a

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

889

highly cytotoxic copper catalyst. When CO-1 (42) was applied to live cells treated with ligand 41, granular fluorescence was observed in the cytoplasm. To identify the target of the CO-1-derived fluorescence, nuclease treatment was performed. DNase or RNase treatment was performed under the same conditions as used for the live cells in the presence of CO-1 (42) and ligand 41. In the case of DNase treatment, there was no change in the fluorescence intensity, but in the case of RNase treatment, CO-1-derived fluorescence was significantly reduced. The addition of the G4 ligand PhenDC3 also decreased the fluorescence intensity in a concentration-dependent manner. Since Phen-DC3 interacts with and stabilizes G4 by π-π stacking with the G-quartet plane, like OTDs, it is strongly suggested that the target molecule of 41 is G4 in the living cells, especially G4-forming RNA. A turn-on type G4 ligand, L2H2-6OTD-Np-H (43), having a vinylnaphthalene group in the 6OTD core, which fluoresces only when it interacts with G4, was also developed (Ma et al. 2021). The vinylnaphthalene moiety is brought into the same plane as the 6OTD by interaction with the G-quartet, allowing it to interact with G4 and change the fluorescence properties of the ligand. The fluorescence properties of 43 in the absence of telomeric G4 (telo24) and double-stranded DNA (dsDNA) were also evaluated. No fluorescence of 43 was observed in the presence of dsDNA, but the addition of telo24 increased the fluorescence intensity in a concentrationdependent manner. The fluorescence quantum yield of 43 in the presence of telomeric G4 was tenfold greater than in the presence of dsDNA. Structural development of 43 was carried out focusing on the electronic state of the naphthalene according to Hammett’s rules, and the ligands 44–47 were synthesized by introducing functional groups on the naphthalene moiety in 43 (Fig. 17). Among the ligands, 6OTD-NMe2 (47) with a dimethylamino group on the naphthalene showed a large Stokes shift (over 200 nm) while maintaining selectivity for the G4-forming sequence. The ratio of the fluorescence quantum yields of 6OTDNMe2 (47) with dsDNA and telo24 was about 15-fold greater than those of TO (thiazole orange) and ThT (thioflavin T), which are known to be turn-on type ligands

O

O

N

N H

Me

N

OTD

O

N N

H 2N

O

N

O

L2H2-6OTD-NMe2 (47)

NH2

O

N

H N

O

w/o DNA

w/ w/ telo24 dsDNA

G4 R

O

L2H2-6OTD-Np-R 43 (R = H) 44 (R = OSO2Me) 45 (R = OMe) 46 (R = OH) 47 (R = NMe2)

OTD (43)

G-quarte p-p Interaction

Fig. 17 Structures of turn-on-type ligands of L2H2-6OTD-Np 43–47 and their mode of action

890

Y. Ma et al.

for G4. Thus, the 6OTD ligand 47 is a turn-on type ligand with higher G4 selectivity than TO (Lubitz et al. 2010) or ThT (Mohanty et al. 2013). The fluorescence properties of 47 with other G4-forming sequences, mutant sequences, and C-rich sequences were further investigated. 47 showed excellent turn-on-type fluorescence properties with various G4-forming sequences, including that of the telomeric G4 of telo24, while it did not show any fluorescence with mutant sequences and non-G4forming sequences. To investigate the mechanism of the turn-on-type fluorescence properties of the 6OTD 47, the fluorescence spectrum of 47 in highly viscous glycerol was examined. Under these conditions, the fluorescence intensity of 47 increased as the ratio of glycerol increased, i.e., as the viscosity of the ligand environment increased. This suggests that the fluorescence of the ligand is weak in the absence of interacting partners, such as G4-forming nucleic acid sequences, due to free rotation of the olefin moiety. On the other hand, the fluorescence increases under highly viscous conditions or in the presence of G4 because rotation of the olefin moiety of 47 is restricted, favoring the planar structure. Docking studies were consistent with this idea. Since G4 is believed to be in equilibrium with single-stranded DNA in vivo, the fluorescence properties of 47 were investigated under conditions where the G4 structure unwinds from the G4-ligand complex. The fluorescence intensity of 47 in the presence of telo24 was decreased by the addition of the C-rich sequence, which is the complementary sequence of telo24, in a concentration-dependent manner. This result suggests that the turn-on-type ligand 47 shows fluorescence only when it interacts with G4 and thus should be useful for the detection of dynamic G4 formation in vitro as well as in vivo. Anticancer Activity of 6OTDs It is known that the stabilization of the telomeric G4 promotes the apoptosis of cancer cells. Further, the formation of G4 structure in the promoter region of oncogenes suppresses gene expression. Many ligands that selectively stabilize G4 have been reported to inhibit the growth of cancer cells, and some of them showed anticancer activity in mice. The promoter region of c-myc is known to form G4, so-called c-myc G4. The gene product of c-myc is overexpressed about 80% of cancers. Because the c-myc G4 is very stable (its Tm value is ca 80 C under the condition with 60 mM of potassium ion), c-myc G4 is likely to form intracellularly (Siddiqui-Jain et al. 2002). Thus, c-myc G4 is considered a promising molecular target for cancer, and the development of G4 ligands targeting c-myc G4 has been extensively investigated. The c-myc G4 is strongly stabilized by the addition of the S2T1-6OTD (48) (Fig. 18), thereby suppressing c-myc gene expression, which results in a decrease of hTERT mRNA and its corresponding protein expression (Shalaby et al. 2010). Indeed, dose- and time-dependent antitumor effects of S2T1-6OTD (48) on human medulloblastoma (MB) and atypical teratoid/rhabdoid childhood brain cancer cell lines were observed. In addition, S2T1-6OTD (48) inhibits not only c-Myc but also telomerase activity, leading to telomere shortening and cellular senescence, which

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

891

Fig. 18 Structure of S2T16OTD (48) and stabilization of c-myc G4 and telomeric G4

Fig. 19 Structure of Y2H2-6M(4)OTD (49) and activity toward the JFCR cancer cell panel test, as well as against glioma stem cells. (A) JFCR cancer cell test panel. (B) Tumor volume change induced by treatment with 49. (C) Body weight change in mice treated with 49

would be consistent with inhibition of proliferation and induction of apoptosis of atypical teratoid/rhabdoid childhood brain cancer cell lines. Based on the results of in vivo experiments in mice, S2T1-6OTD (48) is considered to be a promising candidate as a therapeutic agent for childhood brain tumors. Among the OTD derivatives, Y2H2-6M(4)OTD (49) exhibited significant growth-inhibitory activity against several cancer cell lines in the JFCR (Japanese Foundation for Cancer Research) cancer cell test panel (Fig. 19) (Nakamura et al.

892

Y. Ma et al.

2017). In particular, Y2H2-6M(4)OTD (49) showed remarkable effects against U251 cells, a highly malignant tumor cell line. Thus, the therapeutic effect of compound 49 was evaluated in vivo using U251 tumor model mice, and 49 was found to inhibit tumor growth without causing weight loss. Glioblastoma (GBM) is one of the most common primary brain tumors in human adults and is associated with a very poor prognosis. GBM is resistant to treatment with anticancer drugs due to heterogeneity within the tissue and the presence of glioma stem cells (GSCs). Interestingly, Y2H2-6M(4)OTD (49) inhibited the growth of GSCs more effectively than that of NSGCs, which have lost their stem cell character. In addition, 49 was found to induce DNA damage and cell cycle arrest by stabilizing telomeric G4, inducing apoptosis of GSCs. Furthermore, it effectively inhibited GSC proliferation in model mice having GSCs implanted in the brain. Since Y2H2-6M(4)OTD (49) shows potent antitumor effects, the development of caged-type G4 ligand Y2H2-6M(4)OTD (49), in which the functional groups related to the anticancer effect are masked by photo-labile protective groups, was investigated with the aim of cancer cell-specific stabilization of G4 (Nakamura et al. 2012). Y2Nv2-6M(4)OTD (50), in which the phenolic hydroxyl group is protected with an Nv (nitroveratryl) group, was synthesized as a caged-type ligand (Fig. 20). Y2Nv6M(4)OTD (50) showed no stabilizing ability against G4-forming sequences, including telomeric G4, but G4-stabilizing ability appeared time-dependently upon photoirradiation, concomitantly with deprotection of the Nv group in 50 to generate Y2H2-6M(4)OTD (49). Y2Nv-6M(4)OTD (50) was also examined in a cell-based assay. Y2Nv-6M (4)OTD (50) itself did not inhibit cell proliferation, but it exhibited lightinduced cell growth-inhibitory activity comparable to that of Y2H2-6M(4) OTD (49). Thus, caged-type G4 ligands can selectively inhibit cancer cell growth upon photo-irradiation and appear to be promising candidates as anticancer agents.

O

N

Me O Me

MeO MeO

O

N H

O

Me

N

N

O

N N

OMe

N

H N

Me O

O

Y2Nv2-6M(4)OTD (50)

18 16 14 12 10 8 6 4 2 0

HPLC analysis (49) HPLC analysis (50) FRET melting assay

0

O NO2

100 90 80 70 60 50 40 30 20 10 0 5

10

15

20

25

DTm (°C)

O

OMe

% yield of 49 or 50

O2N O

30

irradiation time (min)

Fig. 20 Structure of caged compound Y2Nv-6M(4)OTD (50) and photo-irradiation-induced release of active compound 49

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

893

6OTD Multimers as G4 Ligands 6OTD Dimer CD titration and ESI-MS analysis revealed that two molecules of TMS (2) interact with telomeric G4. Molecular dynamics analysis suggested that TMS (2) stabilizes telomeric G4 by stacking at the top and bottom G-quartets (Rezler et al. 2005). Since the 6OTDs are thought to stabilize telomeric G4 similarly to TMS (2), 6OTD dimer, containing two molecules of 6OTD connected via a linker, was developed in order to stabilize the telomeric G4 more efficiently. Linkers with different lengths were investigated (Iida et al. 2009). Among them, the 6OTD dimer 51 with a six-carbon linker derived from adipic acid stabilized the telomeric G4 quite well. ESI-MS analysis of the complex of telomeric G4 and 6OTD dimer 51 revealed that the 6OTD dimer 51 interacts with telomeric G4 in a 1:1 ratio. Docking studies also suggested that the 6OTD dimer 51 stabilizes the G4 by sandwiching the top and bottom G-quartet surfaces (Fig. 21). Telomeric DNA exists in cells as a long repeating sequence of 130–210 bases, and various models have been proposed for consecutive G4 structures formed on long telomeric DNA. For example, a model in which individual G4s are independently linked like a string of beads or a model in which G4s are stacked on top of each other has been proposed. Although the complexation of G4 ligands with shorttype telomeric model sequences has been extensively studied, the complexation between long telomeric DNA and G4 ligands has not been well investigated, despite its importance for understanding the behavior of G4 ligands in the cell. Thus, the complexation mode of long telomeric DNA with G4 ligands was examined by comparing the interaction modes of 6OTD monomer 10 and dimeric 51 with longsequence telomeric DNA (Iida et al. 2013b). Förster Resonance Energy Transfer (FRET) melting experiments were performed on telomeric DNAs of different lengths (telo24, 48, 72, 96) using these compounds. The 6OTD dimer 51 stabilized each telomeric DNA at approximately half the concentration at which the 6OTD monomer 10 did. TRAP assays were also performed with the same series of four telomeric DNAs, and the 6OTD monomer (10) and dimer 51 inhibited telomerase with IC50 values of 15 nM and 7.6 nM, respectively. These results suggest that the

Fig. 21 Structure of 6OTD dimer 51 and proposed mode of interaction with telomeric G4

894

Y. Ma et al.

dimer 51 stabilizes the longer telomeric G4 more efficiently than the monomer 10. This conclusion was also supported by electrophoresis mobility shift assay (EMSA). 6OTD Tetramer A tetramer of 6OTD (52) has been synthesized as a ligand that might selectively interact with G4 in telomeres (Fig. 22) (Abraham Punnoose et al. 2017). G4 ligands that selectively recognize telomeric G4s are candidate anticancer agents, and so selective ligands targeting this sequence are of interest. In the telomere region, the TTAGGG sequence is repeated, and multiple G4s are expected to be formed. Mao and co-workers used optical tweezers to measure the Kd of 6OTD tetramer 52 with telomeric G4s of different lengths (telo24, telo48, telo72, telo96, and telo144). By pulling G4 fixed between two beads with optical tweezers, the tensile strength of G4 can be measured. Unlike the usual evaluation in the liquid phase, this

A O

HN

O

N

O

O

O

H N

O

O

N N

N

O

O

O

O

O

N

3

HN

3

O

O

O

N N

O

O

N

N

N

H N

O

O

O

O

N

O N H

N

N

N

N

3

O

O

N H

N

NH

O

3

O

N

N

H N

O N H

N

N N

N H2N

O

O

HN

3

O

N H

N

O

NH

O

3

O

H N

O

O

O

O

N O

NH2

NH

L2H2-6OTD-tetramer (52)

B

Laser Trap Force

G4 ligand

Bead

Bead dsDNA handle

G4

C Steric Hindrance

Monomeric G4 (Kd ≈ 700 nM)

Multimeric G4 (Kd = 16.2 nM)

Fig. 22 (A) Structure of 6OTD tetramer 52. (B) Evaluation of the stability of a single unit of G4 using optical tweezers. (C) Possible mode of interaction of tetramer 52 with multimeric G4 in the telomere sequence

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

895

technique can be applied to a single unit of G4 to determine the stability of G4 itself and the effect of ligand addition (Fig. 22). Therefore, we used optical tweezers to study the stabilizing ability of 6OTD monomer 10, dimer 51, and tetramer 52 toward three different lengths of telomeric G4 sequences (4G, one G4; 12G, three G4s; and 24G, six G4s) (Fig. 22). The monomer 10 and dimer 51 showed Kd values of 10–20 nM for all sequences, with no apparent length dependence. On the other hand, Kd values of over 700 nM, 58.8 nM, and 16.2 nM were observed for the tetramer 52 with 4G, 12G, and 24G, respectively. Thus, the tetramer binds much weaker to the sequence (TTAGGG)4 that forms one G4 than multiple G4 but shows stronger interacting ability as the length of the sequence increases. This is presumably because the 6OTD core in the tetramer 52 forms aggregates through π-π stacking in the absence of G4, so that interaction with consecutive G4s may be required to break the stacking interaction of 52 itself. Since 6OTD tetramer 52 selectively recognizes sequences with multiple repeats of G4, it could be a candidate for an anticancer drug selectively targeting telomere sequences. 6OTD Dendrimer Radially arranged dendrimer-type G4 ligand of 6OTDs has many potential G4binding sites and may exhibit a different interaction mode with G4 as compared with 6OTD monomers or chain-like oligomers. Therefore, dendrimer-type L2H26OTD-trimer (53) and L2H2-6OTD-hexamer (54) were synthesized, using a cyanuric acid-derived core with low cytotoxicity and ethylene glycol-derived dendrons with excellent biocompatibility (Pokhrel et al. 2022). The dendrimer core and dendrons have low nonspecific interactions with the negatively charged DNA backbone, which should be favorable for the evaluation of interactions with G4 or for studies of multivalency-related biological activity. The ability of these two dendrimer ligands 53 and 54 to interact with telomeric G4 was evaluated by the smDA (single-molecular displacement assay) method, where the binding ability of a ligand to G4 is measured with high sensitivity by displacement with a modified ligand using optical tweezers, and the Kd values of L2H26OTD-trimer (53) and L2H2-6OTD-hexamer (54) were found to be 13  1 nM and 4  1 nM, respectively. Thus, the multivalent dendrimer ligands show higher binding affinity for G4 than the 6OTD monomer 10 or L2H2-6OTD dimer 51 (Fig. 23). The telomerase-inhibitory activity of linear-type and dendrimer-type ligands was evaluated by means of TRAP assay. The dendrimer-type trimer 53 and hexamer 54 showed the strongest inhibitory activity, while linear dimer 51 showed weaker inhibition, and monomer 10 showed the lowest inhibitory activity (Fig. 23). The high binding affinity of 53 and 54 may be due to the multivalent nature of the dendrimers. The telomerase-inhibitory activities of these four ligands, 10, 51, 53 and 54, were proportional to their binding affinity to G4.

Control of G4-Protein Interaction by OTD Many of the biological functions of G4s are thought to involve proteins that target the G4 structure, and several G4-binding proteins have been reported. Recently, it

896

Y. Ma et al.

Fig. 23 Structures, Kd values, and telomerase-inhibitory activity of L2H2-6OTD-trimer (53) and L2H2-6OTD-hexamer (54)

has been reported that the interaction between G4 and its binding proteins is facilitated by G4 ligands, which thus modulate the protein function. G4-3R02 Protein with G4 T-S0530 is a guanine-rich DNA aptamer that binds to α-synuclein oligomer. Ikebukuro and co-workers reported that L1H1-7OTD (9) efficiently interacts with 3R02, a dimeric DNA aptamer of T-S0530, and stabilizes its structure (Tsukakoshi et al. 2016). This enhances the binding of the aptamer to the α-synuclein oligomer. Rif1 Protein with G4 Masai and co-workers discovered Rif1, a protein that is involved in the regulation of eukaryotic replication. A genome-wide study of binding sites of Rif1 revealed the presence of a consensus sequence (Rif1BS) in which multiple Gs are repeated to form G4. The interaction between Rif1 and Rif1BS was found to be effectively stabilized by the addition of L1BOD-7OTD (26) (Kanoh et al. 2015). This interaction may allow Rif1 to form a local chromatin structure that exerts a long-range inhibitory effect on origin firing. hnRNPA1 Protein with RNA G4 Telomeric RNAs exhibit various biological functions, including regulation of telomere length and heterochromatin formation. Telomeric RNA forms G4 structure and acts together with hnRNPA1 and POT1 to protect telomere ends and maintain genomic integrity (Flynn et al. 2011). Xu and co-workers analyzed the recognition mechanism and binding affinity of hnRNPA1 with telomeric RNA G4 using the G4 ligand L1Cy5-7OTD (27) (Liu et al.

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

897

2017). In electrophoresis analysis of RNA G4 in the presence of L1Cy5-7OTD (27), the intensity of the band corresponding to the G4-7OTD complex increased in a concentration-dependent manner. When hnRNPA1 was added to the mixture, a band corresponding to the G4-hnRNPA1-OTD complex was also observed, in addition to the band of G4-hnRNPA1 complex. The same authors were able to visualize the complex of telomeric RNA G4 and hnRNPA1 in cells by using the fluorescent G4 ligand L1Cy5-7OTD (27). Red fluorescence derived from L1Cy5-7OTD (27) coincided with the foci of green fluorescence due to the hnRNPA1 antibody. The intensity of red fluorescence was significantly reduced by RNase treatment, confirming that the red fluorescence was derived from RNA G4 bound to L1Cy5-7OTD (27) in the cells. BLM Helicase with G4 G4 inhibits DNA replication, but helicases such as Bloom helicase (BLM) can rescue DNA replication by destabilizing G4 (Mendoza et al. 2016). However, the effect of G4 ligands on this process was unclear. Balci and co-workers investigated the destabilization of G4 by BLM in the presence of three G4 ligands (L1H1-7OTD (9), pyridostatin (PDS), and PhenDC3), by using the single-molecule FRET (smFRET) method (Maleki et al. 2019). The results showed that although the ΔTm values of the three ligands were different, their effects on BLM activity were almost the same; the addition of the ligands resulted in a two- to threefold decrease in BLM activity. Thus, the G4 ligands suppressed BLM activity independently of their G4-stabilizing ability, suggesting that the G4 ligands act at the site(s) where BLM recognizes G4. S1 Nuclease with Telomeric G4 TMS (2) binds to telomeric G4 and enhances the activity of S1 nuclease, but the relationship between G4 stabilization and susceptibility to nuclease or helicase activity is not well understood. Tera and co-workers investigated the kinetics of S1 nuclease and the cleavage sites of telomeric G4 by S1 nuclease using the 6OTDs 10, 32 and 33, which have different effects on the thermal stability of telomeric G4 (Ishikawa et al. 2021). The analysis showed that the activity of S1 nuclease was promoted by the addition of these ligands. More specifically, the T50 (time for 50% degradation of oligonucleotides) in the presence of these ligands was 2.8 (min), 5.1 (min), and 10.1 (min), respectively, indicating that S1 nuclease activity is more effectively increased by ligands with high G4-stabilizing ability. MALDI-MS analysis showed that S1 nuclease cleaved loops 2 and 3 of G4 stabilized by 3,3-L2H2-6OTD (10). Loop 3 was cleaved first, followed by loop 2. The binding mode of 3,3-L2H2-6OTD (10) to telomeric G4 was examined using oligonucleotides in which adenine was replaced with 2-AP (aminopurine). It was found that 3,3-6OTD (10) accesses site 1 of G4 and flips out the adenine in loop 3. This facilitates access of S1 nuclease to the telomeric G4 cleavage site (Fig. 24).

898

Y. Ma et al.

Undigested HT24 (fraction)

1

No ligand 10 32 33

0.75

11-mer loop 3

5′

5′

0.5

0.25

3′

3′ 0 0

10

20

30

40

50

60

19-mer

Reaction time (min)

13-mer

loop 2

Fig. 24 Cleavage site of telomeric G4 by S1 nuclease in the presence of 3,3-L2H2-6OTD (10)

Guanine rich sequence G-quadruplex: G4

C

C

C

C

C

C Cytosine rich sequence i-motif: iM

X

H N H

X

X X

X

X

C

C

C

C

XX

iM

N

C

C

H

R N

N

N R

H N H

O

C C

X

O +

C

C

C

C

Fig. 25 Structure of i-motif (iM)

Micelle-Type Macrocyclic 4OTDs as G4/i-Motif Ligands The complementary strands of guanine-rich G4-forming sequences are enriched in cytosine residues, and these strands also form a characteristic high-order quadruplex structure called the i-motif (iM) (Fig. 25). The iM structure is composed of cytosine and protonated cytosine, forming base pairs that are stacked together in a vertical direction to form the quadruplex. Since one cytosine of the base pair must be protonated to form iM, this structure presumably forms under acidic conditions, so its physiological significance has been questioned. Recently, however, iMs have been observed even under neutral conditions, and iM is now recognized as one of the key factors for replication or translation in vivo, like G4. L2H2-4OTD (55), which is a derivative of 7OTD and 6OTD, contains four oxazole rings, with bisoxazoles linked together through amide bonds (Fig. 26) (Sedghi Masoud et al. 2018); L2H2-4OTD (55) does not interact with G4 but selectively interacts with iM. NMR analysis of the complex with telomeric iM suggested that L2H2-4OTD (55) interacts with loops 1 and 3 in the iM. This is the first example of an iM ligand with a macrocyclic structure.

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

899

A12

A11

| 'G| [ppm]

26

A11 A12

T22

T10 C1

O

N N

O H2N

C1

C13

N

C9

C2

N

H N

C9

C21

O

N H

C21 C2

C14

C20

C14

C8

O

C3 C19

O

L2H2-4OTD (55)

T22

T10

C13

NH2

O

0.01 0.005 0

180°

C15

5m C7

C20

C15

A18

5m

C3

C7

C19

U16 U16

T4

A6

A5 A17

loop 1

A18 T4

A6

loop 3

A17

A5

8

Fig. 26 Structure of L2H2-4OTD (55) and sites of interactions with telomeric iM, as elucidated by NMR

Conclusion This chapter provides an overview of macrocyclic G4 ligands, focusing on oxazole telomestatin derivatives (OTDs), which are synthetic derivatives of telomestatin, a natural G4 ligand. By targeting both the G-quartet and groove in the G4 structure, OTDs with topology and sequence-selectivity can be synthesized. Oligomerization of OTDs also results in specific selectivity for G4s in the long repeat sequences of telomeres. Fluorescent G4 ligands that can detect G4-forming sequences and G4 dynamics in cells are essential research tools for elucidating G4 functions. They are also candidates for the treatment of G4-related diseases such as cancer.

References Abraham Punnoose J, Ma Y, Li Y, Sakuma M, Mandal S, Nagasawa K, Mao H (2017) Adaptive and specific recognition of telomeric G-quadruplexes via polyvalency induced unstacking of binding units. J Am Chem Soc 139(22):7476–7484 Abraham Punnoose J, Ma Y, Hoque ME, Cui Y, Sasaki S, Guo AH, Nagasawa K, Mao H (2018) Random formation of G-quadruplexes in the full-length human telomere overhangs leads to a kinetic folding pattern with targetable vacant G-tracts. Biochemistry 57(51):6946–6955 Alamudi SH, Satapathy R, Kim J, Su D, Ren H, Das R, Hu L, Alvarado-Martínez E, Lee JY, Hoppmann C, Peña-Cabrera E, Ha HH, Park HS, Wang L, Chang YT (2016) Development of background-free tame fluorescent probes for intracellular live cell imaging. Nat Commun 7:11964 Ali A, Bhattacharya S (2014) DNA binders in clinical trials and chemotherapy. Bioorg Med Chem 22(16):4506–4521 Anantha NV, Azam M, Sheardy RD (1998) Porphyrin binding to quadrupled T4G4. Biochemistry 37(9):2709–2714

900

Y. Ma et al.

Barbieri CM, Srinivasan AR, Rzuczek SG, Rice JE, LaVoie EJ, Pilch DS (2007) Defining the mode, energetics and specificity with which a macrocyclic hexaoxazole binds to human telomeric G-quadruplex DNA. Nucleic Acids Res 35(10):3272–3286 Bazzicalupi C, Ferraroni M, Bilia AR, Scheggi F, Gratteri P (2013) The crystal structure of human telomeric DNA complexed with berberine: an interesting case of stacked ligand to G-tetrad ratio higher than 1:1. Nucleic Acids Res 41(1):632–638 Chung WJ, Heddi B, Tera M, Iida K, Nagasawa K, Phan AT (2013) Solution structure of an intramolecular (3 + 1) human telomeric G-quadruplex bound to a telomestatin derivative. J Am Chem Soc 135(36):13495–13501 Chung WJ, Heddi B, Hamon F, Teulade-Fichou MP, Phan AT (2014) Solution structure of a G-quadruplex bound to the bisquinolinium compound Phen-DC(3). Angew Chem Int Ed Engl 53(4):999–1002 Collie GW, Promontorio R, Hampel SM, Micco M, Neidle S, Parkinson GN (2012) Structural basis for telomeric G-quadruplex targeting by naphthalene diimide ligands. J Am Chem Soc 134(5): 2723–2731 Cosconati S, Marinelli L, Trotta R, Virno A, Mayol L, Novellino E, Olson AJ, Randazzo A (2009) Tandem application of virtual screening and NMR experiments in the discovery of brand new DNA quadruplex groove binders. J Am Chem Soc 131(45):16336–16337 Cosconati S, Marinelli L, Trotta R, Virno A, De Tito S, Romagnoli R, Pagano B, Limongelli V, Giancola C, Baraldi PG, Mayol L, Novellino E, Randazzo A (2010) Structural and conformational requisites in DNA quadruplex groove binding: another piece to the puzzle. J Am Chem Soc 132(18):6425–6433 Cosconati S, Rizzo A, Trotta R, Pagano B, Iachettini S, De Tito S, Lauri I, Fotticchia I, Giustiniano M, Marinelli L, Giancola C, Novellino E, Biroccio A, Randazzo A (2012) Shooting for selective druglike G-quadruplex binders: evidence for telomeric DNA damage and tumor cell death. J Med Chem 55(22):9785–9792 Di Leva FS, Zizza P, Cingolani C, D’Angelo C, Pagano B, Amato J, Salvati E, Sissi C, Pinato O, Marinelli L, Cavalli A, Cosconati S, Novellino E, Randazzo A, Biroccio A (2013) C0663108_Exploring the chemical space of G-quadruplex binders: discovery of a novel chemotype targeting the human telomeric sequence. J Med Chem 56(23):9646–9654 Dixon IM, Lopez F, Tejera AM, Estève JP, Blasco MA, Pratviel G, Meunier B (2007) A G-quadruplex ligand with 10000-fold selectivity over duplex DNA. J Am Chem Soc 129(6): 1502–1503 Doi T, Yoshida M, Shin-ya K, Takahashi T (2006) Total synthesis of (R)-telomestatin. Org Lett 8(18):4165–4167 Flynn RL, Centore RC, O’Sullivan RJ, Rai R, Tse A, Songyang Z, Chang S, Karlseder J, Zou L (2011) TERRA and hnRNPA1 orchestrate an RPA-to-POT1 switch on telomeric single-stranded DNA. Nature 471(7339):532–536 Grand CL, Han H, Muñoz RM, Weitman S, Von Hoff DD, Hurley LH, Bearss DJ (2002) The cationic porphyrin TMPyP4 down-regulates c-MYC and human telomerase reverse transcriptase expression and inhibits tumor growth in vivo. Mol Cancer Ther 1(8):565–573 Hamon F, Largy E, Guédin-Beaurepaire A, Rouchon-Dagois M, Sidibe A, Monchaud D, Mergny JL, Riou JF, Nguyen CH, Teulade-Fichou MP (2011) Toxapy_An acyclic oligoheteroaryle that discriminates strongly between diverse G-quadruplex topologies. Angew Chem Int Ed Engl 50(37):8745–8749 Iida K, Tera M, Hirokawa T, Shin-ya K, Nagasawa K (2009) G-quadruplex recognition by macrocyclic hexaoxazole (6OTD) dimer: greater selectivity than monomer. Chem Commun (Camb) 42:6481–6483 Iida K, Nakamura T, Yoshida W, Tera M, Nakabayashi K, Hata K, Ikebukuro K, Nagasawa K (2013a) Fluorescent-ligand-mediated screening of G-quadruplex structures using a DNA microarray. Angew Chem Int Ed Engl 52(46):12052–12055

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

901

Iida K, Majima S, Nakamura T, Seimiya H, Nagasawa K (2013b) Evaluation of the interaction between long telomeric DNA and macrocyclic hexaoxazole (6OTD) dimer of a G-quadruplex ligand. Molecules 18(4):4328–4341 Iida K, Tsushima Y, Ma Y, Sedghi Masoud S, Sakuma M, Yokoyama T, Yoshida W, Ikebukuro K, Nagasawa K (2019) Model studies for isolation of G-quadruplex-forming DNA sequences through a pull-down strategy with macrocyclic polyoxazole. Bioorg Med Chem 27(8): 1742–1746 Ishikawa R, Yasuda M, Sasaki S, Ma Y, Nagasawa K, Tera M (2021) Stabilization of telomeric G-quadruplex by ligand binding increases susceptibility to S1 nuclease. Chem Commun (Camb) 57(59):7236–7239 Kanoh Y, Matsumoto S, Fukatsu R, Kakusho N, Kono N, Renard-Guillet C, Masuda K, Iida K, Nagasawa K, Shirahige K, Masai H (2015) Rif1 binds to G quadruplexes and suppresses replication over long distances. Nat Struct Mol Biol 22(11):889–897 Kim MY, Vankayalapati H, Shin-Ya K, Wierzba K, Hurley LH (2002) Telomestatin, a potent telomerase inhibitor that interacts quite specifically with the human telomeric intramolecular g-quadruplex. J Am Chem Soc 124(10):2098–2099 Kotar A, Wang B, Shivalingam A, Gonzalez-Garcia J, Vilar R, Plavec J (2016) NMR structure of a triangulenium-based long-lived fluorescence probe bound to a G-quadruplex. Angew Chem Int Ed Engl 55(40):12508–12511 Law MJ, Lower KM, Voon HP, Hughes JR, Garrick D, Viprakasit V, Mitson M, De Gobbi M, Marra M, Morris A, Abbott A, Wilder SP, Taylor S, Santos GM, Cross J, Ayyub H, Jones S, Ragoussis J, Rhodes D, Dunham I, Higgs DR, Gibbons RJ (2010) ATR-X syndrome protein targets tandem repeats and influences allele-specific expression in a size-dependent manner. Cell 143(3):367–378 Li Q, Xiang J, Li X, Chen L, Xu X, Tang Y, Zhou Q, Li L, Zhang H, Sun H, Guan A, Yang Q, Yang S, Xu G (2009) Stabilizing parallel G-quadruplex DNA by a new class of ligands: two non-planar alkaloids through interaction in lateral grooves. Biochimie 91(7):811–819 Lin C, Wu G, Wang K, Onel B, Sakai S, Shao Y, Yang D (2018) Molecular recognition of the hybrid-2 human telomeric G-quadruplex by epiberberine: insights into conversion of telomeric G-quadruplex structures. Angew Chem Int Ed Engl 57(34):10888–10893 Liu X, Ishizuka T, Bao HL, Wada K, Takeda Y, Iida K, Nagasawa K, Yang D, Xu Y (2017) Structure-dependent binding of hnRNPA1 to telomere RNA. J Am Chem Soc 139(22): 7533–7539 Lubitz I, Zikich D, Kotlyar A (2010) Specific high-affinity binding of thiazole orange to triplex and G-quadruplex DNA. Biochemistry 49(17):3567–3574 Ma Y, Tsushima Y, Sakuma M, Sasaki S, Iida K, Okabe S, Seimiya H, Hirokawa T, Nagasawa K (2018) Development of G-quadruplex ligands for selective induction of a parallel-type topology. Org Biomol Chem 16(40):7375–7382 Ma Y, Iida K, Sasaki S, Hirokawa T, Heddi B, Phan AT, Nagasawa K (2019) Synthesis and telomeric G-quadruplex-stabilizing ability of macrocyclic hexaoxazoles bearing three side chains. Molecules 24(2) Ma Y, Wakabayashi Y, Watatani N, Saito R, Hirokawa T, Tera M, Nagasawa K (2021) Vinylnaphthalene-bearing hexaoxazole as a fluorescence turn-on type G-quadruplex ligand. Org Biomol Chem 19(37):8035–8040 Maleki P, Mustafa G, Gyawali P, Budhathoki JB, Ma Y, Nagasawa K, Balci H (2019) Quantifying the impact of small molecule ligands on G-quadruplex stability against Bloom helicase. Nucleic Acids Res 47(20):10744–10753 Martino L, Virno A, Pagano B, Virgilio A, Di Micco S, Galeone A, Giancola C, Bifulco G, Mayol L, Randazzo A (2007) Structural and thermodynamic studies of the interaction of distamycin A with the parallel quadruplex structure [d(TGGGGT)]4. J Am Chem Soc 129(51):16048–16056 Mendoza O, Bourdoncle A, Boulé JB, Brosh Jr RM, Mergny JL (2016) G-quadruplexes and helicases. Nucleic Acids Res 44(5):1989–2006

902

Y. Ma et al.

Minhas GS, Pilch DS, Kerrigan JE, LaVoie EJ, Rice JE (2006) Synthesis and G-quadruplex stabilizing properties of a series of oxazole-containing macrocycles. Bioorg Med Chem Lett 16(15):3891–3895 Miyazaki T, Pan Y, Joshi K, Purohit D, Hu B, Demir H, Mazumder S, Okabe S, Yamori T, Viapiano M, Shin-ya K, Seimiya H, Nakano I (2012) Telomestatin impairs glioma stem cell survival and growth through the disruption of telomeric G-quadruplex and inhibition of the proto-oncogene, c-Myb. Clin Cancer Res 18(5):1268–1280 Mohanty J, Barooah N, Dhamodharan V, Harikrishna S, Pradeepkumar PI, Bhasikuttan AC (2013) Thioflavin T as an efficient inducer and selective fluorescent sensor for the human telomeric G-quadruplex DNA. J Am Chem Soc 135(1):367–376 Nakamura T, Iida K, Tera M, Shin-ya K, Seimiya H, Nagasawa K (2012) A caged ligand for a telomeric G-quadruplex. Chembiochem 13(6):774–777 Nakamura T, Okabe S, Yoshida H, Iida K, Ma Y, Sasaki S, Yamori T, Shin-Ya K, Nakano I, Nagasawa K, Seimiya H (2017) Targeting glioma stem cells in vivo by a G-quadruplexstabilizing synthetic macrocyclic hexaoxazole. Sci Rep 7(1):3605 Nicoludis JM, Barrett SP, Mergny JL, Yatsunyk LA (2012) Interaction of human telomeric DNA with N-methyl mesoporphyrin IX. Nucleic Acids Res 40(12):5432–5447 Ou TM, Lu YJ, Tan JH, Huang ZS, Wong KY, Gu LQ (2008) G-quadruplexes: targets in anticancer drug design. ChemMedChem 3(5):690–713 Parkinson GN, Ghosh R, Neidle S (2007) Structural basis for binding of porphyrin to human telomeres. Biochemistry 46(9):2390–2397 Peng Q, Warloe T, Berg K, Moan J, Kongshaug M, Giercksky KE, Nesland JM (1997) 5-Aminolevulinic acid-based photodynamic therapy. Clinical research and future challenges. Cancer 79(12):2282–2308 Pokhrel P, Sasaki S, Hu C, Karna D, Pandey S, Ma Y, Nagasawa K, Mao H (2022) Single-molecule displacement assay reveals strong binding of polyvalent dendrimer ligands to telomeric G-quadruplex. Anal Biochem 649:114693 Rezler EM, Seenisamy J, Bashyam S, Kim MY, White E, Wilson WD, Hurley LH (2005) Telomestatin and diseleno sapphyrin bind selectively to two different forms of the human telomeric G-quadruplex structure. J Am Chem Soc 127(26):9439–9447 Sakuma M, Ma Y, Tsushima Y, Iida K, Hirokawa T, Nagasawa K (2016) Design and synthesis of unsymmetric macrocyclic hexaoxazole compounds with an ability to induce distinct G-quadruplex topologies in telomeric DNA. Org Biomol Chem 14(22):5109–5116 Sasaki S, Ma Y, Ishizuka T, Bao HL, Hirokawa T, Xu Y, Tera M, Nagasawa K (2020) Linear consecutive hexaoxazoles as G4 ligands inducing chair-type anti-parallel topology of a telomeric G-quadruplex. RSC Adv 10(71):43319–43323 Sedghi Masoud S, Yamaoki Y, Ma Y, Marchand A, Winnerdy FR, Gabelica V, Phan AT, Katahira M, Nagasawa K (2018) Analysis of interactions between telomeric i-motif DNA and a cyclic tetraoxazole compound. Chembiochem 19(21):2268–2272 Shalaby T, von Bueren AO, Hürlimann ML, Fiaschetti G, Castelletti D, Masayuki T, Nagasawa K, Arcaro A, Jelesarov I, Shin-ya K, Grotzer M (2010) Disabling c-Myc in childhood medulloblastoma and atypical teratoid/rhabdoid tumor cells by the potent G-quadruplex interactive agent S2T1-6OTD. Mol Cancer Ther 9(1):167–179 Shin-ya K, Wierzba K, Matsuo K, Ohtani T, Yamada Y, Furihata K, Hayakawa Y, Seto H (2001) Telomestatin, a novel telomerase inhibitor from Streptomyces anulatus. J Am Chem Soc 123(6): 1262–1263 Shioda N, Yabuki Y, Yamaguchi K, Onozato M, Li Y, Kurosawa K, Tanabe H, Okamoto N, Era T, Sugiyama H, Wada T, Fukunaga K (2018) Targeting G-quadruplex DNA as cognitive function therapy for ATR-X syndrome. Nat Med 24(6):802–813 Siddiqui-Jain A, Grand CL, Bearss DJ, Hurley LH (2002) Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc Natl Acad Sci U S A 99(18):11593–11598

26

Macrocyclic G-Quadruplex Ligands of Telomestatin Analogs

903

Tahara H, Shin-Ya K, Seimiya H, Yamada H, Tsuruo T, Ide T (2006) G-Quadruplex stabilization by telomestatin induces TRF2 protein dissociation from telomeres and anaphase bridge formation accompanied by loss of the 30 telomeric overhang in cancer cells. Oncogene 25(13):1955–1966 Tauchi T, Shin-Ya K, Sashida G, Sumi M, Nakajima A, Shimamoto T, Ohyashiki JH, Ohyashiki K (2003) *Activity of a novel G-quadruplex-interactive telomerase inhibitor, telomestatin (SOT-095), against human leukemia cells: involvement of ATM-dependent DNA damage response pathways. Oncogene 22(34):5338–5347 Tauchi T, Shin-ya K, Sashida G, Sumi M, Okabe S, Ohyashiki JH, Ohyashiki K (2006) Telomerase inhibition with a novel G-quadruplex-interactive agent, telomestatin: in vitro and in vivo studies in acute leukemia. Oncogene 25(42):5719–5725 Tera M, Ishizuka H, Takagi M, Suganuma M, Shin-ya K, Nagasawa K (2008) S2A2, Macrocyclic hexaoxazoles as sequence- and mode-selective G-quadruplex binders. Angew Chem Int Ed Engl 47(30):5557–5560 Tera M, Iida K, Ishizuka H, Takagi M, Suganuma M, Doi T, Shin-ya K, Nagasawa K (2009) Synthesis of a potent G-quadruplex-binding macrocyclic heptaoxazole. Chembiochem 10(3): 431–435 Tera M, Iida K, Ikebukuro K, Seimiya H, Shin-Ya K, Nagasawa K (2010) Visualization of G-quadruplexes by using a BODIPY-labeled macrocyclic heptaoxazole. Org Biomol Chem 8(12):2749–2755 Tsai YC, Qi H, Lin CP, Lin RK, Kerrigan JE, Rzuczek SG, LaVoie EJ, Rice JE, Pilch DS, Lyu YL, Liu LF (2009) A G-quadruplex stabilizer induces M-phase cell cycle arrest. J Biol Chem 284(34):22535–22543 Tsukakoshi K, Ikuta Y, Abe K, Yoshida W, Iida K, Ma Y, Nagasawa K, Sode K, Ikebukuro K (2016) Structural regulation by a G-quadruplex ligand increases binding abilities of G-quadruplexforming aptamers. Chem Commun (Camb) 52(85):12646–12649 Wheelhouse RT, Sun D, Han H, Han FX, Hurley LH (1998) Cationic porphyrins as telomerase inhibitors: the interaction of tetra-(N-methyl-4-pyridyl)porphine with quadruplex DNA. J Am Chem Soc 120(13):3261–3262 Yasuda M, Ma Y, Okabe S, Wakabayashi Y, Su D, Chang YT, Seimiya H, Tera M, Nagasawa K (2020) Target identification of a macrocyclic hexaoxazole G-quadruplex ligand using posttarget-binding visualization. Chem Commun (Camb) 56(85):12905–12908

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

27

Shigeori Takenaka

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polymorphism of G-Quadruplex DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G4 Binders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Properties of NDI and Its Binding to Double-Stranded DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binding of NDI to G4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interaction of cNDI with G4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interaction of cNDI with G4 Under Molecular Crowding Conditions . . . . . . . . . . . . . . . . . . . . . . . . . Conversion of G4 Structure by NDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Telomerase Inhibitory Ability of cNDI and Inhibition of Cell Growth . . . . . . . . . . . . . . . . . . . . . . . . Ferrocenyl cNDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cNDI Dimer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

906 907 910 911 914 915 920 921 922 924 926 928 929

Abstract

Naphthalene diimide (NDI), a water-soluble cationic compound, interacts with polyanionic DNA through electrostatic interaction and the nucleobase planes through stacking interaction. Threading intercalation is a unique mechanism mediating the binding of NDI to double-stranded DNA. The two substituents of NDI are placed in the major and minor grooves and function as clamps, resulting in the formation of a stable complex. The dissociation rate constant of NDI is approximately 10 times slower than that of propidium, a classical intercalator. NDI strongly interacts with guanine (G) and binds to G-quadruplex (G4) DNA formed by the stacking of G-quartet planes, which are in turn formed by specific hydrogen bonding of four G bases. The introduction of substituents at three or four positions on the NDI backbone can result in steric hindrance, which prevents the binding of NDI to double-stranded DNA and consequently enables NDI to S. Takenaka (*) Department of Applied Chemistry, Kyushu Institute of Technology, Kitakyushu, Japan e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_31

905

906

S. Takenaka

function as a ligand for G4 when it binds to the G-quartet plane through stacking interaction. The G4-binding properties of NDI can also be improved through the cyclization of NDI substituents. This cyclic chain prevents binding to doublestranded DNA, imparting G4 selectivity. Cyclic NDI (cNDI) that has a cyclohexyl group in the cyclic linker can specifically bind to G4, which forms in telomeric DNA and inhibits telomerase activity. Therefore, cNDI dimers are expected to identify G4 clusters. Although this is a model system, cNDI dimers exhibit a higher ability to stabilize G4 dimers and inhibit telomerase activity than do cNDI monomers. Keywords

Naphthalene diimide · Cyclic naphthalene diimide · Cyclic naphthalene diimide dimer · Threading intercalation · Ferrocene · Catenane · G-quadruplex cluster · Electrochemical cancer diagnosis · Cyclic ferrocenylnaphthalene diimide

Introduction Naphthalene diimides (NDIs), which are electron-deficient compounds with robust structures, are used as building blocks for electronic materials with special properties, such as n-type organic semiconductors and supramolecular assembly systems (Kobaisi et al. 2016). Naphthalene diimide derivative (NDI derivative, the NDI derivatives discussed hereafter are limited to cationic derivatives possessing an NDI core) 1 (Fig. 1), which has aminopropyl groups in both diamino moieties, functions as a divalent cationic molecule in aqueous solutions and is reported to intercalate between base pairs of double-stranded DNA (Tanious et al. 1991). In this interaction, the two substituents of NDI 1 are located in the major and minor grooves. As one substituent must pass between the base pairs for complex formation, this binding is referred to as threading intercalation. Additionally, this binding is possible because of the following reasons: (1) the two substituents of NDI are located at the two ends of the long axis of the molecule and the complex must pass through the base pairs to achieve an effective stacking interaction; and (2) the dissociation and formation of hydrogen bonds between DNA base pairs, which is called breathing, fluctuate. In contrast, several classical intercalators, including propidium, locate their substituents only in the minor groove of double-stranded DNA during intercalation. Therefore, the ability of NDIs to place their substituents in the major groove has piqued the interest of the scientific community. Guanine (G) has the lowest redox potential among nucleobases – indicating that it is easily oxidized and can donate electrons. Four G bases can associate through hydrogen bonding to form a G-quartet plane. Previous studies have reported that G-rich single-stranded DNA undergoes folding to form G4 DNA via planar structure formation. G4 is present in large numbers on genomic DNA and plays a role in gene regulation mechanisms (Burge et al. 2006).

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

907

Fig. 1 Chemical structures of naphthalene diimides 1 and 2, nogalamycin, and propidium

G4 is reported to be associated with cancer. The potential of G4 ligands as novel anticancer agents is an area of active research. NDI, an electron-deficient aromatic ring, has attracted attention as a G4 ligand because it easily binds to the G-quartet plane through stacking interaction (Pirota et al. 2019). Several studies have synthesized various NDI derivatives and reported their interactions with G4 (Pirota et al. 2019). This has led to the study of NDI as an anticancer and antimicrobial agent. Studies evaluating the anticancer properties of NDI have focused on stabilizing G4 formed from telomeric DNA, inhibiting the specific growth of cancer cells through telomerase inhibition, and stabilizing the G4 structure of MYC, which is associated with cancer. This chapter summarizes previous studies on cNDI, which is formed by linking its two amide side chains. Based on its mode of binding to double-stranded DNA, the intercalation of cNDI is effectively inhibited into double-stranded DNA and specifically bind to G4 by forming a cyclic amide linker, which may lead to the development of anticancer drugs with limited side effects.

Polymorphism of G-Quadruplex DNA In the nineteenth century, G-rich nucleic acids were known to form gels in aqueous solutions at millimolar concentrations (Guschlbauer et al. 1990). In 1902, Bang proposed that the structure of gelatinized nucleic acids comprises four G tetramers (Lagnado, 2013). The crystallographic structure of G-quadruplex or tetraplex

908

S. Takenaka

Fig. 2 Polymorphism of G-quadruplex (G4) structure stacking of G-quartet plane

(G4) DNA was elucidated by Gellert et al. in 1962 (Gellert et al. 1962). Since then, the presence of G4 structures and the fine differences in the G4 structure depending on its sequence and medium conditions have been reported (Pirota et al. 2019). The G4 structure is formed by the hydrogen bonding of multiple G molecules, resulting in the formation of G-quartet planes that subsequently stack on top of each other. Depending on the folding of G bases (in combination with the anti or syn configuration of guanosine), antiparallel (chair or basket), parallel (propeller), or hybrid (hybrid-1 or hybrid-2) structures of G4 are formed (Fig. 2). The G4 structure may be stabilized by hydrogen bonds between G molecules (called Hoogsteen hydrogen bonds) and the stacking interactions between the G-quartet planes formed by these bonds. Additionally, the G-quartet plane has a gap at its center. The G4 structure is stabilized by the gap, which is occupied by K+ or Na+ ions, formed at the center of two G-quartet planes. Irrespective of the mechanism involved in G4 structure formation, G4 structures are highly selective for K+. Tel23 [d-50 -TA (GGGTTA)3G3–30 ] and Tel22 [d-50 -A(GGGTTA)3G3–30 ] have been widely used for in vitro experiments on G4 DNA (Dai et al. 2007). Tel23 is hypothesized to form a hybrid-1 type structure under potassium ion. X-ray analysis revealed that Tel22 exhibits a parallel-type structure. A-core is considered to exhibit a hybrid-1 type structure along with an antiparallel structure in diluent solutions (Ambrus et al. 2006). Additionally, Myc-22 (d-50 -TGAG3TG3TAG3TG3TAA-30 ), which regulates the expression of MYC proto-oncogene, is used as a parallel-type DNA (Phan et al. 2005).

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

909

Bioinformatics prediction of sequence information obtained from the Human Genome Project indicates that more than 700,000 sequences may form the putative G4 sequence (PGS) structure in humans (Tu et al. 2021). The telomeric sequences at the ends of chromosomes are well-known examples of sequences that can form G4 structures (Bryan 2020). Telomeres comprise TTAGGG repeats that form doublestranded DNA of 5000–8000 bp, terminating in a single-stranded DNA (G-tail) of 100–200 bases (Wright et al. 1997). The size of the telomeric DNA in the fetus and at birth is 15,000 and 10,000 bp, respectively (Sanders and Newman 2013). After each cell division, the telomeric DNA is shortened by 20–170 bp until the sequence reaches approximately 5000 bp. At this stage, the cells are unable to divide and subsequently die. Human tissues can only divide 70 times on average although the number of divisions varies slightly (germ cells and bone marrow stem cells are exceptions). The inability to divide leads to apoptosis and aging. Four TTAGGG repeats of the G-tail fold as a single unit to form a G-quartet. Telomeric DNA comprises continuous clusters of G4 structures (Monsen et al. 2021). In addition to telomeric DNA sites, G4 structures are reported in the coding regions of Myc-22 associated with cancer (Wang et al. 2020). Various antibodies that can bind to the G4 structure have been developed. Fluorescence staining of live cells using these antibodies revealed the presence of G4 even in live cells (Biffi et al. 2013). Human telomeric DNA is reported to form the hybrid-type structure in vitro. However, telomeric DNA exhibits the parallel-type structure in vivo owing to the molecular crowding conditions in the living cell (Phan 2010). Additionally, anticancer drugs, such as cisplatin and daunorubicin, are reported to be selective for guanine–cytosine (GC) sites in DNA (Martinho et al. 2019; Ren and Chaires 1999). Cisplatin exerts anticancer effects by forming covalent bonds with G in double-stranded DNA. Daunorubicin intercalates at GC sites in double-stranded DNA. Several other DNA-binding anticancer drugs are G-selective (Martinho et al. 2019; Ren and Chaires 1999). G4 structures may be associated with the mechanisms of action of G-selective anticancer drugs. Antimalarial drugs, such as quinacrine, are also reported to be G-selective (Ren and Chaires 1999), and their mechanisms may be associated with G4 (Ehsanian et al. 2011). Telomerase elongates telomeric DNA (Artandi and DePinho. 2010). Human telomerase comprises the following two essential core components: human telomerase reverse transcriptase (hTERT), which exhibits catalytic activity, and telomerase RNA (TR), which contains a short template element for adding repetitive sequences of telomeric DNA to chromosome ends. Telomerase activity is absent in healthy cells, except in germ cells and bone marrow stem cells. However, telomerase activity has been detected in more than 80% of cancerous cells. Alternative lengthening of telomeres (ALT), a telomerase-independent mechanism of telomeric DNA elongation that occurs in 20% of cancers (Artandi and DePinho. 2010), causes homologous recombination between shortened and unshortened telomeric DNA, resulting in its elongation (Zhang and Zou 2020). Therefore, telomerase inhibitors and molecules that stabilize the G4 structure of telomerase-elongated telomeric DNA are potential anticancer agents and have been reported by several studies (Ma et al. 2020).

910

S. Takenaka

The elongation mechanism of ALT is inhibited by G4 structure formation (Zhao and Zhai 2020). Thus, G4 binders have piqued the interest of the scientific community (Ma et al. 2020, Zhao and Zhai 2020).

G4 Binders Figure 3 shows the known G4 binders (Monchaud and Teulade-Fichou 2008; Andreeva et al. 2021). The center of Fig. 3 shows a parallel G4 binder in which the G-quartet plane can be identified. These G4 binders primarily interact by stacking with the G-quartet plane. Interactions other than this stacking interaction can be classified into five types (excluding those interacting with the metal complexes) (Andreeva et al. 2021; Georgiades et al. 2010). Group I molecules comprise perylene diimides, such as PEPER, porphyrin derivatives, and telomestatin. The extended aromatic ring allows effective stacking with the G-quartet plane. Most of these ligands are cationic as evidenced by their electrostatic interactions with DNA and water solubility. However, anionic porphyrin derivatives have also been reported. The extended aromatic plane of the G-quartet is not perfectly flat and is slightly twisted (with the propeller twist angle seen in double-stranded DNA base pairs). Group II compounds are derivatives, such as 3,6-bis(1-methyl-4-vinylpyridinium) carbazole

II I Thioflavin T TMPyP4

Telomestan

Pyridostan PDS

BMCV

PEPER

PhenDC3

III CB03 BOQ1

IV 4

V

NCQ 11

Fig. 3 Classification by the binding mode of G4 ligands previously reported. The parallel type G4 model is shown in the center (PDB: 2 MB2). These ligands bind primary by stacking with the G-quartet plane

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

911

diiodide, in which the aromatic rings are connected by individual bonds to make them compatible. This allows the formation of a G-quartet plane and effective stacking. Ligands that are known to fluoresce by suppressing the molecular motion of the singlebonded region have also been reported. Group III molecules comprise aromatic derivatives with several substituent side chains, such as di-substituted, tri-substituted, or tetra-substituted NDIs. As double-stranded DNA forms a helix and has two grooves, G4 DNA has four grooves. If these substituents can be placed such that they protrude into the four grooves when the aromatic ring interacts with the G-quartet in a stacking manner, they are expected to anchor and stabilize the system. Di-substituted NDI can be located in the two grooves of double-stranded DNA. Therefore, double-stranded DNA can also be stabilized – through the effect of threading intercalation. However, tri-substituted and tetra-substituted NDIs are disadvantageous when intercalating into double-stranded DNA owing to the steric hindrance of substituents. Tri-substituted and tetra-substituted forms are G4-selective. Group IV compounds comprise Cyclobisintercalants (CBIs). CBIs with long linker moieties form bis-intercalated with double-stranded DNA. However, CBIs with shorter linker lengths inhibit bis-intercalate complex formation. Ultimately, CBIs linked by a short linker, such as BOQ1, cannot bind to double-stranded DNA but is thought to improve selectivity for G4 because it allows stacking interactions with the G-quartet plane of G4. Group V is comprised of single aromatic subunit surrounded by a cyclic linker. These compounds also have a cyclic linker in the aromatic ring that prevents intercalation into double-stranded DNA. Additionally, this derivative, which has a nonintercalatable functional group on the linker side chain and a short linker length, cannot exhibit bis-intercalation. NCQ linked to neomycin and cNDI derivatives, which were discovered by the Takenaka’s group, also are included in this category. In this section, we focus on cNDI derivatives.

Properties of NDI and Its Binding to Double-Stranded DNA First, we outline the properties of NDI (Kobaisi et al. 2016; Barros et al. 1997). The basic framework is depicted in Fig. 4a, which shows that NDI is an electrondeficient, rigid, planar aromatic ring with four electron-withdrawing imides introduced at positions 1, 4, 5, and 8 for hydrophobic naphthalene, resulting in the reduction of the charge density of naphthalene and formation a strong acidic π-surface. This characteristic structure promotes π-π stacking of the naphthalene moiety, charge transfer (CT) interactions, van der Waals interactions, hydrogen bonding between the C–H and C¼O groups, and metal coordination to the imide group. The symmetry is D2h with an electric dipole along the long and short axes. The electronic transition in the long axis exhibits band I absorption at 300–400 nm, while that in the short axis exhibits band II absorption at 200–250 nm (Fig. 4b). This is due to the π–π* transition of So ➔ S1. N, N-functionalized NDI derivatives are readily obtained by high-temperature treatment of 1,4,5,8-naphthalenetetracarboxylic anhydride with aliphatic or aromatic amines in polar solvents. A simple NDI chromophore in dimethyl dichloroethane exhibits a strong absorption band at wavelengths below 400 nm and a weak emission band with a

912

S. Takenaka

Fig. 4 (a) Structural changes due to NDI chemical structure and redox reaction, (b) image of NDI absorption spectrum and corresponding transition moment, (c) images of DPV (upper) and CV (lower) of NDI, and (d) image of excitation and emission spectra of NDI

short Stokes shift and low quantum yield. Additionally, NDI derivatives have low LUMO energy levels (3.7 eV for N,N-dioctylnaphthalenediimide) and can readily undergo two one-electron reversible reductions to form the stable radical anions NDI and NDI2 (vs. ferrocene/ferrocenium (Fc/Fc+) E1/2Red1 ¼ 1.10 V and E1/2Red2 ¼ 1.51 V for N,N-dioctylnaphthaleniimide) (Fig. 4a, d, and c). The radical anions exhibit a strong absorption band in the visible and near-infrared regions, indicating a clear electron paramagnetic resonance signal. N, N0 -functionalization of NDI does not significantly affect the properties of the parent derivative. However, core NDI modification is known to produce important changes in the optical and electronic properties of these compounds. NDI derivatives have important applications, such as the manufacture of n-type organic semiconductors in organic field-effect transistors, energy storage devices, organic solar cells, and artificial light systems, owing to their unique optoelectronic properties. Water-soluble NDI is known to intercalate and bind to the DNA double helix (Tanious et al. 1991). NDI derivatives 1 and 2, which are shown in Fig. 1, intercalate into double-stranded DNA, avoiding the two imide substituent portions of NDI and maximizing the stacking between NDI and adjacent DNA base pair planes. This forms a complex with the two substituents protruding into the major and minor grooves. To form this complex, a substituent must pass through adjacent base pairs, which is assumed to occur during breathing fluctuations in base pairs, resulting in the

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

913

Fig. 5 (a) Absorption change of NDI 1 in the absence (top) or presence (bottom) of sonicated calf thymus DNA, CD spectra of calf thymus DNA in the absence (top) or presence (bottom) of 1, and top view (c) and side view (d) in the computer modeling of the complex of DNA with 1

formation of an intercalation complex (Fig. 5c–d). This derivative is called a threading intercalator because it has a substituent arrangement that is convenient for threading intercalation. Additionally, the substituent sites in the formed complexes function as anchors to prevent dissociation. Nogalamycin, an antineoplastic drug (Fig. 1), has long been known as a natural threading intercalator. The antineoplastic efficacy of nogalamycin is attributed to its slow dissociation rate from doublestranded DNA (Williams et al. 1990). NDI 2, a threading intercalator (Fig. 1, kd ¼ 0.52 s1, 0.2 M NaCl), exhibits a slower dissociation rate than classical intercalators, such as propidium (Fig. 1) (dissociation rate constant kd ¼ 6.7 s1, 0.2 M NaCl) (Tanious et al. 1991). Owing to its low charge density, NDI is expected to have a CT interaction in addition to a stacking interaction with G, which has a low redox potential (and therefore readily releases electrons). Thus, NDI is expected to prefer GC base pairs within the double-stranded DNA sequence. Kinetic analysis of the interaction of NDI with double-stranded DNA revealed a slow association rate constant (the hydrogen bonds of the GC pair are stronger than those of the adenine-thymine (AT) pair, which slows the breathing rate) and a slow dissociation rate constant. In contrast, the AT base pair exhibits a fast association rate constant (faster breathing rate) and a slow dissociation rate constant (anchoring of substituents) in the case of the interaction between NDI and double-stranded DNA.

914

S. Takenaka

NDI 2 analysis revealed the binding constants for the poly[d(A-T)]2 and poly[d (G-C)]2 moieties to be similar (KGC ¼ 1.11  106 M1, KAT ¼ 1.17  106 M1, 0.02 M MES buffer (pH 6.25) and 40 mM NaCl without base selectivity) (McKnight 2013). Although NDI derivatives do not have base selective properties, they are expected to distinguish single-stranded DNA from double-stranded DNA. This is in contrast to the classical intercalator, which exhibits GC selectivity. For example, Fig. 5a shows that the absorption of NDI is markedly reduced when it binds to double-stranded DNA (showing a large hypochromic effect) and exhibits a slight red shift (bathochromic shift) (Yen et al. 1982). This is dependent on the exciton interaction between the NDI and nucleobase and the sum of the transition moments of the two chromophores. The circular dichroism (CD) spectrum reveals a negatively induced CD in the absorption wavelength region of NDI (Fig. 5b) (Yen et al. 1982), which may be because NDI is anchored in a chiral environment by intercalating into the DNA double helix. This intercalation stretches the base pairs by 3.4 Å, similar to the thickness of the aromatic ring, and unwinds the double helix at 306 per 11 base pairs of the B-type double-stranded DNA. Theoretically, the double helix is considered to be unwound at 26 by classical intercalation (Berman and Young 1981), although this varies depending on the size of the substituents in the NDI.

Binding of NDI to G4 NDI, an electron-deficient aromatic ring, is expected to have an effective stacking interaction with an electron-rich G base. The G-quartet plane comprises four G molecules. NDI is large enough to cover two of the four G bases in this plane based on the size of the aromatic ring. Di-substituted NDI binds the double-stranded DNA via threading intercalation. The introduction of tri-substitutions or tetra-substitutions into NDI may inhibit the binding to the DNA double helix and improve the properties of G4 (Collie et al. 2012; Marchetti et al. 2018; Sur et al. 2017). Figure 6 shows the tetra-substituted chemical structures 3 and 4 and tri-substituted CM03. The crystal structures of 3 and G4 (Tel22 as a single telomeric G4 unit) have been analyzed (Fig. 6a) in which 3 is stacked in the center of G4 with its substituents arranged in an elongated manner in the four grooves (Collie et al. 2012). Stacking and substituent placement must be optimized for tetra-substituents. As shown in Fig. 6b, tri-substituted CM03 appears to have a higher binding ability than 3 because the substituents can be placed in the most suitable positions (Marchetti et al. 2018). This may be due to the achievement of maximum stacking of the NDI plane with the G4 plane and the optimal placement of substituents. The tri-substituted form is effective against pancreatic ductal adenocarcinoma (PDAC) (Marchetti et al. 2018). Additionally, various derivatives with extended naphthalene aromatic rings have been developed that may improve the specificity of G4 (Andreeva et al. 2021; Zuffo et al. 2018). The analysis of the interaction of tetra-substituted NDI 3 with a Tel22 has been performed using isothermal titration calorimetry (ITC) (Sur et al. 2017). The results revealed that the number of bonds, binding constant (K ), enthalpy, and entropy were

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

915

(a) (b)

3

4

(c)

CM03

5

Fig. 6 Chemical structures of tetra- (3 and 5), tri-substituted naphthalene diimide, CM03. (a) Image of X-ray crystal structure of Tel22 and 4 complex (PDB: 3T5E) (Collie et al. 2012), (b) computer modeling of the complex between parallel type G4 and CM03. (Marchetti et al. 2018)

1, 2.92  106 M1, 5.16 kcal/mol, and  8.73 kcal/mol, respectively, which is advantageous in terms of entropy. These parameters are thought to result from the stabilization due to stacking and dehydration via complex formation.

Interaction of cNDI with G4 Cyclic NDI was first synthesized as cyclic bis-NDI based on the properties of the NDI molecule and its ability to form supramolecules via cyclization (Jazwinski et al. 1987). Various CBIs have been synthesized to evaluate their mode of binding to double-stranded DNA. Iverson et al. synthesized 5, a water-soluble cyclic bis-NDI, 5, (shown in Fig. 7) (Chu et al. 2009). The authors examined the mechanism involved in the binding of intercalators, which are a group of molecules that are incorporated between adjacent base pairs of double-helical DNA, to the DNA double helix after cyclization. According to the previously described definition of threading intercalators, hydrogen bonds between base pairs in double-stranded DNA break and dynamically reform (which is called breathing). The intercalation of the CBI into the double-stranded DNA is possible when there is a space between two adjacent base pairs; if its linker chain does not cause steric hindrance in the formation of the bis-intercalation complex, bis-intercalator is expected to form a topologically acceptable catenane-like complex (Fig. 8a). Iverson et al. reported that 5 (Fig. 7) binds to

916

S. Takenaka

NCQ

6

7

10

8

11

9

12 13

Fig. 7 Chemical structures of cyclic naphthalene diimides 7–12 and cyclic perylene diimide 13 with cyclic bis-naphthalene diimide, 6, and NCQ

double-stranded DNA and forms a catenane-like complex (Fig. 8a). The binding mode of the CBI may depend on its linker length. However, cyclic bis-NDI derivatives with different linker lengths have not been synthesized. Various cyclic bisacridines and related derivatives with different linker lengths and derivatives, such as BOQ1 (Fig. 2), have been synthesized (Monchaud et al. 2010). CBIs with short linker lengths are known to form catenane-like complexes by flipping one base of a base pair to form an inclusion complex by encapsulating the flipped base (Fig. 8b) and binding to base pairs neighboring the apurinic site, which is formed by double-stranded DNA damage (Fig. 8c) (David et al. 2003; Vigneron 1999). The binding mode shown in Fig. 8d also occurs in single-stranded DNA through base inclusion. However, as the binding mode shown in Fig. 8b also occurs in doublestranded DNA, distinguishing single-stranded and double-stranded DNA using a CBI with base inclusion ability has not yet been achieved. When the linker length is shortened such that the inclusion of bis-intercalator in the ring is no longer possible,

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

917

Fig. 8 Various binding modes of cyclic bis-intercalators. (a) Catenane-like bis-intercalating binding, (b) inclusion binding to flip bases, (c) catenane-like bis-intercalation binding to base pairs adjacent to apurinic sites, (d) inclusion binding to base of single-stranded DNA, (e) linker length is too short to allow inclusion binding

binding to double-stranded DNA is expected to be inhibited (Fig. 8e) (Monchaud et al. 2010). In this case, if the CBI structure can inhibit binding to the grooves of double-stranded DNA and allow a stacking interaction with the G-quartet plane, the bis-intercalator is expected to exhibit specificity for G4; BOQ1 (Fig. 3, IV) is an example (Monchaud et al. 2010). Marchetti et al. synthesized the cNDI derivative 6 containing a benzene moiety with a long linker length (Marchetti et al. 2015). The binding mode of 6 has not been completely elucidated, although the G4 groove linkage seems to predominate. The Takenaka’s group synthesized the cNDI molecule 7 with a benzene ring introduced into the linker region to easily synthesize a G4-specific molecule. The benzene moiety acts as an intercalating site for the DNA double strand. A bis-intercalation complex is formed by catenane-like bonding as shown in Fig. 8a (Czerwinska et al. 2014). Molecule 8 has a short linker length and exhibits preferential selectivity for G4 over double-stranded DNA (Islam et al. 2015a). G4 selectivity was further improved by changing the 1,4-substituent to 1,3-substituent (molecule 11) (Takahashi et al. 2021). Cyclic intercalators with enhanced G4 selectivity have been synthesized by introducing sterically bulky sites into the intercalation site of neomycin-capped aromatic structures (NCQ) (Kaiser et al. 2006) (Fig. 7). The selectivity for G4 and

918

S. Takenaka

Fig. 9 Chemical structures of 10 and 11 and their computer modeling, (c) computer model of hybrid type G4 from pdb (PDB: 2GKU), and its complex with 11

double-stranded DNA has been comparatively evaluated using a competitive fluorescence resonance energy transfer melting assay. However, the details of quantitative selectivity are unknown. Takenaka’s group synthesized a cNDI in which the amide substituent side chains of NDI are linked in a cyclic manner (Esaki et al. 2014). For example, 9 (Zou et al. 2020) was linked by an alkyl chain, while 10 (Esaki et al. 2014) was linked with a sterically bulky cyclohexyl group (Fig. 9). The CPK models of molecules 9 and 10 are shown in Fig. 9a and b. The side chains of the linker cover one side of the NDI plane, and the structure cannot intercalate between the base pairs of double-stranded DNA. However, the NDI plane remains on one side. Figure 9c shows the structure of the Tel23 in which the DNA loops are located around the G-quartet plane. This image is similar to the structure of a cavity in a tooth caused by decay. As shown in Fig. 9d, cNDI is expected to bind in this manner. The absorption spectrum of cNDI 10 is maximum at 383 nm, which is derived from

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

(a) Corrected Hea t Rate / µcal/s

Normalized Fit / µcal/mole

(b)

Time / s

-0.05 -500

0

500

1000

1500

2000

2500

3000

3500

0 0.05

919

Time/ s

-500 -0.05

Corrected Hea t Rate/ µcal/s

27

0

500

1000

1500

2000

2500 3000

3500

0

0.05

0.1 0.15

0.1

0.15

0.2

0.2

0.5 0

-0.5 -1

-1.5 -2

-2.5

0

0.5

1

1.5

2

2.5

3

3.5

Mole Ratio

Fig. 10 ITC measurements in the interaction of Tel23 (a) or HP27 (b) with 11 in 50 mM potassium phosphate buffer (pH 7.4)

NDI. cNDI 10 exhibits a large hypochromic effect and a slight red shift upon addition of the Tel23. This behavior is similar to that observed when NDI binds to double-stranded DNA, indicating that the NDI plane is stacked with the G-quartet plane. The presence of an isosbestic point indicates a single mode of binding. ITC measurements have been performed to analyze the binding behavior of 10 to the Tel23 (Fig. 10a). The number of bonds, binding constant (K ), enthalpy, and entropy were 1, 1.5  106 M1, 12.8 kcal/mol, and  4.4 kcal/mol, respectively. Although the binding constants are comparable to those of the tetra-substituted NDI 3 described earlier, the binding is considered enthalpy-dominated. Additionally, the stacking of the G-quartet with NDI is considered to be the main driving force of the binding. A double-stranded DNA model depicting the interaction of 10 with the hairpin DNA double helix HP24 (5’-GCG ATT CTC GGC TTT GCC GAG AAT CGC-30 ) is shown in Fig. 10b. The calorific value did not change with ITC titration. This indicates that 10 acts as a G4-selective ligand. The number of bonds, binding constant (K ), enthalpy, and entropy for the interaction of 10 with the Tel23 were 2, 3.4  106 M1, 8.5 kcal/mol, and  0.4 kcal/mol, respectively. The linker bulkiness of 9 was less than that of 10. Therefore, 9 can be stacked from above and below the G-quartet plane. The number of bonds involved in the binding of 9 is considered to be 2. Additionally, the minor change in entropy suggests that binding to the G-quartet cavity was not affected by steric hindrance. The number of bonds, binding constant (K ), enthalpy, and entropy for the interaction of 10 with Myc-22 G4, which is known to form a parallel structure, were 1, 6.2  106 M1, 13.9 kcal/mol, and  4.7 kcal/mol, respectively (Zou et al. 2020). This indicates that the binding affinity of 10 to Myc-22 is four times higher than that to the Tel23,

920

S. Takenaka

mainly due to differences in entropy. Thus, stacking may be more advantageous for Myc-22. The Takenaka’s group has previously synthesized 6, which has a benzene ring in the linker region (Fig. 7) (Czerwinska et al. 2014; Islam et al. Islam et al. 2015b). This was designed and synthesized to facilitate additional stacking interactions between benzene and the loop nucleobase of G4. The binding constant of 6 was 3.7  106 M1. At a Tm of 10  C, the binding constant of 6 was 2.6  106 M1 for dodecameric synthetic double-stranded DNA. Unwinding experiments using superhelix plasmid DNA suggested that 6 intercalates into double-stranded DNA. This indicates that the NDI and benzene ring portions of 6 bind to the base pairs. The formation of a catenane-like complex was predicted for this formation. To inhibit binding to double-stranded DNA, the linker length must be short. Therefore, 9, which has a shorter linker chain (Fig. 7), was synthesized. Consequently, 9 inhibited binding to double-stranded DNA and exhibited an additional stacking interaction between the loop base of G4 and side chain pyridine (Takahashi et al. 2021). Binding capacity can also be evaluated using ΔTm, which is the change in the melting temperature (Tm) of G4 after the addition of ligands. ΔTm roughly correlates with the binding constant and is suitable for evaluating G4 ligands. However, if the Tm of G4 is high, the Tm cannot be optimally evaluated. PEPER, a perylene diimide with an extended aromatic ring of NDI, is reported to be a G4 ligand (Fedoroff et al. 1998). The Takenaka’s group synthesized the cyclic perylene diimide 12 (Fig. 7) (Vasimalla et al. 2017), which comprises two substituents from its imide moiety linked together. Cyclic perylene diimide 12 binds to the Tel23 in the order of 106 M1 and stabilizes the Tm at 21  C. In general, plasmid DNA unwinding experiments were carried out in order to prove the intercalation mode of DNA binding molecules. The unwinding of the double helix by intercalation causes a significant change in the superhelical structure of the plasmid DNA, resulting in a significant change in hydrodynamic volume. This change can be observed by gel electrophoresis. Unwinding experiments with plasmid DNA using 12 showed that unwinding did not occur at all. This result indicated that 12 did not intercalate against double-stranded DNA. The binding constants of cNDI and acyclic NDI to G4 were comparable. This indicates that cNDI inhibits binding to doublestranded DNA while maintaining the original G4-binding ability of NDI. The binding ability of 12 to G4 is the same as that of acyclic PDI, but binding to the double strand is inhibited by the cyclic form. 12 showed stronger inhibition than the cNDI derivative, with IC50 ¼ 0.24 μM.

Interaction of cNDI with G4 Under Molecular Crowding Conditions The TA-core of G4 exhibits a hybrid structure under dilute conditions but forms a parallel structure under cell-like molecular crowding conditions. Molecular crowding conditions are induced in the presence of 40% (v/v) PRG2000 in assay buffer. Sugimoto et al. (Yaku et al. 2013) examined the interaction of G4 with

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

921

ligands under molecular crowding conditions. For anionic ligands that bind to G4 through π–π interactions, the number of water molecules that are removed and summed after complex formation is low, which does not affect the binding of ligands. However, for cationic ligands, the number of water molecules that are removed after complex formation is large, which significantly decreases the binding capacity of ligands. The number of bonds, binding constant (K ), enthalpy, and entropy for the interaction of 10 (Fig. 7) with the Tel23 under molecular crowding conditions were 1, 0.6  106 M1, 43 kcal/mol, and 35 kcal/mol, respectively (Zou et al. 2020). Enthalpy was significantly stabilized, while the entropy value was positive, indicating a major destabilization. The number of bonds, binding constant (K ), enthalpy, and entropy for the interaction of 10 with Myc-22 under molecular crowding conditions were 1, 0.7  106 M1, 20 kcal/mol, and 12 kcal/mol, respectively. Under these conditions, the Tel23 and Myc-22 form a parallel structure, and their binding affinities are almost similar, although the enthalpy and entropy values are less than half. Under dilute conditions, the binding affinity of 9, which does not contain a cyclohexyl group, to the Tel23 (K ¼ 3.4  106 M1) was three times stronger than that to Myc-22 (K ¼ 1.1  106 M1). Under molecular crowding conditions, the binding constant values for the interaction of 9 with the Tel23 and Myc-22 were 1.9  106 M1 and 1.6  106 M1, respectively (Zou et al. 2020). The difference in selectivity depending on the linker region of the cNDI is unclear although steric repulsion of the loop may be involved.

Conversion of G4 Structure by NDI Biomacromolecules function in a crowded intracellular environment with various macromolecules, such as nucleic acids, proteins, polysaccharides, and metabolites accounting for 30–40% of the intracellular volume. To generate molecular crowding conditions, crowding agents, which are important determinants of protein and nucleic acid structure, stability, and function, have been used to simulate the behavior of G-quadruplexes in a limited intracellular volume that mimics cellular conditions and reduces water activity. Spatial crowding can affect the stability of the G-quadruplex and its conformation. Various co-solutes, including polyethylene glycol (PEG), polysaccharides, ethanol, glycerol, dimethyl sulfoxide, acetonitrile, and Ficoll, have been used to simulate physiological crowding conditions (Petraccone et al. 2012). As shown in Fig. 2, human telomere G4 DNA is reported to predominantly exhibit a hybrid conformation in the presence of K+ under dilute conditions. Recent studies have demonstrated that G4 DNA adopts a parallel conformation under molecular crowding conditions. PEG is typically used to simulate molecular crowding conditions in vitro. Under these conditions, human telomere G4 adopts a parallel conformation. However, similar results have not been obtained with Ficoll, which also induces molecular crowding. This molecular crowding condition

922

S. Takenaka

involves the release of approximately 17 water molecules per G4 unit and is demonstrated to be energetically unfavorable in pure aqueous solutions. The parallel form is hydrodynamically larger than the hybrid form. Thus, the mechanism of PEG-induced molecular crowding can be ruled out. This conversion is thought to be due to the binding of PEG to the two structures and the difference in the amount of bound PEG (Hänsel et al. 2011). Several G4 ligands, including NDI 3 (Fig. 6), have been reported to transform the hybrid structure of G4 into a parallel structure as demonstrated by Neidle et al. in 2010 (Hampel et al. 2010). In 2021, Kang et al. (Hao et al. 2021) demonstrated that the addition of NDI 4 (Fig. 6) similarly converted the hybrid form to a parallel form under Ficoll-70-induced molecular crowding conditions in the presence of K+. The detailed mechanisms through which NDI derivatives induce the parallel form and stabilize the complex are not clear. The Takenaka’s group reported that cNDI 9 (Fig. 7) stabilized the complex by binding to G4 in its parallel form in the order of 106 M1, although the binding constant was slightly reduced in the presence of K+ and PEG200. In the absence of K+ and the presence of PEG200, the addition of cNDI 9 induced hybrid and parallel forms of G4 (Zou et al. 2020). Although the detailed mechanism must be clarified in the future, telomere G4 preferentially forms parallel structures under molecular crowding conditions in vivo. Highly G4-selective multisubstituted NDI and cNDI may strongly bind and stabilize these structures. These findings may provide useful insights into the development of structure-specific ligands for G4 and G4 interactions in vivo.

Telomerase Inhibitory Ability of cNDI and Inhibition of Cell Growth The telomerase repeated amplification protocol (TRAP) assay is used to assess the ability of a molecule to inhibit telomerase (Herbert et al. 2006). In this assay, telomerase is applied to a substrate fragment of telomerase, which is called the TS primer, to elongate the TTAGGG repeat. The degree of fragment elongation is evaluated using PCR amplification. As the fragment contains TTAGGG repeats, the PCR primer is designed to randomly form double-stranded DNA. The PCR-amplified product is observed as a ladder in increments of 6 bases as analyzed using gel electrophoresis. Thus, the number of bands obtained after gel electrophoresis is directly proportional to telomerase activity. If a G4 ligand inhibits telomerase elongation via G4 formation, the band will appear faded. The band intensity is then quantified. The concentration of the G4 ligand that exhibits half the initial band intensity (the concentration of G4 ligand that inhibits telomerase activity by 50%) is evaluated as the IC50 value for telomerase activity inhibition. However, as G4 ligands often inhibit DNA polymerase during PCR, a PCR LIG assay has been proposed in which the G4 ligand is removed before PCR (Reed et al. 2008). In the TRAP assay, the inhibition of PCR can be assessed using internal control primers not related to telomerase. Figure 11 shows the examples of 1 and 10 (Esaki et al. 2014). In 1 (Fig. 11b), the internal control band disappeared when the concentration of the

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

923

Fig. 11 Telomerase inhibition assay by 1 (a) and 11 (b) in TRAP assay. T: 20 bp ladder, N: heated telomerase, N: absence of telomerase. IC: internal control

Fig. 12 (a) Growth inhibition against cancer cell line Ca9–22 after addition of 10 (〇) or 11 (●) and subsequently incubated for 24 h, and (b) inhibitory effects in the cancer cell line Ca9–22 (●) and normal cell BMC (~) and NHEK (■) after the addition of variable concentration of 11 and subsequently incubated for 24 h

G4 ligand was high, indicating PCR inhibition. However, in 10, the internal control band did not disappear even with high concentrations, although the elongation band faded as the amount of 10 increased due to telomerase activity. The disappearance of the first four bands was not observed even in the high-concentration region, indicating that the strand elongated to a length where the formation of a G4 structure was possible and that the binding of 10 was expected to inhibit elongation. The IC50 value of 10 for telomerase was 0.5 μM. Ligand binding after G4 formation and inhibition of elongation by polymerase were analyzed using the PCR stop assay (Jamroskovic et al. 2019). The IC50 value for polymerase using this method was 1.1 μM (Fukuda et al. 2021). The number of viable cells after 24 h of incubation with varying concentrations of cNDI in the medium was plotted against cNDI concentration (Fukuda et al. 2021). The effects of the addition of 9 or 10 to the cancer cell line Ca9–22 are shown in Fig. 12a. The IC50 values of 9 and 10 to inhibit cell growth were 3 and < 0.03 μM, respectively, indicating high growth inhibition. This was correlated with the magnitude of binding to the Tel23 under molecular crowding conditions. Figure 12b shows

924

S. Takenaka

the results of growth inhibition of normal human epidermal keratinocytes (NHEKs) and bone marrow cells (BMCs) (models of healthy cells) and the cancer cell line Ca9–22. The difference in the inhibitory capacity between Ca9–22 and NHEK correlated with the difference in TERT mRNA expression. The growth of BMCs was not inhibited even at high concentrations. The findings indicated that 10 effectively killed cancer cells without affecting healthy cells. This activity was also correlated with the ability to inhibit telomerase activity. The direct correlation of the inhibition of telomerase activity with anticancer activity and the changes occurring in the cell are unclear. Therefore, genome-wide expression analyses must be performed in the future. Whole-transcriptome RNA-seq analysis revealed that CM03 downregulates essential pathways involved in survival, metastasis, and drug resistance in human PDAC (Parkinson et al. 2008). Such tri-substituted NDI is considered to be the most promising anticancer drug that targets G4 owing to its strongly selective anticancer activity in vitro and in vivo. All available biophysical, biological, and structural data on G4-targeting NDI were systematically reviewed by Montesarchio et al. (Platella et al. 2021).

Ferrocenyl cNDI The binding capacity of NDI to G4 is higher than that to double-stranded DNA. However, NDI stabilizes the complex via threading intercalation into doublestranded DNA. In the G4 bond of di-substituted NDI, the two substituent moieties protrude into the four grooves of G4 and are anchored. Therefore, the linker length and functional groups of the two NDI substituents can potentially improve G4 selectivity. Takenaka’s group reported an electrochemical detection method for DNA by applying NDI to distinguish between single-stranded and double-stranded DNA, utilizing the ability of NDI to stabilize double-stranded DNA through threading intercalation (Takenaka 1999). In particular, a ferrocenyl NDI derivative (FND) with ferrocene moieties at both substituent ends was synthesized. Next, an electrode with an immobilized DNA probe was prepared. A sample DNA solution was placed on this electrode to perform hybridization. Double-stranded DNA formed on the electrode only when the sample comprised the target DNA. The addition of FND resulted in the concentration of FND on the double-stranded DNA. When a potential was applied to the electrode, the current increased according to the amount of enrichment (amount of double-stranded DNA formed ¼ amount of target DNA). The best FND that was synthesized was 13 (Fig. 13), which was determined based on its ability to electrochemically distinguish double-stranded DNA from singlestranded DNA and its redox potential (Sato and Takenaka 2008). Takenaka’s group analyzed the interaction between previously synthesized FNDs and G4 (Sato and Takenaka 2017). FND 14 (Fig. 13) could bind to the Tel23 with a binding constant of 8.7  107 M1. Takenaka’s group used FND 14 and DNA-modified electrodes to achieve electrochemical detection of telomerase activity, which exhibited a good correlation with cancer development. To perform this

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

925

Fig. 13 Chemical structures of cyclic or acyclic naphthalene diimide carrying ferrocene unit, 14–18

technique, the TS primer, which is the substrate for telomerase, is first immobilized on the electrode. The electrode is then used to analyze the cellular extract. In the presence of telomerase activity, the TS primer is elongated on the electrode. Under electrochemical measurement conditions (K+ is included as a support salt), elongated telomeric DNA forms a contiguous G4 structure. When the electrode is placed in the electrolyte containing FND 14, two units of FND 14 are concentrated at each G4 unit (from n ¼ 2). The application of a potential to the electrode results in the flow of an oxidation current through the ferrocene portion of the enriched FND 14 on the electrode, which allows indirect quantification of telomerase activity (Sato et al. 2005). Low G4 selectivity for a single strand in FND results in a large background current, which reduces accuracy. Therefore, a ferrocenyl NDI ligand with a high G4 selectivity must be developed to further improve sensitivity. As cNDI exhibits high G4 selectivity, the Takenaka’s group attempted to improve the selectivity by cyclizing FND. Due to the ease of synthesis, 15 (Fig. 13) was synthesized and cyclized with ferrocenedicarboxylic acid. Although 15 improved selectivity for G4, two electron-withdrawing carbonyl groups were linked to the ferrocene moiety, resulting in enhanced redox potential of ferrocene and increased background (Islam et al. 2017). Therefore, we synthesized 16 (Fig. 13) in which ferrocene was introduced at the branching site from the side chain of cNDI (Kaneyoshi et al. 2020). Furthermore, 17 (Fig. 13) was synthesized by introducing a ferrocene in the cyclic linker. In 17, the oxidation current was reduced by the interaction between NDI and ferrocene in the molecule, which was restored by G4 binding. These cFNDs yielded improved G4 selectivity along with improved selectivity of the electrochemical response (Kaneyoshi et al. 2022).

926

S. Takenaka

cNDI Dimer Chromosomal ends have repeating sequences of TTTAGGGG, which are called telomeres. In humans, the ends of each of the 26 chromosomes comprise a 200-base single-stranded DNA (G-tail). Previous studies have examined the higher-order structure of the G-tail. In addition to the overhang region, G4 clusters appear to be present in several other important regions of the human genome, such as the open reading frame that is involved in maintaining the viability of neurons in the central and peripheral nervous systems (Haeusler et al. 2014). Recent studies have suggested that G4 clusters are arranged in beads (Petraccone 2013). The ability of G4 structures shown in Fig. 2 to form beads and the interaction between adjacent beads are unknown. The G4 unit of the human telomere is suggested to exhibit a hybrid structure in solution. However, the presence of G4 unit as a series of beads with a hybrid structure is unknown. Experiments with synthetic oligonucleotides and computational simulation studies have provided structural insights into G4 clusters (Petraccone 2013). As shown in Fig. 14, the interaction between G4 units is weak (beads-on-a-string model) in a hybrid-1-hybrid-2 model (a) but strong in a hybrid-1parallel model (b) and a parallel-parallel model (c) (Frasson et al. 2022). In the parallel-parallel model, G-tetrad-mediated interactions appear to occur continuously. As the G4 interaction is considered to contribute to molecular crowding in the cell, hybrid-type G4 is considered to be stabilized under these conditions (Xu et al. 2011). Therefore, a series of parallel-type G4 may be stabilized by G-tetrad-mediated interactions. Although previous studies have focused on analyzing the interactions of G4 ligands with G4 alone, the above considerations demonstrate the importance of analyzing the binding of G4 ligands to a series of G4 units. Studies on the interaction of G4 clusters with G4 ligands have been increasing and have been reviewed by Zhao et al. (Zhao et al. 2020) and Frasson et al. (Frasson et al. 2022). Currently, three

Fig. 14 Two G4 unit structures that are part of a long telomeric sequence consecutive G4 units consisting of (a) Hybrid-1 & Hybrid-2, (b) Hybrid-1 & Parallel, or (c) Parallel & Parallel

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

927

(a)

19

Berberine dimer

20 21

(b)

(i)

(ii)

(iii)

Fig. 15 (a) Chemical structures of G4 ligands targeting G4 cluster, berberine dimer and 19, 20 and 21, and (b) image of three binding patterns (i)–(iii) of these ligands with G4 cluster structure carrying parallel-parallel structure

binding modes are known (summarized in Fig. 15b). In the first (i) mode, two G4 units do not interact with each other, and the G4 dimer is linked to the G4 ligand dimer. The linker length is known to be important. In the second (ii) mode, the G4 ligand enters the cleft between two interacting G4 units and stabilizes them. Some molecules are known to bind via stacking and are sandwiched between the upper and lower G4 planes, while some molecules form bridging interactions with the two G4 portions. In the third (iii) mode, a G4 dimer-type ligand does not bind to two G4 regions but to one G4 ligand that is stacked between two G4 planes and facilitates interaction through additional interactions, such as G4 groove binding. Several G4 ligands or G4 ligand dimers have been reported to exhibit this type of binding mode. An example of this binding mode is shown in Fig. 15a with NDI as the backbone. NDI 18, which is a monomer of tri-substituted NDI, is expected to selectively interact with the G-quartet plane of G4 via stacking. The binding ability of NDI 18 to the G4 dimer is stronger than that to the G4 monomer. Thus, NDI 18 may bind to link the linker amino group moiety and the phosphate moiety of the two G4 moieties (Pirota et al. 2021). The binding analysis of NDI dimer 19 to the G4 dimer suggests that one NDI unit enters between the G4 dimers, while the other NDI unit binds to the groove via mode (iii) rather than via mode (i) (Doria et al. 2019). Several examples of G4 ligand dimers with mode (i) binding are known (Zhao et al. 2020). Berberine dimers (Fig. 15) bind via mode (i). This mode of binding increased

928

S. Takenaka

the binding ability of the dimer by 508-fold when compared with that of the G4 monomer. The binding ability of the telomeric G4 dimer in non-natural systems decreases when the TAA spacer is elongated, indicating that the berberine dimer cooperatively binds and stabilizes the G4 dimer. Binding in mode (i) may not occur because NDI exhibits a strong self-stacking ability. However, the binding of the NDI dimer 20, which was developed by the Takenaka’s group, to the G4 dimer appeared to be in mode (i) (Takeuchi et al. 2019). NDI dimer 20 cooperatively binds to the G4 dimer and significantly stabilizes the Tm of the complex although intramolecular stacking occurs in aqueous solutions. Additionally, the telomerase inhibitory potential of 20 was stronger than that of the cNDI monomer (IC50 ¼ 0.01 μM). Several studies have recognized 700,000 possible four-strand formation sequences in the human genome. Yoshida et al. (Yoshida et al. 2018) revealed that 9651 of these 700,000 sequences exist as G4 clusters. Of these, 3.76 are gene regulatory regions. Among these, 95 are original genes. The study also reported several G4 clusters in the genome in addition to the telomere regions. A single G4 ligand is not sufficient to regulate these clusters. This suggests the importance of developing G4 cluster regulators.

Conclusions The dissociation rate constants of cationic derivatives possessing a NDI core, which binds to double-stranded DNA via threading intercalation, are slower than those of classical DNA intercalators. Such NDI derivatives have an electron-deficient aromatic ring and it strongly binds to G4 DNA through stacking interactions with the G-quartet plane, which is formed by the electron-rich aromatic ring of G. Previous studies have suggested that the G4 structure is involved in gene regulation. Although the mechanisms of this regulatory system have not been elucidated, ligands that stabilize the G4 structure are suggested to be potential anticancer agents. This review demonstrated that cNDI functions as a G4-selective ligand and that it may be an effective anticancer agent. cNDI sterically protects one side of the NDI ring with a cyclic linker and consequently functions as a G4-specific ligand by sterically inhibiting its binding to double-stranded DNA. Additionally, cNDI with a cyclohexyl group suppresses cancer cell proliferation by inhibiting telomerase activity through the stabilization of the G4 structure. G4 exhibits polymorphisms depending on its conformation. The structure of the cNDI ligand may enable the realization of specific conformations. The structural selectivity of cNDI is important for mediating its therapeutic effects. The current coronavirus disease pandemic is a major global health problem. Several G4-forming sequences can be potentially involved in gene regulation in the RNA genomes of viruses, such as coronaviruses (Zhao et al. 2020). Therefore, the study of G4 RNA ligands will be useful in the future.

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

929

References Andreeva V, Tikhomirov AS, Shchekotikhin AE (2021) Ligands of G-quadruplex nucleic acids. Russ Chem Rev 90:1–38 Ambrus A, Chen D, Dai J, Bialis T, Jones RA, Yang D (2006) Human telomeric sequence forms a hybrid-type intramolecular G-quadruplex structure with mixed parallel/antiparallel strands in potassium solution. Nucleic Acids Res 34:2723–2735 Artandi SE, DePinho RA (2010) Telomeres and telomerase in cancer. Carcinogenesis 31:9–18 Barros TC, Brochsztaina S, Toscanoa VG, Filho PB, Politi MJ (1997) Photophysical characterization of a 1,4,5,8-naphthalenediimide derivative. J Photochem Photobiol A Chem 111:97–104 Berman HM, Young PR (1981) The interaction of intercalating drugs with nucleic acids. Ann Rev Biophys Bioeng 10:87–114 Biffi G, Tannahill D, McCafferty J, Balasubramanian S (2013) Quantitative visualization of DNA G-quadruplex structures in human cells. Nat Chem 5:182–186 Bryan TM (2020) G-Quadruplexes at telomeres: friend or foe? Molecules 25:3686 Burge S, Parkinson GN, Hazel P, Todd AK, Neidle S (2006) Quadruplex DNA: sequence topology and structure. Nucleic Acids Res 34:5402–5415 Chu Y, Hoffman DW, Iverson BL (2009) A pseudocatenane structure formed between DNA and a cyclic bisintercalator. J Am Chem Soc 131:3499–3508 Collie GW, Promontorio R, Hampel SM, Micco M, Neidle S, Parkinson GN (2012) Structural basis for telomeric G-quadruplex targeting by naphthalene diimide ligands. J Am Chem Soc 134: 2723–2731 Czerwinska I, Sato S, Juskowiak B, Takenaka S (2014) Interactions of cyclic and non-cyclic naphthalene diimide derivatives with different nucleic acids. Bioorg Med Chem 22:2593–2601 Dai J, Punchihewa C, Ambrus A, Chen D, Jones RA, Yang D (2007) Structure of the intramolecular human telomeric G-quadruplex in potassium solution: a novel adenine triple formation. Nucleic Acids Res 35:2440–2450 David A, Bleimling N, Beuck C, Lehn JM, Weinhold E, Teulade-Fichou MP (2003) DNA mismatch-specific base flipping by a bisacridine macrocycle. Chembiochem 4:1326–1331 Doria F, Salvati E, Pompili L, Pirota V, D’Angelo C, Manoli F, Nadai M, Richter SN, Biroccio A, Manet I, Freccero M (2019) Dyads of G-Quadruplex ligands triggering DNA damage response and tumour cell growth inhibition at subnanomolar concentration. Chem Eur J 25:11085–11097 Esaki Y, Islam MM, Fujii S, Sato S, Takenaka S (2014) Design of tetraplex specific ligands: cyclic naphthalene diimide. Chem Commun (Camb) 50:5967–5969 Ehsanian R, Van Waes C, Feller SM (2011) Beyond DNA binding – a review of the potential mechanisms mediating quinacrine’s therapeutic activities in parasitic infections, inflammation, and cancers. Cell Commun Signal 9:13 Fedoroff OY, Salazar M, Han H, Chemeris VV, Kerwin SM, Hurley LH (1998) NMR-based model of a telomerase-inhibiting compound bound to G-quadruplex DNA. Biochemistry 37: 12367–12374 Frasson I, Pirota V, Richter SN, Doria F (2022) Multimeric G-quadruplexes: a review on their biological roles and targeting. Int J Biol Macromol 204:89–102 Fukuda H, Sato S, Zou T, Higashi S, Takahashi O, Habu M, Sasaguri M, Tominaga K, Takenaka S, Takeuchi H (2021) Substituent effects of cyclic naphthalene diimide on G-quadruplex binding and the inhibition of cancer cell growth. Bioorg Med Chem Lett 50:128323 Gellert M, Lipsett MN, Davies DR (1962) Helix formation by guanylic acid. Proc Natl Acad Sci U S A 48:2013–2018 Georgiades SN, Karim NHA, Suntharalingam K, Vilar R (2010) Interaction of metal complexes with G-quadruplex DNA. Angew Chem Int Ed Engl 49:4020–4034 Guschlbauer W, Chantot JF, Thiele D (1990) Four-stranded nucleic acid structures 25 years later: from guanosine gels to telomere DNA. J Biomol Struct Dyn 8:491–511

930

S. Takenaka

Haeusler AR, Donnelly CJ, Periz G, Simko EAJ, Shaw PG, Kim MS, Maragakis NJ, Troncoso JC, Pandey A, Sattler R, Rothstein JD, Wang J (2014) C9orf72 nucleotide repeat structures initiate molecular cascades of disease. Nature 507:195–200 Hampel SM, Sidibe A, Gunaratnam M, Riou JF, Neidle S (2010) Tetrasubstituted naphthalene diimide ligands with selectivity for telomeric G-quadruplexes and cancer cells. Bioorg Med Chem Lett 20:6459–6463 Hänsel R, Löhr F, Foldynová-Trantírková S, Bamberg E, Trantírek L, Dötsch V (2011) The parallel G-quadruplex structure of vertebrate telomeric repeat sequences is not the preferred folding topology under physiological conditions. Nucleic Acids Res 39:5768–5775 Hao X, Wang C, Wang Y, Li C, Hou J, Zhang F, Kang C, Gao L (2021) Topological conversion of human telomeric G-quadruplexes from hybrid to parallel form induced by naphthalene diimide ligands. Int J Biol Macromol 167:1048–1058 Herbert BS, Hochreiter AE, Wright WE, Shay JW (2006) Nonradioactive detection of telomerase activity using the telomeric repeat amplification protocol. Nat Protoc 1:1583–1590 Islam MM, Fujii S, Sato S, Okauchi T, Takenaka S (2015a) A selective G-quadruplex DNA-stabilizing ligand based on a cyclic naphthalene diimide derivative. Molecules 20: 10963–10979 Islam MM, Fujii S, Sato S, Okauchi T, Takenaka S (2015b) Thermodynamics and kinetic studies in the binding interaction of cyclic naphthalene diimide derivatives with double stranded DNAs. Bioorg Med Chem 23:4769–4776 Islam MM, Sato S, Shinozaki S, Takenaka S (2017) Cyclic ferrocenylnaphthalene diimide derivative as a new class of G-quadruplex DNA binding ligand. Bioorg Med Chem Lett 27:329–335 Jamroskovic J, Obi I, Movahedi A, Chand K, Chorell E, Sabouri N (2019) Identification of putative G-quadruplex DNA structures in S. pombe genome by quantitative PCR stop assay. DNA Repair 82:102678 Jazwinski J, Blacker AJ, Lehn JM, Cesario M, Guilhem J, Pascard C (1987) Cyclo-bisintercalands: synthesis and structure of an intercalative inclusion complex, and anion binding properties. Tetrahedron Lett 28:6057–6060 Kaiser M, Cian AD, Sainlos M, Renner C, Mergny J-L, Teulade-Fichou M-P (2006) Neomycincapped aromatic platforms: Quadruplex DNA recognition and telomerase inhibition. Org Biomol Chem 4:1049–1057 Kaneyoshi S, Eguchi N, Fujimoto K, Fujji S, Sato S, Takenaka S (2022) Cyclic ferrocenylnaphthalene diimids as a probe for electrochemical telomerase assay. J Inorg Biochem 230:111746 Kaneyoshi S, Zou T, Ozaki S, Takeuchi R, Udou A, Nakahara T, Fujimoto K, Fujii S, Sato S, Takenaka. (2020) Cyclic naphthalene diimide with a ferrocene moiety as a redox-active tetraplex-DNA ligand. Chem Eur J 26:139–142 Kobaisi MA, Bhosale SV, Latham K, Raynor AM, Bhosale S.V. Functional naphthalene diimides: synthesis, properties, and applications. Chem Rev 2016; 116: 11685–11796 Lagnado J (2013) The story of quadruplex DNA-it started with a bang! Biochemist 35:44–46 Ma Y, Iida K, Nagasawa K (2020) Topologies of G-quadruplex: biological functions and regulation by ligands. Biochem Biophys Res Commun 531:3–17 Marchetti C, Zyner KG, Ohnmacht SA, Robson M, Haider SM, Morton JP, Marsico G, Vo T, Laughlin-Toth S, Ahmed AA, Vita GD, Pazitna I, Gunartnam M, Besser RJ, Andrade ACG, Diocou S, Pike JA, Tannahill D, Pedley RB, Evans TRJ, Wilson WD, Balasubramanian S, Neidle S (2018) Targeting multiple effector pathways in pancreatic ductal adenocarcinoma with a G-quadruplex-binding small molecule. J Med Chem 61:2500–2517 Marchetti C, Minarini A, Tumiatti V, Moraca F, Parrotta L, Alcaro S, Rigo R, Sissi C, Gunaratnam M, Ohnmacht SA, Neidlee S, Milelli A (2015) Macrocyclic naphthalene diimides as G-quadruplex binders. Bioorg Med Chem 23:3819–3830 Martinho N, Santos TCB, Florindo HF, Silva LC (2019) Cisplatin-membrane interactions and their influence on platinum complexes activity and toxicity. Front Physiol 9:1898

27

Cyclic Naphthalene Diimide Derivatives as Novel DNA Ligands

931

McKnight RE (2013) Elkordy AA (ed) Insights into the relative DNA binding affinity and preferred binding mode of homologous compounds using isothermal titration calorimetry (ITC), applications of calorimetry in a wide context – differential scanning calorimetry, isothermal titration calorimetry and microcalorimetry. Headquarters IntechOpen Limited, UK, pp 129–152 Monchaud D, Granzhan A, Saettel N, Guédin A, Mergny JL, Teulade-Fichou MP (2010) “One ring to bind them all”-part I: the efficiency of the macrocyclic scaffold for G-quadruplex DNA recognition. J Nucleic Acids 2010:525862 Monchaud D, Teulade-Fichou MP (2008) A hitchhiker’s guide to G-quadruplex ligands. Org Biomol Chem 6:627–636 Monsen RC, Chakravarthy S, Dean WL, Chaires JB, Trent JO (2021) The solution structures of higher-order human telomere G-quadruplex multimers. Nucleic Acids Res 49:1749–1768 Parkinson GN, Cuenca F, Neidle S (2008) Topology conservation and loop flexibility in quadruplex–drug recognition: crystal structures of inter- and intramolecular telomeric DNA quadruplex–drug complexes. J Mol Biol 381:1145–1156 Petraccone L, Pagano B, Giancola C (2012) Studying the effect of crowding and dehydration on DNA G-quadruplexes. Methods 57:76–83 Petraccone L (2013) High-order quadruplex structures. Top Curr Chem 330:23–46 Phan AT, Kuryavyi V, Gaw HY, Patel DJ (2005) Small-molecule interaction with a five-guaninetract G-quadruplex structure from the human MYC promoter. Nat Chem Biol 1:167–173 Phan AT (2010) Human telomeric G-quadruplex: structures of DNA and RNA sequences. FEBS J 277:1107–1117 Pirota V, Nadai M, Doria F, Richter SN (2019) Naphthalene diimides as multimodal G-quadruplexselective ligands. Molecules 24:426 Pirota V, Platella C, Musumeci D, Benassi A, Amato J, Pagano B, Colombo G, Frecceroa M, Doria F, Montesarchio D (2021) On the binding of naphthalene diimides to a human telomeric G-quadruplex multimer model. Int J Biol Macromol 166:1320–1334 Platella C, Napolitano E, Riccardi C, Musumeci D, Montesarchio D (2021) Disentangling the structure–activity relationships of naphthalene diimides as anticancer G-quadruplex-targeting drugs. J Med Chem 64:3578–3603 Reed J, Gunaratnam M, Beltran M, Reszka AP, Vilar R, Neidle S (2008) TRAP–LIG, a modified telomere repeat amplification protocol assay to quantitate telomerase inhibition by small molecules. Anal Biochem 380:99–105 Ren J, Chaires JB (1999) Sequence and structural selectivity of nucleic acid binding ligands. Biochemistry 38:16067–16075 Sanders JL, Newman AB (2013) Telomere length in epidemiology: a biomarker of aging, age-related disease, both, or neither? Epidemiol Rev 35:112–131 Sato S, Kondo H, Nojima T, Takenaka S (2005) Electrochemical telomerase assay with ferrocenylnaphthalene diimide as a tetraplex DNA-specific binder. Anal Chem 77:7304–7309 Sato S, Takenaka S (2017) Ferrocenyl naphthalene diimides as tetraplex DNA binders. J Inorg Biochem 167:21–26 Sato S, Takenaka S (2008) Linker effect of ferrocenylnaphthalene diimide ligands in the interaction with double stranded DNA. J Organomet Chem 693:1177–1185 Sur S, Tiwari V, Sinha D, Kamran MZ, Dubey KD, Kumar GS, Tandon V (2017) Naphthalenediimide-linked bisbenzimidazole derivatives as telomeric G-quadruplex-stabilizing ligands with improved anticancer activity. ACS Omega 2:966–980 Takahashi S, Kotar A, Tateishi-Karimata H, Bhowmik S, Wang ZF, Chang T-C, Sato S, Takenaka S, Plavec J, Sugimoto N (2021) Chemical modulation of DNA replication along G-quadruplex based on topology-dependent ligand binding. J Am Chem Soc 143:16458–16469 Takenaka S (1999) Threading intercalators as new DNA structure probe. Bull Chem Soc Jpn 72: 327–337 Takeuchi R, Zou T, Wakahara D, Nakano Y, Sato S, Takenaka S (2019) Cyclic naphthalene diimide dimer with a strengthened ability to stabilize dimeric G-quadruplex. Chem Eur J 25:8691–8695

932

S. Takenaka

Tanious FA, Yen SF, Wilson WD (1991) Kinetic and equilibrium analysis of threading intercalation mode: DNA sequence and ion effect. Biochemistry 30:1813–1819 Tu J, Duan M, Liu W, Lu N, Zhou Y, Sun X, Lu Z (2021) Direct genome-wide identification of G-quadruplex structures by whole-genome resequencing. Nat Commun 12:6014 Vasimalla S, Sato S, Takenaka F, Kurose Y, Takenaka S (2017) Cyclic perylene diimide: selective ligand for tetraplex DNA binding over doubles stranded DNA. Biorog Med Chem 25: 6404–6411 Vigneron JP (1999) Supramolecular bioorganic chemistry: nucleic acids recognition and synthetic vectors for gene transfer. Molecules 4:180–203 Wang W, Hu S, Gu Y, Yan Y, Stovall DB, Li D, Sui G (2020) Human MYC G-quadruplex: from discovery to a cancer therapeutic target. Biochim Biophys Acta Rev Cancer 1874:188410 Williams LD, Egli M, Qi G, Bash P, van der Marel GA, van Boom JH, Rich A, Frederick CA (1990) Structure of nogalamycin bound to a DNA hexamer. Proc Natl Acad Sci U S A 87:2225–2229 Wright WE, Tesmer VM, Huffman KE, Levene SD, Shay JW (1997) Normal human chromosomes have long G-rich telomeric overhangs at one end. Genes Dev 11:2801–2809 Xu L, Feng S, Zhou X (2011) Human telomeric G-quadruplexes undergo dynamic conversion in a molecular crowding environment. Chem Commun (Camb) 47:3517–3519 Yaku H, Murashima T, Tateishi-Karimata H, Nakano S, Miyoshi D, Sugimoto N (2013) Study on effects of molecular crowding on G-quadruplex-ligand binding and ligand-mediated telomerase inhibition. Methods 64:19–27 Yen SF, Gabbay EJ, Wilson WD (1982) Interaction of aromatic imides with deoxyribonucleic acid. Spectrophotometric and viscometric studies Biochemistry 21:2070–2076 Yoshida W, Saikyo H, Nakabayashi K, Yoshioka H, Bay DH, Iida K, Kawai T, Hata K, Ikebukuro K, Nagasawa K, Karube I (2018) Identification of G-quadruplex clusters by highthroughput sequencing of whole-genome amplified products with a G-quadruplex ligand. Sci Rep 8:3116 Zhang J-M, Zou L (2020) Alternative lengthening of telomeres: from molecular mechanisms to therapeutic outlooks. Cell Biosci 10:30 Zhao C, Qin G, Niu J, Wang Z, Wang C, Ren J, Qu X (2020) Targeting RNA G-quadruplex in SARS-CoV-2: a promising therapeutic target for COVID-19? Angew Chem Int Ed Engl 60: 432–438 Zhao J, Zhai Q (2020) Recent advances in the development of ligands specifically targeting telomeric multimeric G-quadruplexes. Bioorg Chem 103:104229 Zou T, Sato S, Yasukawa R, Takeuchi R, Ozaki S, Fujii S, Takenaka S (2020) The interaction of cyclic naphthalene diimide with G-quadruplex under molecular crowding condition. Molecules 25:668 Zuffo M, Guédin A, Leriche E-D, Doria F, Pirota V, Gabelica V, Mergny J-L, Freccero M (2018) More is not always better: finding the right trade-off between affinity and selectivity of a G-quadruplex ligand. Nucleic Acids Res 46:e115

Imaging Study of Small Molecules to G-Quadruplexes in Cells

28

Ting-Yuan Tseng and Ta-Chau Chang

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Development of G4 Fluorescent Probes to Study G4s In Vivo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BMVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o-BMVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BMVC-nC-P and BMVC-8C3O-P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o-BMVC-nC-P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o-2B-P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fluorescence Images for Identifying the Existence of Endogenous G4s In Vivo . . . . . . . . . . . . . Visualization of Telomeric G4s in Metaphase Chromosomes by BMVC . . . . . . . . . . . . . . . . . . Detection of G4s in Live Cancer Cells by o-BMVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detection of G4 Foci by BG4 Antibody in Fixed Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o-BMVC Foci are G4 Foci in Fixed Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Telomeric G4s Detected in Fixed Cells by Antisense DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detection of Mitochondrial G4s in Live Cancer Cells by o-BMVC-12C-P . . . . . . . . . . . . . . . . Binding of Small Molecules to G4s in Fixed Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Imaging Study of G4 Ligands Binding to Exogenous G4s in Live Cells . . . . . . . . . . . . . . . . . . . . . Cellular Response to Exogenous G4s in Live Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G4 Dynamics of Exogenous G-Rich Oligonucleotides in Live Cells . . . . . . . . . . . . . . . . . . . . . . o-BMVC Foci as a Biosensor for Clinical Cancer Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DNA Damage May Facilitate G4 Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o-BMVC Test for Clinical Cancer Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Fluorescent Probes for the Imaging Study of G4s in Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carbazole Derivatives and BMVC Analogues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NBTE for FLIM Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DAOTA-M2 for FLIM Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conjugates of G4 Fluorescent Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

934 938 939 941 941 942 944 945 945 947 947 948 949 951 953 955 955 956 958 959 959 961 961 962 963 963 964 966

T.-Y. Tseng · T.-C. Chang (*) Institute of Atomic and Molecular Sciences, Academia Sinica, Taipei, Taiwan e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_32

933

934

T.-Y. Tseng and T.-C. Chang

Abstract

Many guanine-rich oligonucleotides are found in the human genome, which may fold into G-quadruplex (G4) structures under physiological conditions. Accumulating evidence suggests that G4 structures are associated with genome instability, genetic diseases, and cancer progression. The development of small molecules (G4 ligands) that aim to bind G4s has received increasing attention not only for studying the biological roles of G4s but also for using G4 ligands in cancer therapy and diagnosis. However, the existence of G4 structures in vivo has been debated for a long time. An imaging study with distinct fluorescent probes has been proposed to visualize cellular responses and monitor probe–target interaction in cells at molecular level. As the pioneering group in developing G4 fluorescent probes, here, we focus on the origins, progressions, and prospects of systematic development of carbazole-based fluorescent probes for the imaging study to verify the existence of G4 structures and to identify G4 ligands that bind to G4s in cells. The time-gated fluorescence lifetime imaging microscopy of our G4 fluorescent probes verifies that G4s can be detected in the nuclei, mitochondria, and lysosomes of cancer cells. In addition, the analyzed binary images reveal more G4 signals in cancer cells than in normal cells. We further highlight the quantitative measurement of G4 signals that can be used as markers in clinical cancer diagnosis and the study of cellular response to exogenous G4s and their G4 dynamics in live cells. Keywords

In-cell imaging study · G-quadruplexes · Fluorescent probes · Carbazole derivatives · Fluorescence lifetime imaging microscopy · o-BMVC foci · G4 dynamics in live cells · Cancer diagnosis

Introduction Guanine-rich oligonucleotides (GROs) can fold into four-stranded structures as G-quadruplexes (G4s) through stacking of planar G-quartets with Hoogsteen hydrogen bonding of four guanines under physiological conditions. The first report of G-quartet formation based on a diffraction pattern of the aggregation of four guanines was documented by Gellert et al. (Gellert et al. 1962). After two decades, various G-quadruplex (G4) structures formed from telomeric G-rich sequences of ciliated protozoa and an immunoglobulin switch region were reported (Lipps et al. 1982; Sen and Gilbert 1988), indicating the existence of G4 structures in vitro. Furthermore, Moyzis et al. found a highly conserved repetitive DNA sequence (TTAGGG)n, in the telomeres of human chromosomes (Moyzis et al. 1988). Wang and Patel determined the nuclear magnetic resonance (NMR) structure of a human telomere [AG3(T2AG3)3] (HT22) in a Na+ solution as an antiparallel G4 form (Wang and Patel 1993). In addition, Parkinson et al. obtained the crystal structure of HT22

28

Imaging Study of Small Molecules to G-Quadruplexes in Cells

935

Fig. 1 G-quartet and different types of G4 structures

through X-ray diffraction in the presence of K+ as a parallel G4 form (Parkinson et al. 2002). Moreover, slight variations in human telomeric sequences can form different G4 structures (Heddi and Phan 2011). Figure 1 shows some of the G4 structures found in vitro. Although these G4 structures can be easily formed in vitro, it was unclear whether such G4 structures can also be detected in vivo. Schaffitzel et al. were the first to detect the telomeric G4 structures in vivo by using a specific antibody in ciliate Stylonychia (Schaffitzel et al. 2001). However, the existence of G4 structures in cells has been debated for a long time (Bryan 2020; Lipps and Rhodes 2009). The G-rich sequences in telomeres play an essential role in motivating the progress of the G4 study since the G4 formation in the telomeres could have biological relevance to cancer. This is because telomeric length gradually shortens with each cell division until replicative senescence. Notably, cancer cells require telomere maintenance for unlimited proliferation. Telomerase is a ribonucleoprotein that is involved in telomere replication. The telomerase activity is detected in most cancer cells but is hardly detected in most somatic cells. Since the G4s formed in the telomeres are not a substrate of telomerase, small molecules that can induce G4 formation and/or stabilize the G4 structure have the potential to inhibit the telomerase activity (Mergny and Helene 1998; Sun et al. 1997). Thus, telomeric G4s could be a therapeutic target for developing G4 ligands as potential anticancer agents (Hurley 2002; Neidle and Parkinson 2002). The first G4 ligand that could stabilize telomeric G4s and inhibit the telomerase activity was 2,6-diamidoanthraquinone (Sun et al. 1997). Since then, hundreds of G4 ligands have been evaluated in vitro, but only a few, such as TMPyP4, BRACO-19, telomestatin, and RHPS4, have been examined in tumor xenograft models. However, many G4 ligands preferentially bind to G4s over duplexes in vitro. Given that the large amounts of duplex and random coil DNA overwhelm the possible formation of very small amounts of G4s in cells, it

936

T.-Y. Tseng and T.-C. Chang

is challenging to provide evidence that these G4 ligands indeed bind to the telomeric G4s in cells. In addition to the telomeric G4s, Siddiqui-Jain et al. reported the first G4s in the c-MYC promoter region of oncogene and demonstrated that the G4 ligand TMPyP4 could stabilize the G4s to repress c-MYC transcription (Siddiqui-Jain et al. 2002). Since then, many GROs have been identified in other promoters, such as VEGF, c-KIT, BCL2, and KRAS, which could form G4s to regulate their gene expression. Besides the G4 DNA, Kim et al. reported a very stable tetrameric G4 formation derived from the 5S RNA of Escherichia coli (Kim et al. 1991). Kumari et al. discovered RNA G4s in the 50 untranslated region (UTR) of the NRAS oncogene, which can inhibit gene expression at the translational level (Kumari et al. 2007). Azzalin et al. demonstrated that human telomeres are transcribed into telomeric repeat-containing RNA (TERRA) (Azzalin et al. 2007). Interestingly, the TERRA sequence folds into a parallel G4 structure, which differs from the structural diversity of the telomeric DNA G4s (Phan 2010). Recently, a high-throughput sequencingbased method has revealed over 700,000 putative G4-forming sequences (PQSs) in the human genome (Chambers et al. 2015). Among them, over 10,000 G4 structures have been identified in the human chromatin of cancer cells by G4 ChIP-seq using an antibody specific to the G4 structures (BG4). Notably, the distribution of the G4s is not random, implying a functional role of G4s in the genome. The high density of G4s was found in telomeres, promoters, and 50 -UTR of the transcribed genes, RNA splicing, and particularly in cancer-related genes (Hansel-Hertsch et al. 2016, 2017). Furthermore, DNA helicase proteins can unfold the G4s for DNA replication (Brosh 2013). Since there are abundant PQSs in the human genome, unwinding G4s by RecQ helicases may be important for chromosome stability. The existence of such proteins that can bind and unfold the G4 structures supports the existence of G4s in cells (Varshney et al. 2020). Considering the significance of G4s in cancer, it would be great if one could visualize and identify such G4 structures in vivo and monitor the G4 dynamics and kinetics in telomere, gene regulation, and functional genomics in cells. The fascinating evolvement of quantum mechanism, particularly the Planck’s constant obtained from two independent experiments: blackbody radiation and photoelectric effect inspired us to propose an imaging approach in addition to the biological study for verifying the existence of G4 structures in vivo. Thus, a fluorescent probe is a key element for the imaging study of G4s. Guo et al. reported that ethidium bromide dye could be utilized to distinguish the different G4 structures formed between dT4G4 and d(T4G4)4 (Guo et al. 1992). Chen et al. found an additional absorbance peak of the 3,30 -diethyloxadicarbocyanine (DODC) dye at 534 nm upon interaction with a dimeric hairpin G4 structure (Chen et al. 1996). We further used vibronic spectra to examine the binding modes of the DODC dye with the dimeric G4 structures (Cheng et al. 1998). Arthanari et al. detected the increase in the N-methyl mesoporphyrin fluorescence upon interaction with G4s and proposed that dye molecules could be used in fluorescence microscopy to detect G4s in chromosomes (Arthanari et al. 1998). To our knowledge, there were no reports on

28

Imaging Study of Small Molecules to G-Quadruplexes in Cells

937

using dye molecules to verify the existence of human telomeric G4 structures in cells at that time. This led us to develop new G4 fluorescent ligands with dual characteristics of the G4 stabilizer and fluorescent probe. Fluorescence imaging is a promising tool for visualizing cellular responses and monitoring probe/target interaction in cells at the molecular level. Finding a novel G4 fluorescent probe that has distinct fluorescent properties and a high binding preference for telomeric G4s is a prerequisite for the imaging study of G4s in vivo. Accordingly, we designed and synthesized a number of carbazole-based molecules to study telomeric G4s (Chang et al. 2003, 2005). Among them, a 3,6-bis(1-methyl4-vinylpyridinium) carbazole diiodide (BMVC) molecule is not only a novel G4 fluorescent probe but also a potential anticancer agent, while a 3,6-bis(1-methyl-2vinylpyridinium) carbazole diiodide (o-BMVC) molecule is an even better G4 fluorescent probe than BMVC but not an anticancer agent. BMVC was the first fluorescent probe applied to verify the existence of telomeric G4 structures in metaphase chromosomes (Chang et al. 2004). Fluorescence lifetime imaging microscopy of BMVC and o-BMVC was performed to visualize the telomeric G4s in the metaphase chromosomes (Chang et al. 2006) and live cells (Tseng et al. 2013a), respectively. Two specific monoclonal antibodies, BG4 (Biffi et al. 2013) and 1H6 (Henderson et al. 2014), were employed to provide convincing evidence using immunofluorescence microscopy to support the presence of G4 foci in the metaphase chromosomes and fixed cells. Given that the antibody is permeable to fixed cells and fluorescent probe is simpler and cost-effective, the detection of G4 foci in cells using specific antibodies has promoted the development of fluorescent probes for the imaging study of G4s. During the last decade, a number of excellent reviews have discussed the use of G4 ligands for therapeutic studies (Kosiol et al. 2021; Neidle 2017), G4 fluorescent probes for imaging study (Chilka et al. 2019; Ma et al. 2015), and specific G4 ligands (Cadoni et al. 2021; Muller et al. 2021). Here, we have only considered two studies for each category because of the restriction of references. The most popular G4 ligands and probes have been thoroughly described and will not be discussed herein. As the pioneering group in developing G4 fluorescent probes for the cell-based imaging study, we will describe the challenges involved in the development of BMVC and its derivatives. In this review, we first describe systematic development of BMVC-based G4-binding ligands and then focus on the potential use of fluorescent probes in the imaging study of G4s in cells, which includes verifying the existence of telomeric G4s in the metaphase chromosomes (Chang et al. 2004), visualizing G4s in live cells (Tseng et al. 2013a), particularly identifying the existence of telomeric G4s (Tseng et al. 2020), detecting mitochondrial G4s in live cells (Huang et al. 2015), screening cancer cells for cancer diagnosis (Tseng et al. 2018a), examining the binding of the G4 ligands to the G4s in cells (Tseng et al. 2018b), illustrating cellular responses to different exogenous G4s (Tseng et al. 2013b), monitoring the G4 dynamics of exogenous G-rich sequences in live cells (Tseng et al. 2022), and providing an objective method for the clinical diagnosis of fine-needle aspiration (FNA) of thyroid nodules (Tseng et al. 2021). We further

938

T.-Y. Tseng and T.-C. Chang

briefly discuss other carbazole derivatives and BMVC analogues for developing G4 fluorescent ligands.

Development of G4 Fluorescent Probes to Study G4s In Vivo Accumulating evidence suggests that G4s play an important role in genome instability, genetic diseases, and cancer progression (Kosiol et al. 2021; Neidle 2017). Therefore, developing specific binding molecules has attracted significant attention due to the biological significance of G4s for the basic study and design of anticancer agents for medical uses. We began designing and synthesizing G4 fluorescent ligands in 2000. Given that the major binding sites of G4s are planar G-quartets, grooves, and loops, most G4 ligands contain an aromatic chromophore that can stack at the end of G-quartets by π-π interaction, extended side chains that can extend into the grooves or loops by possible hydrogen bonding, and a cationic charge at the end of the side chain that can interact with the negative charges of phosphate backbones by electrostatic interaction. Accordingly, we have designed and synthesized a number of carbazole derivatives with a pyridinium cation at the end of the vinyl side chain to study G4s in cells (Chang et al. 2005, 2013). Considering the chromatin integrity, the most likely sequence to form G4s is the single-stranded (ss) G-rich sequence at the end of telomeres in cells. Particularly, it is relatively easy to identify through the imaging study of metaphase chromosomes. Thus, telomeric G4 structures have been utilized to screen the G4 ligands of our synthesized molecules by examining the change in the melting temperature of the telomeric G4s for the use of G4 stabilizers and the distinct difference in wavelength and intensity from emission spectra for the use of G4 fluorescent probes in K+ solution. In addition, cell viability assays have been used to examine whether the G4 stabilizers can inhibit the growth of cancer cells, and wide-field fluorescence images have been applied to visualize the cellular localization of the G4 fluorescent probes in live cells. Figure 2 shows the chemical structures of our designed and synthesized BMVC and the BMVC derivatives described in this review. These molecules have been characterized by the duality of the G4 stabilizer and fluorescent probe. In addition, the wide-field fluorescence images indicated that BMVC, BMVC-4C-P, BMVC-5CP, BMVC-8C-P, and BMVC-8C3O-P are mainly detected in the nuclei of the cancer cells, while others are rarely detected in the nuclei of the cancer cells. Moreover, the colony formation showed that most of them can inhibit the growth of cancer cells; however, o-BMVC, o-BMVC-4C-P, and o-BMVC-6C-P show no such effect. Further study of a tumor xenograft revealed that BMVC and BMVC-12C-P could inhibit tumor growth in nude mice. Thus, BMVC and BMVC-12C-P are also potential anticancer agents. Furthermore, a hybrid BMVC-porphyrin photosensitizer, o-2B-P, is discussed for developing hybrid compounds for imaging studies (Kang et al. 2008). Understanding principles governing selective and sensitive targets of cancer cells is critical for developing chemicals for cancer diagnoses and therapeutics.

28

Imaging Study of Small Molecules to G-Quadruplexes in Cells

939

Fig. 2 Chemical structures of BMVC and its derivatives reviewed in this article

BMVC BMVC is the first fluorescent probe designed to study G4s in vivo (Chang et al. 2004). Since BMVC is also a potential anticancer agent, it has been the most extensively studied. Here, we provide a brief overview of it. BMVC shows very weak fluorescence in an aqueous solution due to the free torsional motion of the vinyl bridge. After BMVC binds to the duplex and G4 DNA, its fluorescence intensity significantly increases by ~100-fold. This is because the torsional motion of the vinyl bridge is impeded by groove binding with duplex DNA (Dumat et al. 2011) and π-π stacking with the external G-quartets of G4s (Liu et al. 2019; Yang et al. 2007). Previously, molecular modeling showed that the external stacking of BMVC to two ends of the G-quartet of the telomeric G4s is the most stable binding mode for a 2:1 binding model (Yang et al. 2007). Recently, an NMR study of BMVC binding to MYC G4s with 1:1 and 2:1 ratios showed that BMVC first binds to the 50 end of the MYC G4s and then to the 30 end at higher ligand ratios (Liu et al. 2019). The important finding is that BMVC undergoes a conformational adjustment to a contracted form to perfectly match the external G-quartet for optimizing stacking interaction. It appears that the vinyl side chains play a critical role in the BMVC fluorescence, which is sensitive to the local environment. Bright fluorescence was observed in the nuclei of live cancer cells but not in the nuclei of live normal cells (Fig. 3a). Owing to the significant fluorescence enhancement of BMVC in cancer cells compared to normal cells, BMVC can be applied as a vital fluorescent marker for use in clinical cancer diagnosis of live cells (Kang et al. 2007). Confocal microscopy showed that BMVC could be detected in both nuclei and mitochondria in live cancer cells, while it is retained in the lysosomes and excluded from the nuclei and mitochondria of live normal cells (Fig. 3b). This is

940

T.-Y. Tseng and T.-C. Chang

Fig. 3 Dual characteristics of BMVC. (a) Wide-field fluorescence images of CL1-0 cancer cells (left) and MRC-5 normal cells (right). (b) Confocal fluorescence images of CL1-0 (left) and MRC-5 cells (right) show that BMVC (green) co-stains with MitoTracker red (red) and Hoechst 33342 (blue) in CL1-0 cells and co-stains with LysoTracker red (red) and Hoechst 33342 (blue) in MRC-5 cells. (c) BMVC and Texas Red-labeled dextran were microinjected into the cytoplasm of BJ primary human normal foreskin fibroblasts and incubated for 5 min. The incubation medium was changed, and live cells were visualized using confocal microscopy. BMVC (green) was concentrated in the nucleus, while Texas Red-labeled dextran (red) was restricted to the cytoplasm. (d) CD signal at 295 nm for the measurement of melting temperature of HT24 and its complexes of carbazole and BMVC as a function of temperature. (e) Effects of BMVC on telomere length. H1299 cells were treated with BMVC, and total genomic DNA was prepared and analyzed for telomere length by Southern blotting using telomeric DNA as the probe. (f) BMVC suppresses tumor formation. Nude mice (n ¼ 6) were each injected with 2  106 H1299 cells. BMVC at 1 mg/kg was injected every 3 d. The tumor size was then measured. Notably, the body weight was also measured and showed no appreciable difference

because lysosomal retention of BMVC prevents BMVC from gaining access to the nuclei in normal cells revealed by the microinjection of BMVC into the cytoplasm of normal cells to bypass the transport of BMVC to lysosomes via endocytosis. The quick detection of BMVC in the nuclei of the injected cells indicated that once outside the lysosome, BMVC can freely access the nuclei of normal cells (Fig. 3c) (Kang et al. 2013). Further structure-function analysis of the BMVC derivatives suggested that the hydrogen-bonding capacity plays a role in regulating the lysosomal retention in normal cells (Kang et al. 2013). In addition, BMVC could increase the melting temperature of the telomeric G4s by ~10  C (Fig. 3d), inhibit the telomerase activity, and induce accelerated telomere shortening that leads to accelerated senescence in H1299 cancer cells (Fig. 3e). Moreover, BMVC could reduce the tumorigenicity of the cancer cells xenografted into mice (Fig. 3f) (Huang et al. 2008). Further study showed that BMVC repressed the gene expression of WNT1 in a G4-dependent manner to inhibit WNT1-mediating migration and invasion (Wang et al. 2014). It appears that BMVC can bind to the telomeric G4s and other promoter G4s to inhibit cancer proliferation. Neidle considered that the binding of multiple G4s could be advantageous for therapeutic application (Neidle 2017). To our knowledge, BMVC is the first G4 theranostic agent designed for cancer cells.

28

Imaging Study of Small Molecules to G-Quadruplexes in Cells

941

o-BMVC The o-BMVC molecule is an isomer of BMVC, which is an even better G4 fluorescent probe than BMVC. The binding affinity of o-BMVC to the telomeric G4 DNA is better than duplex DNA by approximately two orders of magnitude and even better than ss DNA (Tseng et al. 2013a). This is because the para-pyridinium cation in BMVC interacts with the groove of duplex DNA more easily and directly than the ortho-pyridinium cation in o-BMVC. Although o-BMVC increased the melting temperature of the telomeric G4s in a K+ solution by ~10  C, it showed no effect on inhibiting the growth of cancer cells under the same condition as BMVC. Confocal microscopy showed that the fluorescence of o-BMVC is rarely detected in the nuclei but mainly in the mitochondria of cancer cells. Different cellular localizations of o-BMVC and BMVC have been described by their different lipophilicities (Kang et al. 2013). Nevertheless, the results clearly indicate that only a difference in the para-form vs. the ortho-form in the side chains of the BMVC isomers plays a critical role in their cellular localizations in live cancer cells. Such a finding is important for drug development, particularly for carbazole derivatives, which are promising candidates for anticancer chemotherapeutics (Issa et al. 2019).

BMVC-nC-P and BMVC-8C3O-P To further examine the effect of lipophilicity in cellular localization, the substitution of an alkyl linker terminated with an N-methyl-piperidinium cation at the N-9 position of BMVC (BMVC-nC-P) was conducted, where n ¼ 4, 5, 8, 9, 12 (Kang et al. 2013). In addition, it is important to identify any change in the G4 stability and fluorescent property of BMVC upon such a modification for its potential use in hybrid compounds. The results indicated a similar effect – the increase in the melting temperature and enhanced fluorescence intensity of the telomeric G4s – as BMVC. Importantly, confocal microscopy clearly indicated that BMVC-9C-P and BMVC12C-P are hardly detected in the nuclei but primarily observed in the mitochondria, whereas BMVC-4C-P, BMVC-5C-P, and BMVC-8C-P localize preferentially in the nuclei rather than in the mitochondria of live CL1-0 cells (Fig. 4a). Murphy suggested that cationic, lipophilic molecules tend to target and accumulate in the mitochondria due to the negative mitochondrial membrane potential (Murphy 2008). The lipophilicity of each BMVC derivative was calculated from the logarithm of the octanol–water partition coefficient (log P). Consistent with the prediction, the BMVC-nC-P molecules with low lipophilicity (log P < ~2.15) localize mainly in the nucleus, whereas those with higher lipophilicity (log P > ~2.0) localize primarily in the mitochondria (Fig. 4b). In addition, the BMVC-8C3O-P molecule has lower lipophilicity and localizes mainly in the nucleus of cancer cells. Of importance is that BMVC-8C3O-P can induce a conformational change of telomeric G4s from nonparallel to parallel topologies in vitro (Wang and Chang 2012). Moreover, BMVC8C3O-P can refold the disrupted nonparallel G4 structures of exogenous HT23 into a parallel G4 structure in the cytoplasm of the CL1-0 cells (Tseng et al. 2013b).

942

T.-Y. Tseng and T.-C. Chang

Fig. 4 Imaging study and potential use of BMVC-nC-P and BMVC-8C3O-P. (a) Confocal fluorescence images of CL1-0 cells incubated with BMVC-nC-P (green) and overlaid with Hoechst 33342 (blue) and MitoTracker red (red). (b) Lipophilicity of BMVC-nC-P and BMVC-8C3O-P correlates with mitochondrial localization in CL1-0 cells. (c) Cell viability assays of various cancer cell lines (solid line) and normal cell lines (dash line) after incubation with BMVC-12C-P from 0 μM to 20 μM for 72 h measured by MTT assay. (d) Delayed anti-proliferation activity of BMVC12C-P. CL1-0 cancer and MRC-5 normal cell lines were treated with 0.5 μM BMVC-12C-P. The number of cells was counted during the passages, and the population doubling was determined

In addition, the current study suggests that BMVC-8C3O-P is a potential anticancer agent. Furthermore, 0.5 μM BMVC-12C-P could induce mitochondrial dysfunction, resulting in cancer cell death with no risk of harming normal cells (Fig. 4c) (Huang et al. 2015). Long-term treatment using 0.5 μM BMVC-12C-P halted the proliferation of CL1-0 after approximately three population doublings, whereas no appreciable effect was observed on normal MRC-5 fibroblasts (Fig. 4d), indicating that BMVC-12C-P can selectively inhibit the proliferation of the CL1-0 cancer cells without damaging MRC-5 normal cells. The tumor xenograft study further showed that BMVC-12C-P could inhibit the progression of tumor growth in nude mice. Although both BMVC and BMVC-12C-P are potential anticancer agents, the former targets nuclear DNA, and the latter interacts with mitochondria DNA. The imaging study of small molecules is critical in developing anticancer chemicals.

o-BMVC-nC-P The o-BMVC-nC-P molecules, where n ¼ 4, 6, 8, 9, 12, can also increase the melting temperature of G4s. The fluorescence intensity of these o-BMVC-nC-P molecules increases 50–100 times upon binding to telomeric G4s but 10–20 times upon interaction with calf thymus DNA (Huang et al. 2015). Interestingly, colony formation of HeLa cancer cells showed that o-BMVC-12C-P could inhibit tumor growth, while o-BMVC-6C-P has no such effect (Fig. 5a). Flow cytometry enabled

28

Imaging Study of Small Molecules to G-Quadruplexes in Cells

943

Fig. 5 Imaging study and potential use of o-BMVC-nC-P. (a) Colony-forming ability of HeLa cells after o-BMVC-nC-P treatment. Cells (1  103) were seeded into each 6 cm culture plate for 24 h, then added different concentrations of each o-BMVC derivative to the corresponding well for additional 5 d. Colonies were stained by crystal violet and quantified. (b) The mean fluorescence intensity of 1 μM o-BMVC derivatives incubated with HeLa cells for 24 h measured by flow cytometry. (c) To detect the fluorescence intensity of o-BMVC derivatives, confocal images of HeLa cells incubated with 1 μM o-BMVC derivatives for 24 h were taken under the same condition (left). To visualize the intracellular localizations of o-BMVC derivatives, confocal images of HeLa cells incubated with o-BMVC derivatives for 24 h and then co-stained with 20 nM Mitotracker Red for 20 min were taken under optimized conditions (right) for o-BMVC-12C-P and o-BMVC-6C-P. Red: Mitotracker Red; Green: o-BMVC derivatives; Yellow: merged. Scale bar is 25 μm. (d) Confocal images of HeLa cells without (left) and with (right) pre-treated 1 μM CsA for 24 h. After the medium was discarded, the cells were then incubated with 5 μM o-BMVC-12C-P for 4 h. Scale bar is 10 μm. (e) Fluorescence spectra of pre-treated o-BMVC-12C-P and o-BMVC-6C-P in the isolated mitochondria of HeLa cells. (f) Cell viability analyses of A375 wild-type and A375ρ0 mtDNA-deficient cells. Both cells were treated with 5 and 10 μM of o-BMVC-12C-P and o-BMVC-6C-P for 72 h and then measured by MTT assay. A375-ρ0 cells were generated using ethidium bromide (EtBr) method to achieve mtDNA depletion

the quantitative measurement of fluorescence intensity in live HeLa cells, which clearly showed a considerable difference in the fluorescence intensity between o-BMVC-12C-P and o-BMVC-6C-P (Fig. 5b). This finding was further confirmed by the confocal images obtained under the same experimental conditions (Fig. 5c, left). To visualize their intracellular localization, confocal images obtained under optimized conditions revealed the accumulation of these o-BMVC derivatives mainly in the mitochondria of the HeLa cells (Fig. 5c, right). Similar absorption and fluorescence spectra could not describe the ~tenfold difference in the fluorescence intensity of the HeLa cells. This difference in live HeLa cells is likely due to the interaction with mitochondrial DNA (mtDNA). Given that mtDNA is located in the mitochondrial matrix and cyclosporine A (CsA) (Gizatullina et al. 2011) is a permeability transition pore inhibitor, the pretreatment of CsA could block the

944

T.-Y. Tseng and T.-C. Chang

penetration of o-BMVC-12C-P into the matrix, as observed by far weaker fluorescence (Fig. 5d). This finding is consistent with the hypothesis, i.e., mtDNA is the major target for o-BMVC-12C-P. In addition, we isolated the mitochondria from the HeLa cells and found that the fluorescence intensity of the pretreated o-BMVC-12CP is much stronger than the intensity of the pretreated o-BMVC-6C-P. The use of DNase to the isolated mitochondria indicated an appreciable decrease in the fluorescence intensity of o-BMVC-12C-P, supporting that mtDNA is the major target for o-BMVC-12C-P (Fig. 5e). Moreover, the significant decrease in the cytotoxic effect of o-BMVC-12C-P in the mtDNA-deficient cells provided convincing evidence to support our hypothesis that mtDNA is the major target of o-BMVC-12C-P and not o-BMVC-6C-P (Fig. 5f). This finding highlights the importance of fluorescent probes in monitoring ligand–target interaction in live cells.

o-2B-P We synthesized a hybrid BMVC–porphyrin photosensitizer, o-2B-P, with two BMVC molecules covalently linked to a central 5,10-bis-(4-hydroxyphenyl)15,20-bis-(4-methoxyphenyl) porphyrin molecule at the ortho positions for use in photodynamic therapy (PDT) (Kang et al. 2008). Porphyrin derivatives have been widely used as photosensitizers for PDT. However, one of the drawbacks is the lack of specific light wavelengths optimal for tissue penetration and chromophore excitation. BMVC has a large cross-section for two-photon absorption at approximately 820 nm (Chang et al. 2006), a wavelength appropriate for tissue penetration to excite the photosensitizer. In addition, the transparency windows for porphyrin derivatives in the range of 450–500 nm enables the selective excitation of BMVC. Moreover, the fluorescence of BMVC detected in cancer cells was much stronger than that in normal cells. Thus, o-2B-P is a novel photosensitizer with both target and irradiation wavelength selectivities for the PDT of cancer cells. Although BMVC and TMPyP4 are G4 ligands, the function of the o-2B-P conjugate is unlikely to involve G4s. Here, we highlight the significance of the imaging study. A series of real-time images of MCF-7 breast cancer cells and D-551 normal skin cells incubated with 1 μM o-2B-P for 6 h were recorded after 480  20 nm irradiation at 1, 20, 40, 60, 90, 120, and 150 s (Fig. 6). The results indicated that the red fluorescence from o-2B-P was initially detected in the cytoplasm. After photoexcitation, the red fluorescence rapidly decreased in the cytoplasm, while the green–yellow fluorescence rapidly increased in the cancer cell nuclei. Notably, the green–yellow color was detected in the nuclei of the MCF-7 cancer cells after irradiation for BMVC4 ~ o-BMVC > PDS in vitro. Furthermore, the analyzed binary FLIM was used to measure the number of o-BMVC foci in the fixed cells without and with the pretreatment of TMPyP4 and PDS for comparison (Fig. 11c). The results showed an increase in the number of o-BMVC foci in the PDS pretreated cells (Tseng et al. 2018a) but a decrease in the TMPyP4 pretreated cells (Tseng et al. 2018b). Consistent with the gel results obtained in vitro, quantitative analysis of the o-BMVC foci in the TMPyP4, BRACO-19, and BMVC4 pretreated HeLa cells showed an appreciable decrease in the number of o-BMVC foci in the nucleus (Fig. 11d), which verifies that these G4 ligands indeed bind to the G4s and block the subsequent binding of o-BMVC to the G4s. However, the increase in the number of o-BMVC foci induced by PDS could be described by different binding modes of PDS and o-BMVC to the G4 structures. This study is important to establish a platform for identifying the G4 ligands that directly bind to the G4s in fixed cells and to support the existence of endogenous G4s in fixed cells.

28

Imaging Study of Small Molecules to G-Quadruplexes in Cells

955

Imaging Study of G4 Ligands Binding to Exogenous G4s in Live Cells Cellular Response to Exogenous G4s in Live Cells Previously, a number of studies showed that several GROs, such as d[(G2T)4TG (TG2)4] (AS1411), d[G3C]4 (T40214), d[T2AG3]4 (HT24), and d [TG4AG3TG4AG3TG4AAG2] (PU27), could form G4s and inhibit cancer cell growth. Particularly, AS1411 is the first anticancer aptamer to reach phase II clinical trials (Bates et al. 1999). However, it is not clear whether these G4s can retain their G4 structures in live cells after cellular uptake. BMVC was introduced to monitor the cellular uptake of naked G4s and map their intracellular localizations in live cells by using confocal microscopy (Fig. 12a) (Tseng et al. 2013b). The GROs that form parallel G4s, such as AS1411, T40214, and PU22, were detected mainly in the lysosome of the CL1-0 cancer cells, while the GROs that form nonparallel G4s, such as human telomeres and thrombin-binding aptamer, were rarely detected in the lysosome but distributed in the cytoplasm. Moreover, the FRET studies of Cy5-labeled GROs showed that the parallel G4s could be retained in the CL1-0 live cells, while the nonparallel G4s are likely distorted in the live CL1-0 cells after cellular uptake (Fig. 12b) (Tseng et al. 2013b). In addition, these exogenous G4s were rarely detected in the nuclei after the 48-h treatment. Similar results found in two other cancer cell lines, HeLa and MCF-7, suggested that the cellular localization of the GROs studied herein is not cell type-dependent. It is not clear why lysosomes show different responses to parallel and nonparallel G4s. Notably, free BMVC can easily enter the cancer cell nuclei. Surprisingly, no appreciable BMVC fluorescence was detected in the nucleus for the mixtures of 15 μM HT24 and 5 μM BMVC; even the nonparallel G4s of exogenous HT24 were distorted in the live cells.

Fig. 12 Cellular response to exogenous G4s in live cells by confocal images of BMVC. (a) 5 μM BMVC and its complexes with 15 μM of HT23 and PU22 incubated with CL1-0 cancer cells for 2 h (left) and then stained by LysoTracker red (middle) together with their merges (right), (b) FRET of 5 μM BMVC and its complexes with 15 μM of Cy5-HT23 and Cy5-PU22 incubated with CL1-0 cancer cells for 2 h

956

T.-Y. Tseng and T.-C. Chang

Nevertheless, this finding is important to visualize the cellular uptake of the exogenous G4s and provide new information on the biological function of lysosomes in live cells. In addition, a preliminary study of exogenous RNA G4s in the live CL1-0 cells showed similar results – the parallel G4s of the exogenous HT23 RNA are clearly detected in the lysosome. This differs from the nonparallel G4 structures of the HT23 DNA, which is rarely detected in the lysosome but mainly in the mitochondria of live cells. Such findings support that the G-rich RNA sequences can form the G4 structures in cells. Considering the importance of telomeric RNA expression in mammalian cells, the imaging study of RNA G4s in live cells deserves more attention for a better understanding of telomere biology.

G4 Dynamics of Exogenous G-Rich Oligonucleotides in Live Cells In addition to the imaging study of BMVC binding to a number of exogenous G4s in live cells, NMR spectroscopy was used to study the ligand binding of microinjected G4s inside living oocytes (Salgado et al. 2015). The next challenge is the study of G4 structures and dynamics in live cells, although the details of G4 structures and dynamics have been extensively studied in vitro. Considering the large amounts of duplex DNA overwhelming the small amounts of G4 DNA, we studied the G4 dynamics from exogenous GROs in live cells. The FLIM images of the exogenous G4s confirmed that the parallel G4 structures characterized by the o-BMVC decay times ( 2.4 ns) were detected in the lysosomes of live cells; however, the nonparallel G4 structures were hardly detected in the lysosomes of the CL1-0 live cells (Fig. 13a), which provides a distinct tool to distinguish the parallel from nonparallel G4 structures in live cells. Similar results were also observed for the incubation of their ssGROs. For example, the exogenous ssCMA could form parallel G4 structures in vitro and were easily detected in the lysosomes of the CL1-0 live cells (Fig. 13b). Further measurement of the photon counts from the o-BMVC signals (decay time  2.4 ns) in the FLIM images was useful to study the G4 dynamics of ssCMA in live cells. The o-BMVC signals were detected in the lysosomes within the instrumental response time, suggesting that the G4 formation from exogenous ssCMA occurs within 20 min. Notably, the time constant of G4 formation from ssCMA is ~80 s after the addition of a 100 mM K+ in tris-buffer. In addition, the measurement of the photon counts from the o-BMVC signals (decay time  2.4 ns) provided, for the first time, a method to estimate the unfolding rate of CMA G4s by the addition of anti-CMA to the live cells (Fig. 13c and d). Hence, the imaging study of the o-BMVC signals holds great promise for studying the G4 dynamics in live cells. In the study of G4 formation from ssHT23, the analyzed binary image enabled the measurement of a number of o-BMVC foci (Fig. 13e) in the live cells. Quantitative measurements of the o-BMVC foci indicated that the number of o-BMVC foci detected for the incubation of the HT23 G4s is higher than that detected for the incubation of ssHT23 and for the absence of exogenous GRO in the live cells

28

Imaging Study of Small Molecules to G-Quadruplexes in Cells

957

Fig. 13 Imaging study of G4 dynamics of exogenous GROs in live cells. (a) FLIM images of 5 μM o-BMVC and 15 μM CMA G4s (left) and HT23 G4s (right) after the incubation with live CL1-0 cells for 2 h. (b) FLIM images of 5 μM o-BMVC and 15 μM ssCMA (left) after the incubation with live CL1-0 cells for 2 h together with its histogram of the decay times of the o-BMVC fluorescence (middle), where n is the number of cells, as well as its time-gated FLIM image separated by 2.4 ns (right) with white (decay time  2.4 ns) and red (decay time < 2.4 ns). (c) The histograms of the average photon count vs. the decay times of the o-BMVC signals in each cell after the addition of anti-CMA at different times. (d) The plots of the average photon count of o-BMVC signals per cell and the CD signals of CMA at 265 nm in solution after the addition of anti-CMA at different times, respectively. (e) The analyzed binary images of free o-BMVC and its complexes with ssHT23 were separated into two colors: red (decay time  2.4 ns) and green (decay time < 2.4 ns). The red spots were determined to be the o-BMVC foci. (f) Quantitative analysis of the number of o-BMVC foci in the absence of GRO and the presence of ssHT23 and HT23 G4s in their analyzed binary images. (g) The average number of o-BMVC foci increases after incubation of ssHT23 and even more after incubation of HT23 G4s

(Fig. 13f). Figure 13g summarizes the average number of o-BMVC foci detected in the analyzed binary images of the exogenous HT23 in the live cells. The analyzed binary images showed a small increase in the number of o-BMVC foci in the cytoplasm after the 2 h incubation of HT23 with the live CL1-0 cells. Such an increase was likely due to the parallel G4 formation of HT23. Previously, using polyethylene glycol as a cosolvent could convert nonparallel G4s to parallel G4s of HT23 in the K+ solution (Heddi and Phan 2011). In addition, the use of o-BMVC8C3O-P could also convert telomeric G4s from nonparallel to parallel topologies (Wang and Chang 2012). However, only a small portion of the exogenous ssHT23 could form the G4 structures compared with the abundance of parallel G4 formation from the exogenous ssCMA in the live cells. At present, we are not able to eliminate the possible formation of the nonparallel G4 structures in the cytoplasm of the CL1-0 live cells; the results suggest that the nonparallel G4 structures are unlikely to be formed in the cytoplasm of live cells.

958

T.-Y. Tseng and T.-C. Chang

Lysosomes play an essential role in the endocytosis of exogenous macromolecules. Lysosomes have long been considered as the digestion machines for cellular clearance to degrade waste disposal into tiny parts for either recycling or exporting to the cytoplasm (Ballabio and Bonifacino 2020; Lawrence and Zoncu 2019). Defects in the degradation result in lysosomal dysfunction and lysosomal storage disorders (LSD) that could cause several neurodegenerative and metabolic diseases as well as cancer. Currently, it is not clear why the parallel G4 structures are retained and the nonparallel G4 structures are not detected in the lysosomes of cancer cells. Very recently, Ballabio and Bonifacino pointed out that the lysosomal membrane comprises hundreds of integral and peripheral proteins, many of which are of unknown functions (Ballabio and Bonifacino 2020). Nevertheless, the accumulation of parallel G4s in the lysosomes deserves further study to examine the lysosomal functions that may uncover the correlation between LSD and cancer. In addition, recent studies of mitochondria–lysosome interactions showed that they mutually regulate intracellular metabolism (Audano et al. 2018). Defective mitochondria–lysosome interactions may lead to neurodegenerative diseases. Chen et al. introduced a fluorescent probe to study the localization and dynamic tracking of the mitochondria–lysosome interactions in the live cells (Chen et al. 2020). It is interesting to verify whether the detection of the parallel G4 structures in the lysosomes and the nonparallel G4 structures in mitochondria are associated with the mitochondria–lysosome interactions in the live cells. In addition, several studies reported that the G4s also exist in a viral genome and can regulate the viral biological processes (Metifiot et al. 2014; Zhang et al. 2020). Particularly, several PQFs are found in SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2). It is known that SARS-CoV-2 is the cause of the coronavirus disease 2019 (COVID-19) pandemic (Zhang et al. 2020). It is interesting to examine the role of these putative G4s in the COVID-19 pandemic. One of the important roles of lysosomes is to degrade intracellular and exogenous macromolecules, such as viral or bacterial DNA, into building blocks for further utilization. Thus, studying the G4s in the lysosomes may be useful for developing new antiviral drugs against COVID-19.

o-BMVC Foci as a Biosensor for Clinical Cancer Diagnosis Cancer is a universal disease and remains a major cause of death. A recent report on cancer statistics showed nearly 20 million new cancer cases and almost 10 million deaths from cancer worldwide in 2020 (Ferlay et al. 2021). Notably, the cancer burden can be reduced through early detection because many cancers have a high chance of being cured if diagnosed early and treated appropriately. Despite substantial progress in understanding the fundamental mechanisms of carcinogenesis and developing new therapeutic methods for cancer treatment, it is a great challenge to determine a common target that can be identified in the cancer diagnosis and especially at an early stage with high accuracy at low cost. Although the causes of cancers can be very diverse, the loss of genomic integrity is a common hallmark of

28

Imaging Study of Small Molecules to G-Quadruplexes in Cells

959

cancer (De and Michor 2011; Hansel-Hertsch et al. 2016). Recently, a number of imaging studies showed that more numbers of G4 foci are detected in the cancer cells than in normal cells (Biffi et al. 2014; Liu et al. 2020; Tseng et al. 2018a), implying that the number of G4 foci may be important as a common signal for the diagnosis of cancer cells.

DNA Damage May Facilitate G4 Formation At present, we do not know exactly why there are more G4 foci in the cancer cells than in normal cells. Particularly, the telomere length of the normal cells is generally longer than that of cancer cells. However, the analyzed binary FLIM images showed more o-BMVC foci at the end of the metaphase chromosomes of the cancer cells than those of normal cells, indicating that the number of G4 foci is not directly proportional to the length of the G-rich sequences. Since DNA damage can lead to genomic instability, open the chromatin, and increase the risk of exposing unprotected G4 sites, DNA damage may play a critical role in the predominance of the G4 foci in cancer cells due to either higher probability for G4 formation or easier for probe binding. Previously, we found the G4 formation from exogenous ssGROs delivered by lipofectamine into the nucleus (Huang et al. 2015), indicating the prevalence of G4 formation of the unprotected GROs in the live cells. In addition, using several chemicals that can cause DNA damage showed an increase in the G4 signals in the cells (Biffi et al. 2014; Ferlay et al. 2021; Henderson et al. 2014; Tseng et al. 2018b). Furthermore, pretreatment with UV irradiation showed a marked increase in the number of o-BMVC foci as a function of irradiation, supporting that DNA damage could facilitate the G4 formation (Tseng et al. 2018a). It should be noted that the o-BMVC foci induced by UV irradiation showed a gradual decrease after terminating the UV light. Such a decrease in the number of o-BMVC foci is likely a function of DNA repair.

o-BMVC Test for Clinical Cancer Diagnosis To test the o-BMVC foci in the tissue biopsy for clinical application, we examined head and neck cancer (HNC) samples obtained during surgery from the National Taiwan University Hospital and normal oral samples collected from healthy volunteers (Tseng et al. 2018a). Quantitative measurements of the number of o-BMVC foci in the analyzed binary images for each sample were conducted. Consistent with the finding in the study of cell lines, the o-BMVC foci were hardly detectable in the normal oral epithelial cells. The average number of o-BMVC foci was measured to be 28.3 from 50 head-and-neck patients and 2.2 from 20 healthy volunteers. According to the receiver operating characteristic (ROC) curve analysis, the area under the ROC curve was 0.992, indicating that this method provides very high accuracy for detecting HNC cells from patients. The digital number of o-BMVC foci measured in the nuclei of samples is critical and useful to the o-BMVC test.

960

T.-Y. Tseng and T.-C. Chang

To further validate the o-BMVC test for clinical cancer diagnosis, 327 FNA samples of thyroid nodules obtained from the National Taiwan University Hospital were analyzed (Tseng et al. 2021). Cytologic examination of FNA is a wellestablished method that provides cost-effective means for diagnosing accessible lesions with almost no side effects for the patients. Generally, FNA cytology has a high overall diagnostic accuracy with minimal complications and can be used as the gold standard for diagnostic comparison. To compare the o-BMVC test with the FNA cytology, the major portions of aspirates were first used for cytological examination, and the residual aspirates were then used for the o-BMVC test. Figure 14a shows a typical benign thyroid nodule, and Fig. 14b shows a typical malignant thyroid nodule. We demonstrated that the o-BMVC test could simplify the diagnostic categories of the FNA cytology from six (nondiagnostic, benign, atypia, follicular neoplasm, suspicious, and malignant) to three categories (nondiagnostic, benign, and malignant) of the thyroid nodules (Fig. 14c). The 95.6% consistency rate of the o-BMVC test to the FNA cytology for both the determined cases suggested that the o-BMVC test is a valuable method as FNA cytology for diagnosing thyroid nodules. However, these two methods have several different characteristics. FNA cytology is a subjective method based on cellular morphology, while the o-BMVC test is an objective method based on the digital number of o-BMVC foci. Particularly, the o-BMVC test can discriminate 75.0% of cytologically nondiagnostic cases (Fig. 14d) and 92.3% of cytologically indeterminate cases into benign or malignant cases (Fig. 14e). The o-BMVC test could markedly decrease the number of nondiagnostic cytology, discriminate indeterminate cytology into benign and malignant cases, and reduce the risk of missing cancers, which could appreciably benefit clinical decision-making and reduce unnecessary costs and surgeries. This

Fig. 14 The use of o-BMVC test for clinic cancer diagnosis of thyroid nodules. Binary FLIM images of o-BMVC testing for (a) benign and (b) malignant samples. (c) o-BMVC test results for thyroid nodules correlated with the cytological results. o-BMVC testing results fall into three categories: nondiagnostic (ND) for unsatisfactory samples, negative (N) for benign samples, and positive (P) for malignant samples, and there are no cytologically indeterminate (ID) cases, including atypia (A), follicular neoplasm (FN), and suspicious (S) categories, in the diagnosis of thyroid nodules. The two red rectangles highlight determined cases reported by both methods with 95.6% (174/182) consistency. (d) o-BMVC testing discriminates 75.0% of cytologically nondiagnostic cases into benign or malignant cases. (e) o-BMVC testing discriminates 92.3% of cytologically indeterminate cases into benign or malignant cases

28

Imaging Study of Small Molecules to G-Quadruplexes in Cells

961

pioneering study strongly suggests that the o-BMVC test holds great promise as an adjuvant method for diagnosing thyroid nodules and possibly other cancers. Given that the average number of o-BMVC foci is 7AThp (0.25 and 0.19) Dup10 (0.19 and 0.21). This order is different to the order of ion release (ΔnNa+) per phosphate for the unfolding of the free DNA molecule: ST-DNA (0.12) ~ Dup27 (0.15) ~ Dup10 (0.14) > 7AThp-GCAA (0.07) ~ 7AThp (0.05), which reflect the overall charge density parameters of the DNA molecules. The overall results imply that the

30

Interaction of Poly(Ethylene Glycol)-b-Poly-L-Lysine Copolymers with. . .

1009

Table 3 Interaction of polycations with DNA as a function of salt concentration [Na+] (mM)

T M ( C)

TM PEG- PLL10 ( C)

3.2 16 31 59 116

59.6 70.0 76.2 81.1 86.3

95.4 95.3 95.4 95.3 96.2

3.2 16 31 59 116

NA 43.8 47.8 52.2 57.8

79.5 79.2 79.5 79.2 79.2

3.2 16 31 59 116

19.3 29.3 34.1 38.4 42.0

60.8 64.7 64.5 64.6 62.7

3.2 16 31 59 116

24.7 26.7 31.6 35.2 39.1

51.1 51.6 51.2 51.1 51.0

3.2 16 31 59 116

32.8 38.0 43.3 46.9 48.5

59.5 57.3 57.2 57.2 58.2

TM PEG-PLL100 ( C) ST-DNA 98.8 98.9 98.6 98.8 98.6 Duplex27 81.6 81.6 81.4 81.4 81.1 Dup10 71.0 70.9 70.8 70.6 65.7 7AThp 60.8 60.0 59.9 60.0 60.0 7AThp-GCAA 60.8 61.2 60.660.9 57.0

KPol PEG-PLL10 Z+/ ¼ 2

PEG-PLL100 Z+/ ¼ 1

1.1  104 6.9  103 4.8  103 3.3  103 2.2  103

2.1  104 104 1.4  104 9.7  103 7.2  103 4.7  103

NA 1.5  104 1.2  104 1.0  104 7.3  103

NA 3.7  104 3.1  104 2.5  104 1.9  104

1.8  104 1.6  104 1.3  104 1.1  104 8.8  103

4.4  104 3.6  104 3.1  104 2.7  104 2.0  104

7.2  103 6.3x 103 5.0  103 3.9  103 2.8  103

1.9  104 1.7x 104 1.4  104 1.2  104 9.7  103

9.0  103 5.0  103 4.2  103 3.0  103 2.8  103

1.8  104 1.2  104 9.9  103 7.8  103 4.6  103

All parameters measured in 10 mM sodium phosphate buffer at pH 7 at the indicated NaCl concentration. Experimental errors are shown in parentheses: TM (0.5  C) and KPol (40%)

difference in the release of counterions upon polycation binding to DNA may be due to the charge density parameters of the reactants and of the complex. Furthermore, these values of counterion release may be considered as the effective number of binding sites of the DNA molecules and can be used as the “n” values for the determination of KPol using the first method outlined above. However, this procedure yielded higher KPol values by a factor of two. Thermodynamic Profiles for the Interaction of Polycations with DNA. Standard thermodynamic binding profiles at 25  C in low salt concentration are listed in

1010

10

ln KPol

Fig. 4 Dependence of binding affinity on salt concentration for PEG-PLL10 (top) and PEG-PLL100 (bottom) polycations with DNA molecules, as indicated by the colors

H.-T. Lee et al.

PEG-PLL10

9 Dup10 Dup27 ST-DNA 7AThp 7AThp-GCAA

8 7 11

PEG-PLL100

ln KPol

10 9 8 -6

-5

-4

+

-3

-2

ln [Na ]

Table 4 Standard thermodynamic profiles for the interaction of PEG-PLL with DNA

DNA Dup10

Polymer PEG-PLL10 PEG-PLL100 Dup27 PEG-PLL10 PEG-PLL100 ST-DNA PEG-PLL10 PEG-PLL100 7AThp PEG-PLL10 PEG-PLL100 7AThp-GCAA PEG-PLL10 PEG-PLL100

KPol  104 1.6 3.6 1.5 3.7 2.1 5.5 0.6 1.7 0.5 1.2

ΔGb (kcal/mol) 5.9 8.1 6.7 7.4 5.9 6.4 6.2 7.2 5.7 6.2

ΔHb (kcal/mol) 0.1 0.4 0 0.1 0.2 0.3 0.2 0.2 0.2 0.3

TΔSb (kcal/mol) 5.8 7.7 6.7 7.3 5.7 6.7 6.0 7.0 5.5 5.9

@ln KPol/@ln [Na+] (mol Na+/mol) 0.19 0.21 0.35 0.33 0.78 0.75 0.25 0.19 0.35 0.36

All parameters measured in a 10 mM sodium phosphate buffer at pH7.0. Experimental errors are shown in parentheses: ΔGb (30%), ΔHb (42%), TΔSb (51%), and @lnKb/@ln[Na+] (0.1)

Table 4. The KPol values of column 3 were used to obtain binding free energies, ΔGb , shown in column 4 of Table 4. Dissection of these free energy terms into enthalpy and entropy contributions was done by measuring the binding enthalpies, ΔHb, using isothermal titration calorimetry. The bottom plots of Fig. 5 show the calorimetric titrations of ST-DNA with each polycation, their interaction with DNA is accompanied with small endothermic heats. After correcting each injection for the

30

Interaction of Poly(Ethylene Glycol)-b-Poly-L-Lysine Copolymers with. . .

a 1.6

Dup10

+ PEG-PLL

b

Dup27

10

1011

+ PEG-PLL10

0.0 + PEG-PLL100 -0.8

Power, mcal/sec

Power, mcal/sec

0.6 0.8

0.3 0.0 -0.3

-1.6

20

40

60

+ PEG-PLL100 25

80

50

c

Power, mcal/sec

3.2

75

100

Time, min

Time, min ST - DNA

+ PEG-PLL

10

2.4 1.6 + PEG-PLL 0.8

100

0.0 20

30

40

50

Time, min

Fig. 5 Typical ITC titrations of Dup10 (a), Dup27 (b), and ST-DNA (c) with each polycation, PEG-PLL10 (top curves) and PEG-PLL100 (bottom curves)

titrant dilution heat, the average heat of injections obtained from the first four peaks of each titration yielded endothermic heats of 6.6 μcal (PEG-PLL100) and 11.4 μcal (PEG-PLL10) (the dilution heats are also endothermic but smaller), which after normalization by the concentration of the limiting reagent yielded ΔHb values of 0.31  0.02 kcal/mol (PEG-PLL100) and 0.21  0.01 kcal/mol (PEG-PLL10), respectively, for the formation of each complex. These enthalpy values are in good agreement with the DSC unfolding data that similar enthalpies are obtained for the unfolding of the free and bound DNA molecules. The results are consistent with binding events that take place at the surface of macromolecules, in which the endothermic energy of potential dehydration effects override the specific interactions between ligand and macromolecules. Most likely, the potential interactions in this type of systems investigated are electrostatic interactions, which are also accompanied by negligible heat effects. The ITC titrations for the interaction of each copolymer with the oligomer duplexes are shown in the top plots of Fig. 5. These interactions are accompanied

1012

H.-T. Lee et al.

also by small exothermic heats, after correcting for dilution heats, the average heat of four injections yielded exothermic heats of 1.7 μcal (PEG-PLL100-Dup27), 3.83 μcal (PEG-PLL10-Dup27), 8.0 μcal (PEG-PLL100-Dup10), and 7.4 μcal (PEG-PLL10-Dup10), which after normalization by the concentration of the limiting reagent yielded ΔHbs ranging from 15  8 cal/mol (PEG-PLL10-Dup27) to 367  20 cal/mol (PEG-PLL100-Dup10) for the formation of each complex. These enthalpy values are negligible and are consistent with complexation by electrostatic interactions. The ITC titrations for the interaction of each polycation with the DNA hairpins are not shown. However, the results are as follows: the 7AThp curves with PEG-PLL10 showed small endothermic heats, while the curves with PEG-PLL100 showed smaller exothermic heats. The ITC curves of 7AThp-GCAA with each polycation showed initial peaks that are exothermic and endothermic, which were integrated over the same time period of the other titrations. The net heats for these interactions are small exotherms, after correcting each injection for dilution heats, the average heat of injections obtained from the first four peaks of each titration yielded exothermic heats of 4.4 μcal (PEG-PLL100 + 7AThp), 2.7 μcal (PEG-PLL10 þ 7AThp), 8.0 μcal (PEG-PLL100 þ 7AThp-GCAA), and 16.5 μcal (PEG-PLL10 þ 7AThp-GCAA), which after normalization by the concentration of the limiting reagent yielded ΔHb of 151  34 cal/mol (PEG-PLL100 þ 7AThp), 181  67 cal/mol (PEG-PLL10 þ 7AThp), 294  37 cal/mol (PEG-PLL100 þ 7AThp-GCAA), and 178  11 cal/mol (PEG-PLL10 þ 7AThp-GCAA), for the formation of each complex. The overall enthalpy values are negligible, indicating that the interaction of each polycation with short DNA hairpins is also through electrostatic interactions. Inspection of the data of Table 4 indicates that the favorable interaction (negative ΔGb ) of a polycation with each DNA molecule results from a favorable entropy contribution, i.e., complex formation is entropy driven. The entropy contributions include the unfavorable entropy of complex formation (ordering imposed by the association of two molecules), release of counterions, and the exclusion of water molecules due to the physical placement of a ligand on the surface of DNA and vice versa (Marky et al. 1983a, b; Marky and Kupke 1989; Bronich et al. 2001). Furthermore, standard thermodynamic profiles for the interaction of PEG-PLL10 (top) and PEG-PLL100 (bottom) with DNA molecules as a function of the DNA length are shown in Fig. 6. Very similar thermodynamic profiles are obtained with PEG-PLL10; the absolute magnitudes of the ΔGb terms are like the TΔSb terms, i.e., the favorable free energy terms are due to favorable entropy terms. However, the effect of PEG-PLL100 is to decrease slightly the magnitude of the thermodynamic profiles, with similar absolute magnitudes of the ΔGb and TΔSb terms, as the length of DNA increases. Standard thermodynamic profiles for the interaction of polycations (PEG-PLL10, PEG-PLL50, PEG-PLL100, and PLL100) with ST-DNA as a function of the length of the PLL block in 10 mM sodium phosphate buffer, with total Na+ concentrations of 16 mM and 116 mM, are shown in Table 5, and plotted in Fig. 7 (just the low salt).

Interaction of Poly(Ethylene Glycol)-b-Poly-L-Lysine Copolymers with. . .

1013

6

6

3

3

0

0

-3

-3

TΔS, kcal/mol

Fig. 6 Building blocks of the thermodynamic profiles for the interaction of PEG-PLL10 (top) and PEG-PLL100 (bottom) with DNA molecules as a function of the length of DNA: ΔG (green), ΔH (red), and TΔS (blue)

ΔG, kcal/mol

30

-6

-6

6

6

3

3

0

0

-3

-3

-6

TΔS, kcal/mol

ΔG, kcal/mol

PEG-PLL10

-6

A N -D ST

27 up D

D

up

10

PEG-PLL100

Table 5 Interaction of polycations with ST-DNA: effect of the PLL block size [Na+] mM 16

116

Polymer PEG-PLL10 PEG-PLL50 PEG-PLL100 PLL100 PEG-PLL10 PEG-PLL50 PEG-PLL100 PLL100

Z+/ 2 1 1 1 2 1 1 1

PLL Weight ratio % 24.2 61.5 76.2 100 24.2 61.5 76.2 100

KPol  104 1.7 3.8 4.4 4.1 0.4 1.0 1.2 1.1

ΔG b

ΔHb

TΔSb

(kcal/mol) 5.8 6.2 6.3 6.3 4.9 5.4 5.6 5.5

(kcal/mol) 0.2 0.1 0.3 0.3 0.2 0.1 0.3 0.3

(kcal/mol) 5.6 6.1 6.6 6.6 4.7 5.3 5.9 5.8

All parameters measured in a 10 mM sodium phosphate buffer at pH7.0 and at a total sodium concentration of 16 mM and 116 mm. Experimental errors are shown in parentheses: ΔGb (30%), ΔHb (42%), and TΔSb (51%)

1014

H.-T. Lee et al.

G, kcal/mol

8

8

6

6

4

4

2

2

0

0

-2

-2

-4

-4

-6

T S, kcal/mol

ST-DNA

Fig. 7 Building blocks of the thermodynamic profiles for the interaction of polycations with ST-DNA as a function of the PLL block size of the polycation: ΔG (green), ΔH (red), and TΔS (blue)

-6 20

40

60

80

100

Weight Ratio of PLL, %

The thermodynamic profiles in each salt concentration are very similar, and the absolute magnitudes of the ΔGb terms are like the TΔSb terms; this means the favorable free energy terms follows the favorable entropy terms. However, the effect of increasing salt concentration is to decrease the magnitude of the thermodynamic profiles, by an average of 0.8 kcal/mol. This suggests that the PEG block of the polycations is not participating in the interaction with DNA molecules, which is consistent with the electrostatic nature of the interaction of these polycations.

Conclusion We have investigated the interaction of five DNA molecules with the cationic block copolymer poly(ethylene glycol)-b-poly-L-lysine (PEG-b-PLL). Two PEG-b-PLL copolymers were investigated thoroughly in this report, each composed of a similar PEG segment with a 10- or 100-residue L-lysine chain. We obtained Kbs of ~1  104 for the interaction of these polycations with each DNA molecule, which is somewhat reduced with the increase in salt concentration, i.e., the binding of the polycations is accompanied by a net release of counterions. Polycation binding to DNA increases the cooperative unfolding of the DNA, and a higher number of base pairs are melting. Standard thermodynamic binding profiles show that the favorable interaction of these polycations with DNA results primarily from a favorable entropy contribution, i.e., polycation binding is entropy driven. Favorable entropy contributions include removal of counterions and water molecules from both participating

30

Interaction of Poly(Ethylene Glycol)-b-Poly-L-Lysine Copolymers with. . .

1015

species. The overall results indicate the polycation binds to the surface of DNA with the positively charged lysine groups forming ion pairs with the negatively charged phosphate groups of DNA, and the PEG block is not participating, which indicates polycation-DNA complexes are produced spontaneously due to the formation of salt contacts. The approach of using polycations for the cellular delivery of oligonucleotides has great potential because polycations interact with DNA oligonucleotides with a reasonable Kb of 104 (ΔGb ¼ 6.0 kcal/mol) allowing the removal of DNA with the increase of the ionic environment of the cell. Acknowledgments This work was supported by grant MCB-1912587 (to L.A.M.) from the US National Science Foundation (NSF).

References Bresloff JL, Crothers DM (1975) DNA-ethidium reaction kinetics: demonstration of direct ligand transfer between DNA binding sites. J Mol Biol 95:103–123 Bronich T, Kabanov AV, Marky LA (2001) A thermodynamic characterization of the interaction of a cationic copolymer with DNA. J Phys Chem B 105(25):6042–6050 Bronich TK, Kankia BI, Kabanov AV, Marky LA (2000) A thermodynamic investigation of the interaction of polycations with DNA. Am Chem Soc, Polym Prepr, Div Polym Chem 41(2): 1611–1612 Cantor CR, Warshaw MM, Shapiro H (1970) Oligonucleotide interactions. 3. Circular dichroism studies of the conformation of deoxyoligonucleotides. Biopolymers 9(9):1059–1077 Crothers DM (1971) Statistical thermodynamics of nucleic acid melting transitions with coupled binding equilibriums. Biopolymers 10(11):2147–2160 Deshpande MC, Garnett MC, Vamvakaki M, Bailey L, Armes SP, Stolnik S (2002) Influence of polymer architecture on the structure of complexes formed by PEG-tertiary amine methacrylate copolymers and phosphorothioate oligonucleotide. J Control Release 81(1–2):185–199 Egholm M, Buchardt O, Christensen L, Behrens C, Freier SM, Driver DA, Berg RH, Kim SK, Norden B, Nielsen PE (1993) PNA hybridizes to complementary oligonucleotides obeying the Watson–Crick hydrogen-bonding rules. Nature 365(6446):566–568 Katayose S, Kataoka K (1997) Water-soluble polyion complex associates of DNA and poly (ethylene glycol)poly(l-lysine) block copolymer. Bioconjug Chem 8(5):702–707 Kibler-Herzog L, Zon G, Uznanski B, Whittier G, Wilson WD (1991) Duplex stabilities of phosphorothioate, methylphosphonate, and RNA analogs of two DNA 14-mers. Nucleic Acids Res 19(11):2979–2986 Kidane A, Lantz GC, Jo S, Park K (1999) Surface modification with PEO-containing triblock copolymer for improved biocompatibility: in vitro and ex vivo studies. J Biomater Sci Polym Ed 10(10):1089–1105 Lochmann D, Jauk E, Zimmer A (2004) Drug delivery of oligonucleotides by peptides. Eur J Pharm Biopharm 58(2):237–251 Lungwitz U, Breunig M, Blunk T, Göpferich A (2005) Polyethylenimine-based non-viral gene delivery systems. Eur J Pharm Biopharm 60(2):247–266 Lutz GJ, Sirsi SR, Williams JH (2008) PEG-PEI copolymers for oligonucleotide delivery to cells and tissues. Methods Mol Biol 433:141–158 Mahato RI, Cheng K, Guntaka RV (2005) Modulation of gene expression by antisense and antigene oligodeoxynucleotides and small interfering RNA. Expert Opin Drug Deliv 2(1):3–28 Manning GS (1978) The molecular theory of polyelectrolyte solutions with applications to the electrostatic properties of polynucleotides. Rev Biophys 11(2):179–246

1016

H.-T. Lee et al.

Marky LA, Alessi K, Rentzeperis D (1996) Calorimetric studies of drug-DNA interactions. In: Hurley LH, Chaires JB (eds) Advances in DNA sequence-specific agents, vol 2. Elsevier, Amsterdam, pp 3–28 Marky LA, Breslauer KJ (1987) Calculating thermodynamic data for transitions of any molecularity from equilibrium melting curves. Biopolymers 26:1601–1620 Marky LA, Kupke DW (1989) Probing the hydration of the minor groove of A•T synthetic DNA polymers by volume and heat changes. Biochemistry 28(26):9982–9988 Marky LA, Snyder JG, Breslauer KJ (1983a) Calorimetric and spectroscopic investigation of drug– DNA interactions: II. Dipyrandium binding to poly d(AT). Nucleic Acids Res 11(16): 5701–5715 Marky LA, Snyder JG, Remeta DP, Breslauer KJ (1983b) Thermodynamics of drug-DNA interactions. J Biomol Struct Dyn 1(2):487–507 Morgan AR, Lee JS, Pulleyblank DE, Murray NL, Evans DH (1979) Review: ethidium fluorescence assays. Part 1. Physicochemical studies. Nucleic Acids Res 7(3):547–569 Neu M, Fischer D, Kissel T (2005) Recent advances in rational gene transfer vector design based on poly(ethylene imine) and its derivatives. J Gene Med 7(8):992–1009 Nishiyama N, Kataoka K (2006) Current state, achievements, and future prospects of polymeric micelles as nanocarriers for drug and gene delivery. Pharmacol Ther 112(3):630–648 Patel DJ, Kozlowski SA, Marky LA, Rice JA, Broka C, Itakura K, Breslauer KJ (1982) Structure and energetics of a hexanucleotide duplex with stacked trinucleotide ends formed by the sequence d(GAATTCGCG). Biochemistry 21(3):451–455 Petersen H, Fechner PM, Martin AL, Kunath K, Stolnik S, Roberts CJ, Fischer D, Davies MC, Kissel T (2002) Polyethylenimine-graft-poly(ethylene glycol) copolymers: influence of copolymer block structure on DNA complexation and biological activities as gene delivery system. Bioconjug Chem 13(4):845–854 Record Jr MT, Anderson CF, Lohman TM (1978) Thermodynamic analysis of ion effects on the binding and conformational equilibria of proteins and nucleic acids: the roles of ion association or release, screening, and ion effects on water activity. Q Rev Biophys 11(2):103–178 Rentzeperis D, Alessi K, Marky LA (1993) Thermodynamics of DNA hairpins: contribution of loop size to hairpin stability and ethidium binding. Nucleic Acids Res 21(11):2683–2689 Rentzeperis D, Marky LA, Dwyer TJ, Geierstanger BH, Pelton JG, Wemmer DE (1995) Interaction of minor groove ligands to an AAATT/AATTT site: correlation of thermodynamic characterization and solution structure. Biochemistry 34(9):2937–2945 Tanious FA, Laine W, Peixoto P, Bailly C, Goodwin KD, Lewis MA, Long EC, Georgiadis MM, Tidwell RR, Wilson WD (2007) Unusually strong binding to the DNA minor groove by a highly twisted benzimidazole diphenylether: induced fit and bound water. Biochemistry 46(23): 6944–6956 Thaler DS, Liu S, Tombline G (1996) Extending the chemistry that supports genetic information transfer in vivo: phosphorothioate DNA, phosphorothioate RNA, 2'-O-methyl RNA, and methylphosphonate DNA. Proc Natl Acad Sci U S A 93(3):1352–1356 Vinogradov SVB, Tatiana K, Kabanov AV (1998) Self-assembly of polyamine-poly(ethylene glycol) copolymers with phosphorothioate oligonucleotides. Bioconjug Chem 9(6):805–812 Wiseman T, Willinston S, Brandts JF, Lin L-N (1989) Rapid measurement of binding constants and heats of binding using a new titration calorimeter. Anal Biochem 179:131–137 Zhang L, Peritz A, Meggers E (2005) A simple glycol nucleic acid. J Am Chem Soc 127(12): 4174–4175

Chemical Tools to Target Noncoding RNAs

31

Maurinne Bonnet and Maria Duca

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RNA As a Therapeutic Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Targeting Bacterial RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Targeting Viral RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Targeting Eukaryotic RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RNA Nucleotides Repeats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MicroRNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Targeting of Long Noncoding RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Current Trends for the Development of Innovative Chemical Tools for RNA Targeting . . . . . RIBOTAC Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Targeting Pre-mRNA Splicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1018 1020 1021 1026 1032 1032 1037 1040 1041 1042 1043 1045 1046

Abstract

Targeting RNA using synthetic small molecules is a major challenge of current medicinal chemistry. RNA targets are indeed very promising for future drug developments since they represent essential tools in a number of biological processes, and a number of new small-molecule binders have been reported in the literature with some excellent examples already approved by the FDA. Despite the great advances made in the development of new methodologies for the identification of specific RNA ligands, some difficulties still remain to be faced since the rational design of specific ligands still remain hard to attain. In this chapter, the milestones of the field of targeting RNA using small molecules will be described together with the methodologies employed for their discovery. The different kinds of RNA targets that have been exploited so far will also be described with examples of the scaffolds that led to the best results in terms of M. Bonnet · M. Duca (*) Université Côte d’Azur, CNRS, Institute of Chemistry of Nice (ICN), Nice, France e-mail: [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_36

1017

1018

M. Bonnet and M. Duca

biological activity. The current trends of the field, including the recently approved drugs against spinal muscular atrophy or the original RIBOTAC approach, will be detailed to open the perspective toward the discovery of new drugs bearing an original and extremely promising mechanism of action. Keywords

RNA · RNA binder · Ligand · Interaction · Small molecule

Introduction Biologically relevant RNAs are among the most promising targets for current medicinal chemistry and for the discovery of new drugs against unmet medical needs (Warner et al. 2018). While the targeting of nucleic acids for therapy is a wellknown approach and various drugs are already on the market against DNA, ribonucleic acid (RNA) is still an underexploited target when considering the great variety and therapeutic relevance of most RNAs, especially noncoding RNAs (Falese et al. 2021). One of the approaches to target RNAs is the antisense strategy that is based on the use of oligonucleotides to specifically recognize an RNA strand by base-complementarity and block its function (Crooke et al. 2021). This approach has been largely explored, with some successful drugs on the market such as Fomivirsen, approved in 1998 against cytomegalovirus (CMV) infections, or the more recent Volanesorsen, approved in 2019 against type I hyperlipoproteinemia. Targeting biologically relevant RNAs with antisense oligonucleotides (ASO) represents a very specific and efficient approach, but the clinical use of ASOs remains limited because of poor pharmacokinetic and biodelivery properties (Crooke et al. 2021). For this reason, a complementary approach for RNA targeting based on the use of small-molecule ligands has been developed (Meyer et al. 2020). Indeed, biologically relevant RNAs often bear a very particular secondary and tertiary structure where single-stranded regions are associated with double-stranded ones thus inducing the formation of a 3D structure offering the possibility for specific interactions with peptides that constitute the natural RNA ligands, as well as small molecules (Zafferani and Hargrove 2021). Structured RNAs thus take a hairpin or stem-loop structure containing terminal or internal loops, bulges, and other specific structures that have been already exploited for targeting with specific binders. In this context, it is worth mentioning that a number of currently marketed antibiotics are RNA binders selective for prokaryotic ribosomal RNA thus impairing protein synthesis in bacteria (Wilson 2014). Among them, aminoglycosides, tetracyclines, macrolides, or oxazilidinones are the most represented. Aminoglycosides were the first RNA ligands identified as antibiotics in the early 1940s when the golden era of antibiotics began (Fig. 1). Streptomycin was the first antibiotic of this class that entered the market and employed against tuberculosis. Aminoglycosides were followed by the discovery in the same years of tetracyclines and amphenicols, by macrolides, such as

31

Chemical Tools to Target Noncoding RNAs

1019

Fig. 1 Chronological timeline illustrating the main landmarks in the RNA targeting by small molecules field. Compounds indicated in blue are marketed drugs while the ones in red are promising examples for future clinical applications

erythromycin, and lincosamides in the 1950s and finally by oxazilidinones, the first synthetic compounds bearing antimicrobial activity with linezolid as the main example. Since then, many studies described the identification of new RNA ligands against a variety of targets and the strategies that led to their discovery (Meyer et al. 2020). Among the most important examples, ribocil is a synthetic compound specific for bacterial riboflavin riboswitches discovered upon phenotypic screening (Howe et al. 2015). More recent approval of Risdiplam as a modifier of alternative splicing of SMN2 protein in spinal muscular atrophy (SMA) illustrates the potential of the small molecule approach and opens the way to the discovery and marketing of a whole new class of drugs for the treatment of incurable diseases (Ratni et al. 2021). Despite these important examples that will be detailed in the following sections, RNAs remain underexploited as biological targets of small-molecule drugs. Among the obstacles to the discovery of this kind of therapeutic tools, three seem to represent major hurdles. First of all, while structured RNAs bear a structure much more similar to the one of a protein than the one of a double stranded DNA, these targets are constituted of a succession of only four bases linked to a negatively charged backbone, thus limiting the number of possible interactions to be formed. Second, rational methodologies for the design of specific ligands remain limited in the literature and most of the discovered RNA ligands have been identified upon screening of large libraries (Childs-Disney et al. 2022). Third, structured RNAs exist as a mixture of different isomers and conformations and are present in solution as a dynamic ensemble (Ganser et al. 2019). The distribution of these different conformations is affected by the environment (salts concentration or protein partners) and strongly influences the biological functions of these RNAs. Some of these conformations are more stable than others, but the existence of this dynamic ensemble can limit the structure-based design of specific ligands and, more importantly, it renders the understanding of actual formed interactions more difficult. Despite these limitations, as mentioned above, the discovery of small molecules targeting RNA is an ever-increasing field of research that holds the promise for the discovery of new efficient therapies for a large number of pathologies in the near future (ChildsDisney et al. 2022). Parallel to the development of new drugs, new methodologies

1020

M. Bonnet and M. Duca

to overcome the lack of rational design possibilities have been developed going from high-throughput screenings to up-to-date in silico and informatic tools. One of the most recent landmarks in the field of RNA binding by small molecules is the development of RIBOTACs (Haniff et al. 2020). These tools are built using a specific RNA binder covalently linked to a RNase recruiter in order to induce not only the inhibition of the RNA target’s function, but also its degradation in the same line with what is performed with PROTACs against proteins. In this chapter, we will thus describe the milestones that illustrate the relevance of RNA targeting focusing on the small-molecule approach for the targeting of therapeutically relevant RNAs together with the most recent strategies that have been developed to supply for the lack of rational methodologies for the discovery of such chemical tools.

RNA As a Therapeutic Target RNA is a particularly intriguing target since despite its pivotal biological role and the large number of clinical applications possible, it remains underexploited in medicinal chemistry. As mentioned above, this is changing thanks to the presence of new RNA binders entering the market for what concerns small molecules tools and to recent developments in oligonucleotide delivery for what concerns oligonucleotide-based therapies. Indeed, after having been considered a simple intermediate in the gene expression process for many decades, it is now clear that RNA bears many essential roles during a wide range of biological processes going from transcription and translation to the regulation of gene expression. Furthermore, the involvement of RNA in a large panel of diseases highlighted the major potential of RNA as a drug target. As for DNA, RNA bears a primary structure constituted of a chain of nucleotides (Fig. 2).

Fig. 2 Primary and secondary structure of RNA

31

Chemical Tools to Target Noncoding RNAs

1021

Each nucleotide is composed of a nucleobase (purine or pyrimidine) in the 10 -position of a ribose linked to the flanking nucleotides by phosphodiester bonds. Nucleobases, adenine, guanine, uracil, and cytosine can interact with each other via the formation of Watson-Crick-Franklin hydrogen bonds thus leading to the particular A-form RNA duplex with the highly negatively charged phosphate backbone in contact with the aqueous media surrounding the nucleic acid. These local intramolecular interactions constitute the secondary structure of RNA. When secondary structure elements interact with each other through space, the tertiary structure is formed and leads to RNA folding giving rise to typical tridimensional structures. These latter contain the association of single-stranded regions with double-stranded ones and offer the possibility of binding with small molecules. Indeed, the formation of stem regions associated with bulges and internal/terminal loops creates binding sites at the junctions of these elements. It is important to note that biologically and therapeutically relevant RNAs are present not only in eucaryotes but also in bacteria and viruses thus offering a panel of targets for a wide range of pathologies where these RNAs are deregulated or whose function can be modulated by drugs. In the following sections, the RNAs that have been exploited for small-molecule targeting will be described together with their sequences, structures, and functions. The examples described in the following sections allow for a better understanding of the field and illustrate future perspectives for RNA targeting by small molecules. These are not meant to be exhaustive of what is reported in the literature, but to illustrate the strategies that have been developed during recent years for the discovery of RNA ligands and the perspectives they open.

Targeting Bacterial RNAs The bacterial ribosome has been largely exploited as a therapeutic target since the discovery of the first natural products acting as inhibitors of protein synthesis in bacteria thanks to the ability to bind ribosomal RNA (Wilson 2014). Prokaryotic ribosome is composed of three ribosomal RNAs (rRNAs) called 16S, 23S, and 5S and of ribosomal proteins. The RNA that composes the ribosome however dominates the main functional sites. During protein synthesis, the small (30S) and large (50S) subunits form a 70S ribosome and the accurate positioning of the mRNA start codon (usually AUG), together with the initiator tRNA (typically fMet-tRNA) at the ribosomal P‑site, constitutes the initiation step. During the elongation phase, an aminoacylated tRNA (aa-tRNA) is delivered to the A‑site of the ribosome. Peptidebond formation occurs between the amino acids attached to the tRNAs in the A- and P‑sites, resulting in transfer of the amino acid (or the polypeptide chain in later elongation cycles) from the P‑site tRNA to the aa-tRNA in the A‑site. In order to continue protein synthesis using a further aa-tRNA, the tRNAs are moved from the A‑ and P‑sites to the P- and E‑sites in a process known as translocation. The nascent protein then begins to engage in the exit tunnel in the 50S subunit before entering the cytoplasm, where protein folding occurs. The elongation cycle continues until a stop codon is encountered thus inducing the hydrolysis of the peptidyl-tRNA bond and

1022

M. Bonnet and M. Duca

Fig. 3 Chemical structure of aminoglycosides with examples of the “mycin” group and “micin” one

releasing the polypeptide chain from the ribosome. The complex is then disassembled, thus allowing the components to be recycled for the next round of translation initiation. During the golden era of antibiotics, many classes of natural compounds have been identified as binders of prokaryotic ribosomal RNA at different sites thus inducing the inhibition or the impairment of protein synthesis. Aminoglycosides are the first antibiotics that have been discovered as inhibitors of protein synthesis with Streptomycin being the first identified in 1944 from Streptomyces griseus (Compound 1, Fig. 3) and clinically employed for the treatment of tuberculosis. Since then, a number of different aminoglycosides have been discovered. Some of these compounds are part of the Streptomyces group (“mycin” aminoglycosides, such as neomycin, kanamycin, and tobramycin, compounds 2, 3, and 4, respectively, Fig. 3) or of the Micromonospora group (“micin” aminoglycosides, such as gentamicin and sisomicin, compounds 5 and 6, respectively, Fig. 3). Other aminoglycosides have been developed through chemical modifications using existing aminoglycoside scaffolds such as amikacin (compound 7, Fig. 3) and plazomicin (compound 8, Fig. 3). This latter was engineered to overcome aminoglycoside-modifying enzymes (AMEs), the most common aminoglycoside resistance mechanism, and is the most recently approved aminoglycoside. The discovery of aminoglycosides was followed by the one of tetracyclines, amphenicols, macrolides, and lincosamides, as well as other inhibitors, all these compounds being RNA binders able to target a particular site on the ribosome. Tetracycline molecules are constituted by a linear fused tetracyclic scaffold onto which a variety of functional groups are attached (Nguyen et al. 2014). It is well established that tetracyclines inhibit bacterial protein synthesis by binding near the

31

Chemical Tools to Target Noncoding RNAs

1023

ribosomal A-site and preventing the association of aminoacyl-tRNA with the bacterial ribosome. Binding of tetracyclines to rRNA is mainly due to the formation of electrostatic and hydrogen bonding interactions with the phosphate-oxygens of the backbone rather than specific rRNA bases thus lacking sequence specificity. Doxycycline, oxytetracycline, and minocycline are representative examples of this class (compounds 9, 10, and 11, respectively, Fig. 4). Macrolides are a family of naturally occurring 12- to 18-membered macrocyclic lactones and typically contain one or more deoxy-monosaccharides (Kanoh and Rubin 2010). Representative examples are erythromycin and azithromycin (compounds 12 and 13, Fig. 5). Macrolide antibiotics interact with the large subunit 23S rRNA in the upper portion of the peptide exit channel close to the peptidyl transferase center (P-site) thus inhibiting the release of the nascent protein from the ribosome. Furthermore, it has been demonstrated that macrolides inhibit the assembling of ribosomal subunits and proteins, thus leading to nucleolytic degradation.

Fig. 4 Chemical structure of three tetracyclines

Fig. 5 Chemical structure of two macrolide antibiotics: erythromycin (12) and azithromycin (13)

1024

M. Bonnet and M. Duca

Lincosamides, such as lincomycin and clindamycin (compounds 14 and 15, Fig. 6), share a similar mechanism of action with macrolides inhibiting the peptidyltransferase reaction on the 50S ribosomal subunit (Spizek and Rezanka 2017). The most recent class of this kind of antibiotics is the one of synthetic oxazolidinones. These compounds, such as linezolid (compound 16 in Fig. 6), inhibit protein synthesis by binding to the P site at the ribosomal 50S subunit. The molecular basis of the interactions formed by these compounds with the RNA target have been well explored thanks to a large panel of biochemical and structural biology studies (Falese et al. 2021). While the majority of these compounds target the elongation cycle, each class of ligands is selective for a particular site on the ribosomal RNA structure. Regardless of the positioning on the ribosome and the following consequences, all these antibiotics usually tend to bind to internal loops and bulged regions where the RNA double helix is distorted. Not only aminoglycosides form strong electrostatic interactions with the negatively charged RNA backbone, but also specific hydrogen bonds and hydrophobic ring stacking could be observed thanks to the crystal structures. Tetracyclines interactions with ribosomal RNA are dominated by hydrophobic interactions formed by the tetracyclic system, while the hydrophilic side of tetracycline interacts with the phosphate–sugar backbone. Macrolides interact in the exit tunnel thanks to the formation of hydrogen bonds and Van der Waals interactions. Hydrogen bonds and Van der Waals interactions also control lincosamides binding to the ribosome, while hydrogen bonds and stacking interactions are the most common for oxazilidinones interaction with the ribosome (Wilson et al. 2008). Magnesium ions are involved in the binding of all these antibiotics and determine the positioning as well as the effect on the RNA structure (Yamagami et al. 2021). Even if these compounds have been largely studied as RNA ligands, they still represent an important source of inspiration for the design of new ligands against different targets, especially for what concerns aminoglycosides. Furthermore, these antibiotics still represent the main basis of current knowledge about RNA targeting and served as invaluable therapeutics and chemical probes for translation processes. Beside the bacterial ribosomal RNA that, as described above, is one of the most studied and exploited RNA targets so far, bacterial riboswitches represent an interesting and relevant target for the development of antibiotics. Bacterial riboswitches

Fig. 6 Chemical structure of lincosamides clindamycin (14) and lincomycin (15) and of the oxazilidinone linezolid (16)

31

Chemical Tools to Target Noncoding RNAs

1025

are noncoding RNA structural elements that direct gene expression in numerous metabolic pathways. A large number of riboswitches have been discovered over the years, and each one can control a number of genes thus constituting an important regulation mechanism in prokaryotes. Their involvement in bacterial proliferation led to efforts to develop small-molecules that mimic natural riboswitch ligands to inhibit metabolic pathways and bacterial growth. Various approaches have been developed for the discovery of not only riboswitch inhibitors such as highthroughput and fragment-based screening, but also structure-guided ligand design (Vicens et al. 2018). However, most of the identified compounds were active in vitro but did not have activity in whole cell bacterial growth. A major example of these efforts is represented by the discovery of ribocil, a small synthetic compound targeting the flavin mononucleotide FMN riboswitch (Howe et al. 2015). This latter is one of the most studied targets among riboswitches since it regulates the concentrations of riboflavin (vitamin B2) and controls the expression of genes coding for proteins involved in the biosynthesis and transport of this essential vitamin. Riboflavin is the starting material for the production of flavin adenine dinucleotide that is extremely important for prokaryotic metabolism. A high-throughput screening for ligands of the FMN riboswitch revealed that ribocil (compound 17, Fig. 7) was an efficient inhibitor causing a dose-dependent reduction of riboflavin levels in E. coli at low concentrations (IC50 ¼ 0.3 μM). The S enantiomer of ribocil was the one responsible for the activity and growth inhibition activity of bacteria. Various analogs have also been developed, and a detailed study about Ribocil D showed that this compound, similarly to its analogs, forms π-stacking interactions with the target together with two pivotal hydrogen bonds between the carbonyl and the 20 -OH of adenosine 48 and the exocyclic NH2 of adenosine 99 (Howe et al. 2016). The examples described in this section show that various kinds of compounds can be employed for RNA targeting in bacteria with great clinical successes and specific interactions can be identified both for highly hydrophilic and positively charged

Fig. 7 Chemical structure of riboswitches ligands flavin mononucleotide FMN, ribocil A (17) and ribocil D (18)

1026

M. Bonnet and M. Duca

compounds, such as aminoglycosides, and more hydrophobic molecules such as linezolid or ribocil. The understanding of the detailed mechanism of action of this kind of antibiotics also opened the way to their application to other RNA targets and to the design of original compounds devoted to the binding of viral noncoding RNAs.

Targeting Viral RNAs A number of pathogenic viruses, such as the human immunodeficiency virus (HIV) , hepatitis C virus (HCV), Dengue virus, or even SARS-CoV-2, bear a RNA genome that can be targeted by small molecules (Hermann 2016). Indeed, noncoding RNAs (ncRNAs) can be found in viral genomes and transcripts, and due to their threedimensional folding structures, they represent potential targets for the development of antiviral drugs. The most advanced studies in the targeting of viral RNA structures by small-molecule ligands are described in the context of the human immunodeficiency virus (HIV) (Blond et al. 2014) and the hepatitis C virus (HCV) (Dibrov et al. 2014), but a few examples of RNA ligands targeting other RNA viruses have also been reported (Bottini et al. 2015; Park et al. 2011). As a widely studied example, the replication of HIV relies on specific RNA structures and RNA/protein interactions that, being crucial for viral proliferation, represent potential therapeutic targets. One of the most studied sequences is the HIV-1 transactivation response (TAR) element, which is necessary for HIV replication (Fig. 8a). TAR RNA is a stem-loop RNA that is spontaneously generated within the first 59 nucleotides of each viral transcript and bears an essential function during transcription thanks to its interaction with the viral protein Tat along with positive

Fig. 8 (a) Primary sequence and secondary structure of the 59-mer HIV-1 TAR RNA in complex with Cyclin T1 and CDK9. The gray circle represents the binding site of Tat peptide; (b) chemical structure of ligands 19–22 targeting TAR

31

Chemical Tools to Target Noncoding RNAs

1027

transcription elongation factor b (P-TEFb) composed of cyclin T1 and cyclindependent kinase 9 (CDK9). Tat interacts with TAR through an arginine-rich region that recognizes the bulged region of TAR. It has thus been suggested that the inhibition of Tat/TAR interaction would lead to the inhibition of viral replication. Numerous RNA ligands that can prevent this association have been identified, and TAR RNA serves as both an exciting target for the search for anti-HIV drugs and an invaluable model system for studying ligand binding to RNA. Even though the TAR RNA target has received much attention in the literature, only a small number of compounds have been tested in intracellular assays. This is likely due to a general lack of selectivity. In fact, the majority of the RNA binding molecules identified up to this point have chemical structures that are positively charged, which restricts their selectivity. The compounds that demonstrated a selective effect in infected cells and so supported the viability of the small-molecule method will be discussed in the following paragraphs. The design of novel RNA ligands has been inspired by aminoglycoside antibiotics for a long time. In particular, the aminoglycoside neomycin binds to a variety of structured RNAs and it has been used as a starting point for the design and synthesis of new RNA ligands against TAR. According to ESI-MS results and ribonuclease footprint assays, neomycin binds to three distinct allosteric locations on TAR RNA that are close to the Tat binding site. Based on these structural information, Arya and colleagues designed dimeric neomycin derivatives that may bind TAR RNA at two sites distinct from the Tat binding site (Kumar et al. 2012). When these dimers bind, they fix TAR in a specific conformation (increasing the melting temperature of TAR RNA by up to 10  C; see compound 19, Fig. 8b), blocking Tat interaction. Compound 19 is a potent binder (KD in the nM range) as well as an inhibitor of HIV proliferative activity in MT-2-infected cells, with a 70% inhibition at a low micromolar dose (9 μM), while toxicity is seen at concentrations twice that of the active concentration. This work proves that rational multimer ligand design can result in effective RNA-targeting compounds. On the basis of quantitative predictions of the binding energies for aminoglycoside derivatives that bind to various RNA conformations, molecular docking has also been employed to find TAR small-molecule binders (Stelzer et al. 2011). Six very specific binders of Tat/TAR complex were found, and the most effective one, netilmicin, has a KD of 1.35 μM and is selective for TAR in the presence of a significant excess of tRNA (compound 20 in Fig. 8b). In TZM-bl cell lines infected with the HIV-1 NL4-3 strain, it is able to suppress Tat-mediated activation by 81% and HIV replication with an IC50 of 23.1 μM. This inhibition was verified in the HUT-78 T-cell line, with toxicity manifesting at a netilmicin concentration of 100 μM. Despite these encouraging findings, the majority of these aminoglycoside binders continue to exhibit poor selectivity and/or undesirable physico-chemical features that make them unsuitable for therapeutic use. To overcome these limitations, new conjugates combining amino acids, which are the natural RNA binders because they are components of proteins, and artificial nucleobases, which are known to form specific hydrogen bonds with DNA and RNA base pairs in order to form base

1028

M. Bonnet and M. Duca

triplets, have been described (Joly et al. 2014). A variety of TAR RNA ligands with low micromolar affinity using structure-based ligand design were obtained. The compound with the highest activity was a conjugate of a phenyl-thiazole moiety and histidine (compound 21 in Fig. 8b), which has a KD of 17.5 μM for TAR, is selective for TAR versus tRNA or DNA, and, more importantly, inhibits viral proliferation in MAGIC-5B cells infected with HIV-1 NL4-3 strain with a submicromolar IC50 of 0.41 μM without toxicity at the highest tested concentration (100 μM). Schneekloth and colleagues also reported a new strategy based on small molecule microarray (SMM) to find new chemotypes that might bind and stabilize TAR. A novel chemotype, specifically thienopyridine 22 (Fig. 8b), was discovered by the authors after screening a library of 20,000 compounds (Sztuba-Solinska et al. 2014). Thienopyridine 18 best stabilizes TAR with a KD of 2.4 μM due to the presence of hydrophobic and aromatic substituents on a heterocyclic core. T-lymphoblast-based anti-HIV assays (CEM-SS), compound 22 reduced HIV-induced cytopathicity with an EC50 of 28 μM without causing toxicity even at concentrations as high as 1 mM. The search for TAR ligands able to inhibit Tat/TAR interaction continues, since TAR remains a very promising target even if antiviral drugs acting upon TAR binding have not been discovered so far. Recent works reported new ligands and new methodologies to discover this kind of compounds, but no cellular assays proved the actual antiviral activity of the newly identified RNA binders (Martin et al. 2020; Paul et al. 2020). Similar to this, other extensive screenings have been utilized to find RNA ligands that can block Rev/RRE, another critical interaction in the HIV replication cycle . Rev protein and RRE RNA work together to generate a host export complex that contains mRNAs that will be translated into viral proteins like Gag, Tat, Nef, and Rev itself (Fig. 9a), making it a suitable target for RNA ligands (Cao et al. 2009). Garvey and colleagues presented a high-throughput scintillation proximity assay using biotinylated Rev protein and tritiated RRE RNA for the screening of 500,000

Fig. 9 (a) Primary sequence and secondary structure of HIV-1 RRE RNA. Yellow circle represents the binding site of Rev peptide; (b) chemical structure of ligands 23–25 targeting RRE

31

Chemical Tools to Target Noncoding RNAs

1029

chemicals as an example of an effective RRE binder (Chapman et al. 2002). Several compounds were identified as inhibitors of the Rev/RRE interaction after numerous hits with EC50 values in the low micromolar range. Unfortunately, intracellular studies showed that the majority of the substances were harmful to the host cells. The identification of numerous compounds able to prevent the formation of the Rev-RRE complex came about as a result of the screening of 1120 FDA-approved small-molecule drugs. This assay is based on the displacement of Rev from RRE. Clomiphene and cyproheptadine (23 and 24, Fig. 9b) were the two most effective drugs, with KD values in the low μM range, high selectivity in the presence of tRNA and DNA, and IC50 values of 3.7 and 4.2 μM, respectively, for the in vitro suppression of the Rev/RRE interaction. Noteworthy, they have the ability to decrease HIV-1 gene expression and viral transcription in intracellular experiments with EC50 values of 4.3 and 4.2 μM, respectively. Benfluron (compound 25 in Fig. 9b) was identified by the same authors through a broader screening of 10,000 different drug-like compounds as an effective inhibitor of Rev/RRE complex both in vitro and in cells (Prado et al. 2018). This compound prevents HIV-1 replication with a very good EC50 of 830 nM. The Gag/SL3 interaction is another potential target for anti-HIV-1 drug discovery. HIV-1 Gag proteins are essential for virus assembly, release, and maturation as well as for the development of a productive infection (Waheed and Freed 2012). The majority of their interactions with RNA occurs when the viral RNA genome is encapsidated during the assembly process between the Gag nucleocapsid (NC) site and a stable helix known as SL3 of the highly conserved 50 -UTR (Fig. 10a). A high-throughput screening revealed that compound NSC260594 (26 in Fig. 10b) was able to inhibit SL3/Gag interaction (Bell et al. 2013). This compound can bind to the SL3 terminal loop, where Gag is known to interact. By stabilizing SL3, it prevents Gag from interacting with SL3, which reduces the conformational

Fig. 10 (a) Primary sequence and secondary structure of HIV-1 50 -UTR which consists of a series of stem-loops (SL1-SL3). Green circle represents the binding site of Gag protein; (b) chemical structure of ligand 26 that inhibits SL3/Gag interaction

1030

M. Bonnet and M. Duca

flexibility of the RNA’s broader backbone structure. When 293 T cells were treated with 50 μM of compound 26 (IC50 ¼ 4.5 μM), this ultimately resulted in the specific inhibition of RNA incorporation into virions without impacting any other step of viral replication. These examples demonstrate unequivocally that it is possible to identify chemical compounds that can effectively block different intracellular RNA/protein interactions. An important step in HIV replication is the dimerization of the viral genomic RNA. The strongly conserved RNA stem-loop structure known as the dimerization initiation site (DIS) is found in the 50 -noncoding region of the genome. A six-nucleotide self-complementary region found in the nine-nucleotide apical DIS loop aids in genome dimerization by generating a loop-loop RNA kissing complex. A lot of work has gone into doing structural research on DIS because of its functional significance for viral replication, which makes it an appealing therapeutic target. Except for natural or modified aminoglycosides, very few small compounds have been identified as effective DIS binders with the capacity to stabilize the kissing complex. For instance, neomycin, paromomycin, and lividomycin bridge the two HIV-1 RNA molecules in infected cells as well as in vitro to stabilize the kissingloop complex (Ennifar et al. 2006). However, even though these substances stop the kissing-loop complex from dissociating in infected cells, they have no antiviral action. This is likely due to their low penetration into eukaryotic cells or the fact that stabilizing DIS is insufficient to stop HIV-1 replication. Hepatitis C virus is another RNA virus that has received much research as a potential target for RNA binders (HCV) (Hermann 2016). A positive-sense internal ribosome entry site (IRES) in the 50 untranslated region (UTR) upstream of the open reading frame controls the translation of the pathogenic flavivirus HCV. The highly organized HCV IRES is an ncRNA element that directly attracts host cell ribosomes at the viral start codon without the need for the majority of translation initiation components. The eukaryotic initiation factor 3 (eIF3), a sizable multiprotein complex necessary to avoid premature interaction of the 40S and 60S ribosomal subunits, is first recruited by the IRES element. Due to its distinct role and great conservation in clinical virus isolates, the IRES RNA is a promising target for HCV translation inhibitors. Three folding domains (II–IV) are joined by flexible linkers and contribute to IRES structure (Fig. 11a). Small compounds have been found to target two domains: domain II, which facilitates the release of eIF2-GDP, a crucial initiation factor, from the 40S subunit to encourage 80S ribosomal assembly, and domain III, which is crucial for the interaction with the 40S ribosomal subunit. The tests used to find IRES inhibitors are primarily based on replicons, providing information on the toxicity of the compounds as well as their capacity to inhibit IRES-mediated translation in cells. Phenazine-like substances, such as 27 (Fig. 11b), demonstrated a specific and effective inhibition of IRES activity at 79 nM in HCF cells among the small molecules discovered thus far (pooled Huh-7 cells stably transfected with a molecular construct CAT-IRES-Fluci under a CMV promoter) (Wang et al. 2000). Biaryl guanidines were discovered to be selective inhibitors after further screening of over 180,000 compounds targeted at structured RNA targets (Jefferson et al. 2004).

31

Chemical Tools to Target Noncoding RNAs

1031

Fig. 11 (a) Primary sequence and secondary structure of HCV IRES RNA. Purple circle represents the major interaction site with small molecules; (b) chemical structure of ligands 27–31 targeting IRES

Compound 28 was refined as a 2 μM inhibitor of HCV-IRES translation as a result of an SAR investigation (Fig. 11b). Griffey and colleagues discovered imidazole derivatives as potential IRES binders and HCV replication inhibitors using MS-based high-throughput screening targeting subdomain IIA 29-mer (Seth et al. 2005). New analogs with low toxicity in an HCV replicon test, such as 29 (Fig. 11b), were discovered through the analyses of the structure-activity relationships. Hermann and colleagues selected compound 30 (Fig. 11b) from this group of imidazole derivatives for additional research on the molecular mechanism of action (Parsons et al. 2009). A widening of the RNA interhelical angle at the IRES subdomain IIA was caused by conformational induction of 30 as an HCV-specific translation inhibitor, according to a combination of fluorescence-based investigations on wild-type and mutant IIA RNAs. The in vitro nM affinities and μM intracellular activity showed good correlation. The 2.2 resolution 3D structure of IIA in association with 30 also revealed that RNA experiences a substantial structural modification brought on by the ligand, forming a deep pocket similar to the substrate binding sites in riboswitches (Dibrov et al. 2012). The same researchers also investigated a number of 3,5-diaminopiperidine heterocycles as IIa ligands and identified nM binders like 31 (Fig. 11b) that had selective activity in a cell-based HCV replicon assay (Carnevali et al. 2010). These modular ligands influence HCV translation by arresting subdomain IIA in a 90 bent state and preventing translation start, in contrast to benzimidazole inhibitors that caused a wider interhelical angle in the subdomain IIa leading to inhibition of IRES-driven translation. Research on HIV and HCV confirmed the RNA-targeting strategy and allowed for the evaluation of its viability and limitations in the context of antiviral therapy. As a

1032

M. Bonnet and M. Duca

result, a few more RNA viruses have been studied recently. For instance, the 50 and 30 ends of each RNA segment in influenza A viruses are highly conserved and fold to form a partial duplex, which is known as the promoter and is recognized by the RNA-dependent RNA polymerase (RdRp). According to an NMR-based fragment screening, 6,7-dimethoxy-2-(1-piperazinyl)-4-quinazolinamine (DPQ) may bind to this RNA structure with a KD of 50.5 μM and suppress viral reproduction in MDCKinfected cells with an EC50 ranging from 71 to 275 μM, depending on the viral strain (Lee et al. 2014). Cellular activity in the micromolar range was improved through additional screenings and SAR tests. Together, these studies demonstrate how difficult it might be to identify novel antiviral approaches based on the use of RNA ligands, despite the fact that many RNA targets could be effectively employed and fixed by small-molecule ligands. The recent advances in the understanding of RNA targeting by small molecules, which are discussed in the following sections, will undoubtedly add to the arsenal of binders already in use and lead to new discoveries in the search for novel antiviral drugs for emerging and/or incurable viral infections as it will be described in the section “Current Trends for the Development of Innovative Chemical Tools for RNA Targeting,” especially concerning the recent SARS-CoV-2 infections.

Targeting Eukaryotic RNAs Beside bacterial and viral RNAs, various eukaryotic RNAs have also been identified as involved in diseases and as potential therapeutic targets for innovative therapies. Major examples are represented by expanded RNA repeats and oncogenic noncoding RNAs such as microRNAs (miRNAs or miRs) that will be detailed in this section. New potential targets also appeared in the literature, such as long noncoding RNAs (lncRNAs), and will be detailed in section “Current Trends for the Development of Innovative Chemical Tools for RNA Targeting.”

RNA Nucleotides Repeats Numerous diseases have been linked to expanded RNA repeats, which have since been used as possible targets for small-molecule RNA ligands with the hope of developing new treatments. So far, over 25 human genes have been found to have tandem repeat expansions. These disease-causing repeats can be found in coding regions, as in Huntington’s disease (HD), in introns, as in spinocerebellar ataxia type 10 (SCA10) and myotonic dystrophy type 2 (DM2), and in 50 and 30 untranslated regions (UTRs), as in fragile X-associated tremor ataxia syndrome (FXTAS) and myotonic dystrophy type 1 (DM1). It is now known that enlarged repetitions in RNA transcripts can cause cellular toxicity and neurodegeneration by modifying the splicing machinery and resulting in the synthesis of abnormal proteins, even though the detailed molecular causes of these illnesses are still not fully understood. In fact, splicing proteins are sequestered by RNA nucleotide repeats, and the resulting

31

Chemical Tools to Target Noncoding RNAs

1033

splicing errors disrupt a variety of biological processes. Small compounds that target these RNAs and prevent them from having an impact on the splicing factors would limit the associated dysfunctions and could be a promising therapeutic strategy. The majority of small compounds that have been identified to target these particular RNA structures have been designed to target DM1 and DM2. While the expansion in DM2 is an rCCUG (r(CCUG)exp) quadruplet repeat in intron 1 of the zinc finger 9 protein (ZNF9) gene, the expansion in DM1 is an rCUG triplet repeat (r(CUG)exp) situated in the 30 untranslated region of the dystrophia myotonica protein kinase (DMPK) gene (Fig. 12a). Both RNAs fold into hairpin structures with two 50 GC/30 CG nucleotide pairs separating periodically repeated internal loops, while the DM1 repeat creates loops that are 1  1 nucleotides UU and the DM2 repeat creates loops that are 2  2 nucleotides 50 CU/30 UC. DM1 and DM2 have comparable disease symptoms and share a common biochemical mechanism: sequestration of the crucial splicing regulator Muscleblind-like 1 protein (MBNL1) by expanded RNA repeats. This is true even though these mutations are found in genes that are not functionally related to one another. The development of strategies focused on the targeting of RNA repeats to prevent the formation of MBNL1 complex was made possible thanks to a better understanding of the molecular origins of DMs. Modularly constructed ligands were first characterized as being especially well suited for this purpose in early reports concerning small compounds as specific nucleotide repeat binders. As an illustration, the aminoglycoside derivative 60 -N-5-hexynoate kanamycin A

Fig. 12 (a) r(CUG)exp and r(CUUG)exp found in DM1 and DM2, respectively; (b) chemical structure of 60 -N-5-hexynoate kamamycin A and of ligands 32–37 selective for RNA nucleotide repeats

1034

M. Bonnet and M. Duca

(compound 32, Fig. 12b), which was discovered to be a tight binder of pyrimidinerich internal loops similar to those generated in DM1 and DM2 related repeats, is a good example (Childs-Disney et al. 2007). This compound was assembled onto a peptoid backbone and bears an in vitro IC50 for the suppression of the MBNL1/ rCUG repeat interaction in the low nanomolar range. To boost the selectivity for targeting rCUG repeats over other types of repeats, the spacing of the ligand modules has been precisely regulated (Lee et al. 2009). However, the limited cell permeability severely restricts their intracellular and in vivo application and is the main drawback of these ligands. Due to a significant increase in cellular penetration, the addition of a D-Arg9 (DR9) transporter conjugated to these compounds as in 33 resulted in a second generation of modularly constructed compounds that were successful in cell culture and animal models of DM1 (Childs-Disney et al. 2012b). Compound 33 was able to correct the MD1-related abnormalities in cells at micromolar doses, with an in vitro IC50 for the disruption of the r(CUG)10-MBNL1 complex of 240 nM. In a mouse model of DM1, splicing was partially rescued as well (Childs-Disney et al. 2012b). These kanamycin A derivatives were further modified in order to favor DM2 repetitions over DM1 repeats by changing the number of modules (Childs-Disney et al. 2014). In situ click chemistry was developed based on the Huisgen 1,3-dipolar cycloaddition reaction to take advantage of the better cell permeabilities of small modules and the target’s capacity to catalyze the synthesis of the multivalent inhibitor inside cells in order to take advantage of these discoveries and to overcome the limitations of these multivalent compounds’ cell permeability (Rzuczek et al. 2014). Treatment of cells with equimolar azido- and alkyne-containing kanamycin moieties resulted in the creation of dimeric and trimeric reaction products, which had the desired intracellular effects, specifically in DM2-affected cells. To overcome the lack of selectivity due to the highly positive charge of aminoglycosides, peptides as well as aromatic and heteroaromatic moieties were employed as inhibitors of the interaction between rCUG and MBNL1. In this context, Hoechst 33258 and pentamidine are intriguing ligands that can be used in a chemical similarity-searching strategy. This kind of methodology was used to find small compounds with similar shapes and/or positions of functional groups, and then investigate these analogs’ capacity to bind rCUG repeats and suppress MD1-related defaults (Parkesh et al. 2012). For instance, this research allowed for the discovery of Hoechst derivative 34 (Fig. 12b). With a KD of 70 nM and the capacity to prevent the formation of the MBNL1/rCUG complex in the low micromolar range, this substance was in fact the most effective rCUG repeat binder. However, intracellular activity was shown to be in the mM range, most likely due to problems with the selectivity or permeability of the membrane. Then, as was done for kanamycin, other Hoechst derivatives were used to create modularly built molecules. As a result, compound 35 was identified and was able to improve all side effects brought on by trinucleotide repeats, including errors in alternative splicing, translation, and disruption of nuclear foci formation in cells (Childs-Disney et al. 2012a). Additional research on this kind of compounds showed that potency and bioactivity are strongly influenced by the scaffold’s composition. The synthesis of modular ligands has also been based on the use of

31

Chemical Tools to Target Noncoding RNAs

1035

polyamines, α-peptides, β-peptides, and peptide-tertiary amides (PTAs). These latter represent the best scaffold in terms of intracellular activity, cellular permeability, stability against proteases, and toxicity. By creating a covalent contact with the target, adding an electrophilic nucleic acid-reactive module, like chlorambucil, significantly increased the potency of 35 (Guan and Disney 2013). The mutant DMPK allele harboring r(CUG)exp was specifically targeted thanks to the identification of cellular RNA targets made possible by this adduct formation. By including a cross-linker, the RNA binder became more effective after making a covalent bond. Target profiling in cells was made possible by the further inclusion of a cleavage domain, such as bleomycin, which is known to trigger RNA cleavage in vitro. To react small compounds bound to nearby sites in r(CUG)exp, an on-site probe synthesis strategy was also devised. Only in DM1-affected cells did this provide picomolar inhibitors through a proximity-based click reaction (Rzuczek et al. 2017). Compound 36 was also discovered by Zimmerman and colleagues to bind CUG repeats and be a selective inhibitor of CUG-MBNL1 interaction in vitro (Jahromi et al. 2013). This compound is composed of a well-known acridine DNA intercalator and of a triaminotriazine unit to detect U-U through Janus-Wedge hydrogen bonding. It was rationally designed based on the X-ray structure of a short CUG sequence. Thus, stacking of the triaminotriazine and acridine units reduces nonspecific, intercalative binding to duplex RNA while the two sides of the triazine heterocycle have the capacity to simultaneously create a whole set of hydrogen bonds with the imperfectly paired uracil bases. Unfortunately, due to its limited water solubility and inability to pass through the cell membrane, no action was seen in cells. In vitro KD values of 5.2 nM and IC50 values of 15 μM were obtained as a result of the conjugation of 36 to a cationic polyamine, and intracellular activities in the high micromolar range were also observed (Jahromi et al. 2013). In order to find effective r(CUG)exp binders, a structure-based method was also employed. This led to the identification of a brand-new family of groove-binding ligands with two triaminotriazine units, including ligand 37 (Wong et al. 2014). Similar to compound 36, this one had in vitro IC50 values in the low micromolar range and KD values in the low nanomolar range; however, far greater concentrations (over 100 μM) were required to detect an intracellular action. The rCGG repeats linked to the Fragile X-associated tremor ataxia syndrome (FXTAS), in addition to the DM1 and DM2 trinucleotide repeats, have been investigated as potential small molecule targets (Fig. 13a). Since the sequestration of splicing proteins by enlarged repeats induces the incorrect splicing of a variety of pre-mRNAs, leading to the production of faulty proteins, FXTAS is caused by an RNA gain of function. To find inhibitors of the interaction between r(CGG)12 and DCGR8D protein involved in the control of pre-mRNA splicing, a high-throughput screening test was performed (Disney et al. 2012). In particular, compound 38 (Fig. 13b) was identified by this assay since it bears an in vitro IC50 of 12 μM and improves the splicing errors in cells treated with as little as a 20 μM dose. However, significantly greater doses (over 100 μM) are required to detect significant intracellular effects. The synthesis of dimers like 39

1036

M. Bonnet and M. Duca

Fig. 13 (a) r(CCG)exp found in FXTAS; (b) chemical structure of ligands 38 and 39 selective for RNA rCCG repeats; (c) r(AUUCU)exp found in SCA10; and (d) chemical structure of ligands 40 and 41 selective for RNA rAUUCU repeats

(Fig. 13b) using the same multivalent compounds as those mentioned above for DM1 and DM2 resulted in a tenfold increase in in vitro and intracellular activity (Yang et al. 2016b). It is noteworthy that the comparison of these repeat-specific ligands with repeat-specific oligonucleotides showed that these particular RNAs are much better suited for targeting with small molecules than with oligonucleotides because the latter affect not only the repeat sequence and interactions but also the translation of downstream open reading frames (ORF). In order to target the enlarged r(AUUCU) repeat that causes SCA10, an incurable neuromuscular illness, Disney and colleagues have recently concentrated on the hunt for compounds able to bind preferentially to AU base pairs. Indeed, stretches of AU base pairs, in particular the repetition of 50 AU/30 UA pair steps, can be found in these repetitions (Fig. 13c). The authors identified benzamidine compound 40 (Fig. 13d) as a selective binder of AU pairs and of r(AUUCU)exp repeats with a 5- to 15-fold selectivity over other repeats after screening a set of 104 compounds selected using chemical similarity searching for small molecules that should bind RNA internal loops (Yang et al. 2016a). This compound has a KD of 300 nM and a binding stoichiometry of 11:1, which suggests that a modular approach like the one previously described for targeting r(CUG)exp may also be appropriate in this situation. Thus, a peptoid scaffold was used to construct a modular derivative of 40, resulting

31

Chemical Tools to Target Noncoding RNAs

1037

in 41 (Fig. 13d). This latter is one of the most effective RNA ligands targeting nucleotide repeats and the first example of a bioactive molecule targeting r (AUUCU)exp. It greatly improves SCA10-related abnormalities when patientderived fibroblasts are treated with 50 nM of this compound. In conclusion, promising compounds able to target RNA repeat expansions efficiently and selectively were discovered by high-throughput screenings (HTS), chemical similarity searches, and rational design. Major intracellular effects are challenging to achieve, in part not only because the compounds with the highest biological activity in vitro have problems with solubility or cellular penetration, but also because binding to these repetitions may not be enough to make a biologically meaningful impact. It is also possible that a lack of specificity is to blame for the sometimes high concentrations required to have the desired biological effect. To discover new drugs for these incurable diseases, however, the small-molecule approach for targeting RNA repeat expansions continues to be very promising.

MicroRNAs The family of short noncoding RNAs known as microRNAs are involved in the control of gene expression and play a pivotal role in the area of physiologically relevant RNAs (Ambros 2008). These small RNAs, which have a base composition of 22–24 nucleotides, are found in all eukaryotic cells. The transcription of miRNA genes results in primary miRNAs (pri-miRNAs), which are made up of different kilobases and are the precursors for their production (Kim et al. 2009). Pri-miRNAs are reduced to shorter pre-miRNAs of about 70 nucleotides after ribonuclease processing by Drosha enzyme. Following their transfer to the cytoplasm, these precursors are subsequently cleaved to a miRNA duplex by a second ribonuclease known as Dicer. After unwinding, the obtained miRNA participates to the formation of a multiproteic complex known as miRISC, which is able to detect different mRNAs and inhibits their translation into proteins. The regulation of protein synthesis and cellular homeostasis depend on this physiological system, but this latter can be altered since miRNAs can be either overexpressed or underexpressed in contrast to healthy cells. Numerous illnesses, including cancer, neurological disorders, and cardiovascular diseases, have been related to these deregulation processes. Thus, it is evident that restoring the physiological expression of dysregulated miRNAs through upregulating those that are underexpressed or downregulating those that are overexpressed could result in interesting therapeutic approaches. In order to target deregulated miRNAs more specifically than the miRNA itself, various small molecule-based strategies have been developed. These strategies include blocking the transcription of deregulated miRNAs from the miRNA gene or the processing of pri- and pre-miRNAs by Drosha and Dicer ribonucleases. More recently, compounds that interact with RNA/protein complexes along miRNA pathways (like Lin28 interactions) have been identified. The majority of these studies have focused on either inhibiting oncogenic miRNAs or stimulating the synthesis of tumor suppressor miRNAs. Reviews have recently reported the most important examples of small chemicals targeting miRNAs, and

1038

M. Bonnet and M. Duca

here we will illustrate the most important and recent examples (Di Giorgio et al. 2016; Warner et al. 2018; Donlic and Hargrove 2018). High-throughput screening remains one of the privileged approaches to identify small molecules that interfere with the synthesis of miRNAs relevant to diseases. Deiters and colleagues described a screening against miR-122, a tumor suppressor in hepatocellular cancer, and against the broadly distributed oncogenic miR-21 in this context (Gumireddy et al. 2008). This led to the discovery of compounds 42–44 (Fig. 14), which have an EC50 in the low μM range and can suppress the synthesis of miR-21 and miR-122 in cells. Another example is compound AC1MMYR2 (compound 45 in Fig. 14), identified after an in silico screening, that is an oncogenic miRNAs inhibitor that is capable of precisely inhibiting miR-21 production in cancer cells selectively, resulting in an anticancer impact in vivo in both glioblastoma and breast cancer orthotopic models (Shi et al. 2013). New RNA ligands were also designed in a more rational way to prevent the Dicer enzyme from processing oncogenic pre-miRNAs) leading to interesting biological activity in cells (Vo et al. 2016, 2018). The miRNAs miR-372 and miR-373 that were targeted play a role in the emergence of several malignancies, including gastric cancer. The designed compounds were based on the conjugation of aminoglycoside neomycin, known to interact with a variety of RNAs with high affinity, with artificial

Fig. 14 Chemical structure of ligands 42–52 that represents promising inhibitors of oncogenic miRNAs biogenesis (indicated for each ligand) upon selective binding to miRNAs precursors (pre-miRNAs and pri-miRNAs)

31

Chemical Tools to Target Noncoding RNAs

1039

nucleobases that can interact with RNA base pairs selectively, such as nucleobase S in compound 46 (Fig. 14). With IC50 values in the low micromolar range, the combination of these RNA binding motifs enabled the suppression of miR-372 processing in vitro and in cells. It was also noted that targeted miRNA overexpression in gastric cancer cells specifically inhibited their ability to proliferate. Through the addition of a third RNA binding domain, which is represented by basic amino acids, and with the appropriate spatial distribution, optimal compounds were synthesized and studied for their binding and biological activity (Maucort et al. 2021). Functionalized polyamines, such as compound 47 (Fig. 14), were also discovered as promising structures for the intracellular suppression of oncogenic miRNAs after a screening for effective inhibitors of the same oncogenic miRNA (Staedel et al. 2018). Disney and colleagues have reported the identification of highly effective drugs that specifically inhibit oncogenic miRNA production using a lead identification method they named Inforna (Angelbello et al. 2018; Disney and Angelbello 2016). They first created a library versus library-screening strategy, in which a substantial collection of small-molecule RNA binders immobilized on microarrays is screened against a sizable collection of RNA motifs, such as bulges or internal loops. The Structure-Activity Relationships Through Sequencing (StARTS) approach, a statistical technique that predicts the affinity and selectivity of members of an RNA library, was then integrated with this two-dimensional combinatorial screening (2-DCS), scoring binding interactions (Velagapudi et al. 2010). The lead identification method, known as Inforna, made it possible to find numerous drugs that have incredibly specific actions in cells and animal models against a particular miRNA (Velagapudi et al. 2014). One of the first examples is targaprimiR-96 (compound 48, Fig. 14), which binds to the target miR-96 hairpin precursor with a low nanomolar KD and has an intracellular IC50 of roughly 50 nM. In mice given 10 mg/kg by intraperitoneal injection of this compound, tumor growth can also be inhibited. This inhibition was linked to the silencing of miR-96 and the recovery of its protein target expression (FOXO1) (Velagapudi et al. 2016). Similar to this, targapremiR-210 (compound 49, Fig. 14) was also identified as a binder that can block the synthesis of miR-210, a key regulator of the hypoxic response that influences the expression of hypoxia inducible factors (HIFs) in solid tumor masses (Costales et al. 2017). Glycerol phosphate dehydrogenase 1-like (GPD1L) enzyme levels are suppressed by miR-210. In triple negative breast cancer cells and in a mouse xenograft model, targapremir-210 binds to the pre-miR-210 Dicer processing site and modifies the miR-210 hypoxic circuit in a very particular way (Costales et al. 2017). InfoRNA was recently applied to a library of natural products and extracts and allowed for the identification of nocathiacin I (NOC-I, compound 50, Fig. 14) as an inhibitor of the oncogenic noncoding miR-18 (Ye et al. 2022). Noteworthy, NOC-I is able to bind to the RNA motif 50 GAU/30 C_A that is included in pre-miR-18 structure and in particular in its Dicer-processing site. This leads to the inhibition of the biogenesis of this miRNA in a prostate cancer cell line and to the inhibition of the miR-18a-STK4 circuit that allows prostate cancer cells to evade apoptosis. Importantly, this biological activity is selective for this pathway without affecting other

1040

M. Bonnet and M. Duca

miRNAs. In a similar way, Inforna allowed for the identification of a small molecule (compound 51, Fig. 14) able to bind selectively the pre-miR-200c structure and reverse a proapoptotic effect in a pancreatic b cell model for the treatment of type 2 diabetes (Haniff et al. 2022). The miR-200 family consists of five miRNAs, and the obtained results demonstrated that it is possible to selectively inhibit one miRNA belonging to a family of miRNAs sharing sequence homology using small-molecule compounds. Using a different and unprecedented approach, Disney and coworkers also applied a screen of a DNA-encoded library of >70 thousands of ligands to a library of RNA structures including >4 thousands RNAs. This allowed for the identification of compound 52 (Fig. 14) with nanomolar affinities for oncogenic primary miR-27 (pri-miR-27) and showed inhibition of this miRNA biogenesis and rescue of a migratory phenotype in triple-negative breast cancer (TNBC) cells (Benhamou et al. 2022). In conclusion, targeting miRNAs using small synthetic compounds is a viable strategy to affect the proliferation of cancer cells and holds the potential for the discovery of new drugs in the near future. Challenges are still to be faced to reach this goal, but the results obtained so far illustrate that binding selectively to a functional site in one of the miRNAs precursors can efficiently and specifically inhibit the production of oncogenic miRNAs and inhibit cancer cells’ proliferation.

Targeting of Long Noncoding RNAs Long noncoding RNAs are long RNA sequences constituting a class of regulatory RNAs that have developmental- and tissue-specific expression and regulate many levels of cellular processes, including expression of oncogenes. LncRNAs thus play many functional roles. For example, they can act as molecular guides that localize ribonucleoprotein complexes to specific chromatin targets, inducing changes in gene expression and assembling protein complexes that impact transcriptional activation or repression. Furthermore, lnc-RNAs are involved in epigenetic regulation of intracellular communication, cell proliferation. Among biologically relevant lncRNAs, metastasis-associated lung adenocarcinoma transcript 1 (MALAT1) has been studied as a target for small-molecule ligands since it is involved in several physiological processes, such as alternative splicing or epigenetic modulation of gene expression, but it is also involved in metastasis and tumor proliferation in different types of cancers (Gutschner et al. 2013). The 30 terminal stability element for nuclear expression (ENE) assumes a triple helical configuration that promotes its nuclear accumulation and is essential to its function (Fig. 15a). In a first attempt to identify efficient binders, Hargrove and coworkers studied the triplex-binding ligand DPFp8 (compound 53, Fig. 15b) that allowed for differential recognition of the MALAT1 triple helix and stem loop structures and bears highaffinity for the triplex structure while it did not bind the stem loop structure (Donlic et al. 2018). New analogs of this compounds were synthesized and reported in order

31

Chemical Tools to Target Noncoding RNAs

1041

Fig. 15 (a) Primary sequence and secondary structure of MALAT-1 ENE RNA; (b) chemical structure of ligands 53–55 targeting MALAT-1 triple helix

to study the recognition properties demonstrating that small molecules could interfere with MALAT1 structure, but no intracellular studies were performed and the actual selectivity profile remains unknown (Donlic et al. 2020). Le Grice and coworkers also identified two small molecules from a microarraybased high-throughput screen that were found to affect MALAT1 triple helix stability (compounds 54 and 55, Fig. 15b) (Abulwerdi et al. 2019). These ligands led to a decrease in MALAT1 levels and branching in breast cancer organoid models, supporting the hypothesis that small molecule could modulate lncRNAs functions and that this could be a promising strategy to regulate oncogenic ncRNA levels and related cancer processes. Despite a low number of examples reported so far, selective ligands of lncRNAs and in particular of MALAT1 have the potential not only to inhibit the proliferation of cancer cells, but also to act as essential chemical tools for a better understanding of lncRNAs whose roles and functions are still not completely elucidated. The results obtained to date suggest that many new examples of lncRNAs binders will be reported in the future.

Current Trends for the Development of Innovative Chemical Tools for RNA Targeting As illustrated by the examples described above, a number of bacterial, viral, and eukaryotic RNAs are therapeutically relevant as drug targets (Childs-Disney et al. 2022). High-throughput and in silico screenings as well as rational and structure-

1042

M. Bonnet and M. Duca

based design have been employed to identify a great variety of chemical structures able to target these RNAs, and the potential chemical space is under exploration to identify the best ligands with the highest specificity to become marketed drugs. The results gathered from these experimental screenings and biological studies allowed for the construction of databases such as R-BIND that will likely accelerate the drug discovery process in the field of RNA targeting (Morgan et al. 2019). Currently, some new approaches are under development to further improve the efficacy of the RNA-targeting approach and holding the promise for the discovery of efficient drugs in the near future. To illustrate these trends, we chose the example of the Ribotac approach, where a specific RNA ligand is conjugated to a compound able to recruit an RNase enzyme and induce target degradation similarly to what is performed with PROTACs (Haniff et al. 2020). Furthermore, more details about the most recent RNA ligands that entered the market, Risdiplam and Branaplam, will be given to illustrate the success of the RNA-targeting approach in the clinic (Zhang et al. 2022).

RIBOTAC Strategy The concept of targeted degradation was first applied to proteins leading to proteolysis-targeting chimeras (PROTACs). PROTACs are chimeric molecules composed of a protein-binding compound and an E3 ubiquitin ligase-recruiting molecule. Upon binding to the targeted protein, this kind of conjugate induces the selective degradation by the proteasome. Following a similar principle, Disney and coworkers developed RIBOTACs for targeted degradation of RNA (Costales et al. 2018). In this case, the designed conjugates are composed of an RNA-binding compound and of an RNase-recruiting molecule that selectively mediates RNA decay thanks to the ability to recruit the ubiquitously expressed cellular endoribonuclease RNase L. In the first two examples reported in the literature, the traga-pri-miR-96 and the targapre-miR-210 were coupled to a short 20 -50 -polyA oligonucleotide, inducing its selective cleavage of the RNA target in cells (compounds 56 and 57, Fig. 16) (Costales et al. 2018, 2019). Conjugate 56 was able to activate endogenously the RNase L that induced the selective cleavage of pre-miR-96 in cancer cells in a catalytic and substoichiometric fashion (Costales et al. 2018). Nanomolar concentrations of the compound thus induced silencing of miR-96, derepression of proapoptotic FOXO1 transcription factor, and apoptosis in breast cancer cells without effect in healthy breast cells. Conjugate 57 showed enhanced selectivity compared with TGP-210 and nanomolar affinity for pre-miR-210 (Costales et al. 2019). Noteworthy, it cleaved pre-miR-210 substoichiometrically and induced apoptosis in breast cancer cells. The results obtained with the RIBOTAC strategy thus demonstrated that small molecules can cleave RNA via nuclease recruitment thus having relevant biological activity. The recent pandemic caused by SARS-CoV-2, a RNA virus, highlighted that targeting RNA could also be an excellent strategy to face this kind of therapeutic challenges if a compound able to inhibit viral proliferation upon binding to genomic

31

Chemical Tools to Target Noncoding RNAs

1043

Fig. 16 Chemical structure of RIBOTAC compounds 56 and 57 inducing selective cleavage of pri-miR-96 and pre-miR-210, respectively

RNA could be available. RIBOTAC strategy was thus also applied to the targeting of SARS-CoV-2 frameshifting element RNA that bears a 1x1 nucleotide UU internal loop that could be targeted by small molecules (Fig. 17a) (Haniff et al. 2020). First, Inforna methodology was applied to a collection of RNA ligands to identify specific binders of this internal loop. Compound Covidcil-19 (compound 58, Fig. 17b) was identified among other hits as the best ligand-bearing nanomolar affinity and able to stabilize the RNA target and impair frameshifting in cells. After confirming the intracellular target using chemical cross-linking and isolation by pull down (Chem-CLIP) approach, compound 58 was conjugated to a chemical compound to recruit cellular ribonuclease RNase L leading to compound 59 (Fig. 17b). In vitro and intracellular assays confirmed that 59 induced cleavage and degradation of the entire SARS-CoV construct. While the antiviral activity of this compound was not evaluated yet, the strategy appears as a very promising and original way to fight the infection. Altogether, these results show that the RIBOTACled optimization strategy is able to improve the biological activity of the studied compounds of at least tenfolds.

Targeting Pre-mRNA Splicing RNA ligands have recently been applied to the manipulation of the RNA splicing machinery. This new and successful drug modality is currently under study for its application to various pre-mRNAs, and the successful treatment of spinal muscular atrophy or SMA will be described below. SMA is the leading genetic cause of infant mortality and is caused by deletion or mutation of the survival of motor neuron

1044

M. Bonnet and M. Duca

Fig. 17 (a) Representation of the U-U internal loop of SARS-CoV-2 frameshifting element. The blue square represents small molecules targeting site; (b) structures of compound Covidcil-19 58 and ribotac 59 targeting SARS-CoV-2 RNA

1 (SMN1) gene which encodes the SMN protein. This induces the degeneration of spinal motoneurons leading to muscle weakness and atrophy. In the genome, there are two nearly identical copies of the SMN genes called SMN1 and SMN2. A large screening for compounds acting on SMA allowed for the identification of SMN-C class of compounds such as SMN-C3 (compound 60, Fig. 18) (Naryshkin et al. 2014). The study of the mechanism of action of these molecules revealed that they regulate SMN2 exon 7 splicing to produce a higher amount of SMN protein and to reverse the effects of the pathology. While increasing the expression of SMN protein in many different cell types and being effective across SMA different severities with nanomolar activities, these compounds were also orally available. They were effective in mouse models increasing life span and preventing motor dysfunction and neuromuscular deficits. This class of compounds acts via interaction with a tertiary RNA structure in the mRNA-protein complex in a selective manner in particular by binding a region of exon 7 and an RNA helix formed by the 50 -single stranded region in intron 7 and the 50 terminus of U1 snRNA, thus increasing the interaction with U1 snRNP. Detailed mechanistic studies on this class of RNA binders revealed that they directly bind to the AGGAAG motif on exon 7 of the SMN2 pre-mRNA and promote a conformational change both in cells and in vivo. Following studies about structure activity relationships in order to improve efficacy, safety, and pharmacokinetic led to many new analogs of these SMN-C derivatives until the discovery of Risdiplam (RG7916, compound 61, Fig. 18) which was approved by the FDA in 2020 (Zhang et al. 2022).

31

Chemical Tools to Target Noncoding RNAs

1045

Fig. 18 Chemical structure of ligands 60–62 targeting pre-mRNA splicing

Many new studies have been performed since then, and HTS approaches led to the discovery of another efficient compound known as branaplam (NVS-SM1, compound 62, Fig. 18) that is advanced in human clinical trials (Cheung et al. 2018). These successful results in terms of innovative therapeutic applications for RNA binders further confirm that nucleic acid-targeted molecules may have a promising future in the modulation of disease processes involving RNA deregulations, such as in this particular case pre-mRNA splicing.

Conclusion Targeting RNA with small-molecule binders is an emerging field of medicinal chemistry with major implications in chemical biology since the tools identified during the search for new drugs also help answering questions about the molecular mechanisms underlying the functions and roles of the targeted RNAs. The overview given in this chapter about the strategies and the chemical compounds that can be employed to bind selectively and efficiently therapeutically relevant RNAs should help the reader understand not only the challenges that the field still needs to face but also the potential that this strategy holds for future discoveries. From a chemical point of view, it is now clear that many kinds of compounds can bind selectively RNA. While initially, positively charged compounds, such as aminoglycosides or polyamines, were privileged in the search for RNA binders, aromatic and heteroaromatic compounds have largely been validated as promising scaffolds with some successes in clinic as shown in the last section. From a biological point of view, many different RNA targets have been identified and new relevant ones are discovered continuously. The discovery of risdiplam is the flagrant proof of this observation, since the class of compounds to whom it belongs was discovered after the phenotypic high-throughput screening and the actual mechanism of action and target were identified afterward thus opening the way for the discovery of many analogs and new perspectives. High-throughput screenings, despite remaining a major approach for the discovery of bioactive compounds, have been complemented with new advanced approaches based on the design of selective conjugates, the structure-based approaches, in silico screenings, and the combination of screening and bioinformatics as is the case for Inforna. Altogether, these methodologies offer a large panel of

1046

M. Bonnet and M. Duca

choices to approach the field of targeting RNA by small molecules. Furthermore, the association of the abovementioned strategies, with new advanced chemical biology techniques for the intracellular identification of the biological targets as well as for their degradation, opens new avenues for the discovery of original and efficient bioactive compounds and hopefully new drugs.

References Abulwerdi FA, Xu W, Ageeli AA, Yonkunas MJ, Arun G, Nam H et al (2019) Selective smallmolecule targeting of a triple helix encoded by the long noncoding RNA, MALAT1. ACS Chem Biol 14(2):223–235. https://doi.org/10.1021/acschembio.8b00807 Ambros V (2008) The evolution of our thinking about microRNAs. Nat Med 14(10):1036–1040. https://doi.org/10.1038/nm1008-1036 Angelbello AJ, Chen JL, Childs-Disney JL, Zhang P, Wang ZF, Disney MD (2018) Using genome sequence to enable the design of medicines and chemical probes. Chem Rev 118(4):1599–1663. https://doi.org/10.1021/acs.chemrev.7b00504 Bell NM, L’Hernault A, Murat P, Richards JE, Lever AM, Balasubramanian S (2013) Targeting RNA-protein interactions within the human immunodeficiency virus type 1 lifecycle. Biochemistry 52(51):9269–9274. https://doi.org/10.1021/bi401270d Benhamou RI, Suresh BM, Tong Y, Cochrane WG, Cavett V, Vezina-Dawod S et al (2022) DNA-encoded library versus RNA-encoded library selection enables design of an oncogenic noncoding RNA inhibitor. Proc Natl Acad Sci U S A 119(6). https://doi.org/10.1073/pnas. 2114971119 Blond A, Ennifar E, Tisne C, Micouin L (2014) The design of RNA binders: targeting the HIV replication cycle as a case study. ChemMedChem 9(9):1982–1996. https://doi.org/10.1002/ cmdc.201402259 Bottini A, De SK, Wu B, Tang C, Varani G, Pellecchia M (2015) Targeting Influenza A virus RNA promoter. Chem Biol Drug Des 86(4):663–673. https://doi.org/10.1111/cbdd.12534 Cao Y, Liu X, De Clercq E (2009) Cessation of HIV-1 transcription by inhibiting regulatory protein Rev-mediated RNA transport. Curr HIV Res 7(1):101–108. https://doi.org/10.2174/ 157016209787048564 Carnevali M, Parsons J, Wyles DL, Hermann T (2010) A modular approach to synthetic RNA binders of the hepatitis C virus internal ribosome entry site. Chembiochem 11(10):1364–1367. https://doi.org/10.1002/cbic.201000177 Chapman RL, Stanley TB, Hazen R, Garvey EP (2002) Small molecule modulators of HIV Rev/Rev response element interaction identified by random screening. Antivir Res 54(3): 149–162. https://doi.org/10.1016/s0166-3542(01)00222-4 Cheung AK, Hurley B, Kerrigan R, Shu L, Chin DN, Shen Y et al (2018) Discovery of small molecule splicing modulators of survival motor neuron-2 (SMN2) for the treatment of spinal muscular atrophy (SMA). J Med Chem 61(24):11021–11036. https://doi.org/10.1021/acs. jmedchem.8b01291 Childs-Disney JL, Wu M, Pushechnikov A, Aminova O, Disney MD (2007) A small molecule microarray platform to select RNA internal loop-ligand interactions. ACS Chem Biol 2(11): 745–754. https://doi.org/10.1021/cb700174r Childs-Disney JL, Hoskins J, Rzuczek SG, Thornton CA, Disney MD (2012a) Rationally designed small molecules targeting the RNA that causes myotonic dystrophy type 1 are potently bioactive. ACS Chem Biol 7(5):856–862. https://doi.org/10.1021/cb200408a Childs-Disney JL, Parkesh R, Nakamori M, Thornton CA, Disney MD (2012b) Rational design of bioactive, modularly assembled aminoglycosides targeting the RNA that causes myotonic dystrophy type 1. ACS Chem Biol 7(12):1984–1993. https://doi.org/10.1021/cb3001606

31

Chemical Tools to Target Noncoding RNAs

1047

Childs-Disney JL, Yildirim I, Park H, Lohman JR, Guan L, Tran T et al (2014) Structure of the myotonic dystrophy type 2 RNA and designed small molecules that reduce toxicity. ACS Chem Biol 9(2):538–550. https://doi.org/10.1021/cb4007387 Childs-Disney JL, Yang X, Gibaut QMR, Tong Y, Batey RT, Disney MD (2022) Targeting RNA structures with small molecules. Nat Rev Drug Discov. https://doi.org/10.1038/s41573-02200521-4 Costales MG, Haga CL, Velagapudi SP, Childs-Disney JL, Phinney DG, Disney MD (2017) Small molecule inhibition of microRNA-210 reprograms an oncogenic hypoxic circuit. J Am Chem Soc 139(9):3446–3455. https://doi.org/10.1021/jacs.6b11273 Costales MG, Matsumoto Y, Velagapudi SP, Disney MD (2018) Small molecule targeted recruitment of a nuclease to RNA. J Am Chem Soc 140(22):6741–6744. https://doi.org/10.1021/jacs. 8b01233 Costales MG, Suresh B, Vishnu K, Disney MD (2019) Targeted degradation of a hypoxiaassociated non-coding RNA enhances the selectivity of a small molecule interacting with RNA. Cell Chem Biol 26(8):1180–1186 e5. https://doi.org/10.1016/j.chembiol.2019.04.008 Crooke ST, Baker BF, Crooke RM, Liang XH (2021) Antisense technology: an overview and prospectus. Nat Rev Drug Discov 20(6):427–453. https://doi.org/10.1038/s41573-021-00162-z Di Giorgio A, Tran TP, Duca M (2016) Small-molecule approaches toward the targeting of oncogenic miRNAs: roadmap for the discovery of RNA modulators. Future Med Chem 8(7): 803–816. https://doi.org/10.4155/fmc-2016-0018 Dibrov SM, Ding K, Brunn ND, Parker MA, Bergdahl BM, Wyles DL et al (2012) Structure of a hepatitis C virus RNA domain in complex with a translation inhibitor reveals a binding mode reminiscent of riboswitches. Proc Natl Acad Sci U S A 109(14):5223–5228. https://doi.org/10. 1073/pnas.1118699109 Dibrov SM, Parsons J, Carnevali M, Zhou S, Rynearson KD, Ding K et al (2014) Hepatitis C virus translation inhibitors targeting the internal ribosomal entry site. J Med Chem 57(5):1694–1707. https://doi.org/10.1021/jm401312n Disney MD, Angelbello AJ (2016) Rational design of small molecules targeting Oncogenic noncoding RNAs from sequence. Acc Chem Res 49(12):2698–2704. https://doi.org/10.1021/ acs.accounts.6b00326 Disney MD, Liu B, Yang WY, Sellier C, Tran T, Charlet-Berguerand N et al (2012) A small molecule that targets r(CGG)(exp) and improves defects in fragile X-associated tremor ataxia syndrome. ACS Chem Biol 7(10):1711–1718. https://doi.org/10.1021/cb300135h Donlic A, Hargrove AE (2018) Targeting RNA in mammalian systems with small molecules. Wiley Interdiscip Rev RNA 9(4):e1477. https://doi.org/10.1002/wrna.1477 Donlic A, Morgan BS, Xu JL, Liu A, Roble Jr C, Hargrove AE (2018) Discovery of small molecule ligands for MALAT1 by tuning an RNA-binding scaffold. Angew Chem Int Ed Engl 57(40): 13242–13247. https://doi.org/10.1002/anie.201808823 Donlic A, Zafferani M, Padroni G, Puri M, Hargrove AE (2020) Regulation of MALAT1 triple helix stability and in vitro degradation by diphenylfurans. Nucleic Acids Res 48(14):7653–7664. https://doi.org/10.1093/nar/gkaa585 Ennifar E, Paillart JC, Bodlenner A, Walter P, Weibel JM, Aubertin AM et al (2006) Targeting the dimerization initiation site of HIV-1 RNA with aminoglycosides: from crystal to cell. Nucleic Acids Res 34(8):2328–2339. https://doi.org/10.1093/nar/gkl317 Falese JP, Donlic A, Hargrove AE (2021) Targeting RNA with small molecules: from fundamental principles towards the clinic. Chem Soc Rev 50(4):2224–2243. https://doi.org/10.1039/ d0cs01261k Ganser LR, Kelly ML, Herschlag D, Al-Hashimi HM (2019) The roles of structural dynamics in the cellular functions of RNAs. Nat Rev Mol Cell Biol 20(8):474–489. https://doi.org/10.1038/ s41580-019-0136-0 Guan L, Disney MD (2013) Covalent small-molecule-RNA complex formation enables cellular profiling of small-molecule-RNA interactions. Angew Chem Int Ed Engl 52(38):10010–10013. https://doi.org/10.1002/anie.201301639

1048

M. Bonnet and M. Duca

Gumireddy K, Young DD, Xiong X, Hogenesch JB, Huang Q, Deiters A (2008) Small-molecule inhibitors of microrna miR-21 function. Angew Chem Int Ed Engl 47(39):7482–7484. https:// doi.org/10.1002/anie.200801555 Gutschner T, Hammerle M, Diederichs S (2013) MALAT1 – a paradigm for long noncoding RNA function in cancer. J Mol Med (Berl) 91(7):791–801. https://doi.org/10.1007/s00109-0131028-y Haniff HS, Tong Y, Liu X, Chen JL, Suresh BM, Andrews RJ et al (2020) Targeting the SARSCoV-2 RNA genome with small molecule binders and ribonuclease targeting chimera (RIBOTAC) degraders. ACS Cent Sci 6(10):1713–1721. https://doi.org/10.1021/acscentsci.0c00984 Haniff HS, Liu X, Tong Y, Meyer SM, Knerr L, Lemurell M et al (2022) A structure-specific small molecule inhibits a miRNA-200 family member precursor and reverses a type 2 diabetes phenotype. Cell Chem Biol 29(2):300–311 e10. https://doi.org/10.1016/j.chembiol.2021.07.006 Hermann T (2016) Small molecules targeting viral RNA. Wiley Interdiscip Rev RNA 7(6): 726–743. https://doi.org/10.1002/wrna.1373 Howe JA, Wang H, Fischmann TO, Balibar CJ, Xiao L, Galgoci AM et al (2015) Selective smallmolecule inhibition of an RNA structural element. Nature 526(7575):672–677. https://doi.org/ 10.1038/nature15542 Howe JA, Xiao L, Fischmann TO, Wang H, Tang H, Villafania A et al (2016) Atomic resolution mechanistic studies of ribocil: a highly selective unnatural ligand mimic of the E. coli FMN riboswitch. RNA Biol 13(10):946–954. https://doi.org/10.1080/15476286.2016.1216304 Jahromi AH, Nguyen L, Fu Y, Miller KA, Baranger AM, Zimmerman SC (2013) A novel CUG (exp).MBNL1 inhibitor with therapeutic potential for myotonic dystrophy type 1. ACS Chem Biol 8(5):1037–1043. https://doi.org/10.1021/cb400046u Jefferson EA, Seth PP, Robinson DE, Winter DK, Miyaji A, Osgood SA et al (2004) Biaryl guanidine inhibitors of in vitro HCV-IRES activity. Bioorg Med Chem Lett 14(20): 5139–5143. https://doi.org/10.1016/j.bmcl.2004.07.066 Joly JP, Mata G, Eldin P, Briant L, Fontaine-Vive F, Duca M et al (2014) Artificial nucleobaseamino acid conjugates: a new class of TAR RNA binding agents. Chemistry 20(7):2071–2079. https://doi.org/10.1002/chem.201303664 Kanoh S, Rubin BK (2010) Mechanisms of action and clinical application of macrolides as immunomodulatory medications. Clin Microbiol Rev 23(3):590–615. https://doi.org/10.1128/ CMR.00078-09 Kim VN, Han J, Siomi MC (2009) Biogenesis of small RNAs in animals. Nat Rev Mol Cell Biol 10(2):126–139. https://doi.org/10.1038/nrm2632 Kumar S, Kellish P, Robinson Jr WE, Wang D, Appella DH, Arya DP (2012) Click dimers to target HIV TAR RNA conformation. Biochemistry 51(11):2331–2347. https://doi.org/10.1021/bi201657k Lee MM, Childs-Disney JL, Pushechnikov A, French JM, Sobczak K, Thornton CA et al (2009) Controlling the specificity of modularly assembled small molecules for RNA via ligand module spacing: targeting the RNAs that cause myotonic muscular dystrophy. J Am Chem Soc 131(47): 17464–17472. https://doi.org/10.1021/ja906877y Lee MK, Bottini A, Kim M, Bardaro Jr MF, Zhang Z, Pellecchia M et al (2014) A novel smallmolecule binds to the influenza A virus RNA promoter and inhibits viral replication. Chem Commun (Camb) 50(3):368–370. https://doi.org/10.1039/c3cc46973e Martin C, De Piccoli S, Gaysinski M, Becquart C, Azoulay S, Di Giorgio A et al (2020) Unveiling RNA-binding properties of verapamil and preparation of new derivatives as inhibitors of HIV-1 tat-TAR interaction. ChemPlusChem 85(1):207–216 Maucort C, Vo DD, Aouad S, Charrat C, Azoulay S, Di Giorgio A et al (2021) Design and implementation of synthetic RNA binders for the inhibition of miR-21 biogenesis. ACS Med Chem Lett 12(6):899–906. https://doi.org/10.1021/acsmedchemlett.0c00682 Meyer SM, Williams CC, Akahori Y, Tanaka T, Aikawa H, Tong Y et al (2020) Small molecule recognition of disease-relevant RNA structures. Chem Soc Rev 49(19):7167–7199. https://doi. org/10.1039/d0cs00560f

31

Chemical Tools to Target Noncoding RNAs

1049

Morgan BS, Sanaba BG, Donlic A, Karloff DB, Forte JE, Zhang Y et al (2019) R-BIND: an interactive database for exploring and developing RNA-targeted chemical probes. ACS Chem Biol 14(12):2691–2700. https://doi.org/10.1021/acschembio.9b00631 Naryshkin NA, Weetall M, Dakka A, Narasimhan J, Zhao X, Feng Z et al (2014) Motor neuron disease. SMN2 splicing modifiers improve motor function and longevity in mice with spinal muscular atrophy. Science 345(6197):688–693. https://doi.org/10.1126/science.1250127 Nguyen F, Starosta AL, Arenz S, Sohmen D, Donhofer A, Wilson DN (2014) Tetracycline antibiotics and resistance mechanisms. Biol Chem 395(5):559–575. https://doi.org/10.1515/ hsz-2013-0292 Park SJ, Kim YG, Park HJ (2011) Identification of RNA pseudoknot-binding ligand that inhibits the 1 ribosomal frameshifting of SARS-coronavirus by structure-based virtual screening. J Am Chem Soc 133(26):10094–10100. https://doi.org/10.1021/ja1098325 Parkesh R, Childs-Disney JL, Nakamori M, Kumar A, Wang E, Wang T et al (2012) Design of a bioactive small molecule that targets the myotonic dystrophy type 1 RNA via an RNA motifligand database and chemical similarity searching. J Am Chem Soc 134(10):4731–4742. https:// doi.org/10.1021/ja210088v Parsons J, Castaldi MP, Dutta S, Dibrov SM, Wyles DL, Hermann T (2009) Conformational inhibition of the hepatitis C virus internal ribosome entry site RNA. Nat Chem Biol 5(11): 823–825. https://doi.org/10.1038/nchembio.217 Paul R, Dutta D, Paul R, Dash J (2020) Target-directed azide-alkyne cycloaddition for assembling HIV-1 TAR RNA binding ligands. Angew Chem Int Ed Engl 59(30):12407–12411. https://doi. org/10.1002/anie.202003461 Prado S, Beltran M, Moreno A, Bedoya LM, Alcami J, Gallego J (2018) A small-molecule inhibitor of HIV-1 Rev function detected by a diversity screen based on RRE-Rev interference. Biochem Pharmacol 156:68–77. https://doi.org/10.1016/j.bcp.2018.07.040 Ratni H, Scalco RS, Stephan AH (2021) Risdiplam, the first approved small molecule splicing modifier drug as a blueprint for future transformative medicines. ACS Med Chem Lett 12(6): 874–877. https://doi.org/10.1021/acsmedchemlett.0c00659 Rzuczek SG, Park H, Disney MD (2014) A toxic RNA catalyzes the in cellulo synthesis of its own inhibitor. Angew Chem Int Ed Engl 53(41):10956–10959. https://doi.org/10.1002/anie. 201406465 Rzuczek SG, Colgan LA, Nakai Y, Cameron MD, Furling D, Yasuda R et al (2017) Precise smallmolecule recognition of a toxic CUG RNA repeat expansion. Nat Chem Biol 13(2):188–193. https://doi.org/10.1038/nchembio.2251 Seth PP, Miyaji A, Jefferson EA, Sannes-Lowery KA, Osgood SA, Propp SS et al (2005) SAR by MS: discovery of a new class of RNA-binding small molecules for the hepatitis C virus: internal ribosome entry site IIA subdomain. J Med Chem 48(23):7099–7102. https://doi.org/10.1021/ jm050815o Shi Z, Zhang J, Qian X, Han L, Zhang K, Chen L et al (2013) AC1MMYR2, an inhibitor of dicermediated biogenesis of Oncomir miR-21, reverses epithelial-mesenchymal transition and suppresses tumor growth and progression. Cancer Res 73(17):5519–5531. https://doi.org/10.1158/ 0008-5472.CAN-13-0280 Spizek J, Rezanka T (2017) Lincosamides: chemical structure, biosynthesis, mechanism of action, resistance, and applications. Biochem Pharmacol 133:20–28. https://doi.org/10.1016/j.bcp. 2016.12.001 Staedel C, Tran TPA, Giraud J, Darfeuille F, Di Giorgio A, Tourasse NJ et al (2018) Modulation of oncogenic miRNA biogenesis using functionalized polyamines. Sci Rep 8(1):1667. https://doi. org/10.1038/s41598-018-20053-5 Stelzer AC, Frank AT, Kratz JD, Swanson MD, Gonzalez-Hernandez MJ, Lee J et al (2011) Discovery of selective bioactive small molecules by targeting an RNA dynamic ensemble. Nat Chem Biol 7(8):553–559. https://doi.org/10.1038/nchembio.596 Sztuba-Solinska J, Shenoy SR, Gareiss P, Krumpe LR, Le Grice SF, O’Keefe BR et al (2014) Identification of biologically active, HIV TAR RNA-binding small molecules using small molecule microarrays. J Am Chem Soc 136(23):8402–8410. https://doi.org/10.1021/ja502754f

1050

M. Bonnet and M. Duca

Velagapudi SP, Seedhouse SJ, Disney MD (2010) Structure-activity relationships through sequencing (StARTS) defines optimal and suboptimal RNA motif targets for small molecules. Angew Chem Int Ed Engl 49(22):3816–3818. https://doi.org/10.1002/anie.200907257 Velagapudi SP, Gallo SM, Disney MD (2014) Sequence-based design of bioactive small molecules that target precursor microRNAs. Nat Chem Biol 10(4):291–297. https://doi.org/10.1038/ nchembio.1452 Velagapudi SP, Cameron MD, Haga CL, Rosenberg LH, Lafitte M, Duckett DR et al (2016) Design of a small molecule against an oncogenic noncoding RNA. Proc Natl Acad Sci U S A 113(21): 5898–5903. https://doi.org/10.1073/pnas.1523975113 Vicens Q, Mondragon E, Reyes FE, Coish P, Aristoff P, Berman J et al (2018) Structure-activity relationship of Flavin analogues that target the Flavin mononucleotide riboswitch. ACS Chem Biol 13(10):2908–2919. https://doi.org/10.1021/acschembio.8b00533 Vo DD, Tran TP, Staedel C, Benhida R, Darfeuille F, Di Giorgio A et al (2016) Oncogenic MicroRNAs biogenesis as a drug target: structure-activity relationship studies on new aminoglycoside conjugates. Chemistry 22(15):5350–5362. https://doi.org/10.1002/chem. 201505094 Vo DD, Becquart C, Tran TPA, Di Giorgio A, Darfeuille F, Staedel C et al (2018) Building of neomycin-nucleobase-amino acid conjugates for the inhibition of oncogenic miRNAs biogenesis. Org Biomol Chem 16(34):6262–6274. https://doi.org/10.1039/c8ob01858h Waheed AA, Freed EO (2012) HIV type 1 gag as a target for antiviral therapy. AIDS Res Hum Retrovir 28(1):54–75. https://doi.org/10.1089/AID.2011.0230 Wang W, Preville P, Morin N, Mounir S, Cai W, Siddiqui MA (2000) Hepatitis C viral IRES inhibition by phenazine and phenazine-like molecules. Bioorg Med Chem Lett 10(11): 1151–1154. https://doi.org/10.1016/s0960-894x(00)00217-1 Warner KD, Hajdin CE, Weeks KM (2018) Principles for targeting RNA with drug-like small molecules. Nat Rev Drug Discov 17(8):547–558. https://doi.org/10.1038/nrd.2018.93 Wilson DN (2014) Ribosome-targeting antibiotics and mechanisms of bacterial resistance. Nat Rev Microbiol 12(1):35–48. https://doi.org/10.1038/nrmicro3155 Wilson DN, Schluenzen F, Harms JM, Starosta AL, Connell SR, Fucini P (2008) The oxazolidinone antibiotics perturb the ribosomal peptidyl-transferase center and effect tRNA positioning. Proc Natl Acad Sci U S A 105(36):13339–13344. https://doi.org/10.1073/pnas.0804276105 Wong CH, Nguyen L, Peh J, Luu LM, Sanchez JS, Richardson SL et al (2014) Targeting toxic RNAs that cause myotonic dystrophy type 1 (DM1) with a bisamidinium inhibitor. J Am Chem Soc 136(17):6355–6361. https://doi.org/10.1021/ja5012146 Yamagami R, Sieg JP, Bevilacqua PC (2021) Functional roles of chelated magnesium ions in RNA folding and function. Biochemistry 60(31):2374–2386. https://doi.org/10.1021/acs.biochem. 1c00012 Yang WY, Gao R, Southern M, Sarkar PS, Disney MD (2016a) Design of a bioactive small molecule that targets r(AUUCU) repeats in spinocerebellar ataxia 10. Nat Commun 7:11647. https://doi.org/10.1038/ncomms11647 Yang WY, He F, Strack RL, Oh SY, Frazer M, Jaffrey SR et al (2016b) Small molecule recognition and tools to study modulation of r(CGG)(exp) in fragile X-associated tremor ataxia syndrome. ACS Chem Biol 11(9):2456–2465. https://doi.org/10.1021/acschembio.6b00147 Ye F, Haniff HS, Suresh BM, Yang D, Zhang P, Crynen G et al (2022) Rational approach to identify RNA targets of natural products enables identification of nocathiacin as an inhibitor of an oncogenic RNA. ACS Chem Biol 17(2):474–482. https://doi.org/10.1021/acschembio.1c00952 Zafferani M, Hargrove AE (2021) Small molecule targeting of biologically relevant RNA tertiary and quaternary structures. Cell Chem Biol 28(5):594–609. https://doi.org/10.1016/j.chembiol. 2021.03.003 Zhang L, Abendroth F, Vazquez O (2022) A chemical biology perspective to therapeutic regulation of RNA splicing in spinal muscular atrophy (SMA). ACS Chem Biol 17(6):1293–1307. https:// doi.org/10.1021/acschembio.2c00161

Targeting DNA Junctions with Small Molecules for Therapeutic Applications in Oncology

32

Joanna Zell and David Monchaud

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structural Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biological and Pathological Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DNA Junction-Targeting Anticancer Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Targeting TWJs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Targeting FWJs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1052 1053 1055 1056 1056 1066 1069 1070

Abstract

DNA junctions exist at the branch point of three or four DNA duplexes (dsDNA or B DNA) in hairpin and cruciform structures. These structures occur when repeated dsDNA sequences open up to expose single-stranded DNA (ssDNA), which then folds upon itself to form an intramolecular hairpin. Junctions are thus formed during DNA transactions, i.e., when the dsDNA is being replicated, transcribed, or repaired. Three-way junctions (TWJs) and four-way junctions (FWJs) can encapsulate small molecules, termed ligands, which stabilize the non-B DNA structural motif. In vitro assays employ this stabilization effect to identify junction-binding small molecules. TWJ-binding molecules have C3 symmetry, are approximately 10 Å in diameter, and contain aromatic and positively charged chemical groups; FWJ-binding ligands are often larger with similar chemical motifs and C2 symmetry. We describe here the discovery of junctionbinding molecules, culminating in those which show exceptional in vitro binding and promising in cellulo properties. Ligands able to stabilize DNA junctions in cells hinder DNA transactions and thus induce a DNA damage response (DDR), leading to cytotoxicity. This approach is cancer-selective as cancer cells are J. Zell · D. Monchaud (*) Institut de Chimie Moléculaire de l’Université de Bourgogne, ICMUB CNRS, Dijon, France e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_37

1051

1052

J. Zell and D. Monchaud

particularly sensitive to DNA damage due to their impaired DDR mechanisms. Recently, these ligands were incorporated in synthetic lethality strategies, demonstrating the enormous progress that the field of junction targeting has made in only 30 years, which should inspire chemical biologists in the pursuit of more specific ligands and techniques to characterize their molecular mechanism. Keywords

DNA junction · DNA repeats · Ligand · DNA damage · DNA damage response

Introduction In every cell nucleus, DNA is compacted into nucleosomes for long-term storage in the form of a double-stranded right-handed B-helix (or duplex-DNA, dsDNA) (Bonev and Cavalli 2016). In this state, dsDNA is wrapped around histones, providing a high degree of compaction and stability. However, for the DNA code to be read, during DNA transactions (i.e., DNA replication and DNA-to-RNA transcription), the dsDNA must be opened up. This provides the opportunity for the locally single-stranded DNA (ssDNA) to fold upon itself and adopt various 3D structures, due to the strong propensity of nucleobases (A for adenine, C for cytosine, G for guanine, and T for thymine) to associate and self-associate. The duplex melting that is mediated by DNA transactions thus promotes the formation of alternative DNA structures (Zell et al. 2021a). These alternative folds represent roadblocks to polymerase motion along the genomic DNA, which are recognized as DNA damage and trigger the activation of the DNA damage response (DDR) machinery (Jackson and Bartek 2009). Genomic errors and instability are thus more frequent at DNA transactions due to non-B structures (Wang and Vasquez 2006). Given that repeated sequences constitute over half of the human genome (Treangen and Salzberg 2012), it is unsurprising that DNA repeats have been studied for their ability to fold into alternative DNA structures and were found responsible for a large part of them. It has also been shown that their structure depends on the sequences involved (Khristich and Mirkin 2020): repeated sequences can be characterized as direct repeats (..ATAG..ATAG..), mirror repeats (..GCAT..TACG..), inverted repeats (..CGTT..AACG..) (where the second half is reverse complementary to the first half), short tandem repeats (GAC)n, etc. Each of these repeats leads to the formation of a particular non-B structure when dsDNA is unwound into ssDNA. The most studied example is undoubtedly G-quadruplexes (G4s) that fold from direct repeats rich in guanine (G) and in which Gs are gathered in G-runs (e.g., G3..G3..G3.. G3) (Hänsel-Hertsch et al. 2017). Besides G4s, C-quadruplexes (or i-motifs, iMs, which fold from C-rich direct repeats), three-way DNA junctions (TWJs, which fold from direct repeats), and four-way DNA junction structures (FWJs, which fold from inverted repeats) are being actively studied (Fig. 1). Helical supercoiling, or superhelical stress, when DNA is twisted out of a more energetically stable structure, occurs when the DNA is being processed by enzymes

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1053

Fig. 1 Example of naturally occurring repeated DNA sequences and their folding into alternative DNA structures

such as polymerases advancing along their genomic substrate. This supercoiling is normally dealt with by enzymes such as helicases and topoisomerases (TOPs) which relax the twisted DNA. Supercoiling is known to facilitate non-B structure formation, such as TWJ and FWJ structures (Vologodskii et al. 1979). One early study showed that negatively supercoiled plasmids form a cruciform structure by showing that the hairpin (HP) loops thus formed were cleaved by single-strand-specific nucleases, while more topologically relaxed plasmids did not fold into HP structures and were thus resistant to nuclease activity (Panayotatos and Wells 1981). The amount of research on these repeated sequences, and on the helical structures that fold from them, is now substantial. Structural studies are numerous and biostatistical data mining has greatly improved our understanding of their biological roles. Given that several of the accompanying chapters are devoted to G4s, in this chapter we focus on what has been established about the structure and likely occurrence of both TWJs and FWJs; we next provide a glimpse into their biological roles and discuss the small molecules that bind to DNA junctions (TWJ and FWJ ligands), with a focus on those designed and used for therapeutic applications in the field of oncology.

Structural Studies Both TWJs and FWJs form when a ssDNA folds into a HP structure (one HP for TWJ, two for FWJ), which can contain several nucleotides between the repeats that form the HP loop, from none (like the HPs of the FWJ seen in Fig. 2a, lower panel) to

1054

J. Zell and D. Monchaud

dozens (like the 2 HPs of the FWJ seen in Fig. 1c, with two 3-nt loops (GGA and TCC loops)). Junction structures can be fully hybridized, like the HPs of the FWJ seen in Fig. 1c, 2a and 2b, or with mismatches like the HP of the TWJ seen in Fig. 1b containing two TG mismatches that form a bulge. Similarly, junctions can be perfectly or imperfectly complimentary at the junction point (also known as branch point) due to mismatched nucleotides or misalignment. As schematically represented in Fig. 2b, the fully base-paired TWJ is termed 3H (for three helices), and the fully base-paired FWJ is termed 4H. Naming of imperfect junctions with bulges at the branch point is also codified, such as 3HS2 for a TWJ with two unpaired nucleotides at the branch point (Fig. 2b) or 2HS2HS1 for a TWJ with two unpaired nucleotides on one side of the branch point, and one unpaired nucleotide on another side (Lilley 2000). These two features (bulges and loops) explain the great structural diversity of DNA junctions. For pure truncated oligonucleotides in solution, the helical arms can be splayed out from the junction point, in a flat open shape, with C3- or C4-symmetry for TWJs and FWJs, respectively, since negatively charged duplex arms repulse each other in low salt concentrations. However, helix arms of FWJs can also adopt a pairwise coaxially stacked structure (Fig. 2c) in the presence of divalent metal ions (such as the biologically relevant Mg2+), while retaining complete base pairing. Synthetic TWJs with a bulge of at least two single-stranded nucleotides at their branch point are flexible enough to undergo coaxial stacking as well (Wu et al. 2004). FWJs can undergo branch migration if the sequence is homologous in helix arms on opposite sides of the junction (Fig. 2d). Over the past years, a series of in vitro experiments performed with synthetic oligonucleotides has outlined the fundamental principles of DNA junction structure and stability relationships. Invaluable insights into their structures have been obtained notably through crystallographic studies (e.g., the FWJ that folds from the inverted repeat sequence C2G2TAC2G2, Fig. 3a) (Eichman et al. 2000) or nuclear

Fig. 2 FWJs without or with unpaired nucleotides in the loops (a). Nomenclature for TWJ and FWJ with perfect or imperfect branch points (b). Schematic representation of the coaxial stacking (c) and branch migration of a FWJ (d). Created with BioRender.com

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1055

Fig. 3 Structures and sequences of a FWJ (a) or TWJs (either a trimolecular (b) or intramolecular TWJ (c)) elucidated by crystallography (a) or NMR studies (b, c) (PDB ID: 1DCW (A), 1EKW (D) and 1EZN (C)); arrows indicate unpaired nucleobases.

magnetic resonance (NMR) investigation (e.g., the two 3HS2 structures of an intermolecular TWJ with an unmatched TC (strand 1, GCTGC2AC2G; strand 2, CG2TGCGTC2; strand 3, G2ACGTCGCAGC; italic for unpaired nucleobases, indicated by arrows in Fig. 3b)) (Thiviyanathan et al. 1999) and of an intramolecular TWJ with an unmatched TT at the branch point (CGTGCAC3GCT2GCG2CGACT2GTCGTTGTGCACG; italic for unpaired nucleobases, indicated by arrows in Fig. 3c) (van Buuren et al. 2000).

Biological and Pathological Functions As indicated above, repeated sequences are widespread in the human genome, with a median occurrence of ca. 18/100 kb for direct repeats and ca. 206/100 kb for inverted repeats (versus ca. 8/100 kb for G4s) (Georgakopoulos-Soares et al. 2018). Their distribution is non-random in the human genome, as they are significantly more present in intergenic regulatory regions, centromeric regions, and replication origins, indicating that these structures have endogenous, non-pathological functions. However, most of the biological information about DNA junctions has been gained in the field of neuropathologies (Khristich and Mirkin 2020). Indeed, repeated sequences are responsible for a large family of hereditary diseases known as REDs (for repeat expansion diseases). In these pathologies, repeat sequences are expanded (the number of repeats in the genome increases with replication), leading to aggravated disease symptoms in later generations with higher copy number. Formation of DNA structures (schematically represented in Fig. 1) indeed provides a window of opportunity to expand the repeat sequence, i.e., when the NOP56 gene is synthesized (black strand, Fig. 1b), it will be extended by 3 repeat units in the next cycle of replication due to HP formation. This has been particularly studied for well-known diseases such as the spinocerebellar ataxia 36 (SCA36, or Costa da Morte ataxia), in which the number of GGCCTG repeats in the NOP56 gene on χ20 (Fig. 1) ranges from 6–14 (healthy genome) to 200–650 repeats (Kobayashi et al. 2011); Pick’s and Charcot’s diseases (or frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS), frequently gathered under a single name, FTD/ALS, due to considerable symptom overlap), caused by the expansion of GGGGCC repeats in the C9ORF72 locus of χ9 from 2 to 1000 units (Renton et al. 2011); and Huntington’s

1056

J. Zell and D. Monchaud

disease, caused by (CAG)n repeats in the HTT gene on χ4 when the number of repeats is over 40 (Walker 2007). In oncology, TWJs and FWJs are just beginning to be considered as important players. Repeated sequences are significantly more present at fragile sites (sites of DNA breakage leading to mutation) (Kaushal and Freudenreich 2019) in oncogenes, thus linking these structures to cancer induction. Also, cancer mutation sites are twice as likely to be located close to repeated sequences (Georgakopoulos-Soares et al. 2018). As indicated above (and further detailed hereafter), DNA junctions are mainly studied for their role as roadblocks to polymerase motion, which creates DNA damage and triggers DNA damage response (DDR) signalling and repair and can lead to mutation. As a representative example, it has been demonstrated that replication fork stall sites are statistically more likely to be found close to repeated sequences. This is why an important chemical biology effort is currently being invested into the design, synthesis, and use of DNA junction-specific ligands, in order to foster non-B DNA structure-mediated DNA damage (del Mundo et al. 2019). This novel strategy relies on the observation that cancer cells commonly display flawed DDR machinery, making them more sensitive to DNA-damaging agents than healthy cells. To date, most efforts have been focused on the targeting of TWJs for anticancer applications. FWJ targeting has emerged more recently in this therapeutic area, while it has been more thoroughly studied for antibacterial approaches. These strategies, although elegant, are only briefly described in the following section, as this chapter focuses on the use of DNA junction-targeting agents in oncology.

DNA Junction-Targeting Anticancer Agents The field of DNA junction-targeting small molecules was pioneered by N. Kallenbach, through a series of in vitro experiments aiming at both understanding the stability and dynamics of DNA junctions and studying the interactions of these structures with small molecules (Lu et al. 1992). These investigations were limited to organic dyes (e.g., ethidium bromide, Stains-all) but they provided a proof-ofconcept that small organic molecules could have high affinity for higher-order DNA structures, inspiring further attempts to discover DNA junction-targeting agents.

Targeting TWJs The first application was analytical: a series of TWJ-forming DNA aptamers, identified by in vitro selection from a library of random sequence oligonucleotides, was used to bind to drugs (e.g., cocaine) (Stojanovic et al. 2000) (Fig. 4a) and steroids (Stojanović et al. 2003). In the early experimental setup, the bimolecular TWJ-forming sensor was equipped with a fluorophore on one strand and its fluorescence resonance energy transfer (FRET)-quencher on the other strand. The FRETlabelled oligonucleotide folds into a TWJ upon interaction with its target (cocaine/

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1057

Fig. 4 (a) Schematic representation of a bimolecular TWJ-forming aptamer used for fluorescencebased detection of cocaine. (b) Crystal structure of Hannon’s metallosupramolecular helicate [Fe2L3]4+ within a TWJ

steroid), thus bringing together FRET partners and quenching fluorescence emission, so that ligand binding can be monitored by fluorescence variations. The TWJ-forming DNA aptamer was validated as a cocaine sensor with a detection limit of 10 μM in vitro. This FRET strategy, which is still commonly used to assess ligand binding affinities, was next complemented by an alternative technique, relying on the use of an organic dye, the cyanine diethylthiotricarbocyanine iodide (DT2C2), which is displaced from the TWJ cavity by cocaine. This fluorescent intercalator displacement (FID) assay was not more sensitive but demonstrated the implementation of a visible colorimetric cocaine-sensing assay (Stojanovic and Landry 2002). This technique was recently revisited to make it practically relevant, replacing DT2C2 with either the naphthyridine ATMND (Roncancio et al. 2014) or the merocyanine FPhOBtz (Van Riesen et al. 2021), which permitted the detection of cocaine in bodily fluids (with a detection limit of ca. 10 μM in saliva for the former, 20 μM in serum for the latter). A major advancement in the field of DNA junction ligands was the publication of a crystal structure of a small molecule bound within the hexanucleotide sequence CGTACG, which self-organized into a trimolecular TWJ (Oleksi et al. 2006). This small molecule, a metallosupramolecular helicate [Fe2L3]4+ (Fig. 4b) synthesized by Hannon et al., is formed of three long aromatic organic ancillary ligands L that coil around two iron(II) ions. This cylindrical helicate, like other known major groove binders of dsDNA, is of similar size to zinc finger motifs, crucial in many DNA-binding proteins, with a ~1 nm diameter (and a length of ~2 nm, which corresponds to a volume of ~1.5  1024 L), which is in line with the volume of the TWJ cavity (considered to be in the yoctoliter scale, 1024 L) (Hansen et al. 2009). The group had set out to characterize the helicate’s major groove-binding properties, and the structure of the TWJ/ligand complex was serendipitously resolved. This provided a new outlook on DNA-binding modes, as it detailed the

1058

J. Zell and D. Monchaud

Fig. 5 (a) Structure of TACN-Q and of a series of triptycene derivatives studied as TWJ ligands. (b) Schematic representation of the isothermal, competitive oligonucleotide inhibition assay developed to assess the affinity of triptycenes for TWJ through fluorescence quenching

positions of electrostatic and π-stacking interactions between the ligand and the TWJ cavity walls. Helicates can exist as two enantiomers, M and P, depending on the way in which the ancillary ligands L twist around the central axis. After separation of Mand P-enantiomers, the TWJ stabilization effect of the helicate was further characterized by NMR (Cerasino et al. 2007) and polyacrylamide gel electrophoresis (PAGE) (Malina et al. 2007), the latter studies showing that the M-enantiomer stabilized the TWJ better than the P-enantiomer; that both enantiomers could stabilize perfect TWJs irrespective of which nucleotides are at the junction point, just as well as imperfect TWJs with two unpaired nucleotides (AA or TT) at the junction point. The antiproliferative properties of the helicate [Fe2L3]4+ were next evaluated in both cancer and non-cancer human cells. The collected results showed cell growth inhibition, cell cycle blockage and apoptosis, but no induction of DNA damage (Hotze et al. 2008). The authors conclude that the cellular effects of this new class of junction-targeting agents could be distinct to other known DNA-targeting molecules. However, it was not shown whether this activity was through TWJ- or duplex major groove-binding, or both. The helicate’s interactions with other DNA motifs was also later demonstrated (Brabec et al. 2013). The analogous and more stable [Ru2L3]4+ helicate was used to block the common polymerase chain reaction (PCR) in vitro, indicating that helicates can block DNA transactions on duplex DNA (Ducani et al. 2010). By synthetic addition of six arginine moieties at both ends of the three organic ancillary ligands L in [Fe2L’3]4+, the helicate’s TWJ stabilization in vitro increased, and its cytotoxicity increased from IC50 ¼ 14 to 6 μM in cisplatinsensitive A2780 ovarian cancer cells (Cardo et al. 2011). Purely organic compounds were also developed and used to target TWJs. The triazacyclononane-quinoline TACN-Q (Fig. 5a) (Vuong et al. 2012) is a polyazamacrocycle-based molecule of ca. 10 Å in diameter with C3-symmetry,

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1059

thus suited to fit within the TWJ cavity. TACN-Q binds specifically to TWJs, although with a rather weak affinity, but with an exquisite selectivity as it shows no interactions with duplex or G-quadruplex (G4) DNA structures. Its TWJ-binding properties were quantified by two in vitro assays: UV-melting assay, in which TACN-Q stabilized unlabelled trimolecular TWJ with a ΔTm value of 5.6  C (which reflects the apparent affinity); and FRET-melting assay, in which it stabilized a fluorescently labelled trimolecular TWJ with ΔT1/2 of 4.9  C. This latter assay is interesting as it can be performed in a competitive manner, in the presence of an excess of unlabelled dsDNA (ds26, up to 20 molar equivalents with reference to the labelled TWJ); these conditions did not hinder the TWJ stabilization, indicating a strong preference for the TWJ central cavity over dsDNA. Interestingly, when the TACN macrocycle chelates metallic ions (Cu2+, Fe2+ or Zn2+), the TWJ-stabilizing properties of the resulting metallo-organic complexes were slightly reduced as compared to TACN-Q, highlighting that the structural flexibility (adaptability) of TACN-Q, enabling it to encapsulate other labile metal ions in the buffered solution such as Li+, is important for TWJ binding. The specificity of TACN-Q was further demonstrated by FID studies showing that it efficiently displaced the cavity-binding dye DT2C2 (vide supra) but not thiazole orange (TO) bound to dsDNA. TACN-Q was found to be active against B16-F10 melanoma cells, with an IC50 of 10.5 μM (Novotna et al. 2015). Owing to their C3-symmetry and nonplanar π-surface, a series of triptycene (Trip) derivatives (Fig. 5a) was also assessed for TWJ affinity through a panel of biophysical measurements including UV and circular dichroism (CD) melting, isothermal FRET-based assay, and PAGE (Barros and Chenoweth 2014). UV-melting experiments gave ΔTm values of 28.5, 26.3, and 18.5  C for Trip 1–3, respectively, indicating that a longer arm length and a greater positive charge are advantageous for TWJ affinity. An isothermal assay was developed, based on bimolecular system involving a TWJ-forming oligonucleotide strand and a short strand complementary to the 50 -end of the TWJ (Fig. 5b). Kd values calculated herein for Trip 1–3 confirmed the trend found by UV melting, with Kd ¼ 0.22, 0.40, and 5.5 μM for Trip 1–3, respectively. In human ovarian carcinoma A2780 cells, Trip 1 showed toxicity similar to cisplatin and higher toxicity in cisplatin-resistant A2780cis cells. Trip 2 showed even higher toxicity in both cell lines, and Trip 3 showed very little toxicity. This could be explained by the finding that Trip 2 showed higher cellular uptake than the other derivatives. In a closely following study, amino acidfunctionalized Trip 5 and 6 were shown to stabilize the TWJ-forming oligonucleotide containing (CAG)n trinucleotide repeats, but not Trip 4. Trip 5 and 6 efficiently displaced the complementary duplex strand in an experimental system analogous to the previous one, indicating a high TWJ affinity, with Kd of 0.05 and 2.36 μM, for Trip 5 and 6, respectively (Barros and Chenoweth 2015). Inspired by the results obtained with the supramolecular helicates described above, helical metallopeptides with pseudo-C3 symmetry were also studied as TWJ ligands, as they present the advantage of yielding enantiomerically pure products from chiral amino acid starting materials. 2,20 -Bipyridine (bipy) was

1060

J. Zell and D. Monchaud

chosen as a highly efficient divalent metallic ion (M2+)-chelator to be incorporated into an amino acid chain. Vázquez et al. synthesized an amino acid-functionalized bipy analogue for solid phase peptide synthesis, to afford a helicate-forming peptide chain containing six bipy monomers interspaced by two [(D/L)-Pro]-Gly linkages (Fig. 6a) (Gamba et al. 2014). Two [D-Pro]-Gly or two [L-Pro]-Gly links are crucial for inducing the β-turn, allowing the peptide to fold into a cylindrical dimetallic enantiomer, rather than extended aggregates. The LL-helicate and DD-helicate, i.e., the M- and P-enantiomers, were structurally and topologically characterized, showing symmetric CD signatures. Then, these metallopeptides were synthesized with a fluorescence tag in order to characterize their binding to a TWJ-forming oligonucleotide by fluorescence anisotropy: the LL-helicate showed much greater binding constants than the DD-helicate, with Kd of 0.25 and 37.6 μM, respectively. Fluorescent helical metallopeptides were also used as molecular probes in Vero cells, and optical imaging investigations indicated that these ligands accumulate in endosomes; however, in this study traffic to the nucleus, where the DNA could be targeted, was not mentioned. The same group then replaced the hydrophobic Pro-Gly regions by positively charged residues to improve solubility and DNA binding and which had mixed chirality to permit the β-turn: L-Arg–L-Pro–D-Arg and D-Arg–D-Pro–L-Arg (Fig. 6a, b) (Gómez-González et al. 2021). The LLD- and DDL-enantiomers were complexed with Fe2+ ions, or with less labile Co3+ ions following oxidation from Co2+. This time, the TWJ-forming oligonucleotide was labelled with a fluorophore, prior to monitoring Fe(II)2-LLD-helicate binding by fluorescence anisotropy, leading to the determination of a Kd of 0.45 μM, while Co(II)2-LLD-helicate displayed a Kd of 7.9 μM. The reduced affinity of the Co3+ helicate for TWJ compared to the Fe2+ helicate was confirmed by PAGE experiments, where the Co3+ formed aggregates with dsDNA. HeLa cells treated with rhodamine-labelled Rho-Fe(II)2-LLD-helicate showed endosomal staining as previously described; however, when cells were pre-treated with a permeabilizing saponin variant Digitonin, the helicate’s fluorescence was observed in both the cytoplasm and the nucleus (Fig. 6c). The nuclear

Fig. 6 (a) Helical metallopeptides in which two bipy units chelate divalent metallic ions (Fe2+ or Co2+) with either [(D/L)-Pro]-Gly linkages (top structure) or L-Arg–L-Pro–D-Arg (lower structure). (b) Docked structure of the Fe(II)2LDD peptide in interaction with a TWJ. (c) Fluorescence microscopy images of HeLa cells expressing GFP-PCNA protein (green) and treated with Fe(II)2TAMRA-LLD (red); arrows indicate colocalized foci (scale bar: 5 μm)

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1061

staining was mainly observed in the nucleoli and partially colocalized with a GFPfused proliferating cell nuclear antigen (PCNA) (Moldovan et al. 2007): PCNA being a marker of DNA replication foci, these images provided a convincing demonstration of the accumulation of ligand-bound TWJs at replication fork sites (Gómez-González et al. 2021). Despite their appealing synthetic and in vitro properties, metallo-organic helicates are poorly cell permeable, which compromises their therapeutic potential. The same conclusion was drawn for a family of supramolecular ruthenium complexes that showed interesting TWJ-binding properties in vitro (Duskova et al. 2019) but could not be further studied in cellulo for permeability issues. This led the same research group to focus on small organic compounds, on the basis of their improved cell permeability. Among them, azacryptands were found to be invaluable molecular tools to study the biological consequences of TWJ stabilization in cells. Azacryptands were initially developed to bind ions in their “crypt-like” cavities formed of four nitrogen atoms at each end of the molecule (Fig. 7a) (Dietrich et al. 1989). Given their global charges (4 to 5 positive charges at physiological pH), their large aromatic surfaces organized with a C3 symmetry, azacryptands indeed display ideal properties for TWJ binding. The prototype DNA junction-binding azacryptand was the acridine-containing TrisA (Fig. 7a). This compound was reported to stabilize an imperfect HP structure formed in trinucleotide repeat sequences (CNG)7 (N ¼ C, A, T) (Amrane et al. 2008). This sequence was labelled with FRET partners and used in FRET-melting experiments, which showed that TrisA was able to thermally stabilize the HP with a ΔT1/2 of 17.5  C. However, the presence of an excess of unlabelled dsDNA triggered a loss of stabilization (ΔT1/2 ¼ 12.8  C) indicating a rather weak TWJ specificity. Owing to

Fig. 7 (a) Structure of the azacryptands TrisA, TrisPOB, TrisNP, and the clickable analogue TrisNP-α. (b) Molecular dynamics (MD) simulation of TrisNP in interaction with a TWJ. (c) Immunodetection of DNA damage in MCF7 cells non-treated or treated with TrisPOB (9 μM) and TrisNP (22 μM) prior to γH2AX immunolabelling (double-strand break markers) and DAPI nuclear staining (scale bar: 5 μm)

1062

J. Zell and D. Monchaud

these promising properties, the TWJ-binding properties of a series of azacryptands were thoroughly investigated via a panel of biophysical techniques, notably by competitive FRET-melting assays similar to those performed with TACN-Q, in order to identify more TWJ-specific molecules by using both G4 and dsDNA competitors (Novotna et al. 2015). The first series of results confirmed the promising TWJ-interacting properties of azacryptands, the best candidates being TrisPOB, TrisNP, 3,30 -TrisBP, and 4,40 -TrisBP (Fig. 7a), with ΔT1/2 of 16.8, 17.7, 14.6, and 11.9  C, respectively (versus 3.8  C for TACN-Q) (Duskova et al. 2020). Despite azacryptands showing a significant G4 interaction, they showed preference for TWJ even in the presence of a large excess (50 mol. equiv.) of G4-DNA (TWJ stabilization is maintained at >60%) and of dsDNA (TWJ stabilization is maintained at >80%). These results were confirmed by a battery of in vitro techniques including the TWJ screen (Guyon et al. 2018), the PAGE experimental protocol developed by Hannon and Brabec (Malina et al. 2007), FID experiments (performed with fluorescent dye TO-PRO-3), CD and mass spectrometry investigations (to gain insights into the binding mode and stoichiometry), and microdialysis equilibrium (to quantify the differential affinity for a panel of secondary DNA structures). All these techniques concurred in demonstrating the very good TWJ-interacting properties of azacryptands, their exquisite selectivity over dsDNA and their preferential association with TWJ in mixtures of TWJs and G4s. A more accurate representation of the TWJ/azacryptand complexes was also provided by in silico investigations (Zell et al. 2021b): molecular dynamics (MD) simulations showed that TrisNP binds snugly in the TWJ cavity (Fig. 7b), thanks to well-defined π-stacking interactions between its aromatic arms and the nucleobases that form the cavity walls. These new TWJ ligands were next used to investigate the cellular consequences of TWJ stabilization. The best candidates displayed stronger antiproliferative properties than TACN-Q, with IC50 of 3.4 and 3.5 μM against B16-F10 melanoma cells for 2,7-TrisNP and 3,30 -TrisBP, respectively (versus 10.5 μM for TACN-Q), and IC50 values as low as 1.3, 0.93, and 5.8 μM in MCF7 breast cancer cells for 3,30 -TrisBP, TrisPOB, and TrisNP, respectively. More importantly, their good bioavailability and cell permeability properties opened unprecedented prospects for chemical biology investigations. Indeed, at this point, the theory that TWJ stabilisation could cause DNA damage remained speculative. Azacryptands were thus studied in cells for their ability to induce DNA damage (Duskova et al. 2020): this was quantified by immunodetection of the H2AX histone phosphorylated on serine 139 (termed γH2AX) (Bonner et al. 2008), which is an established marker of double-strand breaks (DSBs), the most lethal type of DNA damage. Azacryptands were found to trigger massive DNA damage after very short incubations (4 h), with more than 60% of treated MCF7 cells displaying high levels of DNA damage (at least 5 γH2AX foci per cell, Fig. 7c). Next, to help understand their cellular behavior, TrisPOB and TrisNP were submitted to the US National Cancer Institute’s (NCI) NCI-60 Human Tumor Cell Lines Screen, in which their cytotoxicity fingerprints collected in 60 cancer cell lines were tabulated and compared with fingerprints of other known molecules, in order to suggest mechanistic correlations (Zell et al. 2021b). Azacryptands had fingerprints similar to that of known DNA-binding and

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1063

G4-binding molecules, as expected, but also similar to that of topoisomerase (TOP)targeting molecules, which was less expected. This prompted further investigations that eventually unveiled a strong synergistic relationship between TrisNP and the TOP2 catalytic inhibitor BNS-22 (Kawatani et al. 2011), which increased both the antiproliferative properties of the azacryptands (the preincubation of cells with a non-toxic concentration of BNS-22 reduces their respective IC50 values by >2.6-fold) and their DNA-damaging properties. Interestingly, the azacryptands’ synergy with BNS-22 is opposite to what was described with G4 ligands (an antagonism was found upon coincubation of BNS-22 and pyridostatin, PDS) (Bossaert et al. 2021), highlighting that, even if azacryptands were found to interact with both TWJs and G4s in vitro, they act through a cellular mechanism that is distinct from that of classical G4 ligands, therefore lending credence to a specific interaction with TWJs in cells. To delve into the cellular mechanism of azacryptands’ binding and cytotoxicity, a clickable analogue of TrisNP named TrisNP-α (Fig. 7a) (Zell et al. 2021b) was designed and synthesized. This derivative, which contains an alkyne appendage, is an ideal tool for chemical biology investigations, as it can be tagged once in its cellular binding sites, through bioorthogonal chemistry (Fig. 8a) (Cañeque et al. 2018). This strategy was pioneered with an alkynylated G4 ligand, named PDS-α (Rodriguez et al. 2012), which was conjugated to a fluorophore via copper-catalyzed alkyne-azide cycloaddition (CuAAC, also known as click chemistry) after cell incubation, once the ligand is in interaction with cellular G4s. The significance of this approach is two-fold: (i) the alkyne modification of the ligand is minimal, meaning that alkynylated derivatives act similarly to the parent compounds in terms of target binding; and (ii) the alkyne appendage does not affect hydrogen bonding of the ligand, thus does not hinder biological interactions and affords similar biodistributions to the parent compound. In situ click chemistry applied with PDS-α had allowed for G4s to be localized in cells; similarly, this approach was used to trace TrisNP-α localization in cells by fluorescence microscopy (Fig. 8a). Once clicked to a fluorophore, TrisNP-α showed an important staining in the nucleoli and the nucleoplasm, reminiscent of what was reported by Vázquez et al. (vide supra) (Gómez-González et al. 2021). Since the nucleoli contain a wide range of DNA, RNA and proteins, cells were pre-treated with RNAse to remove RNA and highresolution optical imaging provided a clear and accurate snapshot of DNA-binding of TrisNP-α, which accumulates in peri-nucleolar regions (Fig. 8b, yellow arrows). These regions contain nucleolus-associated DNA domains (NADs), known to be rich in repetitive DNA sequences, thereby highly prone to fold into TWJs and interact with TrisNP-α. Importantly, nucleoplasmic (chromosomal) TrisNP-α foci did not colocalize with foci of the established G4-specific antibody BG4 (Fig. 8b), further indicating that TrisNP-α does not interact with G4s in cells. Changes in the TrisNP-α labelling pattern upon incubation with various cellular effectors could also be used to gain interesting insights into TWJ biology. For instance, the intensity of nuclear TrisNP-α staining was quantified following inhibition of replication and transcription to study any dependence of TWJ formation on DNA transactions: no difference was observed after transcription inhibition using BMH-21 (Peltonen et al. 2014) and DRB (Chodosh et al. 1989) (which inhibit RNA

1064

J. Zell and D. Monchaud

Fig. 8 (a) Schematic representation of the bioorthogonal chemistry approach allowing for labelling TrisNP-α once in its genomic binding sites. (b) Optical imaging of MCF7 cells labelled with TrisNP-α, clicked with AF594-azide (red, left panel) or with AF488-azide (green, right panels) and co-stained with G4-specific BG4 antibody (red, right panels) and DAPI (dsDNA marker, blue); yellow arrows indicate perinucleolar foci, white arrows indicate nucleoplasmic foci (scale bars: white ¼ 5 μm; yellow ¼ 1 μm). (c) NHEJ- and HR-mediated repair of DSBs occurring at a stalled replication fork, mediated by DNA-PK, ATM and RAD51, along with the structure of related inhibitors NU7441, KU55933 and B02

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1065

polymerases I and II, respectively), while replication inhibition by aphidicolin reduced TrisNP-α staining, although not in a statistically significant manner. This was again opposed to what was reported for PDS, where both replication and transcription inhibition reduced PDS cellular responses, further confirming that TrisNP does not act like a G4-ligand in cells (Bossaert et al. 2021; Rodriguez et al. 2012). The TOP2 inhibitor BNS-22, which drastically accentuated the cytotoxicity of TrisNP, also induced a significant increase in TrisNP-α staining intensity. This indicated that catalytic TOP2 inhibition promotes the formation of TWJs, which fully corresponds to the known role of TOP2 to relax DNA topology in order to decrease twists and helical stress (known to promote DNA junction folding). The genetic structure of adeno-associated virus (AAV) contains two TWJs; cells were transduced with AAV, leading to a significant increase in TrisNP-α staining, further indicating that TWJ specificity can be attained in cells. Altogether, this set of chemical biology investigations was an invaluable step to understanding how, where, and when TWJs fold in cells and the cellular consequences of TWJ stabilization by small molecules. Taking a chemotherapeutic perspective, these azacryptands were further exploited as anticancer agents, based on their ability to create DNA damage in cancer cells (Zell et al. 2021a). Compounds that inflict serious damage to DNA or hinder DDR signalling and repair are the most used agents in chemotherapy (a) because cancer cells commonly have impaired DDR defence mechanisms, and (b) due to cancer’s fast-replicating nature – an unbridled metabolism means an elevated level of DNA transactions, offering many opportunities for DNA structures to form and damage to occur (Jackson and Bartek 2009). As indicated above, TWJ ligands display promising antiproliferative properties, which can be combined in chemically induced synthetic lethality strategies, in which a combination of chemotherapeutics induces synergic cytotoxicity. The most promising combinations often involve DNA-damaging agents and DDR inhibitors (Pilié et al. 2019). Most DNA-damaging agents covalently bind to DNA; inducing DNA damage with non-covalently binding ligands which specifically target non-B DNA structures could hold great potential for chemotherapy, as alternative structures are potentially more specific for DNA transactions and more present cancer cells, as explained above in the section Biological and Pathological Functions. However, our current understanding of the presence of alternative DNA structures in vivo is still hazy, particularly that of TWJs. Many G4 ligands have shown promise in chemotherapy models, even reaching clinical trials, and an emerging body of work is now showing that combinations of G4 ligands with DDR inhibitors can be deadly effective. For instance, a PDS derivative (PDSI) showed cytotoxic synergy with NU7441 (Leahy et al. 2004), which inhibits the catalytic subunit of a DNA-dependent protein kinase (DNA-PKcs) required for DSB repair by non-homologous end joining (NHEJ) mechanism (McLuckie et al. 2013). This promising strategy was also applied to TWJ-targeting by azacryptands (Duskova et al. 2020; Zell et al. 2021b): MCF7 cells were co-treated with TrisNP or TrisPOB and a series of DDR inhibitors including NU7441 (inhibitor of DNA-PK, therefore noted DNA-PKi), KU55933 (which inhibits ATM kinase, crucial to DSB repair by homologous recombination (HR), noted ATMi) (Hickson et al. 2004), and B02 (which inhibits the recombinase

1066

J. Zell and D. Monchaud

RAD51, also central in HR, noted RAD51i) (Fig. 8c) (Huang et al. 2011). Both NHEJ and HR inhibitors are of biological and clinical interest because NHEJ mediates most DSB repair, except breaks that occur at DNA replication forks, where HR is dominant. The antiproliferative properties of these drug cocktails were quantified by Chou’s combination indexes (CI, where CI < 1 indicates synergy of the two drugs, CI > 1 indicates antagonism and CI ¼ 1 indicates simple additivity) (Chou 2010). TrisNP and TrisPOB proved to be exceptionally synergic with DDR inhibitors, with low CI values of 0.94, 0.82, and 0.70 for TrisPOB and of 0.71, 0.64, and 0.49 for TrisNP, with NU7441, KU55933, and B02, respectively. These results thus confirmed that azacryptands hold important clinical potential in chemically induced synthetic lethality strategies.

Targeting FWJs In contrast to what is depicted above, targeting FWJs for anticancer purposes is only an emerging strategy. The most famous FWJ is certainly the Holliday junction (or HJ) (Liu and West 2004), which is a central intermediate of HR. While both HR and NHEJ repair DSBs in eukaryotes, HR is dominant in prokaryotes, chiefly bacteria, for both ensuring DNA transfer but also providing DNA variations, a key mechanism for bacterial adaptation. This explains why a vast body of work is available to decipher and understand FWJ biology in lower organisms. Beyond mechanistic insights into how HR governs bacterial genetics, research has been conducted with the aim of blocking HJ resolution (a crucial step in HR) with small molecules to induce genetic instability, through the accumulation of unrepaired DSBs. This approach could be used as a novel way to fight against bacterial infection. FWJ ligands were designed in the wake of the crystal structure of a complex of a HJ-forming sequence, loxP, bound to bacterial Cre recombinase, which maintains the HJ open, catalyzing one round of cleavage and religation of two duplexes to form a HJ and a second round to resolve the HJ and release the newly combined sequences (Gopaul et al. 1998). Based on this structure, it was established that the FWJ cavity (binding site) is cuboid-like, with an overall C2 symmetry, and bigger than that of a TWJ, its dimension being estimated at 2.5 nm diameter. This explains why sterically demanding molecules have been tested for their interactions with FWJs. Among them, a panel of C2-symmetric cyclic hexapeptides (Kepple et al. 2005; Ghosh et al. 2005) was synthesized containing aromatic residues tryptophan and phenylalanine (for hydrophobic interaction with cavity walls) and positively charged amine residues (for water solubility and DNA binding), and were tested against Cre recombinase. Incubation of these FWJ-binding cyclic hexapeptides with HJ-forming DNA and Cre led to the accumulation of unresolved HJ intermediates, readily observable by PAGE, as the ligands bound to and stabilized the central FWJ cavity. Compounds containing the biggest aromatic surfaces, phenylalanine and isoquinoline structures, were the most efficient FWJ binders. In a following study, linear hexapeptides Lys-Trp-Trp-Cys-Arg-Trp (KWWCRW) and Trp-Arg-Trp-Tyr-Cys-Arg (WRWYCR, Fig. 9a), which dimerize via a disulfide

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1067

Fig. 9 (a) Chemical structure of the hexapeptide WRWYCR and the dimeric acridine DACA. (b) X-ray crystal structure of DACA in interaction with a FWJ (PDD ID: 2GWA). (c) Series of images showing the ability of WRWYCR to trigger DNA damage in PC3 cells. (d) Chemical structure of VE-822 and optical images demonstrating its ability to create DNA damage (via γH2AX immunodetection) and the accumulation of FWJ at DNA damage sites (via the immunodetection of Holliday Junction Recognition Protein, HJURP)

bond through cysteine, similarly bound and inhibited were found to similarly bind to and inhibit the HJ-unwinding activity of E. coli’s RecG helicase in solution with RuvABC resolvase and HJ-forming DNA. A fluorescence-based assay using the probe 2-aminopurine indicated that hexapeptides interacted within the FWJ cavity, with a very high affinity (Kd ¼ 14 nM) and with selectivity for FWJs over TWJs (Kepple et al. 2008). Subsequent studies confirmed a binding mode similar to those suggested for TWJ ligands, involving π-stacking interactions of the peptides’ aromatic groups with nucleotide bases, and hydrogen bonding of the peptides’ polar

1068

J. Zell and D. Monchaud

groups with ribose and the phosphate backbone. This binding mode was spectacularly confirmed in a series of novel structural studies (Fig. 9b): the resolution of the X-ray structure of a complex involving a dimeric acridine (DACA, Fig. 9a) in interaction with a FWJ confirmed that the central cavity of a FWJ is ideally organized to bind sterically demanding ligands (Brogden et al. 2007). A great deal of information was obtained about the biological activity of FWJ-targeting hexapeptides: it was indeed demonstrated that they inhibited bacterial growth (filamentation and segregation abnormalities), blocked DNA and RNA synthesis and bacterial cell division, induced DNA damage and triggered accumulation of DNA breaks, and displayed synergic cytotoxicity with other DNA-damaging agents (mitomycin C and UV irradiation) (Gunderson and Segall 2006). This synergy observed between peptide treatments and DNA damage induction strongly indicates that DNA damage creates more FWJ targets to which peptides can bind. These studies were thus logically extended to eukaryotic cells and particularly to a series of human tumor cells including HeLa (cervical cancer) and PC3 cells (prostate cancer) (Dey et al. 2013). The toxicity of WRWYCR was moderate against cancer cell lines (IC50 values above 100 μM) but its ability to trigger DNA damage was evidenced by the immunodetection of (i) γH2AX (as above), likely due to the accumulation of unresolved DNA repair intermediates leading to DNA breaks, and (ii) p53-binding protein 1 (53BP1) (Panier and Boulton 2014), which orchestrates the cellular response to DSBs (Fig. 9c, lower panel). Their toxicity was also potentiated by TOP2 poisons doxorubicin (Dox) and etoposide, which are known to induce DNA damage by blocking TOP2 as a covalently bound DNA-TOP2 complex. These results indicate that HJ-targeting agents have potential in chemotherapeutics, either as standalone agents or in synthetic lethality cocktails. Recently, the screening of a small chemical library led to the discovery of a new FWJ-binding agent, VE-822 (Fig. 9d) (Yin et al. 2021). This compound, initially reported as an ATR inhibitor (Fokas et al. 2012), was identified though an in vitro assay monitoring the ligand-mediated FWJ assembly by PAGE, initially developed by Searcey et al. (Howell et al. 2011) and used with a handful of acridines and applied here to 160 candidates. The interaction of VE-822 with FWJ was thoroughly studied in vitro: it promotes the assembly of a FWJ from its four constitutive strands, monitored either by PAGE (EC50 ¼ 7.6 μM) or isothermal FRET assay (EC50 ¼ 5.4 μM). VE-822 also binds to different FWJs with a high affinity assessed by bio-layer interference measurements (with Kd between 8.6 and 23.7 μM, including notably the recently reported FWJ structure that folds from telomeric DNA) (Haider et al. 2018) and a good selectivity over dsDNA and ssDNA (Kd > 50 μM). This compound was also studied for its ability to inhibit the FWJ resolving activity of the Bloom (BLM) helicase, central to the HR repair pathway, by both PAGE and cellbased assays. Binding was also tested in osteosarcoma (U2OS) cells upon transfection of the fluorescently labelled FWJ strands and incubation with the ligand. In cancer cells (U2OS), VE-822 efficiently triggers DNA damage, as demonstrated by the immunodetection of γH2AX (Fig. 9d) and of phosphorylated p53 (pp53), which is a marker of DDR checkpoint activation. The DNA damage induction by VE-822 was shown to be mitigated by the overexpression of BLM, thus providing a

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1069

connection between DNA damage and FWJ in cells. This activity is also strongly attenuated by a pre-treatment with the DNA-PKi Nu7026 (Willmore et al. 2004), and far less with the ATMi KU55933 (vide supra), suggesting that DNA-PKcs is important for sensing VE-822-induced DNA damage, and downstream DDR signalling. Again, the overexpression of BLM in combination with cell pre-treatment with these inhibitors further decrease ligand-mediated DNA damage, giving another clue about the involvement of FWJ targeting in the creation of DNA damage. The most compelling demonstration of this involvement was provided by the co-immunodetection of γH2AX and the Holliday junction recognition protein (HJURP) (Kato et al. 2007). The common foci increase with VE-822 treatment concentration, thus firmly linking ligand-stabilized FWJs with DNA damage. This damage is extensive (Fig. 9d), partly localized at telomeres, and mitigated, as above, by overexpression of BLM. From a therapeutic perspective, VE-822 displays a higher toxicity against osteosarcoma (U2OS) and glioblastoma (U251) cells (IC50 values of ca. 6 μM) than in healthy cells (e.g., lung fibroblast (WI38) and hepatocyte (HL7702) cells, with IC50 > 20 μM). This antiproliferative activity can be potentiated by a sublethal dose of the TOP2 poison Dox (1 μM), with a decrease of the cellular viability by a factor of ca. 2, which correlated with an increase in DNA damage induction (γH2AX) and apoptosis signalling. Again, the toxicity of the VE-822/Dox cocktail can be mitigated by either pre-incubation with Nu7026 or overexpression of BLM. Altogether, this series of results provides convincing arguments for the relevance of FWJs as promising targets for anticancer therapy, a strategy whose efficiency could be improved by implementing precisely designed chemically induced synthetic lethality approaches.

Conclusion The malleability of DNA structure is coming to light, and with this, an understanding that, beyond the genetic code (primary structure) and epigenic tags (secondary), the tertiary structure of our genetic material is functional in biological processes. Herein we have shown that when dsDNA is opened up, ssDNA folds into hairpins and cruciform structures at repeated sequences. These motifs can be recognized by regulatory proteins for endogenous functions or act as roadblocks to DNA transactions, leading to genetic instability, DSBs and mutations. The molecular mechanisms by which DNA structures initiate certain diseases, primarily neurodegenerative and oncogenic diseases, are gradually coming to light. DNA junctions are ideal druggable targets, with a central cavity prone to accommodate sterically defined ligands; this stabilization causes DNA damage that has applications in anticancer and antibacterial strategies. In vitro and in silico models have allowed scientists to characterize small molecules that bind specifically to the cavity of three- and four-way junctions. Aside from detailed NMR and crystal structures of cavity-bound ligands, techniques such as FRET-melting and PAGE gel have afforded the identification of a small repertoire of junction-binding

1070

J. Zell and D. Monchaud

molecules and some reasoning on structure-activity relationships. A handful of molecules that bind strongly within DNA junctions in vitro and that are sufficiently cell-permeable show highly promising properties in cancer cell or bacterial models. However, only indirect evidence has yet shown that these ligands bind to TWJs and FWJs in cells, such as the colocalization of DNA damage foci with replication forks, or the increase of TWJ ligand binding after incubation with TWJ-containing AAV. Chemical biology is our best chance to connect the gaps between in vitro, in cellulo, and in vivo discoveries. The real stability or transiency of hairpin structures in vivo is still unknown; however, this parameter likely varies for the vast number of TWJ- and FWJ-forming sequences found in our genome. With such variability, scientists have not yet described a small molecule able to specifically target a junction-forming sequence in cells, as has been proposed for G4 ligands. The most likely application is in inducing non-covalent DNA damage in rapidly dividing cancer cells. Another future aspiration is the discovery of small molecules that destabilize hairpins to reduce the genetic expansion that leads to neurodegenerative diseases. The road is long and winding; however, in vitro screening techniques have greatly improved, and the scientific community is awakening to the possible future applications of junctionbinding small molecules.

References Amrane S, De Cian A, Rosu F, Kaiser M, De Pauw E, Teulade-Fichou M-P, Mergny J-L (2008) Identification of trinucleotide repeat ligands with a FRET melting assay. Chembiochem 9: 1229–1234 Barros SA, Chenoweth DM (2014) Recognition of nucleic acid junctions using Triptycene-based molecules. Angew Chem Int Ed 53:13746–13750 Barros SA, Chenoweth DM (2015) Triptycene-based small molecules modulate (CAG)(CTG) repeat junctions. Chem Sci 6:4752–4755 Bonev B, Cavalli G (2016) Organization and function of the 3D genome. Nat Rev Genet 17: 661–678 Bonner WM, Redon CE, Dickey JS, Nakamura AJ, Sedelnikova OA, Solier S, Pommier Y (2008) γH2AX and cancer. Nat Rev Cancer 8:957 Bossaert M, Pipier A, Riou J-F, Noirot C, Nguyên L-T, Serre R-F, Bouchez O, Defrancq E, Calsou P, Britton S, Gomez D (2021) Transcription-associated topoisomerase 2α (TOP2A) activity is a major effector of cytotoxicity induced by G-quadruplex ligands. elife 10:e65184 Brabec V, Howson SE, Kaner RA, Lord RM, Malina J, Phillips RM, Abdallah QMA, McGowan PC, Rodger A, Scott P (2013) Metallohelices with activity against cisplatin-resistant cancer cells; does the mechanism involve DNA binding? Chem Sci 4:4407–4416 Brogden AL, Hopcroft NH, Searcey M, Cardin CJ (2007) Ligand bridging of the DNA Holliday junction: molecular recognition of a stacked-X four-way junction by a small molecule. Angew Chem Int Ed 46:3850–3854 Cañeque T, Müller S, Rodriguez R (2018) Visualizing biologically active small molecules in cells using click chemistry. Nat Rev Chem 2:202–215 Cardo L, Sadovnikova V, Phongtongpasuk S, Hodges NJ, Hannon MJ (2011) Arginine conjugates of metallo-supramolecular cylinders prescribe helicity and enhance DNA junction binding and cellular activity. Chem Commun 47:6575–6577

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1071

Cerasino L, Hannon MJ, Sletten E (2007) DNA three-way junction with a dinuclear iron (II) supramolecular helicate at the center: a NMR structural study. Inorg Chem 46:6245–6251 Chodosh LA, Fire A, Samuels M, Sharp PA (1989) 5,6-Dichloro-1-β-D-ribofuranosylbenzimidazole inhibits transcription elongation by RNA polymerase II in vitro. J Biol Chem 264:2250–2257 Chou T-C (2010) Drug combination studies and their synergy quantification using the Chou-Talalay method. Cancer Res 70:440–446 del Mundo IM, Vasquez KM, Wang G (2019) Modulation of DNA structure formation using small molecules. Biochim Biophys Acta - Mol Cell Res:118539 Dey M, Patra S, Su LY, Segall AM (2013) Tumor cell death mediated by peptides that recognize branched intermediates of DNA replication and repair. PLoS One 8:e78751 Dietrich B, Lehn J-M, Guilhem J, Pascard C (1989) Anion receptor molecules : synthesis of an octaaza-cryptand and structure of its fluoride cryptate. Tetrahedron Lett 30:4125–4128 Ducani C, Leczkowska A, Hodges NJ, Hannon MJ (2010) Noncovalent DNA-binding Metallosupramolecular cylinders prevent DNA transactions in vitro. Angew Chem Int Ed 49: 8942–8945 Duskova K, Lamarche J, Amor S, Caron C, Queyriaux N, Gaschard M, Penouilh M-J, de Robillard G, Delmas D, Devillers CH, Granzhan A, Teulade-Fichou M-P, Chavarot-Kerlidou M, Therrien B, Britton S, Monchaud D (2019) Identification of three-way DNA junction ligands through screening of chemical libraries and validation by complementary in vitro assays. J Med Chem 62:4456–4466 Duskova K, Lejault P, Benchimol É, Guillot R, Britton S, Granzhan A, Monchaud D (2020) DNA junction ligands trigger DNA damage and are synthetic lethal with DNA repair inhibitors in cancer cells. J Am Chem Soc 142:424–435 Eichman BF, Vargason JM, Mooers BH, Ho PS (2000) The Holliday junction in an inverted repeat DNA sequence: sequence effects on the structure of four-way junctions. Proc Natl Acad Sci U S A 97:3971–3976 Fokas E, Prevo R, Pollard JR, Reaper PM, Charlton PA, Cornelissen B, Vallis KA, Hammond EM, Olcina MM, Gillies McKenna W, Muschel RJ, Brunner TB (2012) Targeting ATR in vivo using the novel inhibitor VE-822 results in selective sensitization of pancreatic tumors to radiation. Cell Death Dis 3:e441–e441 Gamba I, Rama G, Ortega-Carrasco E, Maréchal J-D, Martínez-Costas J, Vázquez ME, López MV (2014) Programmed stereoselective assembly of DNA-binding helical metallopeptides. Chem Commun 50:11097–11100 Georgakopoulos-Soares I, Morganella S, Jain N, Hemberg M, Nik-Zainal S (2018) Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis. Genome Res 28:1264–1271 Ghosh K, Lau CK, Guo F, Segall AM, Van Duyne GD (2005) Peptide trapping of the Holliday junction intermediate in Cre-loxP site-specific recombination. J Biol Chem 280:8290–8299 Gómez-González J, Pérez Y, Sciortino G, Roldan-Martín L, Martínez-Costas J, Maréchal J-D, Alfonso I, Vázquez López M, Vázquez ME (2021) Dynamic Stereoselection of peptide Helicates and their selective Labeling of DNA replication foci in cells. Angew Chem Int Ed 60:8859–8866 Gopaul DN, Guo F, Van Duyne GD (1998) Structure of the Holliday junction intermediate in Cre–loxP site-specific recombination. EMBO J 17:4175–4187 Gunderson CW, Segall AM (2006) DNA repair, a novel antibacterial target: Holliday junctiontrapping peptides induce DNA damage and chromosome segregation defects. Mol Microbiol 59: 1129–1148 Guyon L, Pirrotta M, Duskova K, Granzhan A, Teulade-Fichou M-P, Monchaud D (2018) TWJ-screen: an isothermal screening assay to assess ligand/DNA junction interactions in vitro. Nucleic Acids Res 46:e16 Haider S, Li P, Khiali S, Munnur D, Ramanathan A, Parkinson GN (2018) Holliday junctions formed from human Telomeric DNA. J Am Chem Soc 140:15366–15374

1072

J. Zell and D. Monchaud

Hänsel-Hertsch R, Di Antonio M, Balasubramanian S (2017) DNA G-quadruplexes in the human genome: detection, functions and therapeutic potential. Nat Rev Mol Cell Biol 18:279–284 Hansen MH, Blakskjær P, Petersen LK, Hansen TH, Højfeldt JW, Gothelf KV, Hansen NJV (2009) A yoctoliter-scale DNA reactor for small-molecule evolution. J Am Chem Soc 131:1322–1327 Hickson I, Zhao Y, Richardson CJ, Green SJ, Martin NM, Orr AI, Reaper PM, Jackson SP, Curtin NJ, Smith GC (2004) Identification and characterization of a novel and specific inhibitor of the ataxia-telangiectasia mutated kinase ATM. Cancer Res 64:9152–9159 Hotze AC, Hodges NJ, Hayden RE, Sanchez-Cano C, Paines C, Male N, Tse M-K, Bunce CM, Chipman JK, Hannon MJ (2008) Supramolecular iron cylinder with unprecedented DNA binding is a potent cytostatic and apoptotic agent without exhibiting genotoxicity. Chem Biol 15:1258–1267 Howell LA, Waller ZAE, Bowater R, O’Connell M, Searcey M (2011) A small molecule that induces assembly of a four way DNA junction at low temperature. Chem Commun 47: 8262–8264 Huang F, Motlekar NA, Burgwin CM, Napper AD, Diamond SL, Mazin AV (2011) Identification of specific inhibitors of human RAD51 recombinase using high-throughput screening. ACS Chem Biol 6:628–635 Jackson SP, Bartek J (2009) The DNA-damage response in human biology and disease. Nature 461: 1071–1078 Kato T, Sato N, Hayama S, Yamabuki T, Ito T, Miyamoto M, Kondo S, Nakamura Y, Daigo Y (2007) Activation of Holliday Junction–Recognizing Protein Involved in the Chromosomal Stability and Immortality of Cancer Cells. Cancer Res 67:8544–8553 Kaushal S, Freudenreich CH (2019) The role of fork stalling and DNA structures in causing chromosome fragility. Genes Chromosom Cancer 58:270–283 Kawatani M, Takayama H, Muroi M, Kimura S, Maekawa T, Osada H (2011) Identification of a small-molecule inhibitor of DNA topoisomerase II by proteomic profiling. Chem Biol 18: 743–751 Kepple KV, Boldt JL, Segall AM (2005) Holliday junction-binding peptides inhibit distinct junction-processing enzymes. Proc Natl Acad Sci U S A 102:6867–6872 Kepple KV, Patel N, Salamon P, Segall AM (2008) Interactions between branched DNAs and peptide inhibitors of DNA repair. Nucleic Acids Res 36:5319 Khristich AN, Mirkin SM (2020) On the wrong DNA track: molecular mechanisms of repeatmediated genome instability. J Biol Chem 295:4134–4170 Kobayashi H, Abe K, Matsuura T, Ikeda Y, Hitomi T, Akechi Y, Habu T, Liu W, Okuda H, Koizumi A (2011) Expansion of intronic GGCCTG hexanucleotide repeat in NOP56 causes SCA36, a type of spinocerebellar ataxia accompanied by motor neuron involvement. Am J Hum Genet 89: 121–130 Leahy JJ, Golding BT, Griffin RJ, Hardcastle IR, Richardson C, Rigoreau L, Smith GC (2004) Identification of a highly potent and selective DNA-dependent protein kinase (DNA-PK) inhibitor (NU7441) by screening of chromenone libraries. Bioorg Med Chem Lett 14: 6083–6087 Lilley DM (2000) Structures of helical junctions in nucleic acids. Q Rev Biophys 33:109–159 Liu Y, West SC (2004) Happy Hollidays: 40th anniversary of the Holliday junction. Nat Rev Mol Cell Biol 5:937–944 Lu M, Guo Q, Kallenbach NR (1992) Interaction of drugs with branched DNA structures. Crit Rev Biochem Mol Biol 27:157–190 Malina J, Hannon MJ, Brabec V (2007) Recognition of DNA three-way junctions by Metallosupramolecular cylinders: gel electrophoresis studies. Chem Eur J 13:3871–3877 McLuckie KIE, Di Antonio M, Zecchini H, Xian J, Caldas C, Krippendorff BF, Tannahill D, Lowe C, Balasubramanian S (2013) G-Quadruplex DNA as a molecular target for induced synthetic lethality in cancer cells. J Am Chem Soc 135:9640–9643 Moldovan G-L, Pfander B, Jentsch S (2007) PCNA, the maestro of the replication fork. Cell 129: 665–679

32

Targeting DNA Junctions with Small Molecules for Therapeutic. . .

1073

Novotna J, Laguerre A, Granzhan A, Pirrotta M, Teulade-Fichou M-P, Monchaud D (2015) Cationic azacryptands as selective three-way DNA junction binding agents. Org Biomol Chem 13:215–222 Oleksi A, Blanco AG, Boer R, Usón I, Aymamí J, Rodger A, Hannon MJ, Coll M (2006) Molecular recognition of a three-way DNA junction by a Metallosupramolecular Helicate. Angew Chem Int Ed 45:1227–1231 Panayotatos N, Wells RD (1981) Cruciform structures in supercoiled DNA. Nature 289:466–470 Panier S, Boulton SJ (2014) Double-strand break repair: 53BP1 comes into focus. Nat Rev Mol Cell Biol 15:7–18 Peltonen K, Colis L, Liu H, Trivedi R, Moubarek MS, Moore HM, Bai B, Rudek MA, Bieberich CJ, Laiho M (2014) A targeting modality for destruction of RNA polymerase I that possesses anticancer activity. Cancer Cell 25:77–90 Pilié PG, Tang C, Mills GB, Yap TA (2019) State-of-the-art strategies for targeting the DNA damage response in cancer. Nat Rev Clin Oncol 16:81–104 Renton AE, Majounie E, Waite A, Simón-Sánchez J, Rollinson S, Gibbs JR, Schymick JC, Laaksovirta H, Van Swieten JC, Myllykangas L (2011) A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD. Neuron 72:257–268 Rodriguez R, Miller KM, Forment JV, Bradshaw CR, Nikan M, Britton S, Oelschlaegel T, Xhemalce B, Balasubramanian S, Jackson SP (2012) Small-molecule-induced DNA damage identifies alternative DNA structures in human genes. Nat Chem Biol 8:301–310 Roncancio D, Yu H, Xu X, Wu S, Liu R, Debord J, Lou X, Xiao Y (2014) A label-free Aptamerfluorophore assembly for rapid and specific detection of cocaine in biofluids. Anal Chem 86: 11100–11106 Stojanovic MN, Landry DW (2002) Aptamer-based colorimetric probe for cocaine. J Am Chem Soc 124:9678–9679 Stojanovic MN, de Prada P, Landry DW (2000) Fluorescent sensors based on aptamer selfassembly. J Am Chem Soc 122:11547–11548 Stojanović MN, Green EG, Semova S, Nikić DB, Landry DW (2003) Cross-reactive arrays based on three-way junctions. J Am Chem Soc 125:6085–6089 Thiviyanathan V, Luxon BA, Leontis NB, Illangasekare N, Donne DG, Gorenstein DG (1999) Hybrid-hybrid matrix structural refinement of a DNA three-way junction from 3D NOESYNOESY. J Biomol NMR 14:209–221 Treangen TJ, Salzberg SL (2012) Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat Rev Genet 13:36–46 van Buuren BN, Overmars FJ, Ippel JH, Altona C, Wijmenga SS (2000) Solution structure of a DNA three-way junction containing two unpaired thymidine bases. Identification of sequence features that decide conformer selection. J Mol Biol 304:371–383 Van Riesen AJ, Le J, Slavkovic S, Churcher ZR, Shoara AA, Johnson PE, Manderville RA (2021) Visible Fluorescent Light-up Probe for DNA Three-Way Junctions Provides Host–Guest Biosensing Applications. ACS Appl Bio Mater 4:6732–6741 Vologodskii A, Lukashin A, Anshelevich V, Frank-Kamenetskii M (1979) Fluctuations in superhelical DNA. Nucleic Acids Res 6:967–982 Vuong S, Stefan L, Lejault P, Rousselin Y, Denat F, Monchaud D (2012) Identifying three-way DNA junction-specific small-molecules. Biochimie 94:442–450 Walker FO (2007) Huntington’s disease. Lancet 369:218–228 Wang G, Vasquez KM (2006) Non-B DNA structure-induced genetic instability. Mutat Res 598: 103–119 Willmore E, de Caux S, Sunter NJ, Tilby MJ, Jackson GH, Austin CA, Durkacz BW (2004) A novel DNA-dependent protein kinase inhibitor, NU7026, potentiates the cytotoxicity of topoisomerase II poisons used in the treatment of leukemia. Blood 103:4659–4665

1074

J. Zell and D. Monchaud

Wu B, Girard F, Van Buuren B, Schleucher J, Tessari M, Wijmenga S (2004) Global structure of a DNA three-way junction by solution NMR: towards prediction of 3H fold. Nucleic Acids Res 32:3228–3239 Yin Q, Liu X, Hu L, Song Q, Liu S, Huang Q, Geng Z, Zhu Y, Li X, Fu F, Wang H (2021) VE-822, a novel DNA Holliday junction stabilizer, inhibits homologous recombination repair and triggers DNA damage response in osteogenic sarcomas. Biochem Pharmacol 193:114767 Zell J, Rota Sperti F, Britton S, Monchaud D (2021a) DNA folds threaten genetic stability and can be leveraged for chemotherapy. RSC Chem Biol 2:47–76 Zell J, Duskova K, Chouh L, Bossaert M, Chéron N, Granzhan A, Britton S, Monchaud D (2021b) Dual targeting of higher-order DNA structures by azacryptands induces DNA junction-mediated DNA damage in cancer cells. Nucleic Acids Res 49:10275–10288

Part V Nucleic Acids and Gene Expression

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression

33

Aaron M. Fleming and Cynthia J. Burrows

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reactive Oxygen Species and Endogenous DNA Damage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oxidation of Guanine in Duplex Versus Quadruplex DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Initiation of Base Excision Repair After Oxidative Stress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cell-Based Assays of Gene Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AP Endonuclease-1 Binding to G-Quadruplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1078 1079 1081 1085 1086 1089 1091 1092

Abstract

Guanine-rich sequences in DNA are sensitive to oxidation to 8-oxo-7,8dihydroguanine (OG) during cellular oxidative stress. When this base modification occurs in a sequence capable of refolding from a canonical duplex DNA structure to a G-quadruplex (G4) in a gene-regulatory region of the genome, modulation of gene expression can occur. The cellular reader of OG in humans is OGG1, a DNA glycosylase of the base excision repair pathway which is followed by the activity of apurinic/apyrimidinic endonuclease-1 (APE1) to cleave the 50 phosphodiester bond adjacent to the abasic site. However, APE1’s cleavage activity is severely attenuated in the G4 context, and DNA binding of the protein instead leads to the recruitment of activating transcription factors when the G4 is located in the nontemplate strand of a promoter close to the transcription start site. In this chapter, we present structural and mechanistic studies pertinent to the APE1-mediated modulation of gene expression under oxidative conditions and briefly cover related mechanisms stemming only from the activity of OGG1; the latter examples do not necessarily involve G-quadruplex formation. Finally, we present a future outlook with unanswered questions. A. M. Fleming · C. J. Burrows (*) Department of Chemistry, University of Utah, Salt Lake City, UT, USA e-mail: afl[email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_38

1077

1078

A. M. Fleming and C. J. Burrows

Keywords

Guanine oxidation · Oxidative stress · 8-oxo-7,8-dihydroguanine · Gquadruplex · Base excision repair · AP-endonuclease · Gene expression · Epigenetics

Introduction Suppose you wanted to design a response element in genomic DNA that would sense oxidative stress and switch into a mode that would recruit the proteins necessary to turn on or off an associated gene that responds to oxidative stress. How would you do that? First, you might realize that reactive oxygen species (ROS) that are formed during oxidative stress can react with DNA bases, and some bases, namely, guanine (G), are more sensitive than others (Neeley and Essigmann 2006; Fleming and Burrows 2017b). Therefore, a G-rich sequence in DNA would be a starting point for a sensor. Second, certain G-rich sequences are capable of folding into an alternative quadruplex structure (Lane et al. 2008; Chen et al. 2022). It could be ideal to couple these two processes – G oxidation as a trigger and duplex/quadruplex refolding as a signal – for the initiation of a genomic response to the redox status of the cell. A suite of proteins in the base excision repair (BER) pathway would be well equipped to recognize the oxidized DNA site, assist with DNA refolding and recruitment of transcription factors, and erase or repair the oxidized lesion when its signaling is no longer needed (David et al. 2007; Fleming et al. 2021). Altogether then, one could imagine that the process of oxidative DNA damage, when it occurs in a regulatory location such as a promoter/enhancer region that commonly contains G-rich sequences, could lead to changes in gene expression if BER proteins communicate with transcription factors during the process of repair. Indeed, such systems exist and have been the subject of intense study over the past few years (Perillo et al. 2008; Pastukh et al. 2015; Pan et al. 2016; Fleming et al. 2017a; Cogoi et al. 2018). Several features are key to nature’s design of a genomic sensor of oxidative stress: (1) The cellular ROS must oxidize DNA in a focused fashion rather than willy-nilly everywhere; (2) the repair of the oxidized lesions should be paused so that there is time to recruit transcription factors for gene induction; and, (3) the BER proteins involved in recognition, signaling, and repair of oxidized bases must be compatible with oxidative stress conditions. All of these features are married in a system that involves the oxidation of guanine in G-quadruplex-forming sequences via one-electron chemistry that ultimately leads to refolding of the apurinic (AP) site in duplex DNA to a G-quadruplex (G4) by AP-endonuclease-1 (APE1), a known recruiter of activating transcription factors. Each of these features will be outlined in more detail in the following sections.

33

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression

1079

Reactive Oxygen Species and Endogenous DNA Damage Reactive oxygen species that are well known to oxidize guanine in DNA include hydroxyl radical (HO•), singlet oxygen (1O2), and ozone (O3) (Cadet et al. 2008; Chabot et al. 2022). However, none of these is present in cells in appreciable concentrations. More common are the “unreactive” oxygen species superoxide (O2•–) and hydrogen peroxide (H2O2); by themselves, these are poorly reactive with DNA but are converted to ROS by enzyme or transition metal catalysts. In addition, the inflammatory response generates nitric oxide, and after reaction with superoxide it produces peroxynitrite (Dedon and Tannenbaum 2004). However, these more abundant species, hydrogen peroxide and peroxynitrite, are not the end of the story because of the presence of the oft-ignored species bicarbonate (Fig. 1). All aerobic organisms have bicarbonate, HCO3, as a component of their buffered medium in the concentration range of 10–40 mM, and dissolved CO2 accompanies it at a much lower concentration. It was found some years ago that peroxynitrite reacts rapidly with CO2 to generate peroxynitrosocarbonate, which then undergoes O-O homolytic bond scission to produce NO2 and CO3•– (Dedon and Tannenbaum 2004; Neeley and Essigmann 2006). The latter, carbonate radical

Fig. 1 Formation pathways for ROS

1080

A. M. Fleming and C. J. Burrows

anion, is thought to account for a significant amount of the DNA base oxidation observed from peroxynitrite. Hydrogen peroxide and CO2 are very slow to react with each other; however, in the presence of iron or copper complexes, metal-bound carbonate undergoes a rapid reaction with peroxide to generate carbonate radical anion (Patra et al. 2020). In fact, when even small amounts (>500 uM) of HCO3 are present at pH 7, the iron Fenton reaction produces CO3•– as the major product, not hydroxyl radical (HO•) as commonly believed (Illes et al. 2019). Therefore, we conclude that carbonate radical anion, not hydroxyl radical, is the predominant ROS that reacts with DNA during endogenous oxidative stress (Fleming and Burrows 2020b). What is the outcome of DNA oxidation by carbonate radical anion? This species is far less reactive than hydroxyl radical (or an FeIV ferryl species), which can react at near-diffusion-controlled rates with many organic compounds including the ribose units and all bases of DNA or RNA with little selectivity (Margolin et al. 2008). In contrast, CO3•– principally reacts to remove an electron from the DNA base stack to yield CO32 and a base radical cation (Margolin et al. 2008). This electron hole in the base stack can migrate over hundreds or possibly thousands of well-stacked base pairs in DNA, pausing only when the electron hole finds a particularly favorable site (Tse et al. 2019). Such a site would be a guanine base, and more ideally, a G with another G stacked on its 30 side (i.e., GG) which makes the 50 G an especially electron-rich site. Better yet is a GGG sequence or a GGGG, in which the italicized Gs are sites observed to be most readily oxidized by one-electron oxidants (Fig. 2) (Sugiyama and Saito 1996). Furthermore, the outcome of one-electron oxidation of

Fig. 2 Oxidation of dsDNA to yield an electron hole that migrates to a low-energy G run to be terminated in formation of OG

33

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression

1081

G is the formation of 8-oxo-7,8-dihydroguanine (OG), after loss of another electron and two protons from the G radical cation (Neeley and Essigmann 2006; Cadet et al. 2008; Fleming and Burrows 2020b). OG is the most commonly observed oxidation product in DNA and RNA. In summary, endogenous oxidative stress produces carbonate radical anion as a key ROS that focuses its reactivity on poly-G sites in duplex DNA. Such sites are frequently found in gene promoters and in particular in G-quadruplex-forming sequences (Huppert and Balasubramanian 2005). Gene promoters are regulatory sites, and G-quadruplexes have been described as transcription factor “hubs” (Spiegel et al. 2021); with the high sensitivity of GGG tracks to oxidation, the initial criteria have been met for creating a sensor of oxidative stress in the genome (Sugiyama and Saito 1996; Fleming and Burrows 2020a). Furthermore, it should be noted that the folding of G-quadruplexes in mammalian cells have been monitored, and on the order of 10,000 G4s are thought to fold at some point during the cell cycle (Hansel-Hertsch et al. 2016); these sequences are especially known to exist in promoter elements and other regulatory regions of the genome. Consequently, many elements of our oxidative stress response unit are in place in the right parts of the genome.

Oxidation of Guanine in Duplex Versus Quadruplex DNA G-quadruplexes are formed by the folding of four closely spaced G-rich sequences, one G from each of the G tracks hydrogen bonds to form a tetrad, three of which stack in DNA to form the G4 (Lane et al. 2008). Potassium ions are present intracellularly at a concentration of 140 mM, much higher than any other metal cation, and they are a perfect fit for coordination in the center of two adjacent tetrads, alleviating some of the negative charge buildups from so many guanine carbonyls oriented toward the center of the stack. Different topologies of G4s are formed according to the conformations of the glycosidic bonds and the connectivity of loops, either across the same face to give antiparallel strands or along a groove to provide parallel strands (Chen et al. 2022). The latter is most common when the loop lengths are short (e.g., 1–2 nt) as is commonly seen in promoter G4s. In contrast, the human telomere sequence with 3-nt loops folds to an antiparallel G4 when Na+ is the coordinating ion and to hybrid G4s with K+ that contain a mixture of loop topologies (Fig. 3) (Patel et al. 2007). The topology of G4 folding can be monitored by circular dichroism (CD) spectroscopy and by NMR or X-ray crystallographic determination of structure in vitro (Cheong et al. 2015; Del Villar-Guerra et al. 2018; Bielskutė et al. 2021; Chen et al. 2022). In addition, antibodies such as BG4 that are selective for G4s have been employed to monitor G4 folding during cell cycle progression (Hansel-Hertsch et al. 2016). In single-stranded DNA in which guanine bases have extensive solvent exposure, there is usually little sequence selectively observed for the oxidation of bases by freely diffusible ROS (Fleming and Burrows 2013). When DNA is structured in a

1082

A. M. Fleming and C. J. Burrows

Fig. 3 A generalized potential G-quadruplex sequence that can form G-tetrads that adopt different G4 topologies Fig. 4 Sites of G4 oxidation are dependent on the nature of the ROS

duplex or quadruplex, then the site of oxidation depends on the mechanism of oxidation (Fleming and Burrows 2013). For example, singlet oxygen and ozone undergo cycloaddition reactions across the face of guanine’s five-membered ring, and so the reactivity at a given G site reports on solvent exposure. Double-stranded DNA that is perfectly complementary is rather unreactive with 1O2 and O3, whereas Gs residing in hairpin loops or bulges are more prone to oxidation by these ROS. The folded nature of a G-quadruplex can be probed by reaction with 1O2 wherein oxidation of G will be observed at the exposed faces of a G-tetrad, but not at internal layers of the G4 (Fig. 4). This reaction was used to help confirm the stereochemistry of the four-electron oxidation product of G, a spiroiminodihydantoin nucleoside Sp whose stereocenter in the base was set by the initial attack of the oxidant and/or

33

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression

1083

water at either the re or si face of guanine, depending on which face was exposed in that particular G4 topology (Fleming et al. 2013). In contrast to four-electron oxidants such as singlet oxygen, oxidation by one-electron oxidants, most commonly carbonate radical anion, typically leads to oxidation at Gs that reside 50 to another G (Margolin et al. 2006; Fleming and Burrows 2013). Because of long-range charge transport in duplex DNA, the initial electron hole can permeate to sites that are less solvent exposed, although ultimately the G•+ is quenched by a nucleophile such as H2O to yield OG (Neeley and Essigmann 2006; Cadet et al. 2008; Fleming and Burrows 2017b). The 50 G preference in one-electron oxidation mechanisms is due to the chirality of the helical base stack in which a G•+ benefits from electron density donation from a G stacked on its 30 side (Sugiyama and Saito 1996). Therefore, oxidation sites via the one-electron abstraction mechanism do not follow solvent exposure but rather guanine richness in an intact duplex. In a folded quadruplex, the same holds true with a minor twist; all of the 50 Gs in a GGG track are reactive in duplex DNA, but the central G is unreactive in quadruplex DNA, either because a central G•+ is not happy to be sandwiched by K+ ions, or because solvent accessibility in the center of the G4 is poor (Fleming and Burrows 2013; Fleming et al. 2015). In any case, the reaction of a prefolded G4 with CO3•– leads to an oxidation pattern of 50 -G>>G>G-30 . Whether and when potential G-quadruplex sequences (PQSs) exist already folded in the genome is an intriguing question that has been addressed by several laboratories (Hansel-Hertsch et al. 2016; Lago et al. 2021); the G4 of the cMYC promoter is thought to be long-lived, while others are transiently folded during transcription or replication. In the work described in this chapter, we assume that duplex DNA is the resting state of the genome. Pertinent to the question of solvent exposure is the question of chromatin structure and how it impacts the formation of oxidized lesions in DNA. Generally speaking, much more OG is found in transcriptionally active sites of the genome where the DNA is more exposed compared to compact heterochromatin (Wu et al. 2018), but one must also consider the fact that transcriptionally active regions undergo more DNA repair to remove oxidized nucleotides. In addition, these regions are generally histone-depleted, and this is particularly true for gene promoters that frequently harbor potential G4-forming sequences. Consequently, the question of how packaging of DNA into nucleosomes impacts DNA damage and repair may be diminished to a certain extent when focusing on G-rich sequences in regulatory regions of the genome. Several laboratories have devised ways to sequence mammalian genomes for the presence of OG with varying success (Ding et al. 2017; Wu et al. 2018; Amente et al. 2019; Fang and Zou 2020; Poetsch 2020; An et al. 2021). Because OG is generally thought to be the result of cellular chemistry rather than the product of an enzyme writer with sequence or context specificity, and because the overall frequency of OG is so low (~1 in 106 in unstressed cells) (Gedik and Collins 2005), separating the signal from the noise is very difficult for OG sequencing. Nevertheless, pull-down enrichment of OG-containing strands and sequencing methods that compare wildtype to repair knockout genomes help home in on important regions where OG is

1084

A. M. Fleming and C. J. Burrows

found (Ding et al. 2017). Besides telomeres, studies have shown that gene-regulatory regions such as promoters appear to have more OG than intergenic regions, suggesting that OG could serve as an epigenetic-like modification in DNA if the recruitment of repair enzymes leads also to the recruitment of activating transcription factors (Ding et al. 2017). We also found that among OG-containing sequences pulled down, potential quadruplex-forming sequences were statistically overrepresented ~10-fold (Ding et al. 2017). The structural consequences of oxidation of G to OG are slight; the 8-oxo group is well accommodated in the OG:C base pair within the B-form double helix with only a small destabilization due to electrostatic repulsion by the proximity of the C4’ oxygen on the ribose with the 8-oxo oxygen on the guanine base (Lipscomb et al. 1995). This is reflected in a lowering of the thermal stability, as measured by Tm, of about 1–2  C for a 17-20mer of duplex DNA (Fleming and Burrows 2017a). On the other hand, OG is not well accommodated in a core position of a G-quadruplex because of steric repulsion between the new proton on N7 of G and the exocyclic NH2 group of an adjacent G in the same tetrad (Fleming et al. 2015). When OG is forced to be 1 of the 12 guanines of the core of a G4, the Tm is lowered by about 15  C. However, if the oxidized G can be accommodated in a loop of a G4, either by shuffling Gs within the track when there are four or more in the same track, or by looping out the damaged track and folding in a fifth track (a.k.a. the “spare tire”), then a thermally stable G4 can once again be formed (Fig. 5). However, such a sequence, when presented with its perfect complementary sequence, might still prefer to exist in a duplex rather than a quadruplex form. It should be noted that NMR structures have been solved of OG-containing G-quadruplexes that have no extra unpaired Gs available (Cheong et al. 2015; Bielskutė et al. 2021); whether or not these represent sufficiently stable structures to be biologically relevant is not known. In summary, oxidation of G-rich sequences by the most common ROS in the cell, carbonate radical anion, leads to preferential oxidation of G tracks such as those found in potential G-quadruplex-forming sequences. OG is the common outcome, but it does not substantially perturb the duplex; it would be highly destabilizing to a quadruplex tetrad if refolding occurs, or if the oxidation happens after a PQS has already folded to a G4.

Fig. 5 Spare tire domains in G4 sequences allow folding to occur when a G is oxidatively damaged

33

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression

1085

Initiation of Base Excision Repair After Oxidative Stress Oxidation to form OG in duplex DNA is sufficiently common in the cell to warrant a specialized repair mechanism as well as a backup repair mechanism (David et al. 2007). The backup system is needed because OG is well hidden in the duplex and does not sufficiently impede the progress of DNA polymerase; however, copying of OG in a template strand is error prone wherein substantial amounts of dAMP are inserted opposite OG when its glycosidic bond rotates to a syn conformation (Shibutani et al. 1991). The result is an OG(syn):A(anti) base pair that would lead to G➔T transversion mutations after another round of replication. Accordingly, there are base excision repair (BER) glycosylases that scan duplex DNA looking for either the OG:C base pair, in which case OGG1 removes the OG, or the OG:A base pair that is a substrate for MTYH, a glycosylase that removes the undamaged but mutagenic A, giving the DNA polymerase another chance to insert dCMP correctly opposite OG (David et al. 2007). The BER glycosylase enzymes do a remarkable job of finding small changes in nucleobase structure, including C deamination to U, methylation of a nitrogen of A, or oxidation of G to OG and beyond to the hydantoins Sp and Gh (David et al. 2007; Fleming and Burrows 2017b). With the exception of the hydantoin lesions, most BER substrates do not greatly perturb the structure of duplex DNA (Fleming and Burrows 2017a), and thus BER is the major pathway dealing with oxidative stress and other types of endogenous chemistry in which only a few atoms are changed (Fig. 6) (McKibbin et al. 2013; Shafirovich and Geacintov 2021). In contrast, environmental mutagens that make bulky, duplex-distorting adducts are repaired by the nucleotide excision repair (NER) pathway that removes a damaged oligonucleotide segment from DNA (Kolbanovskiy et al. 2020). For BER, the process typically involves only a single nucleotide, and it proceeds by first cleaving the glycosidic bond to remove the offending base (David et al. 2007; Wallace 2014; Fleming and Burrows 2017b). Following that, the abasic (AP) site may be removed either by the glycosylase if it has lyase activity or by AP endonuclease-1 (APE1). For OGG1, it is thought that the in vivo lyase activity is too slow to be relevant, and that the OGG1-associated Fig. 6 Generalized BER pathway

1086

A. M. Fleming and C. J. Burrows

protein APE1 takes over to cleave the AP site. Repair is completed by removing the remaining ribose fragment to create a single-nucleotide gap, polymerase β insertion of the correct dNTP opposite the undamaged strand and ligation of the nick to form the repaired duplex. BER glycosylases share features in common with transcription factors: (1) They are expressed at similar levels although these vary by cell type; (2) they have similar binding constants to DNA, typically in the 100–200 nM range; and (3) both can be involved in regulation of gene expression particularly in response to base modifications (Fleming and Burrows 2021). APE1, a phosphodiesterase, is a critical component of the BER pathway but has the unusual feature of being expressed at very high levels in most cell types (Mol et al. 2000). As discussed below, this may reflect the fact that APE1 plays many different cellular roles. Described in sections below are more details about investigations that point to APE1 as the critical mediator of gene activation in response to oxidative stress in G4-forming sequences. However, another pathway has been explored by Boldogh and coworkers that relies only on OGG1 and not APE1 for initiation of gene expression, and it does not necessarily involve G-quadruplex formation. This alternative pathway has been reviewed (Ba and Boldogh 2018).

Cell-Based Assays of Gene Expression Expression of specific genes can be monitored either by quantitative sequencing of the mRNA transcribed or by quantification of reporter gene products that are fluorescent or luminescent (Pan et al. 2016; Fleming et al. 2017a). In our studies, we opted for the latter through the use of the psiCheck2 plasmid that encodes two luciferase genes, Renilla and Firefly, that lead to separate gene products that can readily be quantified in a luminescence study (Fleming et al. 2017a). We manipulated the SV40 promoter of the Renilla luciferase gene in order to replace the TATA box with a potential G-quadruplex-forming sequence, using well-characterized G4s such as VEGF (Fig. 7) (Agrawal et al. 2013). The VEGF promoter had been found to

Fig. 7 Approach to synthesize plasmids with OG at a desired location

33

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression

1087

Fig. 8 Gene induction is impacted by a potential G4, its oxidation, and folding to a G4 bound by APE1

be subject to oxidation, and the gene was known to be upregulated during oxidative stress (Pastukh et al. 2015). Accordingly, it was a good starting point for our studies. In a series of studies, the TATA box sequence of the SV40 promoter (or other elements) was replaced with a synthetic oligonucleotide duplex containing a G4 sequence that could fold to a parallel G4 (Fleming et al. 2017a). The G-rich sequence was located on the nontemplate (coding) strand of DNA, about 10 nt ahead of the transcription start site. We labeled this new construct as “wild-type” because it contained no modified bases. In subsequent studies, a single G at various locations in the G4 was replaced with either OG or an AP analog (tetrahydrofuran, F). The plasmids were transfected into various cell lines, and then the gene products (i.e., Renilla vs. Firefly luciferases) were analyzed after 12, 24, or 48 h post transfection (Fleming et al. 2017a, 2019a). The results of these studies are summarized in Fig. 8. The following observations were seen in these experiments (Fleming et al. 2017a, 2019a): • Replacement of the TATA box with a G4 sequence increased gene expression by about 50%. • The presence of OG in the G-rich sequence leads to an additional ~3X increase in gene expression. The exact location of the OG within the G4-forming sequence was not important, but the G-rich sequence had to be capable of folding to a G4. – The presence of a fifth track increased gene induction more than fourtrack G4s. – In mouse embryonic fibroblasts lacking OGG1, no increase in gene expression was observed implicating BER as a key initiator of gene induction.

1088

A. M. Fleming and C. J. Burrows

• Replacement of the OG with F at various locations of the G4 sequence led to a ~5X increase in gene expression compared to wt, and this was independent of OGG1 but dependent on the presence of APE1. • Replacement of the OG with F that additionally had a phosphorothioate at the site of potential APE1 cleavage (i.e., the 50 phosphodiester bond of F) showed up to 14X enhancement of gene expression over wild-type G. These observations lead to the conclusion that OG and its downstream repair intermediate AP are epigenetic marks in DNA when they occur in a potential G-quadruplex-forming sequence of a promoter. Importantly, gene induction was observed when the G4 sequence and OG or AP were located on the nontemplate strand. In the opposite construct in which the G4 and OG were located on the template strand just ahead of the transcription start site (TSS), gene expression was downregulated, and instead of showing dependence on APE1, the downregulation depended on the CSB protein, Cockayne Syndrome B, which is known to be involved in the recruitment of transcription-coupled repair (Fleming et al. 2017b). Altogether, the cell-based assays of gene expression with reporter plasmids lead to the following proposed mechanism (Fig. 9) (Fleming et al. 2017a). First, reactive oxygen species such as carbonate radical anion focus DNA damage on G-rich sequences as found in gene promoters. Formation of OG recruits OGG1 to initiate repair; this happens in duplex DNA because OGG1 operates only on OG:C duplexes, not on quadruplexes or single-stranded substrates. After formation of the AP site, the duplex is considerably destabilized and may refold to a quadruplex, particularly if a G4-binding protein helps chaperone refolding. Importantly, studies in our lab and others have shown that APE1 binds to G-quadruplexes nearly as well as to duplex DNA AP sites, although enzymatic cleavage at the AP is highly attenuated when in a nonduplex motif. With APE1 stalled on the G4, activating transcription factors such as HIF-1α or AP-1 can be recruited to help initiate gene expression.

Fig. 9 Proposed pathway in which promoter OG can stimulate gene expression via an APE1bound G4 fold

33

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression

1089

As already noted, there are parallel mechanisms proposed in the literature (Fleming et al. 2017a, 2019a; Cogoi et al. 2018). For example, Xodo and coworkers found that the promoter G4 sequence of the KRAS gene was susceptible to oxidation in cancer cells where metabolism is much higher and more ROS are generated (Cogoi et al. 2018). They showed that the binding of MAZ and hnRNP A1, both transcription activating nuclear factors, as well as PARP1 are stronger to the KRAS G4 when it contains OG. It is interesting to note that the position of the G4 in the KRAS promoter is quite different from the G4s in VEGF and NEIL3 that we have studied in reporter plasmids ; the KRAS G4 is ~200 nt upstream of the TSS and located on the template strand, whereas the other two G4s are on the nontemplate strand within ~50 nt of the TSS, i.e., a typical TATA box location. In a limited study of location dependence, we found that the impact of a G4 generally diminished with distance from the TSS; however, there were specific promoter/enhancer elements that were outliers. One such position is the SPHII element of the SV40 promoter that is located ~200 nt upstream of the TSS (Fleming et al. 2019a). Here, a G4 was activating on the template strand and deactivating on the nontemplate strand which is generally in accord with the findings of the Xodo laboratory (Cogoi et al. 2018). Overall, there appear to be several sets of activating transcription factors that can interact with either OG or AP in duplex and quadruplex promoter DNA to lead to upregulation of the corresponding gene. Our work has focused on APE1’s generegulatory activities because of its key role in the proximal G4s and its rich history of being a redox-regulatory factor. In fact, a recent study showed colocation of G4s and APE1 on a genome-wide scale during oxidative stress (Roychoudhury et al. 2020). Teasing out the activity of APE1 as a repair enzyme versus a transcription factor recruiter has led to insights discussed in the next section.

AP Endonuclease-1 Binding to G-Quadruplexes APE1 is a 318-amino acid protein that is named for its activity in catalyzing phosphodiester bond cleavage at the 50 side of an AP site. It has also been named “Ref-1” (redox-effector factor 1) because of its propensity to bind activating transcription factors such as HIF-1α and AP-1 (Fos/Jun) via disulfide bond formation to Cys65 of APE1 (Xanthoudakis and Curran 1992). Several X-ray crystal structures of the protein show binding of the catalytic domain (aa’s 65–318) to an AP site in which the duplex backbone is bent by about 35 (Mol et al. 2000; Freudenthal et al. 2015). Several contacts are made to the backbone, but the protein does not bind sequence specifically nor does it contact the abasic ribose. Curiously, the N-terminal domain is either absent or disordered in the crystal structures (Mol et al. 2000; He et al. 2014; Freudenthal et al. 2015), and it is this domain that is responsible for the Ref-1 activity of the protein (McNeill et al. 2020). We and others also found that it is necessary for G4 binding (Burra et al. 2019; Fleming et al. 2021). Studies with the cMyc promoter G-quadruplex were the first to show that APE1 could bind to a G4, and that G4 binding led to poor enzymatic activity (Broxson et al. 2014). We studied APE1’s ability to cleave DNA at an AP site using F-containing

1090

A. M. Fleming and C. J. Burrows

duplexes and quadruplexes in which the tetrahydrofuran analog F is a chemically stable version of AP that lacks the 10 hydroxyl group (Fleming and Burrows 2020a). Nevertheless, such substrates bind well to APE1, and in the duplex context F is a good substrate for APE1 when Mg2+ is present. In an unstructured, single-stranded context, F-containing DNA is very slowly cleaved. In the context of a G4, F is destabilizing to the folded structure unless it is present in a loop, and then it too is a poor substrate for the cleavage activity of APE1 (Broxson et al. 2014; Burra et al. 2019; Fleming et al. 2021). However, this is not because it does not bind to APE1; indeed, APE1 binds G4s robustly as long as the N-terminal domain of the protein and Mg2+ are present (Fleming et al. 2021). Therefore, the binding of APE1 to F in a quadruplex versus a duplex context must be sufficiently different so that the quadruplex fold is not in an active site conformation conducive to strand scission. Given the unusual binding behavior of APE1, it is interesting to ask how APE1 switches between its role as an intermediary in base excision repair and an activator of gene expression. First, we wondered if APE1 could modulate the transition between duplex and quadruplex structures, effectively acting as a chaperone for refolding duplex into quadruplex (Fleming et al. 2021). This could happen at the stage of AP formation in DNA because abasic sites are destabilizing to the duplex but can easily be accommodated in a loop of a G4. In addition, if binding of APE1 is nearly as strong to the G4 as to the duplex, the equilibrium could shift toward G4 folding. We found this to be the case through a fluorescence assay in which G4 folding of a fluorescently labeled oligonucleotide released a complementary oligomer that contained a proximally labeled fluorescence quencher. Refolding to a G4 occurred upon addition of ~150 nM APE1 although specific conditions were required, namely, the addition of 20% PEG200 in order to simulate the solvent conditions in a molecularly crowded cell (Miyoshi et al. 2006). Second, the oxidation state of proteins processing oxidative DNA damage is also of interest because oxidative stress that induces base oxidation in DNA likely also leads to amino acid oxidation in proteins (Pan et al. 2016; Wang et al. 2021; Howpay Manage et al. 2022). This raises the question of whether DNA and protein oxidation work synergistically to respond to oxidative stress in the cell. Accordingly, we subjected APE1 to ROS such as CO3•– and found that endonuclease activity was impaired on both duplex and quadruplex substrates, but binding constants (KD) were mixed – binding to duplex substrates was considerably weaker for oxidized APE1, but KD values dropped somewhat for quadruplexes indicating tighter binding (Howpay Manage et al. 2022). These two observations fit perfectly with a switching mechanism, that is, as oxidative stress increases leading to both G oxidation in DNA and (presumably) cysteine oxidation in APE1, the affinity of APE1 for duplex DNA decreases while its affinity for folded G4s increases. The abasic site in the duplex site is the best substrate for phosphodiester cleavage while an AP in a G4 is very unreactive. Thus, oxidation of both DNA and protein collaborate to switch from the standard DNA repair mechanism to G4 binding and gene regulation (Fleming et al. 2017a). The seven Cys residues of APE1 are not well positioned for the formation of intramolecular disulfide bonds, but interestingly, most of the Cys’s are located close

33

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression

1091

Fig. 10 Impact of oxidation on APE1 cleavage and G4 binding

to active site residues (Luo et al. 2012). Consequently, it is not surprising that oxidation of these sites weakens duplex DNA binding and catalysis. A mass spectrometric study was conducted to determine the outcome of Cys oxidation in APE1, and five of them, especially C65, C296, and C210, were converted to sulfenic acid residues, RSOH, upon exposure to CO3•– while C65 and C93 also led to disulfide bond formation to a limited extent (Howpay Manage et al. 2022). The Cys residues were also individually mutated to Ser and Asp in order to mimic cysteine sulfenic acid groups, and all of them (especially C65S and C310S) showed diminution of enzymatic activity but enhanced binding to well-folded G4s (Fig. 10).

Conclusion and Outlook A picture is emerging whereby oxidative damage in DNA, in the form of OG, serves as a signal to regulate genes in response to oxidative stress. In this sense, we and others designate OG as an epigenetic-like mark (Pan et al. 2016), one that is not itself heritable to the next generation of cells, but instead the oxidation sensitive site, a potential G-quadruplex, for example, is inherited. Oxidative stress is a transient condition of most cells, and therefore a dynamic on-off switch is needed for regulation of genes that respond to oxidation. Base excision repair enzymes, OGG1 and APE1, are the readers of OG and AP, respectively, and depending upon location and context, they are able to recruit activating transcription factors (Antoniali et al. 2014; Pan et al. 2016; Fleming and Burrows 2021). For example, the G4 context appears to depend on the activity of APE1 while non-G4 contexts may instead involve OGG1 and a different suite of transcription factors (Pan et al. 2016; Fleming et al. 2017a; Cogoi et al. 2018). It is also intriguing to consider that there might be enzymatic writers of OG, such as LSD1, a flavin-dependent chromatin remodeler that generates H2O2 in the vicinity of a gene promoter (Perillo et al. 2008). This mechanism was inferred in one of the

1092

A. M. Fleming and C. J. Burrows

earliest observations of OGG1 being recruited to a promoter site for gene activation (Luo et al. 2012). Many questions involving molecular details are still missing for the process of BER-assisted gene activation. Specifically: • How does APE1 bind to G4s and recruit transcription factors? Lacking structural work on the disordered N-terminal domain of the protein, we have little information about the important G4 binding residues. • Liquid-liquid phase separation is observed with APE1 (Tosolini et al. 2020). Is this a component of the gene expression regulation or the BER activity or both? • More information is needed about where OG is formed and by what ROS. Are one-electron oxidants such as carbonate radical anion the major species oxidizing guanine in the cell? • What triggers refolding of duplexes to quadruplexes in DNA? Does DNA damage and repair start the process, or are other mechanical processes responsible for separating the strands to initiate G4 folding. • What is the role of R-loops (Sollier and Cimprich 2015); these structures are common when a template strand of DNA is transcribed and the nontemplate strand contains a G-quadruplex-forming sequence? Is there a role for miRNAs or other RNAs in this overall pathway? We hope that these questions and others will be addressed in future experiments to provide a clearer picture of the choreography initiated by oxidation of guanine. Acknowledgments Research on this topic in our laboratory was supported by grants from the US National Institutes of Health, including R01 CA090689, R01 GM129267, and presently R35 GM145237.

References Agrawal P, Hatzakis E, Guo K, Carver M, Yang D (2013) Solution structure of the major G-quadruplex formed in the human VEGF promoter in K+: insights into loop interactions of the parallel G-quadruplexes. Nucleic Acids Res 41(22):10584–10592 Amente S, Di Palo G, Scala G, Castrignanò T, Gorini F, Cocozza S, Moresano A, Pucci P, Ma B, Stepanov I, Lania L, Pelicci PG, Dellino GI, Majello B (2019) Genome-wide mapping of 8-oxo-7,8-dihydro-20 -deoxyguanosine reveals accumulation of oxidatively-generated damage at DNA replication origins within transcribed long genes of mammalian cells. Nucleic Acids Res 47(1):221–236 An J, Yin M, Yin J, Wu S, Selby CP, Yang Y, Sancar A, Xu GL, Qian M, Hu J (2021) Genome-wide analysis of 8-oxo-7,8-dihydro-20 -deoxyguanosine at single-nucleotide resolution unveils reduced occurrence of oxidative damage at G-quadruplex sites. Nucleic Acids Res 49(21): 12252–12267 Antoniali G, Lirussi L, D’Ambrosio C, Dal Piaz F, Vascotto C, Casarano E, Marasco D, Scaloni A, Fogolari F, Tell G (2014) SIRT1 gene expression upon genotoxic damage is regulated by APE1 through nCaRE-promoter elements. Mol Biol Cell 25(4):532–547 Ba X, Boldogh I (2018) 8-Oxoguanine DNA glycosylase 1: beyond repair of the oxidatively modified base lesions. Redox Biol 14:669–678

33

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression

1093

Bielskutė S, Plavec J, Podbevšek P (2021) Oxidative lesions modulate G-quadruplex stability and structure in the human BCL2 promoter. Nucleic Acids Res 49(4):2346–2356 Broxson C, Hayner JN, Beckett J, Bloom LB, Tornaletti S (2014) Human AP endonuclease inefficiently removes abasic sites within G4 structures compared to duplex DNA. Nucleic Acids Res 42(12):7708–7719 Burra S, Marasco D, Malfatti MC, Antoniali G, Virgilio A, Esposito V, Demple B, Galeone A, Tell G (2019) Human AP-endonuclease (Ape1) activity on telomeric G4 structures is modulated by acetylatable lysine residues in the N-terminal sequence. DNA Repair 73:129–143 Cadet J, Douki T, Ravanat J-L (2008) Oxidatively generated damage to the guanine moiety of DNA: mechanistic aspects and formation in cells. Acc Chem Res 41(8):1075–1083 Chabot MB, Fleming AM, Burrows CJ (2022) Identification of the major product of guanine oxidation in DNA by ozone. Chem Res Toxicol 35(10):1809–1813 Chen L, Dickerhoff J, Sakai S, Yang D (2022) DNA G-quadruplex in human telomeres and oncogene promoters: structures, functions, and small molecule targeting. Acc Chem Res 55(18):2628–2646 Cheong VV, Heddi B, Lech CJ, Phan AT (2015) Xanthine and 8-oxoguanine in G-quadruplexes: formation of a GGXO tetrad. Nucleic Acids Res 43(21):10506–10514 Cogoi S, Ferino A, Miglietta G, Pedersen EB, Xodo LE (2018) The regulatory G4 motif of the Kirsten ras (KRAS) gene is sensitive to guanine oxidation: implications on transcription. Nucleic Acids Res 46(2):661–676 David SS, O’Shea VL, Kundu S (2007) Base-excision repair of oxidative DNA damage. Nature 447:941–950 Dedon PC, Tannenbaum SR (2004) Reactive nitrogen species in the chemical biology of inflammation. Arch Biochem Biophys 423:12–22 Del Villar-Guerra R, Trent JO, Chaires JB (2018) G-Quadruplex secondary structure obtained from circular dichroism spectroscopy. Angew Chem Int Ed Engl 57(24):7171–7175 Ding Y, Fleming AM, Burrows CJ (2017) Sequencing the mouse genome for the oxidatively modified base 8-oxo-7,8-dihydroguanine by OG-Seq. J Am Chem Soc 139:2569–2572 Fang Y, Zou P (2020) Genome-wide mapping of oxidative DNA damage via engineering of 8-oxoguanine DNA glycosylase. Biochemistry 59:85–89 Fleming AM, Burrows CJ (2013) G-Quadruplex folds of the human telomere sequence alter the site reactivity and reaction pathway of guanine oxidation compared to duplex DNA. Chem Res Toxicol 26(4):593–607 Fleming AM, Burrows CJ (2017a) 8-Oxo-7,8-dihydro-20 -deoxyguanosine and abasic site tandem lesions are oxidation prone yielding hydantoin products that strongly destabilize duplex DNA. Org Biomol Chem 15(39):8341–8353 Fleming AM, Burrows CJ (2017b) Formation and processing of DNA damage substrates for the hNEIL enzymes. Free Radic Biol Med 107:35–52 Fleming AM, Burrows CJ (2020a) Interplay of guanine oxidation and G-quadruplex folding in gene promoters. J Am Chem Soc 142(3):1115–1136 Fleming AM, Burrows CJ (2020b) On the irrelevancy of hydroxyl radical to DNA damage from oxidative stress and implications for epigenetics. Chem Soc Rev 49(18):6524–6528 Fleming AM, Burrows CJ (2021) Oxidative stress-mediated epigenetic regulation by G-quadruplexes. NAR Cancer 3(3):zcab038 Fleming AM, Orendt AM, He Y, Zhu J, Dukor RK, Burrows CJ (2013) Reconciliation of chemical, enzymatic, spectroscopic and computational data to assign the absolute configuration of the DNA base lesion spiroiminodihydantoin. J Am Chem Soc 135(48):18191–18204 Fleming AM, Zhou J, Wallace SS, Burrows CJ (2015) A role for the fifth G-track in G-quadruplex forming oncogene promoter sequences during oxidative stress: do these “spare tires” have an evolved function? ACS Cent Sci 1:226–233 Fleming AM, Ding Y, Burrows CJ (2017a) Oxidative DNA damage is epigenetic by regulating gene transcription via base excision repair. Proc Natl Acad Sci U S A 114:2604–2609

1094

A. M. Fleming and C. J. Burrows

Fleming AM, Zhu J, Ding Y, Burrows CJ (2017b) 8-Oxo-7,8-dihydroguanine in the context of a promoter G-quadruplex is an on-off switch for transcription. ACS Chem Biol 12:2417–2426 Fleming AM, Zhu J, Ding Y, Burrows CJ (2019a) Location dependence of the transcriptional response of a potential G-quadruplex in gene promoters under oxidative stress. Nucleic Acids Res 47(10):5049–5060 Fleming AM, Zhu J, Howpay Manage SA, Burrows CJ (2019b) Human NEIL3 gene expression is regulated by epigenetic-like oxidative DNA modification. J Am Chem Soc 141:11036–11046 Fleming AM, Howpay Manage SA, Burrows CJ (2021) Binding of AP endonuclease-1 to G-quadruplex DNA depends on the N-terminal domain, Mg2+, and ionic strength. ACS Bio Med Chem Au 1(1):44–56 Freudenthal BD, Beard WA, Cuneo MJ, Dyrkheeva NS, Wilson SH (2015) Capturing snapshots of APE1 processing DNA damage. Nat Struct Mol Biol 22(11):924–931 Gedik CM, Collins A (2005) Establishing the background level of base oxidation in human lymphocyte DNA: results of an interlaboratory validation study. FASEB J 19:82–84 Hansel-Hertsch R, Beraldi D, Lensing SV, Marsico G, Zyner K, Parry A, Di Antonio M, Pike J, Kimura H, Narita M, Tannahill D, Balasubramanian S (2016) G-quadruplex structures mark human regulatory chromatin. Nat Genet 48(10):1267–1272 He H, Chen Q, Georgiadis MM (2014) High-resolution crystal structures reveal plasticity in the metal binding site of apurinic/apyrimidinic endonuclease I. Biochemistry 53(41):6520–6529 Howpay Manage SA, Fleming AM, Chen HN, Burrows CJ (2022) Cysteine oxidation to sulfenic acid in APE1 aids G-quadruplex binding while compromising DNA repair. ACS Chem Biol 17(9):2583–2594 Huppert JL, Balasubramanian S (2005) Prevalence of quadruplexes in the human genome. Nucleic Acids Res 33(9):2908–2916 Illes E, Mizrahi A, Marks V, Meyerstein D (2019) Carbonate-radical-anions, and not hydroxyl radicals, are the products of the Fenton reaction in neutral solutions containing bicarbonate. Free Radic Biol Med 131:1–6 Kolbanovskiy M, Aharonoff A, Sales AH, Geacintov NE, Shafirovich V (2020) Remarkable enhancement of nucleotide excision repair of a bulky guanine lesion in a covalently closed circular DNA plasmid relative to the same linearized plasmid. Biochemistry 59(31):2842–2848 Lago S, Nadai M, Cernilogar FM, Kazerani M, Domíniguez Moreno H, Schotta G, Richter SN (2021) Promoter G-quadruplexes and transcription factors cooperate to shape the cell typespecific transcriptome. Nat Commun 12(1):3885 Lane AN, Chaires JB, Gray RD, Trent JO (2008) Stability and kinetics of G-quadruplex structures. Nucleic Acids Res 36(17):5482–5515 Lipscomb LA, Peek ME, Morningstar ML, Verghis SM, Miller EM, Rich A, Essigmann JM, Williams LD (1995) X-ray structure of a DNA decamer containing 7,8-dihydro-8-oxoguanine. Proc Natl Acad Sci U S A 92:719–723 Luo M, Zhang J, He H, Su D, Chen Q, Gross ML, Kelley MR, Georgiadis MM (2012) Characterization of the redox activity and disulfide bond formation in apurinic/apyrimidinic endonuclease. Biochemistry 51(2):695–705 Margolin Y, Cloutier JF, Shafirovich V, Geacintov NE, Dedon PC (2006) Paradoxical hotspots for guanine oxidation by a chemical mediator of inflammation. Nat Chem Biol 2(7):365–366 Margolin Y, Shafirovich V, Geacintov NE, DeMott MS, Dedon PC (2008) DNA sequence context as a determinant of the quantity and chemistry of guanine oxidation produced by hydroxyl radicals and one-electron oxidants. J Biol Chem 283:35569–35578 McKibbin PL, Fleming AM, Towheed MA, Van Houten B, Burrows CJ, David SS (2013) Repair of hydantoin lesions and their amine adducts in DNA by base and nucleotide excision repair. J Am Chem Soc 135(37):13851–13861 McNeill DR, Whitaker AM, Stark WJ, Illuzzi JL, McKinnon PJ, Freudenthal BD, Wilson 3rd DM (2020) Functions of the major abasic endonuclease (APE1) in cell viability and genotoxin resistance. Mutagenesis 35(1):27–38

33

DNA Damage and Repair in G-Quadruplexes Impact Gene Expression

1095

Miyoshi D, Karimata H, Sugimoto N (2006) Hydration regulates thermodynamics of G-quadruplex formation under molecular crowding conditions. J Am Chem Soc 128(24):7957–7963 Mol CD, Izumi T, Mitra S, Tainer JA (2000) DNA-bound structures and mutants reveal abasic DNA binding by APE1 and DNA repair coordination. Nature 403(6768):451–456 Neeley WL, Essigmann JM (2006) Mechanisms of formation, genotoxicity, and mutation of guanine oxidation products. Chem Res Toxicol 19(4):491–505 Pan L, Zhu B, Hao W, Zeng X, Vlahopoulos SA, Hazra TK, Hegde ML, Radak Z, Bacsi A, Brasier AR, Ba X, Boldogh I (2016) Oxidized guanine base lesions function in 8-oxoguanine DNA glycosylase1-mediated epigenetic regulation of nuclear factor kappaB-driven gene expression. J Biol Chem 291(49):25553–25566 Pastukh V, Roberts JT, Clark DW, Bardwell GC, Patel M, Al-Mehdi AB, Borchert GM, Gillespie MN (2015) An oxidative DNA “damage” and repair mechanism localized in the VEGF promoter is important for hypoxia-induced VEGF mRNA expression. Am J Physiol Lung Cell Mol Physiol 309(11):L1367–L1375 Patel DJ, Phan AT, Kuryavyi V (2007) Human telomere, oncogenic promoter and 50 -UTR G-quadruplexes: diverse higher order DNA and RNA targets for cancer therapeutics. Nucleic Acids Res 35(22):7429–7455 Patra SG, Mizrahi A, Meyerstein D (2020) The role of carbonate in catalytic oxidations. Acc Chem Res 53(10):2189–2200 Perillo B, Ombra MN, Bertoni A, Cuozzo C, Sacchetti S, Sasso A, Chiariotti L, Malorni A, Abbondanza C, Avvedimento EV (2008) DNA oxidation as triggered by H3K9me2 demethylation drives estrogen-induced gene expression. Science 319(5860):202–206 Poetsch AR (2020) The genomics of oxidative DNA damage, repair, and resulting mutagenesis. Comput Struct Biotechnol J 18:207–219 Roychoudhury S, Pramanik S, Harris HL, Tarpley M, Sarkar A, Spagnol G, Sorgen PL, Chowdhury D, Band V, Klinkebiel D, Bhakat KK (2020) Endogenous oxidized DNA bases and APE1 regulate the formation of G-quadruplex structures in the genome. Proc Natl Acad Sci U S A 117(21):11409–11420 Shafirovich V, Geacintov NE (2021) Excision of oxidatively generated guanine lesions by competitive DNA repair pathways. Int J Mol Sci 22(5):2698 Shibutani S, Takeshita M, Grollman AP (1991) Insertion of specific bases during DNA synthesis past the oxidation-damaged base 8-oxodG. Nature 349:431–434 Sollier J, Cimprich KA (2015) Breaking bad: R-loops and genome integrity. Trends Cell Biol 25(9): 514–522 Spiegel J, Cuesta SM, Adhikari S, Hänsel-Hertsch R, Tannahill D, Balasubramanian S (2021) G-quadruplexes are transcription factor binding hubs in human chromatin. Genome Biol 22(1):117 Sugiyama H, Saito I (1996) Theoretical studies of GG-specific photocleavage of DNA via electron transfer: significant lowering of ionization potential and localization of HOMO of stacked GG bases in B-form DNA. J Am Chem Soc 118:7063–7068 Tosolini D, Antoniali G, Dalla E, Tell G (2020) Role of phase partitioning in coordinating DNA damage response: focus on the apurinic apyrimidinic endonuclease 1 interactome. Biomol Concepts 11(1):209–220 Tse ECM, Zwang TJ, Bedoya S, Barton JK (2019) Effective distance for DNA-mediated charge transport between repair proteins. ACS Cent Sci 5(1):65–72 Wallace SS (2014) Base excision repair: a critical player in many games. DNA Repair 19:14–26 Wang K, Maayah M, Sweasy JB, Alnajjar KS (2021) The role of cysteines in the structure and function of OGG1. J Biol Chem 296:100093 Wu J, McKeague M, Sturla SJ (2018) Nucleotide-resolution genome-wide mapping of oxidative DNA damage by click-code-seq. J Am Chem Soc 140(31):9783–9787 Xanthoudakis S, Curran T (1992) Identification and characterization of Ref-1, a nuclear protein that facilitates AP-1 DNA-binding activity. EMBO J 11(2):653–665

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

34

Manlio Palumbo and Claudia Sissi

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Does Gene Expression Depend on DNA Structure? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functions of Non-canonical Structures at Gene Promoters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Targeting G-quadruplexes for Medical Purposes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Lesson from the Studied Ligands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alternative G4 Arrangements as More Selective Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physiological Relevance of Alternative G4 Repeats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Epigenetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1098 1100 1103 1107 1109 1112 1115 1118 1119 1121

Abstract

Regulation of gene expression is one of the fundamental functions of a living organism. It is not sufficient to make a desired cell component through specific mechanism(s), since the production must be quantitatively, locally, and temporally controlled and concerted for granting harmonic evolution of the cell cycle. Hence, the need for a finely tuned effective and dependable mechanism of regulation. Among possible strategies, a type of control mediated by DNA structural rearrangements has lately received great attention. This was mainly amplified toward G-quadruplex (G4) because of the apparent simplicity of the construct and the in-depth structural experience earned when DNA biological functions, besides the genomic information, were not known yet. Unrestrained transcription rate is related to genetic instability and increased DNA repair. These conditions correlate to cancer progression and neurological disorders, which prompted the idea of using selective G4 ligands to interfere with the expression of disease-associated genes. Unfortunately (or fortunately) the issue of specific M. Palumbo · C. Sissi (*) Department of Pharmaceutical and Pharmacological Sciences, University of Padova, Padova, Italy e-mail: [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_39

1097

1098

M. Palumbo and C. Sissi

G4 recognition proved to be overly complicated in vivo by the plasticity of the nucleic acid, able to assume several conformations with comparable energy, hence biologically significant. As a result, we are still waiting for a G4-directed drug to reach the market. To drive the attention toward more focused strategies, particular arrangements in which more than one G4 moiety contributes to complex affinity and selectivity are being considered. Finally, the recently established G4 participation in epigenetic processes opens new hopefully successful drug design perspectives. Keywords

Non-canonical DNA structures · G-quadruplex · Transcription factors · Genome architecture · DNA binders

Introduction Cell sustainment, growth, replication, and death are physiological processes that need to be finely tuned to proceed in a concerted way. This requires multiple mechanisms of control that work at distinct levels. Overall, the large part of this job is performed by proteins, the activity of which can be controlled by activation/ inhibition pathways as well as by a modulation of their intracellular concentration. This last mechanism is based on the induction, suppression, increment, or reduction of proteins production. All these steps refer to a major common pathway that is based on a point-by-point tuning of gene expression. Again, this control depends on proteins like transcription factors, enhancers, polymerases, and others. Nevertheless, since the discovery of the double helix structure by Watson and Crick, we slowly learned that DNA cannot be considered any more as an inert macromolecule that simply recruits the transcriptional machinery and merely functions as a template on the bases of its primary sequence. The discovery of the dynamic structural behavior of DNA is not a matter of curiosity but it is one key that Nature smartly developed to control the multiple and articulate events that support cell life and death. As far it concerns the regulation of transcription, the finely integrated intervention of numerous factors is commonly exploited. On a general base, the recruitment of the preinitiation complex at promoter is obviously fundamental to properly localize RNA polymerase and start the mRNA synthesis. However, the binding of transcription factors (TF) at the proximal promoter or at distal enhancer sites is needed to tune the efficiency of the process and, ultimately, to control mRNA production in terms of amount, time, and site (Patange et al. 2021; Rodriguez and Larson 2020). TFs recruitment corresponds to a protein-DNA complex formation. The choice performed by Nature to work on such a system reflects the possibility it provides to act at least at two distinct levels: the protein and the nucleic acid. Since any single chemical modifications at any different site of these two macromolecules may drive different cellular response, this apparently simple solution opens a massive number

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1099

of combinations that, in the case of transcription, allows Nature to finely modulate and control this fundamental biological process. Each combination represents a unique structural entity that forms only when a specific physiological response is required. Noteworthy, some involved species are generated by reactions, like methylation or acetylation, that have high energetic barriers and thus they are performed by enzymes. The remarkable energetic cost of such modifications is justified to obtain a high level of control of the process. Conversely, conformational rearrangements, which can be efficiently promoted even by smooth changes in the chemical composition of the nuclear environment, can produce fast responses to cell needs and can be efficiently exploited to monitor even single steps of complex reactions. It derives that the ability to affect the protein-DNA binding at a targeted site (corresponding to one single conformation) is expected to allow a focused modulation of the cell expression profile. This would open a wide range of applications in medicinal chemistry. However, the selective targeting of a protein-DNA interface is one of the hugest challenges in drug design. To overcome this issue, it is demanding to define the molecular determinants of the DNA recognition by TF (or all other factors that are actively involved to control transcription). The TF-DNA interaction is mostly based on the recognition of a consensus sequence. However, the new perspective is to address the multiple roles that the different DNA structures (of one single sequence!) can play in this context. Here we will present a brief overview of the major polymorphic species of the genomic DNA that can impact significantly on the regulation of transcription with a major focus on G-quadruplexes (G4) and on their potential as therapeutic targets. The relevance of these structural nucleic acid motives is strictly connect to the correlation between folding and function. G4s can be differentiated by their location along the genome, probability of occurrence and conformational features, creating a variety of opportunities for specific and selective intervention. Moreover, their folding process covers a wide range of kinetic rates, thus introducing a temporal control of the function (Grün and Schwalbe 2022). While G4 occurrence in vivo is no longer object of debate, their fine mechanism(s) of interference on transcription need to be further clarified eventually leading to discover more sophisticated ways of modulation of gene expression. The reported studies encompass a large variety of experimental approaches starting from in silico methods, synthetic and medicinal chemistry reports, molecular biology, and genetics investigations, to end with pharmacology and medicine experimentations, each one bringing valuable integrating contributions. Still, when we move toward the proper exploitation of their function for medical application a general question summarizes our perspectives: Where is research on G4 going in the field of gene expression? We can identify three major areas of interest that will drive us along this issue: 1) Improvement of the knowledge on molecular mechanisms for G4-mediated gene expression both in physiological and pathological conditions 2) Development of specific low molecular weight G4 ligands 3) Investigations on integrated epigenetic events involving G4

1100

M. Palumbo and C. Sissi

Does Gene Expression Depend on DNA Structure? The functional recruitment of TF occurs at promoters of those genes located in the open chromatin: noteworthy this is the nuclear region where conformational rearrangements of the canonical double helix are more prone to occur leading to the formation of the so-called “non-canonical” DNA arrangements. The preferred final non-canonical DNA folding will depend largely on the sequence and, not less relevant, from the topological state of the substrate. Indeed, it is worth to remind that, once activated, transcription intrinsically generates positive and negative supercoiling in front of and behind the transcription fork. This condition favors the opening of a denaturation bubble to reduce the structural tension deriving from the supercoiling degree (Buglione et al. 2021). The number of structures that can be generated at these sites and under these conditions is high and their structural features variegate (Fig. 1). Some of them, although distinct from the B-helix, refer to the canonical WatsonCrick base pairing. This is the case of hairpin formation, which on general bases, depends on the presence of a palindromic sequence composition. It derives, that within a denaturation bubble, two hairpins with the same base-pairing pattern, and thus with the same thermodynamic profiles, may occur at the two complementary strands thus leading to the formation of cruciform structures. Common sites prone to generate cruciforms are inverted repeats, where they form with a stability that increases according to the number and length of the repeats, since it is directly related to the final length of the stem (Bowater et al. 2022). These structural elements are functionally involved in recombination, translocation, and splicing, although they are frequently associated to sites of genomic instability. Cruciforms accumulate at chromosomal breakpoint junctions, promoters, and replication initiation sites.

Fig. 1 Summary of non-canonical structures of nucleic acids prevalently based on Watson-Crick (WC) or alternative (Quadruplex and Triplex) base pairing. Created in BioRender.com

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1101

Still, in vivo data confirmed their occurrence at transcriptionally active sites in line with the relevance of DNA unwinding as driving force (Khristich and Mirkin 2020). A single stranded sequence can pair with one strand of the denaturation bubble, too. This condition often occurs during translation thanks to the recruitment of the nascent mRNA. This mechanism is at the base of the structural arrangement called R-loops, very peculiar structures that correspond to DNA-RNA hybrids (Petermann et al. 2022). Also R-loops are associated to genomic instability and, consistently, multiple control elements work synergically to reduce their accumulation. This regulation is crucial during transcription, when the newly synthesized RNA might pair with the template behind the fork progression. At this stage, the most relevant causes of R-loop accumulation refer to RNA polymerase hyper-activation (that increments the number of nascent mRNA) or pausing (that increases the time of residence of the mRNA close to the complementary DNA). Among the multiple proteins that cooperate to avoid this critical condition, it is worth to highlight the fundamental contribution of topoisomerases, the enzymes devoted to remove DNA supercoiling during transcription and replication. In particular, Topoisomerase I (TOP1) is essential to remove the negative supercoiling that is generated by the progression of the transcriptional machinery. As a result, silencing of TOP1 causes an accumulation of R-loops at actively transcribed genes (Manzo et al. 2018). It is worth to mention that the pairing of a single stranded sequence within the denaturation bubble is different from the potential pairing of a single strand directly within the canonical B-DNA. In the latter, a single stranded DNA or RNA sequence enter from the major groove and bind to the complementary strand through Hoogsteen hydrogen bonds. This binding pattern does not compete with the WC pairing of the targeted duplex. As a result, the final structure is a triple helix. Notably, due to the simultaneous presence of the two classes of hydrogen bond pairings, triplex occurs only at homopurine-homopyrimidine sites (Bacolla et al. 2015). This kind of dsDNA binding mode is efficiently exploited by lncRNAs that according to this mechanism can amplify their regulatory role on transcription (Choudhury et al. 2021). It derives that the development of designed sequences able to selectively target the site where the triplex might form can produce therapeutically useful outcomes. This faces the development of nucleic acid-based analogues endowed with high transfection efficiency to be considered for targeted delivery of sitespecific modulator of transcription. As far it concerns the targeting of non-canonical DNA structures as tools to regulate DNA transcription, more intriguing is the exploitation of other arrangements based on the formation of tetrahelices, mainly G-quadruplex (G4) and I-motif (iM). Structurally, these two arrangements are largely divergent. First of all, the hydrogen-base pairing supporting these tetrahelices is different and this leads toward distinct overall shape and thermodynamics. G4 structures are based on the Hoogsteen pairing of four guanines into a “Gtetrad.” To gain stability, multiple G-tetrads stack one over the other. This arrangement creates a central hole toward which 4 carbonyl oxygens (one for each guanine involved in tetrad formation) point at each plane. To neutralize the resulting central negative charge, cations of proper size and charge density are bound. The final G4

1102

M. Palumbo and C. Sissi

stability depends on the sequence that determines the number of tetrads and the possible formation of capping element by the loops, and on the nature of the cation, K+ being the most efficient. Conversely, iM are based on the pairing of two cytosines. These bonds require protonation of one cytosine (like in triplex) to be formed efficiently. Cytosine hemiprotonation occurs at their pKa, which means at about pH 4.5. Originally, this condition hindered the exploitation of iM as intracellular functional elements. However, recently, the selection of antibodies targeting iM as well as the development of in cells NMR analyses of nucleic acids provided directed evidence to confirm iM occurrence in cells. (Dzatko et al. 2018; Zeraati et al. 2018). Consistently, their stability was found to be directly dependent upon the CO2 concentration used during cell growth: this is a consequence of the concomitant lowering of the intracellular environment but it also mimics the condition occurring at solid tumor grown sites. It is worth to remind that the above presented tetrahelices can be generated by the folding back of a single stranded G- or C-rich DNA fragment (for G4 or iM, respectively) or through the pairing of multiple strands. Along the chromosomes, it is feasible to consider the intramolecular arrangement as the most relevant. Moreover, it is worth to remind that for both for G4 or iM, the presence of four runs of consecutive guanines or cytosines, respectively, is required. These two requirements drive us back to the denaturation bubble where the opening of the helix generates two filaments of complementary base composition. Thus, in principle, G4 and iM might form independently at the two complementary strands. Up to date, the positive contribution of unwinding as driving force for G4 formation has been extensively reported. (Buglione et al. 2021) What is still a matter of debate is whether the G4 and iM can form simultaneously on the two strands of a single site. Indeed, several reports indicate that these quadruplexes are mutually exclusive due to steric hindrance (Cui et al. 2016; King et al. 2020). This negative interference might be addressed as an example of competition between G4 and iM to drive the preferential formation only of the most stable form. It is worth to underline that competition can occurs also between different non-canonical arrangements potentially occurring at the same strand of the double helix. Such a condition can be observed when more than four consecutive guanine runs are occurring at a single site. An example of multiple-G4 interplay combinations is found in the promoter of tyrosine hydroxylase, a protein that catalyzes biosynthesis of dopamine. This genomic domain contains 7 G-runs that can generate two partially overlapping, but exclusive, G4 structures. Noteworthy, the protein expression is differently controlled by them with one G4 enhancing and the other repressing transcription (Beals et al. 2022). Another frequently observed condition refers to the structural balance between hairpin vs quadruplex species (either G4 or iM) of a single G or C rich sequence. At these sites, the equilibria can be shifted toward the duplex or the quadruplex by modulation of the experimental conditions (i.e., by increasing K+ or lowering the pH, it will move toward the tetrahelical arrangements). The use of small molecules to selectively control these equilibria has been considered too. This approach was

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1103

applied at the promoter of BCL2 where it allowed to prove that the hairpin and the competing iM produce opposite effects on gene transcription (Kendrick et al. 2014). Noteworthy, the same can occur even when the two competing structures can form at sequences located close to each other, although out of phase. An example has been described at the TMPRSS2 gene, where a G-rich domain is inserted at the 30 end of a longer mirror sequence. As a result, the G4 formation compete with a hairpin, the two of them exerting opposite effect on the control of transcription (Sugimoto et al. 2022). These data suggest intriguing impact of neighboring sequences on the gene expression regulation which are strictly connected to DNA folding modulation. The relevance of the flanking domain on driving unique structural arrangement is actually widely represented. Several reports highlight how elongation of at 30 and 50 of a minimal G4 forming sequence might severely impact on the stability of the folded form. Still, whether the tetrahelix forms within the denaturation bubble, we cannot forget that it is surrounded by two double helix flanking domains. The structural behavior of this combination of secondary structures has been exploited by fluorescence resonance energy transfer, both in ensemble and at the single molecule level. The comparison of data acquired on models containing or not the flanking domains interestingly highlighted how addition of flanking ends results in biasing both the final equilibrium state and folding kinetics (Vesco et al. 2021). It is worth to point out that these effects are not conserved for all the G4 forming sites, likely as a result of the different energetic content of the involved species. This evidence pairs with the possible modifications of the non-canonical folding landscape according to the sequence composition, length, and topology of the flanking domains too. This issue emerges as critical when we aim to correlate a single DNA secondary structure to a specific physiological role. To derive proper correlations the tridimensional structure of the nucleic acid must be known but it must correspond to the one mainly relevant for the physiological process we are looking for. However, whenever we remove a fragment of DNA from its natural context, we must be aware that we can alter the final equilibrium state as well as the folding kinetics.

Functions of Non-canonical Structures at Gene Promoters The high attention that is currently devoted to the structural landscapes of nucleic acids is strictly related to the concept that different structures might represent different signals for the cells. Focusing on transcription regulation, this concept was mainly related to dissecting how the DNA folding may influence the recruitment of TF. This is not a trivial issue. Indeed, the recruitment of a transcription factor at a precise gene promotor site (i.e., at an enhancer or at a repressor) can play an inductive or a repressive effect on the gene transcription (Fig. 2a). In a simplified picture, the occurrence of a non-canonical DNA structure at any of these positions can lead to different output according to the location where they form and the impact they play on protein recruitment at that site.

1104

M. Palumbo and C. Sissi

Fig. 2 Schematic representation of the variegate consequences on transcription of proteins recruitment (Panel a) or G4 formation (Panel b) as a combination of their functions (activator or inhibitor) and DNA localization (at enhancer or repressor). Created in BioRender.com

As briefly introduced above, among the different non-canonical arrangements, G4s are the most extensively studied in terms of regulation of gene expression. This is due to several reasons. Since the discovery of G4 as physiologically relevant secondary structures, bioinformatics tools were designed and used to map the guanine distribution within the genome. Searches were mainly focused on the general formula G>3 N1–7 G>3 N1–7 G>3 N1–7 G>3 (where G are guanine and N any base) as validated consensus sequence for G4 formation. Such genome scanning protocols highlighted a relevant enrichment of these sequence motives near the Transcription Start Sites (TSS). This feature was phylogenetically shared between human and related organisms (i.e., chimpanzee, dog, mouse, rat, and so on) thus supporting a specific function for these genomic arrangements (Lipps and Rhodes 2009). It is amazing to add that, similarly to eukaryotes, G4 motives appear to be evolutionary conserved among microorganisms too, as shown by sequence analysis of bacterial genomes, which identified putative G4-forming sequences especially in promoter regions, with the gastrointestinal pathogen Helicobacter pylori counting up to 8000 G4 motifs. The location and abundance of these non-canonical DNA structures, point to a regulatory role for them that appears to particularly involve secondary metabolite synthesis and signal transduction pathways (Shankar et al. 2022). Additionally, beside the variable distribution of G4 along the genome, these structures are intrinsically polymorphic, each of them being potentially able to rearrange among multiple conformational states with low energetic barriers (Gray et al. 2009). Since any single structure-position-time combination can represent a specific signaling code, it is not at odd that Nature exploited them as flexible, tunable low-cost modulators of gene processing (Fig. 2b). The first (and up to date the most explored) working model considered that upon G4 formation, transcription factors should not be able to efficiently recognize their consensus sequences at the promoter and thus, to drive transcription (Fig. 3a). As a result, a suppression of the encoded protein expression is expected, and, on these bases, several strategies have been explored to induce/stabilize G4 at promoters of

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1105

Fig. 3 Different possible outputs on transcription efficiency associated to G4 formation at gene proximal promoters. Created in BioRender.com

those gene that need to be silenced. However, the experimental output turned out to be not so easily predictable with several G4 containing genes that resulted either silenced or overexpressed upon the treatment. This is the case of the promoter (and 30 -UTR) of the serine threonine kinase AKT1 where mutation experiments showed activation in the absence of the G forming sequences (Zhang et al. 2021; Zorzan et al. 2018). To rationalize these opposite evidences, it was proposed that only the stabilization of G4 structures at TF consensus sequences closer to the TSS and located on the leading strand can reduce transcription, because of the enhancer function of these domains (Fleming et al. 2019). Indeed, as anticipated, protein recruitment at proximal or distal promoter domains may work on opposite directions on the gene expression. As anticipated, this condition produces an articulate picture. In fact, even if we consider a very simplified model with just one enhancer and one repressor site, each of them recognized by one activator or one repressor, it is possible to observe two opposite effects (increment or reduction of transcription efficiency) according to the combination between genomic site and protein modulation. Moreover, if we keep in mind that multiple proteins work simultaneously to control gene expression, it is not easy to predict which is the best site to target. The lack of effective and selective modulators working according to single subsets of these paradigms do not help us in addressing a unified picture of the balance behind the G4-transcription connection and, even today, force us to get better knowledge into these molecular mechanisms of regulation. However, the integration of the original bioinformatics evidence with the nowadays widely accessible data of G4 distribution in living cells is providing novel insights that help to better describe the G4-transcription connection. Indeed, G4 are directly detectable in cells thanks to the use of carefully selected antibodies and, more recently, by the development of highly selective fluorescent probes (Biffi et al. 2014; Di Antonio et al. 2020; Masson et al. 2021). In this field interesting results were obtained using a common starting

1106

M. Palumbo and C. Sissi

scaffold based on pyridostatin, a small molecule with a good ability to discriminate between G4 and dsDNA, that has been progressively conjugated working through divergent chemical approaches (Fig. 4). The final optical response resulting from the recognition of G4 by these tools is based on different probes and molecular mechanisms. In the molecule described in Fig. 4a, the optimal recognition required a proper setup of the length of the linker used for the conjugation. As far it concerns the system represented in Fig. 4b, the optical response is obtained by immune-tagging a 5-bromo-20 -deoxyuridine (5-BrdU) connected to the G4-binding module. The novelty of this project rests on the fact that the functionalization of the G4-binding unit with the optical amplifier can be performed before or after the DNA recognition. Still, the images they provide overlap. This is important to rule out the detection of G4s merely induced by the presence of the probes. As a result, these tools not only confirm the existence of G4s in living cells, but by providing us images at progressively increasing sensitivity, they are expected to better focus their physiological functions. Nowadays, the implementation of punctual analyses of G4 structures and functions may refer on the merging of in cell imaging with the genome-wide mapping of G4 (Hänsel-Hertsch et al. 2018). Also in this field, several concerns were raised for

Fig. 4 Examples of functionalized bis-quinoline dicarboxamide derivatives (a and b) exploited as G4 fluorescent probes. In blue the G4-binding scaffold in purple and orange the reporter systems (the red fluorophore Silicon-Rhodamine in A and 5-BrdU in B)

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1107

the possible interference of the cell treatment with the final G4 landscape outputs. Now, novel protocols, like “cut&tag,” that do not require to fix the cells, have been exploited, thus reducing the possible introduction of bias related to the non-physiological treatment (Lyu et al. 2022). Moreover, sequencing methods have been implemented to provide information even at single cell level (Hui et al. 2021). The newest data confirmed that G4s represent a very dynamic population. As expected, they are found at region of open chromatin. What is more interesting is that the simple identification at genomic level of a primary DNA sequence compatible with G4 formation is not sufficient to grant the presence of the tetrahelical arrangement at that site. Indeed, G4 distribution is largely variable according to the cell type and the cell cycle step (Zyner et al. 2022). Concerning transcription, this opens a new critical issue since it suggests that TF actually are exposed to different G4 populations during the cell life. Thus, it appears as highly relevant the recent experimental pairing of sequencing strategies designed to simultaneously map G4 formation and TF binding in cells. The derived output was the unexpected one. Indeed, it was found that G4 can work as efficient spot to activate the effective recruitment of multiple TFs (Fig. 3b) (Lago et al. 2021; Robinson et al. 2021; Spiegel et al. 2021). Overall, G4s emerge as structural elements that actively contribute to the control of transcription according to several interconnected mechanisms. Still, a final remark must be introduced. Indeed, it is attractive to consider how they are not forming only at histone depleted region, thus suggesting that they might actively regulate histone deposition. As well, they have been considered as architectural elements involved in the definition of the 3D higher order chromatin structure as well as long range interactions. Accordingly, G4s can be considered as epigenetic modifications that drive an active transcriptionally based cell control.

Targeting G-quadruplexes for Medical Purposes The abovementioned issues well explain why, up to date, we fail in safely predicting in which direction a single G4 can move the expression of a single protein. This question does not belong to the DNA side only. Indeed, after almost 30 years of investigation, still we are suffering a dramatic paucity of G4 binders under clinical evaluation. It is well known that small organic molecules can efficiently interact with G4s. Whenever the complex occurs at sites where the G4 play a regulatory function, the straight consequence is a modulation of the involved physiological process. As a result, small molecules can be used to tune up the G4 interference for medical purposes, creating a new opportunity to develop innovative therapeutic agents. What are the structural requirements for an efficient G4 binder? Recalling the above summarized structural features of G4, on a general basis, a good ligand should be a positively charged molecule with an extended aromatic surface to efficiently

1108

M. Palumbo and C. Sissi

interact with the planar G4 portion, accompanied by an adequate steric fitting and flexibility. In more details, electrostatic interactions largely contribute to binding strength. In fact, for these peculiar tetraplex arrangements the charge density is higher than in other canonical and non-canonical structures. This basic contribution is however not directional as expected from the spherical distribution of charge density. The required directional forces needed to realize a specific binding for a comprise hydrogen bonding and hydrophobic interactions. This is by the way not sufficient for tight binding, which also requires optimal shape adaptation between G4 and ligand to arrive at the final stage where the G4 species form(s) on the right spot(s) to produce their biological effect(s) in a specific and selective way. The number of molecules endowed with the above requirements is exceptionally large, with the advantage of leaving enormous space to the chemist’s fantasy, and the disadvantage of the search representing a never-ending story with insufficient or too restricted focusing. In fact, more than a thousand G4-binding compounds are reported in the literature, many of which representing effective analytical tools, but as mentioned above, none has, until today, the dignity of drug. Also, the ligand-G4 interaction is not easy to investigate quantitatively. Let us forget for the moment issues such as binder solubility and self-aggregation which are surely relevant. For noncovalent G4-binders, the interaction process can be represented by the following ensemble of equations: ! G4  T ! G4  OT

G4 þ T G4 þ OT G4 þ L G4  L þ T G4  L þ OT LþT L þ OT

! G4  L ! G4  L  T ! G4  L  OT ! LT !

L  OT

where G4 is the nucleic acid, T is the selected physiological target of G4, OT an alternative off-target for the G4, and L the ligand species. This apparently simple series of equations refers to a single conformation of each species and does not explicitly consider the highly polymorphic character of the nucleic acid. Thus, the above equilibria should be written as the sum of the contributions deriving from each conformational state of the nucleic acid and, eventually, of target macromolecule(s). G4 þ L

! G4  L

! G4  L

where G4 is the nucleic acid in the “free” conformation and G4*corresponds to G4 in its “bound” state. However, further conformational contributions to the binding free energy must be considered when the G4-L-T complex forms, eventually involving both the nucleic acid and the binding target.

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

G4  L þ T

1109

! G4  L  T

where G4** is the conformation of G4 in the ternary complex. It follows that G4 6¼ G4* 6¼ G4** both structurally and energetically. Finally, we cannot forget to mention the importance to discriminate between those ligands that recognize a preformed G4 vs those that can efficiently promote the ds to G4 transition. This difference can be seen as a mere effect of different ranges of affinity for the two nucleic acid arrangements. However, this is just a part of the picture. Indeed, the preferential topological arrangements that the “free” G4 can assume in the “bound” state can be different according to the chemical structure of the ligand and can be translated into different biological responses. It appears clearly that the final G4 conformational balance is hard to predict and, consistently, its biological outcome. In fact, the ligand may function as a switch on or as a blocker. In the first case, it reverses the direction of gene expression possibly shifting the original G4 conformation, i.e., turns it on if it is off and vice versa or freezes the system in the on or off position by stabilizing the original G4 form. Again, this ignores the possibility of partial agonism, which would give an intermediate response, further complicating the outcome scenario.

The Lesson from the Studied Ligands Up to date, a vast number of compounds have been prepared and tested as G4 binders but, as anticipated, a general model for their contribution to transcription is not available yet. With the aim of identifying selective and efficient potential new drugs, most drug-design protocols look for molecules able to recognize one single G4 structure with high affinity. Likely, both goals might be individually not correct. In fact, we need combined optimal steric and electronic alignments to make sound comparisons, the final choice possibly deriving from an overall compromise between the two parameters. The modeling of a new binder is generally performed on a solved G4 structure. Alternatively hit identification (and optimization) can proceed by library screenings on G4 models in solution. In both instances the nucleic acid is out of its natural environment. This does not only comprise ions, proteins, and ligands in general but also the sequence context, which, as already discussed, can largely affect the G4 topology. Moreover, if we aim to obtain a specific biological response by targeting one G4, the binding affinity cannot be considered a solid parameter to rank the components of a library of compounds. This concept has been elegantly proved using a small number of G4-ligands on an in vitro DNA replication model. (Takahashi et al. 2021) The collected data showed that to properly correlate the rate of replication to the stability of the G4-ligand complex, it is mandatory to consider the topology of the bound nucleic acid. Indeed, the structural features influence, to different extent, not only the proteins recruitments but also the kinetics associated to G4 unfolding. Such a model perfectly fits the regulation of transcription as well and, although hardly

1110

M. Palumbo and C. Sissi

predictable, it must be explicitly considered. As a result, to improve the performance along the design of new G4-directed chemical entities, ligands might be profitably grouped according to interactions topology. Further, as far as conformation is concerned, there are different conditions that besides helping us in clustering the G4-ligand studied up to date might lead to applications that go beyond the therapeutic ones: – G4 shape complementary to ligand with no substantial conformational changes upon binding represents the ideal condition if we want to derive information on the functional role of one G4 in living cells. – A rigid G4 keeps the original conformation and eventually obliges the ligand to change. As far it concerns the nucleic acid, this condition overlaps the one above reported. However, the ligand rearrangement might be associated to a strong optical response (i.e., fluorescence) thus addressing these compounds as ideal probe to follow G4 with limited perturbation of their folding landscape. – A ligand more rigid than G4 forces the nucleic acid to a conformational change. Due to the extensive aromatic surface that characterizes most of the G4 binders, this group is quite extensively populated. Among them Rhodamine 6G converts most of the antiparallel G4 into their parallel forms (Trizna et al. 2021). In this class we can also include those derivatives that upon G4 interaction drive a destabilization of the tetrahelices. This is the case of two positional cationic porphyrin isomers TMPyP3 and TMPyP4 (Joshi et al. 2022). Their tight interaction with the parallel G4 formed in the promoter region of the human multidrug resistance protein 1 (MRP1) transporter gene produces a remarkable drop in its stability with consequent effects on regulation of gene expression. In this connection, a complementary tool is represented by the study of conformationally constrained G4. They can be obtained by covalently combining the natural sequences with synthetic units. This was exploited by using chemically modified nucleotides to reduce the polymorphism of the target sites. (Doboszewski et al. 2013). Noteworthy, some of them can even be compatible with the functional enzymatic processing of the DNA (De et al. 2015). Alternatively, the modification might derive from the conjugation of a potential ligand to a nucleobase. In one example, a deoxyuridine nucleotide has been covalently bound to a pyrene moiety. The resulting conjugates were inserted at various positions of the c-kit2 sequence, A G-rich domain known to form a parallel G4 at the proximal promoter of the KIT oncogene (Peterková et al. 2021). NMR analyses showed that the incorporation of the conjugate at the terminal positions leads to a stabilization of the parallel G4 because of aromatic stacking of the pyrene moiety onto the terminal G-quartets. On the opposite, substitution of nucleotides in the penta-loop of the G4 destabilizes the original G4 arrangement. As final critical issue, it must be mentioned that the derivatives analyzed so far suffer of poor selectivity. This parameter can be examined at various levels of stringency:

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1111

1. G4 vs dsDNA (or other non-canonical structures). This first level is quite well accomplished by most of the reported small molecules, but it is not sufficient to address a G4 ligand as a drug. Indeed, as proved for several G4 binders, the number of physiological pathways that are affected upon cell treatment is too large to grant patient safety (Hou et al. 2022). 2. G4 topology. This parameter is quite frequently evaluated during screening selection by ranking the interaction of tested ligands to G4 targets known to fold according to different topologies. However, the topological rearrangement of one G4 upon ligand binding might largely impact on fulfilling the expectation of this second level. Still, for those systems that preserve the original G4 topology, some positive results can be derived. This is the case of some new bisindolylmaleimide-derivatives that preferentially stabilize parallel G4s (Kumar et al. 2022). 3. One single G4. To move toward a smaller subset of recognized targets, the exploitation of ligands that “read” the loops or the flanking residues can be proficiently exploited. One recent example concerns a progressive decoration of the non-selective DNA binder, thiazole orange (Long et al. 2022). In particular, small aromatic domains were introduced to obtain cooperative multi-site interactions with flanking residues and loops of the G4-motif. In this case, the required high binding free energy arises from the sum of several small contributions. According to this drug-design strategy, two derivatives able to efficiently discriminate between G4 at telomere or c-MYC promoter were identified. Noteworthy, the high fluorescence emission of the bound ligands foresees their application for live cell imaging. With the aim to fulfill this stringent selectivity level, valuable “shortcuts” can be exploited when focusing on the targeting of G4s comprising incredibly unique structural domains. In most cases these unusual structures do not fulfill the general sequence requirements of a “normal G4” thus forcing to re-evaluate their potential distribution along the genome (Mukundan and Phan 2013). Two main examples deserve consideration. The first one refers to G4 structures that contains long loops. Usually, bioinformatic genome scans constrain the loops length to seven nucleotides. However, it was confirmed that whenever a hairpin can form within the long loop, the overall stability of the resulting hybrid structure is granted (Ngoc Nguyen et al. 2020; Ravichandran et al. 2021). This peculiar arrangement contains at the end two different DNA secondary structures that can be targeted at the same time. Such a sophisticate approach has been validated by using the hairpin-G4 hybrid structure found in the MYCN gene as target (Yang et al. 2021). The screening protocol allowed to address the unique hairpin-G4 interface as binding site for ligands with a low affinity but high selectivity. An additional group of “non-canonical G4” corresponds to those G-rich sites that do not contain the same number of guanines at all runs (i.e., one G2 and three G3

1112

M. Palumbo and C. Sissi

tracts) (Heddi et al. 2016). As a result, the folded G4 contains an incomplete G-tetrad and can be labeled as guanine-vacancy-bearing G4 (vG4). Potential vG4-forming motifs are widely distributed along the genome, again clustering in particular at gene promoters. Due to the lack of one guanine, the use of guanine analogues that can complete the tetrad has been explored as targeting strategy. The results were good. In this case, the issue to solve was related to the needed to increase the efficient delivery of the nucleobase at the vG4. Thus, a guanine was functionalized through a flexible linker with a G4-binding counterpart (i.e., a G4-binding peptide or a porphyrin) (Chen et al. 2021; He et al. 2020). As a result, it was observed that the two binding domains bind cooperatively only at vG4 thus providing constructs with high activity and selectivity. The use of these conjugates in cellular models confirmed that also vG4 can actively participate in the regulation of transcription and address vG4 binders as an emerging class of drugs specifically affecting gene expression. Finally, an approach still poorly exploited for potential medical application is represented by the use of G4-binding peptides. The design of this kind of ligands is based on the proper localization of aromatic residues along a flexible backbone. This is the case of the peptide QQWQQQQWQQ where multiple basic glutamines work as spacers to allow stacking of the tryptophan (Kundu et al. 2022). This peptide model destabilizes the dsDNA while efficiently interacts with the G4 forming sequence located in the nuclease-hypersensitive elements (NHE) III1 region of the c-Myc promoter with a considerable selectivity with reference to iM. Possibly, starting from short domains of known G4-binding proteins, soon it will be possible to derive libraries of G4-directed peptides among which it will be possible to identify selective ligands with targeted antiproliferative activity. The data here addressed are experimental proofs that by increasing the number of available data we will be more efficient in defining the proper “higher order” correlation between interaction and effect. As well, it is extremely useful to consider a large variability of the pharmacophore scaffolds that might generate unique output. The summary of the major pharmacophores so far considered for G4 targeting supports how efficient is the scientific community to pursue this goal (Fig. 5). Finally, Artificial Intelligence protocols could profitably give some useful hints when dealing with massive amounts of heterogeneous data.

Alternative G4 Arrangements as More Selective Targets As above discussed, the simultaneous binding to two or more non-canonical elements may provide the conditions for univocal, eventually positively cooperative, binding interactions. In this connection, the potential physiological roles of higher order G4 arrangements that might occur when two or more G4 units are close in space is standing for a new attractive challenge worth to play. These G4 repeats can represent functional units completely different from a single isolated G4. Originally,

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1113

Fig. 5 Summary of the major scaffolds considered as a pharmacophore core to build up G4 ligands

1114

M. Palumbo and C. Sissi

Fig. 6 Examples of multiple G4 modules arranged according to a “bead on a string” (a) or an “interacting beads” model (b), with the maximized number of G4s (c) or with the presence of a frustrating spacer (d). Created in BioRender.com

this issue clearly emerged looking at the telomeric sequence. Indeed, the unique feature of the telomere ends is the presence of a long 30 protruding single stranded chain composed of (TTAGGG) tandem repeats. Although most available structural studies focused on sequences including only four telomeric repeats, telomere composition clearly points out that, in the cell, the long 30 protruding ends can easily generate a chain of multiple consecutive G4s. For this system, two main models have been proposed (Fig. 6). The first one, called “beads on a string,” corresponds to the formation of individual and independent G4 units, in contrast to the second one, addressed as “interacting beads” where a direct interaction between consecutive G4s is envisaged (Monsen et al. 2021). In addition, it must be considered that experimental data do not always agree to support the formation of the maximal number of consecutive G4s in a long sequence. Indeed, independent G4s might occur at positions separated by a number of telomeric repeats lower than four, thus not able to fully accommodate an additional G4. In this case the two contiguous G4s will be separated by loops longer than the expected 3 nts. This model is addressed as “frustrated” (Carrino et al. 2021). The difference between these conditions is only apparently trivial. First, the overall tridimensional arrangements produced by these folding models are different. As a result, one can expect that those proteins involved in the maintenance of the telomere ends, might be prone to discriminate among them. A second issue is related to the possible structural rearrangement that an isolated G4 unit undergoes when it is recruited in a G4 repeat module, a condition extensively studied on the highly polymorphic telomeric long repeats (Monsen et al. 2021). Third, the formation of a G4-G4 surface is expected to provide a favorable energetic contribution for the G4-G4 interaction although, in most instances, it has been addressed as “transient.” Still, this site can be considered as an attractive alternative target from a medicinal chemistry point of view. Indeed, the G4-G4 surface is associated to an exceedingly small subset of higher order G4s making it a very poorly represented (thus highly selective!) site. Moreover, its breakup to

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1115

Fig. 7 Examples of imidazole (Panel a) or pyridostatin (Panel b) analogues designed to target G4 repeats

accommodate a ligand would be energetically more favorable in comparison to the intercalation between consecutive G-tetrads in the inner core of a G4. Still, it allows the binder to explore a wide aromatic surface on its top and bottom side. Already tested training models are for example dimeric aryl-substituted imidazoles (Fig. 7a) (Hu et al. 2020). In this case, the challenge is finding the best balance to preserve selectivity while increasing affinity. An alternative strategy to selectively target G4 repeats foresees the simultaneous recognition of the two outmost external tetrads. This provides the rational for the use of compounds that comprise two aromatic entities properly spaced by a linker to fit the distance of the two binding sites (Fig. 7b). The resulting system appears as a modular drug in which each single element can be individually optimized: the added value is the potential rapid set up of different ligands hopefully selective for a unique G4 repeat (Hu et al. 2020; Platella et al. 2020).

Physiological Relevance of Alternative G4 Repeats This short analysis reveals how hard it is to successfully address the folding of multiple G4 sequences, a condition that negatively affects an efficient rational design of potential new drugs. Up-to-date, multiple studies using a wide panel of experimental technical approaches have been applied to confirm the most relevant arrangement of G4 repeats in solution. Acquired data resulted to be frequently contradictory. This largely relies to the sequence selection. Indeed, as above mentioned, it is remarkable how the insertion or removal of even a single nucleotide at the 30 - or 50 -position may substantially affect the energetic landscapes of the folded forms thus resulting in highly variable distributions of the species under conditions that mimic the physiological environment. As well, whenever modest variations of the experimental conditions are applied, additional rearrangement can occur. Last, it must be

1116

M. Palumbo and C. Sissi

pointed out that, inside the nucleus, the overall picture is even more fuzzy because of the dynamic processes occurring on the nucleic acids, i.e., conversion from open to close chromatin, variation of supercoiling degree of the double helix, variable relative concentration of multiple DNA-binding proteins, and further effects due to the crowding conditions, with dramatic changes in the activity coefficients of all species. At the early stage of the investigation on G4 at gene promoters, the occurrence of interconnected multimeric G4s at gene promoter was not considered. A change on this perspective was driven by studies on c-MYB promoter where three (GGA)4 repeats are located 17 nts downstream the TSS on the antisense strand (Palumbo et al. 2008). The folding of this sequence is quite unique. It corresponds to a dimeric G4-G4 where each module comprises a G-tetrad and a GA heptad. The larger aromatic surface of the last grants an optimized stacking interaction between the two G4 units (Matsugami et al. 2003). The interest for this system was not merely linked to its unprecedented architecture but was amplified by proving that its occurrence impairs the progression of RNA polymerase and, consequently, it reduces gene expression (Broxson et al. 2011). On these bases, the occurrence of such higher order organization of G4 at gene promoter cannot be ruled out anymore. Still, the number of studies concerning the formation of G4 repeats in the gene promoters is limited to a very narrow subset of genes. These cover the promoters of h-TERT, ILPR, KIT, MYC, and k-RAS. (Monsen et al. 2020, 2022; Rigo and Sissi 2017; Salsbury and Lemkul 2019; Schonhoft et al. 2009). The reported data suggest that these long G4-repeats fold according to different topologies, with an overall variable number of G-tetrads and result into different higher order architectures. Regrettably, up-to-date, no prediction about the final arrangement on a novel G4-repeat can be safely performed. This is strictly connected to the paucity of studied G4 repeats, which, on turn, derives from the difficulties in resolving the overall complex equilibria they undergo in solution. To experimentally approach this issue, integrated structural biology studies are under development. Unfortunately, they are quite expensive in terms of amount of work, efforts to rationalize apparently divergent output from distinct experimental protocols and, finally, in terms of time needed to complete the proper assignment. As well, the present condition negatively impacts on the development of efficient protocols useful to predict with a sufficient accuracy genomic distribution of G4 repeats. Thus, it is demanding to develop proper fast screening procedures; hopefully upgradable to HTS, to identify them. But what about their effective roles within the cells? Are they dependent on the folding topology? As anticipated for isolated G4, a conserved approach was to rationalize the effect of G4 repeats formation on the gene expression considering that their formation close to the TSS might affect the activity/recruitment of transcription factors or the progression of RNA polymerase. This step directly prompted to reply to this question from a distinct perspective. As anticipated, the validation of a new nucleic acid folding motif as a physiologically relevant gene expression modulator should require the identification of proper binding partners, i.e., a transcription factor or a

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1117

selective helicase to release them (Sengupta et al. 2021). Moreover, whether the polymorphism of G4 repeat would be confirmed on a large scale, we should rule out the presence of single protein able to all these multiple G4s while the identification of multiple proteins able to preferentially interfere with a subset of these higher order structures should be expected. Interestingly, recent evidences are supporting this working model. Using a G4 repeat sequence as probe for nuclear proteins, Vimentin was identified (Ceschi et al. 2022). The relevance of this finding rests on the peculiarity of the interaction. Indeed, Vimentin was shown to be able to bind any G4 repeats with nanomolar affinity while no single G4 was recognized. Moreover, such interaction involves only the tetrameric form of the protein, which is the soluble one, and it blocks the microfilaments formation by Vimentin. The selectivity of Vimentin for adjacent G4 repeats is unprecedented and its binding mode to these alternative G4 arrangements might be addressed as a strategy to substantially enhance selectivity with no need to work on the primary sequence of the DNA. This appears the case. Indeed, the details of the effect that such protein-DNA complex exerts at the level of the gene expression are not fully addressed yet but, based on GO enrichment analysis of genes having putative G4 repeats within their promoters, an overlapping with the reported functions of the protein during epithelial-to-mesenchymal transition (EMT) and neurological development emerged. Notably, another intermediated filament protein, Lamin B has been reported to bind DNA and to actively participate in nuclear genome reorganization (PascualReguant et al. 2018). Moreover, in cell the protein Yin and Yang 1 (YY1) was confirmed to colocalize with G4 forming sites and this protein-G4 interaction is necessary to support the long-range DNA looping function of YY1 (Li et al. 2021). All these data open a novel scenario for G4 as epigenetic markers. Indeed, as above described, G4 can work “per se” at local level (i.e., a small portion of the promoter that interferes with histone deposition). The new accumulating evidences point out that they can also be involved in controlling the 3D genome architecture through the interaction with proteins, ultimately driving long-range interactions of distant genomic sites. An additional example in agreement with this model is provided by studies on High Mobility Group (HMG) proteins, a wide family of proteins extensively involved in genome architecture. Among them, it was experimentally validated that members of the High Mobility Group Box (HMGB) Nuclear HMGB proteins have broad regulatory potential on the cellular metabolism (Voong et al. 2021). Among the multiple physiological partners, nuclear HMGB proteins interact with high affinity with non-canonical DNA structures including G4 forming domains thus contributing to the organization of the higher order chromatin structure. HMGB1 and HMGB2 identify the boundaries of topologically associating domains (TADs). Still HMGB1 interacts and destabilizes G4 at KRAS promoter where it induces a reduction of protein expression (Amato et al. 2018). Chromosome architecture and protein recruitment/displacement are strictly intertwined to control transcription. It will be interesting to see in the future if we will succeed to derive a general model to describe the multiple involved equilibria as

1118

M. Palumbo and C. Sissi

well as the impact of G4 formation. What it appears as attractive is that these proteinDNA complexes might represent fruitful targets for therapeutic intervention. Indeed, several pathologies, including metastatic cancers, are dependent on an overexpression/hyperactivation of these structural proteins. However, most of them are considered highly undruggable. The possibility to work at the G4-protein interface opens new perspectives to modulate their activity thus leading to the development of a targeted therapeutic approach.

Epigenetics We have seen that, besides playing important roles in modulating transcription and replication, G4s are emerging as epigenetic markers. By rearranging the DNA folding, they could enable or hinder complex formation with other factors, ultimately affecting chromatin reshaping and nucleosome positioning. The G4 recognition elements remain the same but are used in a different context. However, if we look at epigenetic on the opposite direction, we can see that another yet insufficiently examined issue concerns the structural, dynamic and functional consequences of DNA modifications on G4 formation and their biological and pharmacological implications. The most well-known base modification relevant for G4 regulation is guanosine oxidation (Fleming and Burrows 2021). This is an extremely important reaction since G4 represent a hub for oxidation thought the combination of two main chemical pathways. From one side, the guanines at the 50 outmost tetrad of a G4 and those within the loops are highly prone to oxidation due to solvent accessibility. This mechanism pairs with the efficient transport of guanosine radical cations generate along the double helix at guanine rich sites where they accumulate. Cells efficiently handle oxidized guanines (GO) or the consequently generated apurinic site (AP) through the base excision repair pathway and the apurinic endonuclease 1 (APE1), respectively. However, either GO and AP impact on G4 stability. Thus, based on the above already discussed functional mechanisms, it is not at odd to observe that these pathways lead to either up- or downregulation of gene expression. Noteworthy, the frequent presence of more than four contiguous G-runs at gene promoters it has been proposed to counterbalance GO by supporting a sliding in G4 formation. Overall, it appears that oxidative stress-mediated lesions in the G-rich sequence may be considered as a very potent hotspot with amplified effects on transcription resulting from a highly articulated balance of complementary molecular mechanisms. Looking at other nucleobases, another common epigenetic mark is methylation. The N6-methylation of cytosine can occur significantly at CpG sites. Such a mechanism might increase the stability of the corresponding iM, as observed for example at the VEGF promoter (Kimura et al. 2022). More recently N6-methylation of adenosine along the eukaryotic genomic DNA have been considered as well. In human mature mRNA, this modification colocalizes with G4 forming domain thus possibly playing a role in mRNA processing. However, the same reaction can occur

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1119

also at DNA G4 forming sites where, as proved for a G4 at c-KIT promoter (kit1), it causes a destabilization of the tetrahelical arrangement (Laddachote et al. 2020). These results support the relevant contribution of epigenetic modifications to G4 formation (and their associated biological functions) even when guanines are not directly involved.

Conclusions and Perspectives In a complex container like a human cell, specific recognition elements allow molecules to form stable complexes with other molecules generating new species, which in turn recognize other species, and so forth. Hence, recognition is at the basis of all biologically relevant processes, from the simplest to the most complicate event. Dealing with nucleic acids, with the need to recruit a protein we have several basic recognition elements deriving from their chemical composition. These elements are the charged phosphate groups, the aromatic ATCG moieties, and the hydrogen bonding accepting (O, N)/donor (OH, NH) groups. This is at the basis of the formation of secondary and tertiary structures inter- and intra-molecular, both canonical and non-canonical. The highly polymorphic character of the nucleic acid allows to create various structures for the same sequence, each different from the others, with highly selective recognition ability. The latter will be then fully exploited when dealing with different base sequences linked to a common structural core, according to the length and composition of the intervening loops. It is immediately clear that an enormous variety of combinations is available to reach optimal specificity for the intended target. Focusing on G4 assembly, one of the arising problems is the modest energy difference of the various conformers, which makes them easily interchangeable. Still, the activation energy barrier between them might be sufficiently high to render interconversion not so easy, with the need to recruit a protein to solve the G4 or more simply, to bind them. A subsequent point deals with the effects of sequences immediately preceding or following the non-canonical arrangement, which may remarkably affect the in vivo conformation of the nucleic acid. Next to this is the precise information on which folded species really correspond to the active conformation within the cell. The use of “strong” binders, such as antibodies against G4 to reveal their presence, might in fact force the putative duplex-G4 conversion. Such a condition might lead to G4 overestimation with loss of focusing details rendering the results not easily interpretable. This is even more relevant in the presence of mutually excluding multiple putative G4s (or assembly of multiple non-canonical structures), which are intrinsically difficult to be properly identified and quantified in cell. Finally, we must remind that, in most instances, we have to deal with a population of conformations not being at equilibrium. This renders kinetic studies necessary (although extremely complex) to fully characterize the pathways leading to the biological response. Remember that a cell is not an equilibrium closed system, but it is open and far from equilibrium

1120

M. Palumbo and C. Sissi

given the continuous chemical processes that occur therein, and energy and matter exchanges with the environment. Concerning G4s interactions with proteins, they can be subdivided into two main classes of process: (1) G4 winding/unwinding reactions conducted by helicases, and (2) recruitment of those proteins to assemble and stabilize the complexes needed to regulate gene expression (i.e. transcription factors), chromatin remodeling (i.e., methyl transferases), or DNA repair proteins (Shu et al. 2022). Given the high numbers of factors required to bind G4 structures to form the functionally active promoter complex, the first consideration is related to the protein/ G4 ratio in the complex and their binding mode. Reasonably, the G4 moiety can directly recruit one or two factors at a time given the relative dimensions of the interacting species. Then, complete assembly of the activation complex should derive from the induced fit conformational changes occurring by (cooperative) incorporation of additional protein components until the complex is fully assembled and functionally active. Evidence that many factors exhibit high G4 affinity might be indicative of multiple G4-binding sites. However, this is unlikely to be the case given the unicity of the transcript being formed at each replication origin. Considering that G4 binding to transcription factors represents the first event occurring in promoter formation, the multiplicity of G4-binding proteins concurring to assemble the promoter may be indicative of several different promoters containing (in part) the same transcription factors organized in diverse ways. This represents a smart approach to produce several promoter combinations, each able to activate transcription of a specific gene using common modules in a combinatorial way. Amazingly, the same simple chemical recognition combination model can be used to discriminate a nucleotide, a nucleic acid structure as such or when it is part of a complex machinery like transcription complex. Given the multiple roles played by G4 binders, much remains to be clarified about the connections between loss of (onco)gene expression control, onset of genomic instability, and altered DNA damage and repair mechanisms in the progression of severe pathologies like cancer and neurological disorders. In this connection, we need to successfully rationalize the effects of the large family of G4 ligands on the tetraplex structure starting from common protocols, since non-negligible sources of inconsistency arise from different experimental conditions used in different studies. Until recently, thousands of investigations were performed all over the world, yet we have no drug at hand. We need to be able to predict the precise location of relevant interactions and demonstrate the real ability of the ligand to reach the intended cellular target, i.e., selectively bind to the G4 involved in the process of interest. Otherwise, we end up with a plethora of potential chemotherapeutic agents still devoid of the necessary selectivity. The fact that Mother Nature uses a complex of several proteins to obtain a selective promoter for a given gene is not very encouraging in this sense. Other approaches can be attempted based upon targeting a disease-related protein (enzyme), but a discussion on this matter goes beyond the scope of the present report. At the end, we need to remind that all these multifaceted issues apply not only at the DNA-G4 ensemble but should be extended to RNA and DNA-RNA

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1121

G-quadruplexes, also present at several transcripts, as well as to all the other possible secondary structures and their relative combinations (Millevoi et al. 2012). A final statistical consideration. Recent research was performed to evaluate the impact of the single-nucleotide variations (SNV), the most frequent genetic modifications present in the human genome, when they occur within a G-quadruplex motif (G4V) (Gong et al. 2021). About five million G4Vs were identified, 3.5 million of which within genes, remarkably overlapping with transcription factor binding sequences. The data indicate that G4Vs can significantly affect gene expression with biological and pathological implications to be considered in personalized medicine and health risk assessment. Acknowledgments This work was supported by AIRC (grant IG 2021 - ID26474, PI CS), CERIC ERIC (grants No. 20207052 and 20202174) and European Union-Next GenerationEU (PNRR M4C2-Investimento 1.4- CN00000041).

References Amato J, Madanayake TW, Iaccarino N, Novellino E, Randazzo A, Hurley LH, Pagano B (2018) HMGB1 binds to the KRAS promoter G-quadruplex: a new player in oncogene transcriptional regulation? Chem Commun (Camb) 54:9442–9445. https://doi.org/10.1039/c8cc03614d Bacolla A, Wang G, Vasquez KM (2015) New perspectives on DNA and RNA triplexes as effectors of biological activity. PLoS Genet 11:e1005696. https://doi.org/10.1371/journal.pgen.1005696 Beals N, Farhath MM, Kharel P, Croos B, Mahendran T, Johnson J, Basu S (2022) Rationally designed DNA therapeutics can modulate human TH expression by controlling specific GQ formation in its promoter. Mol Ther 30:831–844. https://doi.org/10.1016/j.ymthe.2021.05.013 Biffi G, Di Antonio M, Tannahill D, Balasubramanian S (2014) Visualization and selective chemical targeting of RNA G-quadruplex structures in the cytoplasm of human cells. Nat Chem 6:75–80. https://doi.org/10.1038/nchem.1805 Bowater RP, Bohálová N, Brázda V (2022) Interaction of proteins with inverted repeats and cruciform structures in nucleic acids. Int J Mol Sci 23:6171. https://doi.org/10.3390/ ijms23116171 Broxson C, Beckett J, Tornaletti S (2011) Transcription arrest by a G Quadruplex formingtrinucleotide repeat sequence from the human c-myb gene. Biochemistry 50:4162–4172. https://doi.org/10.1021/bi2002136 Buglione E, Salerno D, Marrano CA, Cassina V, Vesco G, Nardo L, Dacasto M, Rigo R, Sissi C, Mantegazza F (2021) Nanomechanics of G-quadruplexes within the promoter of the KIT oncogene. Nucleic Acids Res 49:4564–4573. https://doi.org/10.1093/nar/gkab079 Carrino S, Hennecker CD, Murrieta AC, Mittermaier A (2021) Frustrated folding of guanine quadruplexes in telomeric DNA. Nucleic Acids Res 49:3063–3076. https://doi.org/10.1093/ nar/gkab140 Ceschi S, Berselli M, Cozzaglio M, Giantin M, Toppo S, Spolaore B, Sissi C (2022) Vimentin binds to G-quadruplex repeats found at telomeres and gene promoters. Nucleic Acids Res 50: 1370–1381. https://doi.org/10.1093/nar/gkab1274 Chen J, He Y, Liang H, Cai T, Chen Q, Zheng K (2021) Regulation of PDGFR-β gene expression by targeting the G-vacancy bearing G-quadruplex in promoter. Nucleic Acids Res 49: 12634–12643. https://doi.org/10.1093/nar/gkab1154 Choudhury SR, Dutta S, Bhaduri U, Rao MRS (2021) LncRNA Hmrhl regulates expression of cancer related genes in chronic myelogenous leukemia through chromatin association. NAR Cancer 3:zcab042. https://doi.org/10.1093/narcan/zcab042

1122

M. Palumbo and C. Sissi

Cui Y, Kong D, Ghimire C, Xu C, Mao H (2016) Mutually exclusive formation of G-Quadruplex and i-motif is a general phenomenon governed by steric hindrance in duplex DNA. Biochemistry 55:2291–2299. https://doi.org/10.1021/acs.biochem.6b00016 De S, Groaz E, Margamuljana L, Abramov M, Marlière P, Herdewijn P (2015) Sulfonate derived phosphoramidates as active intermediates in the enzymatic primer-extension of DNA. Org Biomol Chem 13:3950–3962. https://doi.org/10.1039/C5OB00157A Di Antonio M, Ponjavic A, Radzevičius A, Ranasinghe RT, Catalano M, Zhang X, Shen J, Needham L-M, Lee SF, Klenerman D, Balasubramanian S (2020) Single-molecule visualization of DNA G-quadruplex formation in live cells. Nat Chem 12:832–837. https://doi.org/10.1038/ s41557-020-0506-4 Doboszewski B, Groaz E, Herdewijn P (2013) Synthesis of Phosphonoglycine backbone units for the development of phosphono peptide nucleic acids: phosphono peptide nucleic acids. Eur J Org Chem 2013:4804–4815. https://doi.org/10.1002/ejoc.201300523 Dzatko S, Krafcikova M, Hänsel-Hertsch R, Fessl T, Fiala R, Loja T, Krafcik D, Mergny J-L, Foldynova-Trantirkova S, Trantirek L (2018) Evaluation of the stability of DNA i-motifs in the nuclei of living mammalian cells. Angew Chem Int Ed 57:2165–2169. https://doi.org/10.1002/ anie.201712284 Fleming AM, Burrows CJ (2021) Oxidative stress-mediated epigenetic regulation by G-quadruplexes. NAR Cancer 3:zcab038. https://doi.org/10.1093/narcan/zcab038 Fleming AM, Zhu J, Ding Y, Burrows CJ (2019) Location dependence of the transcriptional response of a potential G-quadruplex in gene promoters under oxidative stress. Nucleic Acids Res 47:5049–5060. https://doi.org/10.1093/nar/gkz207 Gong J, Wen C, Tang M, Duan R, Chen J, Zhang J, Zheng K, He Y, Hao Y, Yu Q, Ren S, Tan Z (2021) G-quadruplex structural variations in human genome associated with single-nucleotide variations and their impact on gene activity. Proc Natl Acad Sci U S A 118:e2013230118. https://doi.org/10.1073/pnas.2013230118 Gray RD, Li J, Chaires JB (2009) Energetics and kinetics of a conformational switch in G-quadruplex DNA. J Phys Chem B 113:2676–2683. https://doi.org/10.1021/jp809578f Grün JT, Schwalbe H (2022) Folding dynamics of polymorphic G-QUADRUPLEX structures. Biopolymers 113. https://doi.org/10.1002/bip.23477 Hänsel-Hertsch R, Spiegel J, Marsico G, Tannahill D, Balasubramanian S (2018) Genome-wide mapping of endogenous G-quadruplex DNA structures by chromatin immunoprecipitation and high-throughput sequencing. Nat Protoc 13:551–564. https://doi.org/10.1038/nprot.2017.150 He Y, Zheng K, Wen C, Li X, Gong J, Hao Y, Zhao Y, Tan Z (2020) Selective targeting of guaninevacancy-bearing G-Quadruplexes by G-quartet complementation and stabilization with a guanine–peptide conjugate. J Am Chem Soc 142:11394–11403. https://doi.org/10.1021/jacs. 0c00774 Heddi B, Martín-Pintado N, Serimbetov Z, Kari TMA, Phan AT (2016) G-quadruplexes with (4 n – 1) guanines in the G-tetrad core: formation of a G-triadwater complex and implication for small-molecule binding. Nucleic Acids Res 44:910–916. https://doi.org/10.1093/nar/gkv1357 Hou Y, Gan T, Fang T, Zhao Y, Luo Q, Liu X, Qi L, Zhang Y, Jia F, Han J, Li S, Wang S, Wang F (2022) G-quadruplex inducer/stabilizer pyridostatin targets SUB1 to promote cytotoxicity of a transplatinum complex. Nucleic Acids Res 50:3070–3082. https://doi.org/10.1093/nar/gkac151 Hu M-H, Lin X-T, Liu B, Tan J-H (2020) Dimeric aryl-substituted imidazoles may inhibit ALT cancer by targeting the multimeric G-quadruplex in telomere. Eur J Med Chem 186:111891. https://doi.org/10.1016/j.ejmech.2019.111891 Hui WWI, Simeone A, Zyner KG, Tannahill D, Balasubramanian S (2021) Single-cell mapping of DNA G-quadruplex structures in human cancer cells. Sci Rep 11:23641. https://doi.org/10. 1038/s41598-021-02943-3 Joshi S, Singh A, Kukreti S (2022) Porphyrin induced structural destabilization of a parallel DNA G-quadruplex in human MRP1 gene promoter. J Mol Recognit 35. https://doi.org/10.1002/ jmr.2950

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1123

Kendrick S, Kang H-J, Alam MP, Madathil MM, Agrawal P, Gokhale V, Yang D, Hecht SM, Hurley LH (2014) The dynamic character of the BCL2 promoter i-motif provides a mechanism for modulation of gene expression by compounds that bind selectively to the alternative DNA hairpin structure. J Am Chem Soc 136:4161–4171. https://doi.org/10.1021/ja410934b Khristich AN, Mirkin SM (2020) On the wrong DNA track: molecular mechanisms of repeatmediated genome instability. J Biol Chem 295:4134–4170. https://doi.org/10.1074/jbc. REV119.007678 Kimura K, Oshikawa D, Ikebukuro K, Yoshida W (2022) Stabilization of VEGF i-motif structure by CpG methylation. Biochem Biophys Res Commun 594:88–92. https://doi.org/10.1016/j. bbrc.2022.01.054 King JJ, Irving KL, Evans CW, Chikhale RV, Becker R, Morris CJ, Peña Martinez CD, Schofield P, Christ D, Hurley LH, Waller ZAE, Iyer KS, Smith NM (2020) DNA G-Quadruplex and i-motif structure formation is interdependent in human cells. J Am Chem Soc 142:20600–20604. https://doi.org/10.1021/jacs.0c11708 Kumar S, Reddy Sannapureddi RK, Todankar CS, Ramanathan R, Biswas A, Sathyamoorthy B, Pradeepkumar PI (2022) Bisindolylmaleimide ligands stabilize c-MYC G-Quadruplex DNA structure and downregulate gene expression. Biochemistry 61:1064–1076. https://doi.org/10. 1021/acs.biochem.2c00116 Kundu N, Sharma T, Kaur S, Singh M, Kumar V, Sharma U, Jain A, Shankaraswamy J, Miyoshi D, Saxena S (2022) Significant structural change in human c-Myc promoter G-quadruplex upon peptide binding in potassium. RSC Adv 12:7594–7604. https://doi.org/10.1039/D2RA00535B Laddachote S, Nagata M, Yoshida W (2020) Destabilisation of the c-kit1 G-quadruplex structure by N6-methyladenosine modification. Biochem Biophys Res Commun 524:472–476. https://doi. org/10.1016/j.bbrc.2020.01.116 Lago S, Nadai M, Cernilogar FM, Kazerani M, Domíniguez Moreno H, Schotta G, Richter SN (2021) Promoter G-quadruplexes and transcription factors cooperate to shape the cell typespecific transcriptome. Nat Commun 12:3885. https://doi.org/10.1038/s41467-021-24198-2 Li L, Williams P, Ren W, Wang MY, Gao Z, Miao W, Huang M, Song J, Wang Y (2021) YY1 interacts with guanine quadruplexes to regulate DNA looping and gene expression. Nat Chem Biol 17:161–168. https://doi.org/10.1038/s41589-020-00695-1 Lipps HJ, Rhodes D (2009) G-quadruplex structures: in vivo evidence and function. Trends Cell Biol 19:414–422. https://doi.org/10.1016/j.tcb.2009.05.002 Long W, Zheng B-X, Li Y, Huang X-H, Lin D-M, Chen C-C, Hou J-Q, Ou T-M, Wong W-L, Zhang K, Lu Y-J (2022) Rational design of small-molecules to recognize G-quadruplexes of c-MYC promoter and telomere and the evaluation of their in vivo antitumor activity against breast cancer. Nucleic Acids Res 50:1829–1848. https://doi.org/10.1093/nar/gkac090 Lyu J, Shao R, Kwong Yung PY, Elsässer SJ (2022) Genome-wide mapping of G-quadruplex structures with CUT&Tag. Nucleic Acids Res 50:e13. https://doi.org/10.1093/nar/gkab1073 Manzo SG, Hartono SR, Sanz LA, Marinello J, De Biasi S, Cossarizza A, Capranico G, Chedin F (2018) DNA topoisomerase I differentially modulates R-loops across the human genome. Genome Biol 19:100. https://doi.org/10.1186/s13059-018-1478-1 Masson T, Landras Guetta C, Laigre E, Cucchiarini A, Duchambon P, Teulade-Fichou M-P, Verga D (2021) BrdU immuno-tagged G-quadruplex ligands: a new ligand-guided immunofluorescence approach for tracking G-quadruplexes in cells. Nucleic Acids Res 49:12644–12660. https://doi. org/10.1093/nar/gkab1166 Matsugami A, Okuizumi T, Uesugi S, Katahira M (2003) Intramolecular higher order packing of parallel quadruplexes comprising a G:G:G:G tetrad and a G(:A):G(:A):G(:A):G heptad of GGA triplet repeat DNA. J Biol Chem 278:28147–28153. https://doi.org/10.1074/jbc.M303694200 Millevoi S, Moine H, Vagner S (2012) G-quadruplexes in RNA biology. Wiley Interdiscip Rev RNA 3:495–507. https://doi.org/10.1002/wrna.1113 Monsen RC, DeLeeuw L, Dean WL, Gray RD, Sabo TM, Chakravarthy S, Chaires JB, Trent JO (2020) The hTERT core promoter forms three parallel G-quadruplexes. Nucleic Acids Res 48: 5720–5734. https://doi.org/10.1093/nar/gkaa107

1124

M. Palumbo and C. Sissi

Monsen RC, Chakravarthy S, Dean WL, Chaires JB, Trent JO (2021) The solution structures of higher-order human telomere G-quadruplex multimers. Nucleic Acids Res 49:1749–1768. https://doi.org/10.1093/nar/gkaa1285 Monsen RC, DeLeeuw LW, Dean WL, Gray RD, Chakravarthy S, Hopkins JB, Chaires JB, Trent JO (2022) Long promoter sequences form higher-order G-quadruplexes: an integrative structural biology study of c-Myc, k-Ras and c-Kit promoter sequences. Nucleic Acids Res 50: 4127–4147. https://doi.org/10.1093/nar/gkac182 Mukundan VT, Phan AT (2013) Bulges in G-Quadruplexes: broadening the definition of GQuadruplex-forming sequences. J Am Chem Soc 135:5017–5028. https://doi.org/10.1021/ ja310251r Ngoc Nguyen TQ, Lim KW, Phan AT (2020) Duplex formation in a G-quadruplex bulge. Nucleic Acids Res 48:10567–10575. https://doi.org/10.1093/nar/gkaa738 Palumbo SL, Memmott RM, Uribe DJ, Krotova-Khan Y, Hurley LH, Ebbinghaus SW (2008) A novel G-quadruplex-forming GGA repeat region in the c-myb promoter is a critical regulator of promoter activity. Nucleic Acids Res 36:1755–1769. https://doi.org/10.1093/nar/gkm1069 Pascual-Reguant L, Blanco E, Galan S, Le Dily F, Cuartero Y, Serra-Bardenys G, Di Carlo V, Iturbide A, Cebrià-Costa JP, Nonell L, de Herreros AG, Di Croce L, Marti-Renom MA, Peiró S (2018) Lamin B1 mapping reveals the existence of dynamic and functional euchromatin lamin B1 domains. Nat Commun 9:3420. https://doi.org/10.1038/s41467-018-05912-z Patange S, Ball DA, Karpova TS, Larson DR (2021) Towards a ‘spot on’ understanding of transcription in the nucleus. J Mol Biol 433:167016. https://doi.org/10.1016/j.jmb.2021.167016 Peterková K, Durník I, Marek R, Plavec J, Podbevšek P (2021) c-kit2 G-quadruplex stabilized via a covalent probe: exploring G-quartet asymmetry. Nucleic Acids Res 49:8947–8960. https://doi. org/10.1093/nar/gkab659 Petermann E, Lan L, Zou L (2022) Sources, resolution and physiological relevance of R-loops and RNA–DNA hybrids. Nat Rev Mol Cell Biol. https://doi.org/10.1038/s41580-022-00474-x Platella C, Pirota V, Musumeci D, Rizzi F, Iachettini S, Zizza P, Biroccio A, Freccero M, Montesarchio D, Doria F (2020) Trifunctionalized naphthalene Diimides and dimeric analogues as G-Quadruplex-targeting anticancer agents selected by affinity chromatography. Int J Mol Sci 21:1964. https://doi.org/10.3390/ijms21061964 Ravichandran S, Razzaq M, Parveen N, Ghosh A, Kim KK (2021) The effect of hairpin loop on the structure and gene expression activity of the long-loop G-quadruplex. Nucleic Acids Res 49: 10689–10706. https://doi.org/10.1093/nar/gkab739 Rigo R, Sissi C (2017) Characterization of G4-G4 crosstalk in the c-KIT promoter region. Biochemistry 56:4309–4312. https://doi.org/10.1021/acs.biochem.7b00660 Robinson J, Raguseo F, Nuccio SP, Liano D, Di Antonio M (2021) DNA G-quadruplex structures: more than simple roadblocks to transcription? Nucleic Acids Res 49:8419–8431. https://doi.org/ 10.1093/nar/gkab609 Rodriguez J, Larson DR (2020) Transcription in living cells: molecular mechanisms of bursting. Annu Rev Biochem 89:189–212. https://doi.org/10.1146/annurev-biochem-011520-105250 Salsbury AM, Lemkul JA (2019) Molecular dynamics simulations of the c-kit1 promoter G-Quadruplex: importance of electronic polarization on stability and cooperative ion binding. J Phys Chem B 123:148–159. https://doi.org/10.1021/acs.jpcb.8b11026 Schonhoft JD, Bajracharya R, Dhakal S, Yu Z, Mao H, Basu S (2009) Direct experimental evidence for quadruplex–quadruplex interaction within the human ILPR. Nucleic Acids Res 37: 3310–3320. https://doi.org/10.1093/nar/gkp181 Sengupta A, Roy SS, Chowdhury S (2021) Non-duplex G-Quadruplex DNA structure: a developing story from predicted sequences to DNA structure-dependent Epigenetics and beyond. Acc Chem Res 54:46–56. https://doi.org/10.1021/acs.accounts.0c00431 Shankar U, Mishra SK, Jain N, Tawani A, Yadav P, Kumar A (2022) Ni+2 permease system of helicobacter pylori contains highly conserved G-quadruplex motifs. Infect Genet Evol 101: 105298. https://doi.org/10.1016/j.meegid.2022.105298

34

DNA Structural Elements as Potential Targets for Regulation of Gene Expression

1125

Shu H, Zhang R, Xiao K, Yang J, Sun X (2022) G-Quadruplex-binding proteins: promising targets for drug design. Biomol Ther 12:648. https://doi.org/10.3390/biom12050648 Spiegel J, Cuesta SM, Adhikari S, Hänsel-Hertsch R, Tannahill D, Balasubramanian S (2021) G-quadruplexes are transcription factor binding hubs in human chromatin. Genome Biol 22:117. https://doi.org/10.1186/s13059-021-02324-z Sugimoto W, Kinoshita N, Nakata M, Ohyama T, Tateishi-Karimata H, Nishikata T, Sugimoto N, Miyoshi D, Kawauchi K (2022) Intramolecular G-quadruplex-hairpin loop structure competition of a GC-rich exon region in the TMPRSS2 gene. Chem Commun 58:48–51. https://doi.org/ 10.1039/D1CC05523B Takahashi S, Kotar A, Tateishi-Karimata H, Bhowmik S, Wang Z-F, Chang T-C, Sato S, Takenaka S, Plavec J, Sugimoto N (2021) Chemical modulation of DNA replication along G-Quadruplex based on topology-dependent ligand binding. J Am Chem Soc 143: 16458–16469. https://doi.org/10.1021/jacs.1c05468 Trizna L, Janovec L, Halaganová A, Víglaský V (2021) Rhodamine 6G-ligand influencing G-Quadruplex stability and topology. Int J Mol Sci 22:7639. https://doi.org/10.3390/ ijms22147639 Vesco G, Lamperti M, Salerno D, Marrano CA, Cassina V, Rigo R, Buglione E, Bondani M, Nicoletto G, Mantegazza F, Sissi C, Nardo L (2021) Double-stranded flanking ends affect the folding kinetics and conformational equilibrium of G-quadruplexes forming sequences within the promoter of KIT oncogene. Nucleic Acids Res 49:9724–9737. https://doi.org/10.1093/nar/ gkab674 Voong CK, Goodrich JA, Kugel JF (2021) Interactions of HMGB proteins with the genome and the impact on disease. Biomol Ther 11:1451. https://doi.org/10.3390/biom11101451 Yang M, Carter S, Parmar S, Bume DD, Calabrese DR, Liang X, Yazdani K, Xu M, Liu Z, Thiele CJ, Schneekloth JS (2021) Targeting a noncanonical, hairpin-containing G-quadruplex structure from the MYCN gene. Nucleic Acids Res 49:7856–7869. https://doi.org/10.1093/nar/gkab594 Zeraati M, Langley DB, Schofield P, Moye AL, Rouet R, Hughes WE, Bryan TM, Dinger ME, Christ D (2018) I-motif DNA structures are formed in the nuclei of human cells. Nat Chem 10: 631–637. https://doi.org/10.1038/s41557-018-0046-3 Zhang L, Yan T, Wang W, Wu Q, Li G, Li D, Stovall DB, Wang Y, Li Y, Sui G (2021) AKT1 is positively regulated by G-quadruplexes in its promoter and 30 -UTR. Biochem Biophys Res Commun 561:93–100. https://doi.org/10.1016/j.bbrc.2021.05.029 Zorzan E, Elgendy R, Giantin M, Dacasto M, Sissi C (2018) Whole-transcriptome profiling of canine and human in vitro models exposed to a G-Quadruplex binding small molecule. Sci Rep 8:17107. https://doi.org/10.1038/s41598-018-35516-y Zyner KG, Simeone A, Flynn SM, Doyle C, Marsico G, Adhikari S, Portella G, Tannahill D, Balasubramanian S (2022) G-quadruplex DNA structures in human stem cells and differentiation. Nat Commun 13:142. https://doi.org/10.1038/s41467-021-27719-1

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

35

Tamaki Endoh, Hisae Tateishi-Karimata, and Naoki Sugimoto

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physicochemical and Molecular Factors Influencing Nucleic Acid Structures and Their Stabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structural Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Environmental Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Model Experimental Systems In Vitro to Investigate the Effects of Molecular Crowding on Biomolecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of Co-solutes to Mimic the Intracellular Molecular Environment . . . . . . . . Change in Solution Properties by the Addition of Co-solutes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effects of Molecular Crowding Environments on Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effects of Molecular Crowding on Canonical Duplexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effects of Molecular Crowding on Noncanonical DNA Structures and Stabilities . . . . . . . Effects of Molecular Crowding on RNA Structure and Functions . . . . . . . . . . . . . . . . . . . . . . . . Biological Reactions Influenced by Nucleic Acid Structures and Their Stabilities . . . . . . . . . . . Effects of Nucleic Acid Structures on DNA Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effects of Nucleic Acid Structures on RNA Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effects of Nucleic Acid Structures on Protein Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effects of Nucleic Acid Structures on Concurrent Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1128 1132 1132 1134 1135 1136 1138 1139 1139 1143 1153 1158 1159 1161 1163 1165 1167 1167

T. Endoh · H. Tateishi-Karimata Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan e-mail: [email protected]; [email protected] N. Sugimoto (*) Frontier Institute for Biomolecular Engineering Research (FIBER), Konan University, Kobe, Japan Graduate School of Frontiers of Innovative Research in Science and Technology (FIRST), Konan University, Kobe, Japan e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_40

1127

1128

T. Endoh et al.

Abstract

More than half a century has passed since Francis Crick proposed the concept of “Central Dogma” and a sequence hypothesis, which are the fundamental bases of the life system. The term “gene” was coined by Gregor Johann Mendel for a factor that can transmit information over biological generations and defines the phenotypes of organisms. The main identity of “gene” as a physical material is DNA, and the word “gene” is often used to refer to the DNA region that encodes a protein and the region involved in the modulation of protein expression. Modulation of gene expression could occur in any reaction process in the central dogma such as at the transcriptional, posttranscriptional, translational, or posttranslational level. Since proteins were hitherto considered as molecules responsible for modulation, recognition mechanisms of nucleic acids (DNA and RNA) by proteins have been generally focused on to unveil detailed molecular mechanisms of the modulation process. In addition to proteins, DNA and RNA are macromolecules consisting of linearly polymerized monomer units, adenine (A), guanine (G), cytosine (C), and thymine (T) [or uracil (U) in RNA], that form secondary and tertiary structures based on the nucleotide sequence. Although nucleic acids have been thought to be simply recognized by proteins, it is becoming clear that they contribute actively to the modulation of gene expression based on their tertiary structures and stabilities. The structures and stabilities of nucleic acids are influenced by the surrounding molecular environment. In particular, the molecular environment in cells is characterized by high concentrations of biomolecules, which is known to create a crowding environment. Because the modulation of gene expression based on the structures and stabilities of nucleic acids have been optimized to function in such a crowding molecular environment, it is important to understand the behaviors of nucleic acids in the crowding environment. In this chapter, the effects of molecular crowding environments on the structures, stabilities, and functions of nucleic acids are discussed, with emphasis on the effects provided by physicochemical properties under the crowding environments that are different from those in a diluted aqueous solution.

Introduction The molecular environment inside a cell contains extremely high concentrations of biomolecules in various sizes. Macromolecules include mainly nucleic acids, proteins, and polysaccharides. Small molecules include nucleotides, amino acids, sugars, fatty acids, and other small organic molecules, which are the precursors for the biosynthesis of macromolecules and those for production of chemical energies. Intracellular biomolecules occupy 20–40% of the cellular volume, and their mass concentration is suggested to reach several hundreds of milligrams per milliliter of cellular volume (Fig. 1) (Nakano et al. 2014b). Taking E. coli as an example, about 30% of the total weight is from organic compounds, approximately 70% is from water, and approximately 0.3% is from inorganic compounds. Of

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1129

Fig. 1 An intracellular crowding environment created by biomolecules. (Reprinted with permission from Nakano et al. (2014b). Copyright 2014 American Chemical Society)

these organic compounds, approximately 55% are proteins (approximately 17% of the total cell weight). DNA and RNA account for about 1% and 6% of the weight of E. coli, respectively. Lipids, carbohydrates, and low-molecular-weight compounds such as metabolites account for approximately several percent. It is also known that the inside of E. coli is in a glass-like state, and the production of metabolites ensures the fluidity of biomolecules (Campos et al. 2014). The most abundant metabolites are amino acids, among which glutamate is present in majority. In addition, proline, sucrose, betaine, glycine betaine, trimethylamineN-oxide (TMAO), and urea are present in high concentrations as osmoregulatory molecules. Adenosine triphosphate (ATP) is reported to function as a hydrotrope to dissolve biomolecules in the cell, which are in an oversaturated state (Patel et al. 2017). In eukaryotic cells, in addition to the highly concentrated molecular environment, there are many types of intracellular organelles consisting of lipid bilayer membranes, such as mitochondria, lysosomes, and endothelial reticulum. There are also membrane-less organelles or condensates, which are formed by liquidliquid phase separation of biomolecules, and macromolecular fibrils and filaments. These organelles and assembled macromolecules form a heterogeneous environment (Fig. 1). Additionally, molecular density is not constant but dynamically changing depending on cellular activities like proliferation and differentiation. This type of intracellular environment is called multimolecular crowding or simply molecular crowding and is characteristic and a unique property of living organisms. Thus, chemical investigations of the intracellular molecular environment have been revealing revolutionary functions of cells and biomolecules one after another. In the molecular crowding environment, due to a steric hindrance between the molecules, some of the space around coexisting molecules is not available for a specific molecule, in which the unavailable space is known as excluded volume. As for the excluded volume, it can be understandable by illustrating the molecules

1130

T. Endoh et al.

Fig. 2 Spatial restriction of molecules and their movement in a crowding environment. (a, b) Excluded volume effects depending on the relationship of molecular sizes. Space available to a molecule (blue) is dramatically reduced in a solution where molecules with similar size are crowded. (c) Molecular sieving effect in diffusion. (d) Reduced water activity caused by hydration to co-solute molecules

as a spherical substance with a specific hydrodynamic radius (Fig. 2a, b) (Minton 2001). The centers of two spherical molecules cannot be placed each other closer than a distance as great as sum of their radius. Therefore, in a space where molecules with similar size are crowded, the space available to each molecule is dramatically reduced due to the excluded volume effect. In contrast, if relative molecular sizes are different largely, observed excluded volume effect becomes smaller and the volume unavailable for a specific molecule becomes close to the volume occupied by the coexisting molecules. In addition to the excluded volume effect, the crowded environment provides molecular sieving effect. When solute molecule with large size is present in high concentration, viscosity of the solution increases that results in slow down of molecular diffusion (Fig. 2c) (Nakano et al. 2014b). The interaction networks of biomolecular solutes such as cytoskeletons and filaments of polymerized proteins make reticulated structure as one of the heterogenous crowding sources. The network of obstacles reduces the diffusion rate of molecules with large size. However, it seems that the actual diffusion rate of biomolecules in the cell depends not only on the effect of molecular sieving but

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1131

also on the specific interaction with components in the cell (Dix and Verkman 2008). The solute molecules in a crowding environment affect the physical properties of the aqueous solution, which is a solvent of life system. There are significant layers of hydrated water molecules, which are osmotically inactive, around the biomolecules. Thus, biomolecules at high concentration cause alteration of solvent properties of bulk water from those of a dilute solution. In particular, small hydrophilic molecules, which are in high molar concentration in cells, tightly interact with water to reduce its mobility (Fig. 2d). These molecules such as polyols, amino acids, methylamines, and urea act as osmolytes and reduce water activity in solution (Yancey 2015). The osmolytes not only contribute to the resistance towards osmotic stress by adjusting the osmotic pressure but also promote equilibrium shift in chemical reactions and interactions of biomolecules towards the state accompanied by dehydration. Additionally, the molecular crowding environment with highly concentrated solute molecules tends to reduce the dielectric constant of the solution, because the dielectric strength of the solute molecules is usually less polar than that of bulk water, in which the dielectric constant is approximately 80 (Predeus et al. 2012). For example, although various values of the dielectric constant of proteins have been reported and differences between internal and surface dielectric constants have been discussed, the values are 40 at maximum (Li et al. 2013). The reduced dielectric constant under the molecular crowding environment influences to strengthen both electrostatic interactions and repulsions. Biomolecules have evolved to perform optimal functions in such a unique environment, i.e., in a molecular crowding environment, in order to maintain life systems. In particular, nucleic acids are fundamental molecules not only for maintaining genetic information but also for modulating gene expressions with associated proteins. The formation of a canonical duplex through sequence complementarity is fundamental to the replication of a genome. On the other hand, it has been suggested that the formation of the non-double helical nucleic acid structures and their stabilities are involved to modulate the gene expressions (Sugimoto 2014). Whether DNA or RNA, single-stranded nucleic acids are flexible and highly polymorphic, because nucleic acids are linearly polymerized hydrophilic molecules having a backbone structure uniformly charged in negative. Although the genomic DNA is a duplex, it is not stationary but dynamically repeating dissociation and association of the two strands during biological reactions such as replication and transcription, because DNA and RNA polymerases need single-stranded template DNA to perform their functions. During such reactions, DNA regions with unique sequence property have a chance to temporarily form structures other than the canonical duplex (Endoh and Sugimoto 2017). In addition, the polymorphic features of nucleic acids and their structural stabilities are highly sensitive to the surrounding environment. Physicochemical factors, which differ between the in vitro aqueous solution and the in-cell crowding environment, have a relatively large impact on the structures and stabilities as well as functions of nucleic acids.

1132

T. Endoh et al.

Physicochemical and Molecular Factors Influencing Nucleic Acid Structures and Their Stabilities To understand how the molecular crowding environment affects the structures and stabilities of nucleic acids, it is necessary to identify factors that determine them. As it is described above, nucleic acids are uniformly and negatively charged due to the presence of a phosphate group in the backbone. Electrostatic repulsion between the phosphate groups and conformational flexibility of backbone is disadvantageous for nucleic acids to form a compact and ordered structure. The nucleic acids form their structures by offsetting the unfavorable energetic cost by favorable energetic contributions obtained through intramolecular interactions such as hydrogen bonding and stacking interactions, and interactions of extra factors such as cations and coexisting solute molecules. Even the interaction of water molecules influences the energetics to stabilize the structure. The stability of nucleic acid structures is determined by both structural and environmental factors as shown in Fig. 3 (Sugimoto et al. 2021) (see also ▶ Chap. 2, “Stability Prediction of Canonical and Noncanonical Structures of Nucleic Acids,” section “Basics of Stability Prediction of Canonical Structures of Nucleic Acids”).

Structural Factors Hydrogen Bonding In nucleic acids, hydrogen bonds are formed between hydrogen, which is weakly charged positive due to the influence of covalently linked atoms with strong

Fig. 3 Structural and environmental factors mainly determine nucleic acid structures and stabilities. (Reprinted and adopted with permission from Sugimoto et al. (2021). Copyright The Chemical Society of Japan)

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1133

Fig. 4 Potential hydrogen bonding patterns between nucleobases. (a) Donor (blue) and acceptor (red) sites for formation of hydrogen bonding in nucleobases. (b) Examples of base pair patterns that can form at least two hydrogen bonds

electronegativity, and atoms having free lone pairs, such as oxygen and nitrogen. All nucleobases contain several atoms that can act as acceptors, such as oxygen and nitrogen, or donors, such as hydrogen, for forming a hydrogen bond (Fig. 4a). The hydrogen bonds formed between two nucleobases in a canonical duplex are known as Watson-Crick base pairs, that is, base pairs between adenine (A) and thymine (T) or uracil (U), and between guanine (G) and cytosine (C). On the other hand, there are a variety of patterns that can form two or more hydrogen bonds between two nucleobases when the arrangement of ribose is not considered (Fig. 4b). Some hydrogen bonding patterns in Fig. 4b are observed in interactions that stabilize tertiary structures of nucleic acids. Nucleobases in single-stranded regions such as loop and bulge have the potential to simultaneously form hydrogen bonds with atoms over two or more nucleotides that include hydrogen atom in the hydroxy group of ribose and oxygen atom in phosphate. The formation of these hydrogen bonds, which strongly depends on positional and orientational relationships between the acceptor and donor atoms, is particularly important for stabilizing the tertiary structures.

1134

T. Endoh et al.

Stacking Interaction π-π stacking interaction is present between neighboring nucleobases. Stacking interaction is known to be more prevalent when two strands form a duplex than when they are present as single strands. The driving force of the stacking interaction is mainly dispersion forces between aromatic rings containing π orbitals. Energetic contributions from the stacking interactions are largely dependent on the types of neighboring nucleobases and their orientation in both single and double strands. The energetic contributions of nucleobase stacking have been investigated by using the contribution of dangling end nucleobase on duplex stabilities (Sugimoto et al. 1987). The degree of stacking interaction also depends on salt concentration in solution. Because the interaction of cations weakens the electrostatic repulsion between neighboring phosphates on the strand, the observed contribution of the stacking interaction to stabilize the duplex increases with increasing salt concentration (Yakovchuk et al. 2006). Conformational Entropy The formation of hydrogen bonds and stacking interactions leads to favorable enthalpy changes. On the other hand, duplex formation of nucleic acids from a single-stranded state is accompanied by unfavorable conformational entropy changes due to a decrease in the degree of freedom at the nucleotide torsion angle. Therefore, focusing on the conformational entropy, the process of structure formation is thermodynamically unfavorable.

Environmental Factors Hydration and Dehydration The phosphate on the backbone is a polar group, and thus interacts with water molecules as a solvent. The hydroxyl group on deoxyribose and ribose also has high potential to interact with water molecules. In addition, nucleobases have several sites for hydrogen bonding with water molecules (Fig. 4). The hydration on nucleic acids results in a relatively stable network of water molecules, which is called a primary hydration shell. It has been demonstrated that 18–23 water molecules are included in one nucleotide unit in the canonical B-form DNA duplex; the number of hydrated water molecules is relatively large at the region of the minor groove (Chalikian et al. 1999). Because the number of hydrated water molecules depends on the state of nucleic acid structure, the formation and transition of nucleic acid structures are usually accompanied by the uptake or release of water molecules. Thus, the interaction of water molecules on nucleic acids has relatively large effects on structures and stabilities of nucleic acids. Cation Binding Due to the polyanionic feature of nucleic acids, nucleic acids have a condensed layer of counter ions called as ion atmosphere. In particular, the interaction of metal

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1135

cations with the phosphate backbone suppresses electrostatic repulsion that stabilizes both duplex and tertiary structures. Cations tightly interacting with nucleic acids can be observed by experimental techniques for structural determination such as X-ray crystallography and nuclear magnetic resonance (NMR) spectrograph. In the case of double-stranded helices, the minor groove is one of the major sites for cation binding. Monovalent cations generally interact with nucleic acids in a nonspecific polyelectrolyte manner, whereas divalent cations interact in a site-binding manner. When cellular crowding environments are considered, there are not only metal cations but also organic molecules with charges, which behave differently from that of metal cations and affect structure and stability of nucleic acids. For example, choline (2-hydroxy-N,N,N-trimethylethanaminium) ion, which is one of the molecular ions abundant in cells, preferentially interacts with A-T Watson-Crick base pairs in a duplex; hence, it stabilizes the duplexes that are rich in A-T (TateishiKarimata and Sugimoto 2012).

Specific Interaction of Biomolecular Ligands By forming higher-order structures, nucleic acids can provide surface to interact with other biomolecules and a specific cavity to accommodate small chemical compounds. Nucleic acids that potentially interact with other biomolecules, such as proteins, peptides, and small chemicals, are known as aptamers (Ellington and Szostak 1990; Tuerk and Gold 1990). Specific interactions between nucleic acids and other molecules usually occur through hydrogen bonding, electrostatic interaction, stacking interaction, and hydrophobic interaction (Cai et al. 2018). These interactions potentially cause entropically favorable dehydration from its cavity. The favorable energy obtained by such an interaction contributes to stabilization of the complex. In some cases, due to energetic contribution from the interaction, the nucleic acid changes its structure upon the interaction, in which the interaction mode is said to be induced-fit.

Model Experimental Systems In Vitro to Investigate the Effects of Molecular Crowding on Biomolecules The structure and stability of biomolecules in living cells have attracted much attention in various fields such as medicinal, pharmacological, and materials sciences. In the cell, there are various molecules that change their concentration and localization depending on the cell state, such as in different phases of the cell cycle. As described in Fig. 1, an intracellular environment in which diverse biomolecules are densely coexist is characterized as molecular crowding (Verkman 2002). However, most studies on the structure and stability of biomolecules have been carried out using a dilute aqueous solution, which contains low concentrations (less than about 1 g/L) of total proteins, nucleic acids, polysaccharides, and buffer salts. The evolution of biomolecules has been achieved by optimizing their structure and function in the crowding environment. In other words, to understand the properties of biomolecules in cells, it is essential to understand the molecular crowding effects

1136

T. Endoh et al.

Fig. 5 Molecular crowding environments of (a) intracellular organelles, (b) the nucleus, and (c) the cytoplasm. Molecular environments of (d) standard aqueous solution in vitro and (e) molecular crowding constructed by co-solutes in vitro

on biomolecules. Thus, experimental systems that can mimic the crowding environment have been prepared and utilized in vitro (Fig. 5) (Sugimoto 2014).

Characteristics of Co-solutes to Mimic the Intracellular Molecular Environment To mimic the crowding environment in natural cells, various compounds have been used as co-solute molecules. To induce the crowding environments, the co-solute molecules should meet the following criteria: (1) they should be inert (i.e., the co-solutes should not interact directly with biomolecules); (2) they should be soluble in water at least up to several weight percent; (3) in the case of large size co-solutes, a range of polymer sizes should be available; and (4) in the case of small size co-solutes, co-solutes with a range of chemical properties should be used. The commonly used large size co-solutes are poly(ethylene glycol) (PEG), dextran, and Ficoll (Fig. 6a). PEG is often used because PEGs are inert and various molecular weight PEGs are available. Proteins such as albumin, hemoglobin, and lysozyme have been also utilized as crowding co-solutes. In solutions containing

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1137

Fig. 6 Structures of typical co-solutes with (a) large and (b) small size (molecular weight) inducing the molecular crowding

such high-molecular-weight crowding molecules, the volume that can be occupied by the biomolecules to be analyzed is greatly reduced by the excluded volume provided by the crowding molecule, and molecular motion and structure are restricted. In addition, the activity of biomolecules increases due to the limitation of the volume in which they can exist. These characteristics of crowding molecules have a significant effect on the behavior and function of biomolecules (details are given in sections “Effects of Molecular Crowding Environments on Nucleic Acids” and “Biological Reactions Influenced by Nucleic Acid Structures and Their Stabilities”). Small-sized co-solutes are also used to mimic the intracellular environment. Commonly used small-sized co-solutes are ethylene glycol (EG), alcohols, glycols, amino acids, and betaine (Fig. 6b). These co-solutes have been used to investigate structure and stability of biomolecules by changing the physical properties of the solution. In the living cell, there are also co-solutes with functional groups that can directly interact with nucleic acids, such as charged molecules, sugars, and osmoregulatory molecules (osmolytes). These co-solutes also affect the structure and stability of biomolecules. Lipids and surfactants are also added to aqueous solutions to encapsulate biomolecules, mimicking droplet formed by the induced phase separation, and to investigate the effect of confinement on biomolecules; for example, in a reverse micelle large enough to contain only one molecule of the target nucleic acid has

1138

T. Endoh et al.

stabilized the nucleic acid structure inside (Pramanik et al. 2012). In other words, it is becoming clear that the spatial properties surrounding biomolecules have potential to impact biological phenomena.

Change in Solution Properties by the Addition of Co-solutes To determine how molecular crowding affects biomolecules, solution properties were investigated in the solutions containing defined concentration of co-solutes, including EG, glycerol, PEG200, PEG8000, and dextran, which have different properties (Table 1) (Teng et al. 2020). For example, in the presence of both largeand small-sized co-solutes, the diffusion rate of proteins decreases due to an increase in solution viscosity; this is an important factor affecting the association rate of biomolecules. Although the viscosity of the solution tends to increase as the molecular weight of the co-solute increases, it is also affected by the chemical structure of the co-solute. In addition, water activity is also an important factor for evaluating the hydration of biomolecules and is calculated from osmolality (Nakano et al. 2014b). The chemical structures of co-solutes, especially the position of hydroxyl groups and amines, are important in determining the hydration of biomolecules. Dielectric constant is also a factor affecting the strength of electrostatic interaction and repulsion. Dielectric constants of crowding conditions with 10 weight percent (wt%) of co-solutes are obtained from the maximum emission wavelength (λmax) of the fluorescence of 8-anilino-1-naphthalene sulfonic acid (Stryer 1965). EG and glycerol lead to a relatively higher dielectric constant similar to that in the absence of co-solutes. PEG200, PEG8000, and dextran reduce the dielectric constant significantly even at low concentration. Small-sized co-solutes (EG and glycerol), medium-sized co-solutes (PEG200 and PEG8000), and a large-sized co-solute (dextran) are utilized to compare the excluded volume effect on biomolecules in molecular crowding conditions. In general, the excluded volume of co-solute increases as the molecular weight of the co-solute increases. For example, the excluded volume calculated by using PEG with different Table 1 The properties of surrounding conditions in the presence of 10 wt% of co-solutes Cosolutes None EG Glycerol PEG 200 PEG 8000 Dextran a

Viscosity (mPas) 0.72 1.90 1.64 1.54 3.68

ln aw (102) 0.36 3.36 2.97 1.68 0.55

Dielectric constant (εr) 81.3 77.9 77.9 67.7 61.6

Molecular weight (gmol1) – 62 92 200a 8000a

4.60

0.43

70.4

70,000a

The solution contains 40 mM Tris-HCl (pH 7.6), 8 mM MgCl2, and 60 mM KCl in the absence and presence of 10 wt% co-solutes

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1139

Fig. 7 Comparison of excluded volume in solutions containing PEGs with different molecular weights (MWs) and concentrations

molecular weights indicated that larger the molecular weight of co-solute, larger the excluded volume in solution (Fig. 7); the exclusion volume of PEG1000 is about one-fourth of that of PEG8000 (Akabayov et al. 2013).

Effects of Molecular Crowding Environments on Nucleic Acids The unique physical and chemical properties of molecular crowding environments have been demonstrated to affect structures and stabilities of nucleic acids that results in functions of nucleic acids different from conditions under dilute aqueous solutions in vitro.

Effects of Molecular Crowding on Canonical Duplexes Various studies have demonstrated the effects of molecular crowding environments on thermodynamics and kinetics of canonical DNA and RNA duplexes (Nakano et al. 2014b; Sugimoto 2014; Sugimoto et al. 2021). In general, the effects vary depending on the size of nucleic acids, which corresponds to the length of a nucleotide unit as well as the size of a co-solute molecule for inducing the crowding environment.

Structure of Large Genomic DNAs Under a Crowding Environment A large nucleic acid molecule consisting of genome, such as calf thymus DNA, bacterial DNAs, and bacteriophage DNAs, undergoes condensed compaction mediated by cation binding. In particular, the interaction of multivalent cations efficiently neutralizes and condenses the DNAs. Reduced dielectric constant in the presence of relatively small crowding co-solutes enhances compaction of the DNA by strength ion-ion correlation based on increased electrostatic interaction. Crowding co-solutes having relatively large molecular size also enhances DNA compactions due to excluded volume effect provided by the co-solutes. In contrast, because the degree of excluded volume depends not only on the molecular size but also on the shape of

1140

T. Endoh et al.

crowding co-solutes, different dynamics of large DNA molecules were observed in different crowding co-solutes.

Stability of Polymer Nucleotide Duplexes Under a Crowding Environment The stability of nucleic acids can be easily evaluated by using a thermal melting temperature, Tm, at which half of the structure is dissociating considering a two-state transition melting (Puglisi and Tinoco 1989). Effects of the molecular crowding environments on the stability of canonical duplexes have been evaluated by Tm values using a pair of polymerized mononucleotides, which form the base pairs. Although the degree of changes in Tm value differs depending on the type of crowding co-solute as well as the case of structure compaction of genome DNA, co-solute molecules with large size tend to increase Tm value due to the excluded volume effect. For example, in the presence of 10 wt% condition, PEG with averaged molecular weight of 20,000 (PEG20000), and that of 4000 (PEG4000) increased Tm value of RNA duplex consisting of polyinosine and polycytidine by 1.8  C and 2.0  C, respectively, whereas increment of Tm value caused by dextran molecules with averaged molecular weight of 70,000 and 10,000 was by 0.7  C (Woolley and Wills 1985). The difference of Tm increment, in which large-sized PEG increased more than dextran, is considered due to their shape differences. Dextran is a linear and flexible polymer, which does not effectively produce excluded volume, while large size PEG self-associates at high concentration and behaves as apparently spherical molecule to produce effective excluded volume. As it is described above, small hydrophilic co-solutes reduce water activity in solution. Nucleic acids contain a network of hydrated water molecules around them; the degree of hydration differs between the two states wherein the two strands are either in duplex or are separated into two single strands. In general, the duplex state has more hydrated waters compared to the single-stranded state. Thus, the equilibrium of duplex shifts to the state of two single strands under the solution with reduced water activity that destabilizes the duplex. Reduced stability of DNA duplex in the presence of osmolyte co-solutes, such as glycerol and monosaccharides, was also demonstrated by using calf thymus DNA. The hydration on the canonical duplexes has been further analyzed using short oligonucleotide duplexes with various sequence compositions using PEG with relatively small size as described in the following subsection. Stability of Short Oligonucleotide Duplexes in Crowding Environments with Reduced Water Activity Evaluation of the stabilities of short oligonucleotide duplexes with defined sequence and length under the molecular crowding environments enables quantitative analyses of thermodynamic parameters for duplex formation (see also ▶ Chap. 2, “Stability Prediction of Canonical and Noncanonical Structures of Nucleic Acids,” section “Basics of Stability Prediction of Canonical Structures of Nucleic Acids”). In general, short oligonucleotide duplexes show simple two-state transition during

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1141

Fig. 8 Examples of thermal melting profiles of spectroscopic signal. (a) Typical melting curves of self-complementary duplexes, in which sequence symmetry factor α ¼ 1. Thermodynamic parameters, slope and intercept of baselines are indicated as examples. The melting profiles are theoretical ones at Ct ¼ 2 μM. (b) Linear plots of Tm versus Ln(Ct) to obtain the thermodynamic parameters

thermal melting of spectroscopic signals, such as absorbance of ultraviolet (UV) light and ellipticity of circular dichroism (CD) (Fig. 8a). Whether it is a duplex consisting of self- or non-self-complementary sequence, enthalpy change (ΔH ) and entropy change (ΔS ) accompanied by duplex formation can be obtained from following Eq. 1, by considering the theoretical equilibrium between a duplex and two single strands. Eq. 1 considers baselines at low and high temperatures and theoretical folding ratio of the duplex depending on the thermodynamic parameters.

1142

T. Endoh et al.

SS ¼ ðaT þ bÞ þ fðaT þ bÞ  ðcT þ dÞg

1

1þ8 4

Ct α

Ct α

exp ð

exp ð

ΔH  ΔS RT þ R

ΔH  ΔS RT þ R

Þ

Þ

ð1Þ

where SS is spectroscopic signal at the measured temperature; a and b are slope and intercept for baseline at higher temperature, respectively; c and d are slope and intercept for baseline at lower temperature, respectively; R is gas constant; T is temperature in Kelvin; Ct is total concentration of oligonucleotide strands; and α reflects a sequence symmetry of the self (α ¼ 1) or non-self-complementary strands (α ¼ 4). More accurately, the ΔH and ΔS values can be obtained by Tm values of the duplex depending on the concentration of the total oligonucleotide strands (Fig. 8b) as given in the following Eq. 2. T m 1 ¼

R Ln Cαt ΔS þ  ΔH ΔH 

ð2Þ

Once after the ΔH and ΔS values were obtained, the free energy change (ΔG ) associated with duplex formation can be calculated by a theoretical equation, which is ΔG ¼ ΔH  TΔS . In general, the thermodynamic parameters of oligonucleotides shorter than a 15-mer can be accurately calculated, because the short duplexes tend to show clear two-state melting transition. In addition, evaluation of short oligonucleotide duplexes has an advantageous for evaluating the effects of the crowding environments, because PEG and alcohols, which are often used as co-solute and co-solvent for inducing the crowding environments and condition with reduced dielectric constant, do not precipitate the short duplexes, whereas the long polynucleotides adopt precipitation. The aqueous solution for the thermodynamic analyses of crowding effects on the nucleic acid structure and stability usually contains cations in addition to co-solutes. The formation of a duplex has the potential to associate with the uptake or release of water, cations, and co-solutes as it is represented as: 2A⇄A2 þ ΔnW H2 O þ ΔnCS CS þ ΔnMþ Mþ where A and A2 indicate the single-stranded and the double-stranded states of oligonucleotides, respectively; CS represents a co-solute; M+ represents a cation; and Δnw, Δncs, and ΔnM+ represent the number of water molecules, co-solutes, and cations released upon duplex formation, respectively. Based on the situation, correlation between the true equilibrium constant (K0) and the observed equilibrium constant (KOBS) can be represented as: Δn

W ΔnCL Mþ K 0 ¼ K OBS aΔn W aCL aMþ

where aw, acs, and aM+ are the activities of water, co-solute, and cation, respectively. By considering that K0 is constant, this correlation indicates that the Kobs value

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1143

changes with changes in the activities of water, co-solutes, and cations, depending on the number of molecules released upon duplex formation. In particular, if Kobs changes linearly depending on the activity of a specific factor (water, co-solute, or cation), the effect of the specific factor on the activity of the other two factors is ignorable, and the number of the specific factor involved in the formation of the duplex can be estimated. The Δnw values of various duplex types including DNA/DNA, RNA/RNA, and RNA/DNA have been reported (Nakano et al. 2014b). In all the duplex types, Δnw values mainly depend on the composition of base pairs in the nearest neighbor, because the nearest neighbor model to predict the stability of duplex was varied even if the duplex was in the crowding condition of reduced water activity (Ghosh et al. 2020).

Effects of Molecular Crowding on Noncanonical DNA Structures and Stabilities Nucleic acid structures are flexible and show polymorphic property. The canonical duplex with Watson-Crick base pairs is a part of the possible structural forms, and nucleic acids form various non-double helical structures depending on internal and external factors. Even duplexes with Watson-Crick base pairs show several types of helicities including left-handed helicity. Because their topological features and hydration states depend on each structure, the structural dynamics and stabilities sensitively respond to changes in the molecular environment.

Formation of Left-Handed Duplex Under a Crowding Environment Z-DNA is a double-stranded helix with left-handed helicity and a distinct zigzag pattern in its phosphate backbone that is radically different from A- and B-DNAs (Fig. 9). Several proteins are known to interact with and induce Z-DNA, suggesting biological roles of the noncanonical helix (Kim et al. 2010). Alternating purinepyrimidine sequence such as d(GCGCGC) has the potential to form a Z-DNA structure in the presence of high salt concentration. In biological systems, the presence of negative supercoiling stress and specific binding protein can enhance Z-DNA formation (Bae et al. 2011; Singleton et al. 1982). Additionally, it is known that Z-DNA is less hydrated compared to the usual right-handed duplex. Thus, Z-DNA consisting of poly(dG-m5dC) is stabilized in the crowding environment induced by osmolytes (Preisler et al. 1995). Although the biological significance of Z-DNA is not fully understood, it is assumed to affect gene expression, given that some proteins specifically recognize Z-DNA. For example, Z-DNA formed in a promoter region of c-myc gene has been reported to be dependent on the transcriptional activity of the c-myc gene. The presence of Z-DNA changed before and after cellular differentiation (Wölfl et al. 1995). The equilibrium between Z-DNA and right-handed B-DNA may fluctuate depending on the changes in intracellular crowding environment through the differentiation.

1144

T. Endoh et al.

Fig. 9 Factors stabilizing the Z-DNA form. Alternating GC repeat forms Z-DNA (PDBID: 4OCB) in the presence of high ionic concentration and negative supercoiling. Transition from B-DNA to Z-DNA occurs under osmotic stress due to a less hydration state of Z-DNA. m5dC indicates methylated cytosine. Structure of B-DNA is a typical image formed by d(CGCGAATTCGCG) (PDBID: 1BNA)

Stabilization of Blanched Junction Under a Crowding Environment When sequences with complementarity are located at the 50 and 30 positions of a single-stranded nucleic acid, it forms a hairpin structure. When such a sequence is in duplex and palindromic duplex regions are located at 50 and 30 , the addition of negative supercoiling loses the twist of the helix and induces the formation of a cruciform structure consisting of a four-way junction (Fig. 10). A similar branched four-way junction, which is known as a Holliday junction, is also formed during the recombination of genes. The branched junction is used for constructing nanoscale architectures of nucleic acids such as DNA Origami (Rothemund 2006). In the case of RNA, because it is produced as a single-stranded polymer, various types of junction structures are formed when hairpin-forming motifs are connected by relatively short loop sequences. Among the branched nucleic acid structures, a threeway junction is one of the simplest structures and has been analyzed to investigate the mechanism for stabilizing the junction structures (Fig. 10). One of the structural factors for stabilizing the three-way junction is the coaxial stacking of two stems. The coaxial stacking also contributes to stabilization of pseudoknot structures, which are abundant elements for RNA tertiary structures. To form the coaxial stacking, the presence of unpaired nucleobases at the branched junction is beneficial to arrange the third stem outside from the two stems that are coaxially stacked (Stühmeier et al. 1997). Because the junction point has a unique electrostatic environment due to

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1145

Fig. 10 Formation of tertiary structure containing a branched junction. Secondary structures and tertiary structures of four-way junction (PDBID: 1M6G) and three-way junction (PDBID: 1EZN) are indicated

bulged nucleotides for arranging the stem orientations, the formation of a stable three-way junction with the coaxial staking is favored by the presence of a divalent cation and by high monovalent salt concentration (Leontis et al. 1991). The rearrangement of the stems to form a three-way junction also has the potential to change its hydration states. The evaluation of the stabilities of three-way junctions in the presence of crowding co-solutes, which reduce the water activity of the solution, demonstrated destabilization of the overall three-way junction structure under the crowding environment (Muhuri et al. 2009). The destabilization of the three-way junction is provided by the effect of the crowding environment on both the branched junction unit and three stem regions connected to the junction. From analyses of the number of water molecules taken up during the formation of each stem region and the junction unit, it was suggested that the local junction unit released water molecule upon structural formation. Thus, if it is focused on the junction unit, the structure was stabilized under the crowding environment. The stabilization of the local junction unit was further suggested by the conformational transition from dimeric DNA structure mainly consisting of duplex to monomeric three-way junction in response to the crowding co-solutes (Fig. 11) (Muhuri et al. 2009). Because the formation of a three-way junction is involved in contraction and expansion of tandem repeat sequences during replication and DNA repair, the crowding environments may cause variation in the number of repeats by inducing genomic instability through destabilization of the duplex region and stabilization of the junction region.

Stabilization of Multistranded Helix Under a Crowding Environment Multistranded structures are formed when more than three strands assemble through planar nucleobase organization and their stacking. Among the multistranded nucleic acid structures, triplexes and tetraplexes have demonstrated to have biological significance and thus their physicochemical analyses in molecular crowding environments have been carried out.

1146

T. Endoh et al.

Fig. 11 Transition DNA structure from duplex-type to three-way junction in the presence of crowding co-solute through stabilization of the junction region due to dehydration. (Reprinted with permission from Muhuri et al. (2009). Copyright 2009 American Chemical Society)

Triplex Structures Under a Crowding Environment A triplex is formed by stacking of base triads, which consists of the usual WatsonCrick base pair and additional Hoogsteen base pair between purine nucleobase in Watson-Crick base pair and purine or pyrimidine nucleobase in the third strand (Fig. 12). The third strand binds to the duplex from its major groove side. In general, to keep the configuration of glycosidic bond orientation and stabilize the triplex through the stacking of the third strand, the purine nucleobases in the Watson-Crick base pairs should be aligned in one strand of the duplex, and the third strand should consist of polypurine or polypyrimidine; there is exceptional triplex that partially contains a base triad of different compositions (Ji et al. 1996). When the third strand consists of polypurine, the third strand forms Hoogsteen base pairs in antiparallel orientation with the polypurine strand in the duplex. On the other hand, when the third strand consists of polypyrimidine, the third strand forms Hoogsteen base pairs in parallel orientation with the polypurine strand in the duplex (Fig. 12a). One of the unique properties of the triplex consisting of polypyrimidine third strand is that cytosine needs to be protonated to form C-G*C+ triad, in which C-G indicates Watson-Crick base pair and G*C+ indicates Hoogsteen base pair between guanine and protonated cytosine (Fig. 12a). The logarithm equilibrium constant of nucleobase protonation (pKa) for cytosine is 4.4 (Verdolino et al. 2008), suggesting that the triplex containing C-G*C+ is stably formed at acidic environment, although the actual pKa of each cytosine in the triplex is higher than monomeric cytosine in solution depending on local environmental factors. Under a molecular crowding

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1147

Fig. 12 Stabilization of triplex structure in the presence of choline ion. (a) Compositions of nucleobase triads in triplexes with parallel and antiparallel third strand. (b, c) Examples of melting profiles of a model parallel triplex in the presence of (b) sodium ion or (c) choline ion at different pH conditions

environment, the triplexes increase their Tm values depending on the molecular weight of the co-solute; PEG with larger averaged molecular weight show more increased Tm values at the same weight percentage. The dependency on the molecular size of the co-solute suggests the excluded volume effect as a main factor for the stabilization of the triplex. However, PEG with an averaged molecular weight of 200 (PEG200), in which size is considerably small compared to the triplex, also shows increased Tm value (Miyoshi et al. 2009), suggesting that the dehydration (release of water molecule) from the major groove of duplex upon base pairing of the third strand has additional contribution to stabilize the triplex under the crowding environment. The pKa of cytosine in nucleic acid structure is increased due to the effect of molecular crowding environments on i-motif structure, which is one of the tetraplexes formed by cytosine-rich sequence (Rajendran et al. 2010). Thus, the enhanced protonation of cytosine in a condition of neutral pH would also be one of the factors stabilizing the triplex. Cation binding also affects triplex stability. In particular, choline ion, which is an abundant ion in vivo, selectively binds a groove newly produced by hybridization of the third strand and stabilizes the triplex (Fig. 12b, c) (Tateishi-Karimata et al. 2015). It is considered that the formation of triplexes and their stabilities are sensitive to alteration of the intracellular environment such as pH and ion strength and contribute to modulation of gene expression processes such as transcription associated with triplex-binding proteins (Buske et al. 2011).

1148

T. Endoh et al.

G-Quadruplex Structures Under a Crowding Environment The presence of a quadruplex structure in DNA was reported using a guanine-rich strand in 1962 (Gellert et al. 1962). The structural analyses of polyguanylic acid gel demonstrated that four guanines can be circularly organized to form a planar unit, which is called G-quartet, by forming hydrogen bonds of Hoogsteen base pairs. When four strands of short oligonucleotides consisting of guanine tract assemble to form the G-quartet, stacking between the quartets stabilizes the structure and forms an intermolecular tetraplex structure called G-quadruplex (G4) (Fig. 13). Not only the short guanine tract but also relatively long oligonucleotides, which have multiple guanine tracts in the same strand, can form dimeric and monomeric (intramolecular) G4s. In the case of intramolecular G4, it is generally considered that canonical G4s having contiguous backbone pillar are formed if the sequence contains four guanine tracts, each of which has at least three guanines connected by relatively short sequences. The short sequences connecting the guanine tracts form loops in various orientations, in which types are categorized to propeller, diagonal, and lateral.

Fig. 13 Structural characteristic and polymorphic property of intramolecular G-quadruplex formed by a sequence consisting of four guanine tracts. Antiparallel, hybrid, and parallel topologies can be formed by almost the same sequence derived from a human telomere

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1149

Depending on the orientations of the loops, the contiguous backbone pillars are possible to align parallel or antiparallel orientation with neighboring pillar that provides hyperpolymorphic topologies in the intramolecular G4s (Fig. 13) (Burge et al. 2006). In addition, the hyperpolymorphic topologies are further expanded by a peculiar loop arrangement, such as V-loop, D-loop, and bulges, especially in the case of DNA G4 (Jana et al. 2021). In contrast to the DNA G4s, RNA G4s tend to form a topology with strand orientation all parallel with propeller loops because the ribonucleotide prefers anti-conformation, which all guanines in the parallel G4 topology take, in its glycosidic bond. The polymorphisms of G4s and their stabilities are sensitive to solution conditions including molecular crowding environment and the presence of cations. In experiment, thermodynamic stability of G4 can be monitored by UV absorbance at 295 nm; dissociation of G4 decreases the absorbance coefficient (Fig. 14a). The thermodynamic parameters of ΔH and ΔS for intramolecular G4 can be calculated from the spectroscopic signal depending on temperature by using the following Eq. 3, which considers baselines at low and high temperatures and theoretical folding ratio of the intramolecular G4 depending on the thermodynamic parameters. SS ¼ ðcT þ dÞ þ fðaT þ bÞ  ðcT þ dÞg

1 ΔH  ΔS 1 þ exp ð RT þ R Þ

ð3Þ

where SS is spectroscopic signal at the measured temperature; a and b are slope and intercept for baseline at higher temperature, respectively; c and d are slope and intercept for baseline at lower temperature; R is gas constant; and T is temperature in Kelvin. This equation can be applied for calculating thermodynamic parameters for any intramolecular structure showing two-state melting transition in its spectroscopic signal. In general, crowding environment with osmolyte co-solutes stabilizes both DNA and RNA G4s based on a favorable enthalpic contribution. It has been demonstrated that dehydration during the formation of G4 is the main cause of stabilization from the linear collation between observed equilibrium constant and water activity (Miyoshi et al. 2006). Although dehydration during structure formation is a general property of the core unit of G4, which is stacked G-quartets, the organization of loop structure accompanies hydration. Because the number of water molecules released with G4 formation tends to be lower in cases of G4s consisting of less G-quartets, G4 structures consisting of two G-quartets are not much stabilized in the presence of osmolyte co-solute compared to those formed with more G-quartets. In addition, the number of water molecules released depends on not only the number of G-quartets but also the sequence compositions of loop nucleotides. Thus, loop orientation and G4 topology are potentially altered in response to the crowding co-solutes. Particularly, although the G4 structures formed by repeated GGGTTA sequence in human telomeric region show polymorphic property in diluted aqueous solution depending on the types of cations, the sequence monomorphically forms parallel type G4 topology under the crowding environment containing PEG200 and osmolyte co-solutes such as acetonitrile (Heddi and Phan 2011). It is considered that the formation of propeller loops causes more dehydration and is preferred under the

1150

T. Endoh et al.

Fig. 14 Analyses of G4 stability from UV absorbance at 295 nm. (a) Typical thermal melting profile of G4. Thermodynamic parameters, slope and intercept of baselines are indicated as examples. (b) G4 stabilities depending on potassium concentration in the absence (blue) or presence (red) of 20 wt% PEG8000. (Adopted with permission from Trajkovski et al. (2018). Copyright 2018 Oxford University Press)

conditions of low water activity (Fig. 13). In the case of crowding environments containing PEG, preferential and direct binding of PEG to G-quartets is also suggested to induce a conformational transition to the parallel-stranded DNA G-quadruplex (Buscaglia et al. 2013). Comparison of the crowding environments containing the same weight percentage of EG and PEG8000 demonstrated more stabilization of DNA G4 with PEG8000 although it does not reduce water activity as much as EG. The structure might shift to G4 with a smaller size compared to single-

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1151

stranded state due to the excluded volume effect, but this needs to be verified because the stability analysis of G4 under high pressure conditions demonstrated that the volume of the whole system including hydrated water increases with the formation of G4 (Takahashi and Sugimoto 2013). In contrast, PEG8000 reduces the dielectric constant of the solution more than EG and potentially enhances interaction between cation and G-quadruplex. The observed number of potassium ions involved in G4 formation is increased under the crowding environment with PEG8000, suggesting that the stability of G4 becomes more sensitive to changes in potassium concentration (Fig. 14b) (Trajkovski et al. 2018). In addition, the decrease in dielectric constant by the addition of small organic solvents leads to increased stability of G-quadruplexes through favored electrostatic interactions. The biological functions of G4s were first suggested through a guanine-rich repeat at the end of chromosome, a region called as telomere. Although genomic DNA is a duplex, a eukaryotic chromosome has long single-stranded repeat sequences at the 30 end. As described above, in the case of a human chromosome, GGGTTA is the repeat sequence, which forms an intramolecular G4 structure. Although incomplete replication of DNA at the 30 end shortens the length of telomeres with cellular proliferation that induces genome instability and apoptotic cell death, telomerase protein specifically expands the telomeric repeat sequence. It is known that 80–85% of human cancer cells have an active telomerase to maintain the telomere length and malignant property. The stabilization of G4 in the telomere region has been suggested to prohibit the expansion of the telomeric repeat by telomerase and thus has become attractive as a target for cancer therapy. In addition to the chromosome ends, guanine-rich sequences with G4-forming potential are widely distributed in the human genome. Several bioinformatic studies have demonstrated the presence of an abundance of sequences with G4-forming potential in genes associated with human diseases, suggesting biological relevance of G4s with human health. The formation of G4s on DNA and RNA inside cells has been proved using chemical fluorescent probes and G4-specific antibodies (Biffi et al. 2013; Suseela et al. 2018). Biochemical and molecular biological research has also demonstrated that the formation of G4s on nucleic acids has the potential to perturb processes in the central dogma such as replication, transcription, and translation based on their stabilities; the effects of G4 structures on biological reactions are described later in section “Biological Reactions Influenced by Nucleic Acid Structures and Their Stabilities.”

i-Motif Structures Under a Crowding Environment An i-motif is another tetraplex structure formed by cytosine-rich sequences. The basic unit of the i-motif is a hemi-protonated C-C+ base pair aligned on two parallel strands. Two sets of the two parallel strands forming the C-C+ base pairs are arranged in antiparallel orientation and form stacking of consecutive and alternate C-C+ base pairs provided from the four strands (Fig. 15). Similar to a G4 structure, an i-motif can be formed both intermolecularly and intramolecularly. Due to the requirement of cytosine protonation, an i-motif structure has been considered as a unique structure that forms only at acidic pH in vitro and is difficult to present in physiological

1152

T. Endoh et al.

Fig. 15 Formation of i-motif structure in cytosine-rich DNA sequence. i-motif structure requires stacked C-C+ base pairs, in which one of the cytosine in the base pair is protonated. pKa value of cytosine increases under the crowding environment, resulting in stabilization of i-motif structure

conditions. In the case of RNAs, an i-motif structure has been shown to be more unstable than those formed by DNA, and the structure is difficult to form even under acidic conditions. However, recent studies have suggested the presence of i-motif structures in cells (Zeraati et al. 2018). As it is described above, the pKa of cytosine protonation depends on local environmental factors. The molecular crowding environment induced by PEG with an averaged molecular weight of 200 or 8000 could induce the formation of i-motifs at neutral pH through apparent increment of cytosine pKa (Rajendran et al. 2010). The stabilization effect induced by the same weight percentage of the PEG was larger for PEG8000 than PEG200. It is likely that the excluded volume is a dominant factor to stabilize the i-motif structure. In DNA duplex, the sequence having the potential to form intramolecular i-motifs always exists as a counterpart of the sequence with G4-forming potential. Based on the stabilization of both G4 and i-motif structures under the molecular crowding environments as well as destabilization of duplex, the duplex region has high potential to be dissociated to form G4 and i-motif structures facing each other. In cells, the addition of negative supercoiling stress on DNA during biological reactions such as

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1153

replication and transcription potentially enhances the simultaneous formation of G4 and i-motif (Selvam et al. 2017; Sun and Hurley 2009). In addition, specially restricted movement and compartmentalization of DNA in the local area also have the potential to enhance the structure because the DNA duplex is dissociated and simultaneously forms G4 and i-motif in reverse micelles, which mimics a confinement environment in the crowding conditions (Pramanik et al. 2012).

Effects of Molecular Crowding on RNA Structure and Functions In cells, RNA is transcribed by RNA polymerase from template DNA as a singlestranded polynucleotide. It forms various tertiary structures based on intramolecular interactions such as hydrogen bonding and nucleobase stacking. Typified by an RNA catalyst, which was named as ribozyme, the formation of a tertiary structure is indispensable for RNAs to exhibit their functions (Kruger et al. 1982). Transfer RNA (tRNA) is one of the most abundant noncoding RNAs in cells and a fundamental molecule for protein synthesis acting as an adaptor between template mRNA and amino acids. Although tRNA is not a catalyst, it forms a uniquely compacted L-shape structure based on its cloverleaf secondary structure. To form compact and complex tertiary structures, RNAs require the interaction of cations, especially that of divalent magnesium ions, to offset unfavorable electrostatic repulsion of phosphates in backbone and stabilize the structure. In some RNA structures, magnesium ions have a specific binding site to be experimentally observed by NMR and X-ray crystallography. Molecular crowding environments, which affect the electrostatic interaction and repulsion, critically influence the structure and stability of RNAs that also influence RNA functions. Excluded volume provided by the co-solute molecules also affect the RNA functions because the tertiary structure of RNAs shows varying hydrodynamic volume depending on their polymorphic folding state.

Tertiary Structure Folding Under the Molecular Crowding Environments Tertiary structures of RNAs are formed through a process based on the formation of secondary structural units consisting of intramolecular Watson-Crick base pairs. The secondary structural region forms an A-type RNA duplex, which has more hydrated water molecules compared to that of its DNA counterpart forming a B-type duplex (Nakano et al. 2014b). Thus, the region is destabilized under crowding environments induced by osmolyte co-solutes. In contrast, the formation of tertiary structure motifs in RNAs such as A-minor, ribose zipper, kissing loop, T-loop, tetraloop-receptor, and pseudoknot, makes secondary structure elements or a secondary structure and a single-strand region at separated location to come into proximity (Fig. 16). In the motifs, bulges, loops, and grooves, which provide a surface or recognition unit for tertiary interaction, have hydrated water beforehand, and thus, the tertiary interaction causes dehydration. Enhancement of the tertiary interactions and stabilization of the compact RNA structures have been demonstrated by using TMAO as an osmolyte

1154

T. Endoh et al.

Fig. 16 Tertiary interaction motifs often observed in RNA structures

co-solute (Lambert et al. 2010). In addition, crowding co-solutes with large molecular sizes have the effect of enhancing tertiary interactions of RNAs to make their structure compact due to the excluded volume effect. For example, compaction of transfer RNA was observed in the presence of PEG200 and PEG8000 that resulted in cooperative melting behavior of tertiary and secondary structures, while the tertiary structure melted before the secondary structure units in diluted aqueous solution containing the physiological concentration of cations (Leamy et al. 2017). This effect was caused by stabilization and destabilization of tertiary and secondary structures, respectively, under the crowding environment. PEG200, which behaves like an osmolyte, induces a compact tRNA structure though stabilization of tertiary interactions accompanied by dehydration in their formation, and PEG8000 stabilizes the compact structure through its excluded volume effect.

Activities of RNA Catalyst Under the Molecular Crowding Environments The formation of a tertiary structure results directly in the RNA function. In the case of a hammerhead ribozyme, which is one of the smallest RNA catalysts that utilize metal ions to hydrolyze a target RNA, the formation of a three-way junction is fundamental to its catalytic activity whether the cleavage occurs in an intramolecular strand or intermolecular substrate strand. In diluted aqueous solution, efficient catalytic activity is observed in a relatively high concentration of magnesium ions, in which binding is site-specific in some cases. In contrast, under the crowding environment, the catalytic activity rates were facilitated by 2.0–6.6 at a concentration of 10 mM Mg2+, and much more enhancements were observed at lower Mg2+ concentrations. In particular, the crowding environments induced by large-sized PEG such as PEG8000 enhanced the reaction rate more than that by smaller PEG (Nakano et al. 2009). The facilitation of catalytic activities in a multiple turnover reaction was expected to be caused by both destabilization of a duplex region, which avoids ribozyme misfolding and accelerates the dissociation of a cleaved substrate after the reaction, and stabilization of the active form of the ribozyme structure (Fig. 17) (Nakano et al. 2009). Facilitation of the ribozyme activity by the addition of alcohol instead of PEG was also demonstrated later, although the small alcohols do not behave as macromolecular crowding co-solutes (Nakano et al. 2014a). The addition of alcohol reduced the cation concentration to exhibit ribozyme activity,

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1155

Fig. 17 Reaction mechanism of substrate strand cleavage mediated by a hammerhead ribozyme. Blue arrow indicates the cleavage site of substrate strand. Red arrows indicate the steps facilitated under the crowding environment. (Reprinted with permission from Nakano et al. (2009). Copyright 2009 American Chemical Society)

resulting in reduced dependency of the activity on the cation concentration. There was linear correlation between the degree of activity enhancement and dielectric constant in the solution containing co-solute molecules. Therefore, the enhanced electrostatic interaction between RNA and cations under the reduced dielectric constant in the crowding environment was suggested to be important for RNA functionalization. In addition to the importance of a reduced dielectric constant under a crowding environment with small co-solutes, that of excluded volume effect was demonstrated in the case of larger ribozymes such as group I intron ribozyme. Bacterial group I intron ribozyme consists of five hairpin-type secondary structure units, in which four of the five form two sets of hairpin pillars with co-axial stacking and the remaining one is extended. The structure of the ribozyme is in equilibrium between an unfolded and maturely folded structure, and the equilibrium is shifted toward a more compact mature form in the presence of PEG1000. As a result of the equilibrium shift, the ribozyme activity of intermolecular RNA cleavage was facilitated (Desai et al. 2014). The experimentally calculated number of water molecules released during the structure formation was significantly larger than the theoretically estimated maximum number of water molecules accessible to the total surface area buried during the folding process. This suggests that not only dehydration but also the excluded volume provided by the co-solute have important roles to induce the compact RNA structure. The importance of the excluded volume was analyzed based on a lower shift of magnesium concentration to fold RNA structure with increasing the concentration of PEG1000. The observation suggests that similar to the case of a hammerhead ribozyme, the reduction of dielectric constant in the presence of PEG1000 might have additional contribution to enhance the electrostatic interaction between magnesium and RNA to induce the tertiary structure formation.

1156

T. Endoh et al.

Affinities of RNA Aptamers Under the Molecular Crowding Environments Specific molecular recognition is one of the functions of RNAs. Nucleic acids that can perform specific molecular recognition and interaction are known as aptamers. Irrespective of DNA or RNA, aptamers have been selected targeting various biomolecules such as proteins, peptides, and small chemicals from an artificially designed sequence library using the selection technology called as systematic evolution of ligands by exponential enrichment (SELEX), which was established in 1990 (Ellington and Szostak 1990; Tuerk and Gold 1990). Artificially selected aptamers have been applied to various biotechnological applications such as diagnostic, therapeutic, and bioimaging. Particularly, therapeutic and bioimaging applications are those aimed to use aptamers in vivo and in cell. Because the aptamers usually form unique secondary and tertiary structures to provide an interaction surface for large targets such as proteins or to form a cavity or pocket for small targets, the crowding environments in vivo and in cells would affect the functions of the selected aptamers through alteration of their structures and stabilities. In contrast to artificially selected aptamers, more than a decade after the development of SELEX technology, natural aptamer-like RNAs have been discovered and demonstrated to modulate gene expressions through specific interaction between RNA and intracellular metabolite (Winkler et al. 2002). The functional RNAs named riboswitch contain aptamer domain inside and cause an RNA conformational change in response to interaction of specific metabolites that results in modulation of gene expression at transcriptional, posttranscriptional RNA editing, or translational level. The length of the aptamer domains of riboswitches is generally longer than that of artificially selected aptamers; riboswitches form complex tertiary structures even if the target is a small metabolite chemical (Fig. 18). As described above, in the crowding condition with reduced water activity compared to diluted solution, reactions accompanied with dehydration are accelerated. Equilibriums in biomolecular conformations and interactions also shift toward the state, which releases water molecules. In the case of the aptamer domain of riboswitch, although the intermolecular interaction between pre-folded aptamer domain and small metabolite molecule may not cause release or uptake of a large number of water molecules due to a small surface area involved in direct interaction, there are large contributions of water molecules if the interaction induces conformational transition of whole RNA structure. Thus, when the aptamer domain of riboswitch does not preliminary fold tertiary structure, the interaction affinity between the aptamer domain and the metabolite molecule is potentially enhanced under the crowding environment with reduced water activity. For example, aptamer domain derived from an adenosine deaminase (add) A-riboswitch (addA riboswitch), which interacts with adenine as a target metabolite, has a secondary structure consisting of a three-way junction. In the complex state with the adenine, the two hairpin-type structures extending from the branching site stand in parallel and interact at their tip loop regions (Serganov et al. 2004). The binding pocket of adenine is located at the center of the junction. Because the tertiary structure forming

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1157

Fig. 18 Examples of tertiary structures of aptamer domains derived from natural riboswitches and their secondary structures. Star symbols show the position of the target molecules, in which the chemical structures are indicated at lower right side, in the complex state

the loop-loop interaction is stabilized in the presence of magnesium ions, the aptamer domain requires higher magnesium concentration to exhibit higher binding affinity with adenine in dilute solution. Under the molecular crowding environment induced by PEG200, the aptamer domain of the addA riboswitch shows higher binding affinity compared to the dilute aqueous solution, especially in the presence of low magnesium concentration such as concentrations less than 0.5 mM (Kumar et al. 2012). Although PEG200, unlike magnesium ions, did not induce the formation of tertiary structure consisting of the loop-loop interaction, it increased the binding affinity of the target metabolite. The increased affinity is caused by the promotion of dehydration accompanied by the formation of the loop-loop interaction (Fig. 19). Similar enhancement of binding affinity under the crowding environment was observed with an aptamer domain derived from flavin mononucleotide (FMN) riboswitch, in which the secondary structure consists of a five-way junction. In the complex state with the target metabolite, FMN, the aptamer domain forms several tertiary interactions including loop-loop interactions (Serganov et al. 2009). In this case, even if the solution did not contain magnesium ions, the binding affinity of the aptamer domain toward FMN in the presence of 15 wt% PEG200 was comparable to that in the aqueous solution containing 2 mM MgCl2 (Rode et al. 2018). Despite

1158

T. Endoh et al.

Fig. 19 Dynamic behavior of aptamer domains derived from riboswitches during interaction of target metabolite under a crowding environment with reduced water activity. Dehydration with tertiary structure formation of the aptamer domain enhances the affinity of the metabolite even in a condition of low or no magnesium ions

similar binding affinities between these two conditions, the interaction between the aptamer domain and FMN under the crowding environment with 15 wt% PEG200 showed larger enthalpy and entropy changes than that in the solution containing MgCl2, suggesting conformational dynamics of the aptamer domain that agrees with induced-fit behavior of the interaction (Fig. 19). In contrast to the aptamer domain of a natural riboswitch, artificially selected RNA aptamer for FMN, which consists of a simple secondary structure containing an internal loop region, the interaction affinity totally depended on magnesium concentration and the binding was not observed in the absence of magnesium and presence of 30 wt% PEG200. From the viewpoint that the relatively complex tertiary structures of the aptamer domains of riboswitches are well conserved, their complex tertiary structures are expected to be advantageous to function in cellular crowding environments. Particularly, based on the functional mechanism of riboswitches, the dynamic behavior of RNA conformation upon metabolite interaction, which is typified by induced-fit binding, is considered to have a fundamental role to modulate gene expressions.

Biological Reactions Influenced by Nucleic Acid Structures and Their Stabilities Whether in prokaryote or in eukaryote, life system maintains homeostasis by dynamically modulating gene expressions at almost every reaction process according to the central dogma. Under the molecular crowding environment inside cells, nucleic acids exhibit dynamic properties in their structures and stabilities depending on the physicochemical factors described above. Based on the dynamic features of nucleic acids and cellular environmental factors, it is envisioned that

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1159

Fig. 20 Concept of dimensional code of central dogma that dynamically modulates gene expressions depending on non-double helical nucleic acid structures, in which formation dynamically changes in response to the intracellular environments. (Reprinted and adopted with permission from Sugimoto et al. (2021). Copyright The Chemical Society of Japan)

nucleic acids play important roles in modulating gene expressions based on their structures and stabilities; some examples are already described above. Such kinds of modulation can be hypothesized as a dimensional code in the central dogma that are different from the regulatory sequence code depending on the primary sequence in the genetic information (Fig. 20) (Sugimoto et al. 2021).

Effects of Nucleic Acid Structures on DNA Replication In order to maintain the genetic information of a specific organism and transmit it to the next generation, the sequence of genomic DNA should be accurately replicated and provided to each daughter cell when cells proliferate. DNA polymerase, which is an enzyme responsible for replication, and its supportive proteins, such as helicases and single-stranded DNA binding proteins, unwind the DNA duplex and replication proceeds using each of the strands as a template. When the DNA strands exist in a single-stranded state at a certain time window, sequence regions with unique composition potentially form non-double helical structures depending on the environmental factors, resulting in replication errors including mutation and recombination. For example, the formation of a hairpin structure on DNA is involved in contraction

1160

T. Endoh et al.

and expansion of tandem repeat sequences through replication- and repair-mediated formation of junction structures as described above (McMurray 2010). A replication reaction starts from a DNA region called as replication origin or simply origin, where protein factors responsible for the initiation of replication assemble. Usually, origins are distributed in the whole genome. For example, the human genome contains numerous origins ranging from 30,000 to 50,000 that enable the timescale to replicate the whole genome in approximately 5–6 h although the reaction rate of the DNA polymerase is several tens of nucleotides per second. Replication from the origins does not start at once and there is an order to be started. The timing of the replication initiation is influenced by the accessibility of protein factors for DNA replication, playing important roles for the correct assembly of chromatin involved in the organization of special distribution in nucleus, centromere function, chromosome cohesion, and genome stability (Gilbert 2002; Lanctot et al. 2007). The formation of non-double helical structures on DNA potentially controls the timing of the replication initiation. For example, the formation of a G4 structure at the region of origin has suggested to enhance replication initiation through several mechanisms such as nucleosome exclusion to unwrap the DNA region, acceleration of DNA melting to promote the formation of a replication fork, and direct recruitment of protein factors for replication initiation (Bryan 2019). During elongation phase of the replication, DNA polymerase and associated proteins form a replication fork, which progressively dissociates parental DNA duplex into two single strands, both of which are used as a template for duplicating genetic information. Because DNA polymerase unidirectionally synthesizes new DNA from 50 to 30 , one of the two strands known as leading strand can be synthesized continuously with the progression of the replication fork, while the other strand known as lagging strand is synthesized through the formation of fragmented duplexes called Okazaki fragments, which are later connected by ligation. Before initiation of the Okazaki fragment synthesis, the lagging strand, which is 100–200 nucleotides long in eukaryotes, is in a single-stranded state. Although a single-stranded DNA binding protein called replication protein A interacts with the region to support the effective and accurate synthesis of the Okazaki fragments, the single-stranded regions potentially form secondary and tertiary structure elements, which impact the progressivity of the DNA synthesis. For example, triplex formation is suggested to induce genome instability during replication that leads to human disorders including cancer (Bacolla et al. 2015). G4 and i-motif structures function to disturb replication and cause a temporary halt to the action of DNA polymerase. Investigations of replication reactions using quantitative study of topologydependent replication (QSTR), which analyzes a phase diagram of the kinetic rate constant for the DNA polymerase to overcome the structured region versus thermodynamic stability of the structure formed on the template strand, have suggested that rate constants depend not only on stabilities but also on topological features of the DNA structure (Fig. 21) (Takahashi and Sugimoto 2021). Analysis using QSTR suggested a different replication mechanism to overcome the DNA region forming different structures. In addition, the mechanisms were altered between a diluted aqueous solution and a crowding environment. Furthermore, the phase diagram of

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1161

Fig. 21 Quantitative study of topology-dependent replication (QSTR) to evaluate the effects of non-double helical nucleic acid structures on replication reaction. The topology dependency of replication inhibition is shown as a difference in the slope of the QSTR plots. The slope values and correlations are altered depending on not only the topology of the structure formed on template strand but also the molecular environment in the system. (Reprinted with permission from Takahashi and Sugimoto (2021). Copyright 2021 American Chemical Society)

QSTR analyses could be applied for designing chemical ligands, which efficiently suppress the replication reaction (Takahashi et al. 2021).

Effects of Nucleic Acid Structures on RNA Transcription RNA transcription is the first step for the expression of proteins encoded in the genome. Transcription is regulated at various steps of the process including initiation, elongation, and termination. The initiation step is particularly important for determining the expression levels of proteins. In both prokaryotes and eukaryotes, protein factors called as transcription factors recognize a specific region on the target gene called promoter and recruit RNA polymerase, an enzyme responsible for initiating RNA synthesis. Various studies have reported non-double helical and unusual helical structures on template DNA that affect transcriptional activities through direct interaction with protein factors or through alteration of the structural state of chromatin. For example, sequences having the potential to form left-handed Z-DNA are enriched near the transcriptional start sites in human genes, especially at the regions actively transcribing RNAs. Z-DNA enhances transcription levels by stabilizing or inducing open chromatin structure (Shin et al. 2016). G4s are suggested to influence transcription activity through direct interaction with protein factors in addition to alteration of the chromatin structure. Various proteins interacting with G4 structure have been analyzed and registered in G-quadruplex structure (G4) Interacting Proteins DataBase (G4IPDB) (Mishra et al. 2016). They have the potential to both stabilize and destabilize the G4 structures that impacts the

1162

T. Endoh et al.

transcription activities (Spiegel et al. 2020). Because the G4-mediated alteration of transcriptional levels is involved in the determination of cellular phenotypes including progression of diseases, structures and stabilities of G4s on the template strands are emerging as epigenetic regulators of the transcriptional reaction. The formation of non-double helical structures on a region that is decoded by RNA polymerase generally disturbs transcriptional elongation. Non-double helical structures include Z-DNA, triplex, and quadruplex. In the case of G4s formed on a non-template strand, G4 formation has potential to promote transcription by the successive formation of an R-loop, in which nascent RNA invades into upstream double-stranded DNA and forms a DNA/RNA hybrid (Lee et al. 2020). However, it has also been demonstrated that the formation of DNA/RNA hybrid G4 structures through R-loop on the template strand suppresses the levels of RNA transcripts (Zheng et al. 2013). Alteration of RNA polymerase activity during elongation not only results in a change in RNA transcript levels but also causes transcriptional mutations (Fig. 22)

Fig. 22 A schematic illustration of aberrant transcription. Compared to the normal run-off transcription, a longer transcript is produced when RNA polymerase causes slip during transcription. The amount of transcript is reduced when RNA polymerase causes a temporary pause. An immature transcript is produced when RNA polymerase causes arrest and transcription is terminated. (Reprinted from Tateishi-Karimata et al. (2014)).

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1163

(Tateishi-Karimata et al. 2014). In the case of G4 structures on the template strand of DNA, moderately stable ones induce slippage of RNA polymerase, which produces RNA transcripts longer than the template strand. In contrast, sufficiently stable ones induce arrest of RNA polymerase that results in immature and aberrant termination of transcriptional reaction. Transcriptional arrest was induced when the stability of the G4 structure(ΔGo37) in the presence of 20 wt% PEG 200 was more than 8.2 kcal mol1, and the amount of the arrested RNA transcripts increased depending on their thermodynamic stability (Tateishi-Karimata et al. 2014). Transcriptional mutations are associated with epigenetic phenotypes such as hypolipoproteinemia and hemophilia, and β-amyloid accumulation in the human central nervous system. G4-mediated transcriptional mutation is also suggested to be involved in the progression of cancer cells due to the destabilization of G4s in response to an intracellular environment with reduced potassium concentration caused by a reduced expression of potassium channel protein. Thus, investigating the relationship between the production of mutated RNA transcripts and formation of non-double helical structures under an intracellular crowding environment is becoming an important subject of research.

Effects of Nucleic Acid Structures on Protein Translation Translation is the process by which proteins, which are the main functional molecules in cells, are produced by polymerizing amino acids in a defined order based on the genetic information encoded in the primary sequence of nucleic acids. Messenger RNA (mRNA) produced by transcription from genomic DNA functions as a template for translation. Ribosome, which is a macromolecular complex of ribosomal proteins and ribosomal RNAs, catalyzes the polymerization of amino acids according to the mRNA by using an aminoacyl-transfer RNA as a substrate. The ribosome first binds 7-methyl guanylate (m7G) cap structure at the 50 end of eukaryotic mRNA or a ribosomal binding site (RBS), which is also known as Shine-Dalgarno sequence, in prokaryotic mRNA with associated protein factors known as initiation factors. Then, the ribosome moves along the mRNA and initiates translation from an initiation codon, which generally encodes methionine. Because the RNA transcripts are intrinsically single stranded, RNAs form secondary and tertiary structures through intramolecular interactions as well as interactions with co-solute molecules such as cations as described above. Stable structures formed on mRNA have the potential to affect the translation reaction in various ways (Fig. 23) (Endoh and Sugimoto 2017). In the 50 untranslated region (UTR) of eukaryotic mRNAs, it has been suggested that hairpin structures, for which stabilities predicted by the nearest-neighbor classical model under a diluted aqueous solution are in the range of 30 to 50 kcal mol1, reduce the protein expression level depending on their stabilities (Kozak 1986). The range of 30 to 50 kcal mol1 required to reduce the translational efficiency is seemingly highly stable. However, the double helical structures are destabilized in the crowding environment as described above. Thus, despite the fact that eIF4A, one of the eukaryotic initiation factors, unwinds the

1164

T. Endoh et al.

Fig. 23 Examples of reactions in translation, which are potentially influenced by stable RNA structures

mRNA structures by its helicase activity, ribosome progression is potentially affected by RNA structures that have substantially lower thermal stabilities. From the viewpoint of the unwinding process of mRNA, in addition to thermal stability, the structural properties including topology and dynamics formed on mRNA impact ribosome progression. Non-double helical structures tend to resist unwinding and thus strongly suppress the ribosome moving as a roadblock. In addition, the non-double helical structures tend to be stabilized under the crowding environments. For example, it has been reported that triplexes and quadruplexes formed on 50 UTR of mRNA efficiently reduce protein expression levels (Bugaut and Balasubramanian 2012). When stable non-double helical structures are formed on an open reading frame, a region that encodes the order of amino acids to be polymerized, they affect not only the protein expression levels but also the protein functionalization processes through induction of arrhythmic translation such as a temporary halt and reduced progression rate. For example, H-type RNA pseudoknot, which is one of the simplest pseudoknots consisting of a loop region of hairpin RNA base paired with a region close to the hairpin, is known to halt translation elongation. In some viral mRNAs, pseudoknots are involved in a ribosomal frameshift during translation that produces two types of protein products having different C-termini from one mRNA. Physicochemical properties of the pseudoknot structures, such as torsional resistance and

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1165

mechanical strength during unfolding, and possibly formation of alternate incompletely folded RNA structures are suggested to be important to determine the efficiency of ribosomal frameshifts (Endoh and Sugimoto 2017). G4 structures formed on an open reading frame of mRNA also suppress translation elongation. The suppression efficiency depends on a positional relationship between the 50 end of guanine tracts involved in the G4 structure and the codon, where the ribosome stalls. A change in the relationship at single nucleotide level changed the efficiency of the translation suppression. Thus, insertion of the G4 structure at the appropriate position downstream of slippery sequence, which stimulates ribosomal frameshift, enhances the efficiency of the frameshift. Arrhythmic translation also affects protein structure because the folding of nascent protein occurs during elongation, a process called co-translational protein folding. A temporary halt in translation elongation would provide time window for the nascent protein to fold some domain structures on the ribosome. Synonymous mutations from abundant to rare codons and vice versa have altered structures and functions of translated products due to alteration of the translation elongation rate and the kinetic landscape of the co-translational protein folding (Sharma and O’Brien 2018). From the same viewpoint, arrhythmic translation caused by non-double helical structure on mRNA potentially impacts the co-translational folding of the protein. An example is a G4 structure formed on the mRNA of human estrogen receptor alpha that impacted the sensitivity of the translated product toward the proteolysis inside cells (Endoh et al. 2013). The proteolysis sensitivity depended on the stability of the G4 variants under the crowding environment. Since G4 stability is sensitive not only to cation concentration but also water activity, the folding of the natural estrogen receptor alpha is expected to be perturbed in response to such physicochemical factors inside cells, despite the mRNA having the same G4-forming sequence. Based on the phenomena, the non-double helical structures on mRNA are expected to function like protein folding codes, which could modulate the translation elongation rate and co-translational protein folding.

Effects of Nucleic Acid Structures on Concurrent Reactions One of the characteristic features of the biological reactions in gene expression processes, such as replication, transcription, and translation, is unidirectional polymerization. In RNA synthesis during transcription, the region at 50 end is synthesized prior to that at 30 end and the formation of nascent RNA structures occurs co-transcriptionally. The co-transcriptional folding process provides conformational dynamics of the RNAs because of the sequential folding. Immaturely folded metastable structure during the transcription transitions to the most stable mature structure after the 30 region is synthesized. The conformational dynamics of nascent RNAs can be a trigger of gene regulation. One of the examples is the modulation of transcription termination by the riboswitch. The formation of a transcription termination signal is modulated by the conformational transition of RNA upon binding of target metabolite during transcription. Not only transcription but also other reaction steps such as posttranscriptional RNA editing typified by alternative splicing, and

1166

T. Endoh et al.

translation can be modulated by the RNA conformational dynamics because the reactions in central dogma occur concurrently and consecutively. In general, the formation of secondary structures in nucleic acids is kinetically faster than that of tertiary structures. Thus, if a metastable secondary structure, which is different from the secondary structure involved in the mature functional RNA, is formed during transcription, the metastable structure potentially induces further misfolding of the 30 region synthesized later that results in a dysfunctional RNA (Endoh and Sugimoto 2017). In contrast, metastable secondary structures formed during transcription transform to thermodynamically more stable structures more rapidly in cells than in vitro diluted solution (Mahen et al. 2010). Contributions of proteins such as helicases and other RNA interacting proteins cause the unstructured state of RNA inside cells, and make it faster to form the most stable structures. In addition, destabilization of the secondary structure units under the crowding environment with reduced water activity would reduce the energy required to dissociate the metastable secondary structures that facilitate the transition rate to the most stable structure. For example, the folding rate of G4 is generally slower than simple secondary structural units. Therefore, the co-transcriptional folding of nascent RNA tends to form metastable secondary structures in the region with G4-forming potential, in which the secondary structure subsequently shows posttranscriptional transition to G4. The rate of posttranscriptional transition to G4 was demonstrated to be accelerated under the crowding environment induced by PEG200 (Endoh et al. 2016). As it is described above, the formation of G4s on mRNA has been demonstrated to impact translation. Given that the transition rate from the metastable secondary structure to G4 structure is affected by the crowding environment, it is possible that the effect of G4 on translation varies in response to the intracellular molecular environment. Additionally, because biological reactions occur concurrently and consecutively, a balance in the timescales of conformational dynamics and gene expression processes is also important. For example, translation in prokaryote starts before the transcription of mRNA finishes. When the co-transcriptionally folded metastable structure is kept until the ribosome translates the region, G4 functioning as a roadblock of the translation is diminished. However, when there

Fig. 24 Effects of RNA conformational dynamics through co-transcriptionally formed metastable structure on gene expression in different biological systems such as eukaryotes and prokaryotes. An intracellular crowding environment accelerates the conformational transition from metastable to the thermodynamically most stable conformer

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1167

is sufficient time lag between transcription and translation, such as time lag in eukaryote with post-transcriptional translation, the RNA can cause conformational transition to the G4 structure that enables efficient suppression of translation (Endoh and Sugimoto 2019) (Fig. 24).

Conclusion In this chapter, effects of molecular crowding environments on structures, stabilities, and functions of nucleic acids are described, especially focusing on general properties of the crowding environment and physicochemical factors that are altered in the crowding environment. Crowding environments impact nucleic acids mainly through their excluded volume caused by the proprietary nature of physical space, reduction of dielectric constant in solution due to their dielectric constant being lower than that of water molecules, and reduction of water activity due to hydration of co-solutes that deprive free water in solution. In such crowding environments, canonical duplexes consisting of Watson-Crick base pairs tend to be destabilized; in contrast, noncanonical structures including compact tertiary structures are stabilized compared to their presence in dilute solution. In addition, nucleic acids tend to show dynamic behaviors in crowding environments. The sequence complementarity based on Watson-Crick base pairs in the canonical duplex is fundamental for maintaining genetic information in all life forms from one generation to the next. Any mutation in the sequence information is continuously retained in the organism or later generations that is critical to the phenotype of the organisms. On the other hand, the structures and stabilities of nucleic acids can be altered temporarily and reversibly depending on the changes in the physical and chemical factors in cells. Based on the above-described examples, the crowding environment is one of the causes to alter the factors. Although the temporary and reversible changes may not seem to be significant, by considering the effects on gene expression processes and chemical reactions such as self-splicing, the structures and stabilities of nucleic acids including their dynamics in the crowding conditions can impact the activities of organisms that may also be a trigger of the subsequent mutation of genomic information.

References Akabayov B, Akabayov SR, Lee SJ, Wagner G, Richardson CC (2013) Impact of macromolecular crowding on DNA replication. Nat Commun 4:1615 Bacolla A, Wang G, Vasquez KM (2015) New perspectives on DNA and RNA triplexes as effectors of biological activity. PLoS Genet 11:e1005696 Bae S, Kim D, Kim KK, Kim YG, Hohng S (2011) Intrinsic Z-DNA is stabilized by the conformational selection mechanism of Z-DNA-binding proteins. J Am Chem Soc 133: 668–671 Biffi G, Tannahill D, McCafferty J, Balasubramanian S (2013) Quantitative visualization of DNA G-quadruplex structures in human cells. Nat Chem 5:182–186

1168

T. Endoh et al.

Bryan TM (2019) Mechanisms of DNA replication and repair: insights from the study of G-quadruplexes. Molecules 24 Bugaut A, Balasubramanian S (2012) 5'-UTR RNA G-quadruplexes: translation regulation and targeting. Nucleic Acids Res 40:4727–4741 Burge S, Parkinson GN, Hazel P, Todd AK, Neidle S (2006) Quadruplex DNA: sequence, topology and structure. Nucleic Acids Res 34:5402–5415 Buscaglia R, Miller MC, Dean WL, Gray RD, Lane AN, Trent JO, Chaires JB (2013) Polyethylene glycol binding alters human telomere G-quadruplex structure by conformational selection. Nucleic Acids Res 41:7934–7946 Buske FA, Mattick JS, Bailey TL (2011) Potential in vivo roles of nucleic acid triple-helices. RNA Biol 8:427–439 Cai S, Yan J, Xiong H, Liu Y, Peng D, Liu Z (2018) Investigations on the interface of nucleic acid aptamers and binding targets. Analyst 143:5317–5338 Campos M, Surovtsev IV, Kato S, Paintdakhi A, Beltran B, Ebmeier SE, Jacobs-Wagner C (2014) A constant size extension drives bacterial cell size homeostasis. Cell 159:1433–1446 Chalikian TV, Volker J, Srinivasan AR, Olson WK, Breslauer KJ (1999) The hydration of nucleic acid duplexes as assessed by a combination of volumetric and structural techniques. Biopolymers 50:459–4571 Desai R, Kilburn D, Lee H-T, Woodson SA (2014) Increased ribozyme activity in crowded solutions*. J Biol Chem 289:2972–2977 Dix JA, Verkman AS (2008) Crowding effects on diffusion in solutions and cells. Annu Rev Biophys 37:247–263 Ellington AD, Szostak JW (1990) In vitro selection of RNA molecules that bind specific ligands. Nature 346:818–822 Endoh T, Sugimoto N (2017) Conformational dynamics of mRNA in gene expression as new pharmaceutical target. Chem Rec (New York) 17:817–832 Endoh T, Sugimoto N (2019) Conformational dynamics of the RNA G-quadruplex and its effect on translation efficiency. Molecules 24:1613 Endoh T, Kawasaki Y, Sugimoto N (2013) Stability of RNA quadruplex in open reading frame determines proteolysis of human estrogen receptor α. Nucleic Acids Res 41:6222–6231 Endoh T, Rode AB, Takahashi S, Kataoka Y, Kuwahara M, Sugimoto N (2016) Real-time monitoring of G-quadruplex formation during transcription. Anal Chem 88:1984–1989 Gellert M, Lipsett MN, Davies DR (1962) Helix formation by guanylic acid. Proc Natl Acad Sci U S A 48:2013–2018 Ghosh S, Takahashi S, Ohyama T, Endoh T, Tateishi-Karimata H, Sugimoto N (2020) Nearestneighbor parameters for predicting DNA duplex stability in diverse molecular crowding conditions. Proc Natl Acad Sci 117:14194–14201 Gilbert DM (2002) Replication timing and transcriptional control: beyond cause and effect. Curr Opin Cell Biol 14:377–383 Heddi B, Phan AT (2011) Structure of human telomeric DNA in crowded solution. J Am Chem Soc 133:9824–9833 Jana J, Mohr S, Vianney YM, Weisz K (2021) Structural motifs and intramolecular interactions in non-canonical G-quadruplexes. RSC Chem Biol 2:338–353 Ji J, Hogan ME, Gao X (1996) Solution structure of an antiparallel purine motif triplex containing a T.CG pyrimidine base triple. Structure 4:425–435 Kim D, Lee YH, Hwang HY, Kim KK, Park HJ (2010) Z-DNA binding proteins as targets for structure-based virtual screening. Curr Drug Targets 11:335–344 Kozak M (1986) Influences of mRNA secondary structure on initiation by eukaryotic ribosomes. Proc Natl Acad Sci 83:2850–2854 Kruger K, Grabowski PJ, Zaug AJ, Sands J, Gottschling DE, Cech TR (1982) Self-splicing RNA: autoexcision and autocyclization of the ribosomal RNA intervening sequence of Tetrahymena. Cell 31:147–157

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1169

Kumar V, Endoh T, Murakami K, Sugimoto N (2012) Dehydration from conserved stem regions is fundamental for ligand-dependent conformational transition of the adenine-specific riboswitch. Chem Commun (Camb) 48:9684–9686 Lambert D, Leipply D, Draper DE (2010) The osmolyte TMAO stabilizes native RNA tertiary structures in the absence of Mg2+: evidence for a large barrier to folding from phosphate dehydration. J Mol Biol 404:138–157 Lanctot C, Cheutin T, Cremer M, Cavalli G, Cremer T (2007) Dynamic genome architecture in the nuclear space: regulation of gene expression in three dimensions. Nat Rev Genet 8:104–115 Leamy KA, Yennawar NH, Bevilacqua PC (2017) Cooperative RNA folding under cellular conditions arises from both tertiary structure stabilization and secondary structure destabilization. Biochemistry 56:3422–3433 Lee C-Y, McNerney C, Ma K, Zhao W, Wang A, Myong S (2020) R-loop induced G-quadruplex in non-template promotes transcription by successive R-loop formation. Nat Commun 11:3392 Leontis NB, Kwok W, Newman JS (1991) Stability and structure of three-way DNA junctions containing unpaired nucleotides. Nucleic Acids Res 19:759–766 Li L, Li C, Zhang Z, Alexov E (2013) On the dielectric “constant” of proteins: smooth dielectric function for macromolecular modeling and its implementation in DelPhi. J Chem Theory Comput 9:2126–2136 Mahen EM, Watson PY, Cottrell JW, Fedor MJ (2010) mRNA secondary structures fold sequentially but exchange rapidly in vivo. PLoS Biol 8:e1000307 McMurray CT (2010) Mechanisms of trinucleotide repeat instability during human development. Nat Rev Genet 11:786–799 Minton AP (2001) The influence of macromolecular crowding and macromolecular confinement on biochemical reactions in physiological media. J Biol Chem 276:10577–10580 Mishra SK, Tawani A, Mishra A, Kumar A (2016) G4IPDB: a database for G-quadruplex structure forming nucleic acid interacting proteins. Sci Rep 6:38144 Miyoshi D, Karimata H, Sugimoto N (2006) Hydration regulates thermodynamics of G-quadruplex formation under molecular crowding conditions. J Am Chem Soc 128:7957–7963 Miyoshi D, Nakamura K, Tateishi-Karimata H, Ohmichi T, Sugimoto N (2009) Hydration of Watson-Crick base pairs and dehydration of Hoogsteen base pairs inducing structural polymorphism under molecular crowding conditions. J Am Chem Soc 131:3522–3531 Muhuri S, Mimura K, Miyoshi D, Sugimoto N (2009) Stabilization of three-way junctions of DNA under molecular crowding conditions. J Am Chem Soc 131:9268–9280 Nakano S-I, Karimata HT, Kitagawa Y, Sugimoto N (2009) Facilitation of RNA enzyme activity in the molecular crowding media of cosolutes. J Am Chem Soc 131:16881–16888 Nakano S-I, Kitagawa Y, Miyoshi D, Sugimoto N (2014a) Hammerhead ribozyme activity and oligonucleotide duplex stability in mixed solutions of water and organic compounds. FEBS Open Bio 4:643–650 Nakano S, Miyoshi D, Sugimoto N (2014b) Effects of molecular crowding on the structures, interactions, and functions of nucleic acids. Chem Rev 114:2733–2758 Patel A, Malinovska L, Saha S, Wang J, Alberti S, Krishnan Y, Hyman AA (2017) ATP as a biological hydrotrope. Science 356:753–756 Pramanik S, Nagatoishi S, Sugimoto N (2012) DNA tetraplex structure formation from human telomeric repeat motif (TTAGGG):(CCCTAA) in nanocavity water pools of reverse micelles. Chem Commun (Camb) 48:4815–4817 Predeus AV, Gul S, Gopal SM, Feig M (2012) Conformational sampling of peptides in the presence of protein crowders from AA/CG-multiscale simulations. J Phys Chem B 116:8610–8620 Preisler RS, Chen HH, Colombo MF, Choe Y, Short Jr BJ, Rau DC (1995) The B form to Z form transition of poly(dG-m5dC) is sensitive to neutral solutes through an osmotic stress. Biochemistry 34:14400–14407 Puglisi JD, Tinoco I (1989) Absorbance melting curves of RNA. In: Methods in enzymology. Academic Press, pp 304–325

1170

T. Endoh et al.

Rajendran A, Nakano S, Sugimoto N (2010) Molecular crowding of the cosolutes induces an intramolecular i-motif structure of triplet repeat DNA oligomers at neutral pH. Chem Commun (Camb) 46:1299–1301 Rode AB, Endoh T, Sugimoto N (2018) Crowding shifts the FMN recognition mechanism of riboswitch aptamer from conformational selection to induced fit. Angew Chem Int Ed 57: 6868–6872 Rothemund PW (2006) Folding DNA to create nanoscale shapes and patterns. Nature 440:297–302 Selvam S, Mandal S, Mao H (2017) Quantification of chemical and mechanical effects on the formation of the G-quadruplex and i-motif in duplex DNA. Biochemistry 56:4616–4625 Serganov A, Yuan YR, Pikovskaya O, Polonskaia A, Malinina L, Phan AT, Hobartner C, Micura R, Breaker RR, Patel DJ (2004) Structural basis for discriminative regulation of gene expression by adenine- and guanine-sensing mRNAs. Chem Biol 11:1729–1741 Serganov A, Huang L, Patel DJ (2009) Coenzyme recognition and gene regulation by a flavin mononucleotide riboswitch. Nature 458:233–237 Sharma AK, O'Brien EP (2018) Non-equilibrium coupling of protein structure and function to translation-elongation kinetics. Curr Opin Struct Biol 49:94–103 Shin SI, Ham S, Park J, Seo SH, Lim CH, Jeon H, Huh J, Roh TY (2016) Z-DNA-forming sites identified by ChIP-Seq are associated with actively transcribed regions in the human genome. DNA Res 23:477–486 Singleton CK, Klysik J, Stirdivant SM, Wells RD (1982) Left-handed Z-DNA is induced by supercoiling in physiological ionic conditions. Nature 299:312–316 Spiegel J, Adhikari S, Balasubramanian S (2020) The structure and function of DNA G-quadruplexes. Trends Chem 2:123–136 Stryer L (1965) The interaction of a naphthalene dye with apomyoglobin and apohemoglobin. A fluorescent probe of non-polar binding sites. J Mol Biol 13:482–495 Stühmeier F, Welch JB, Murchie AI, Lilley DM, Clegg RM (1997) Global structure of three-way DNA junctions with and without additional unpaired bases: a fluorescence resonance energy transfer analysis. Biochemistry 36:13530–13538 Sugimoto N (2014) Noncanonical structures and their thermodynamics of DNA and RNA under molecular crowding: beyond the Watson-Crick double helix. Int Rev Cell Mol Biol 307: 205–273 Sugimoto N, Kierzek R, Turner DH (1987) Sequence dependence for the energetics of terminal mismatches in ribooligonucleotides. Biochemistry 26:4559–4562 Sugimoto N, Endoh T, Takahashi S, Tateishi-Karimata H (2021) Chemical biology of double helical and non-double helical nucleic acids: “to B or not to B, that is the question”. Bull Chem Soc Jpn 94:1970–1998 Sun D, Hurley LH (2009) The importance of negative superhelicity in inducing the formation of G-quadruplex and i-motif structures in the c-Myc promoter: implications for drug targeting and control of gene expression. J Med Chem 52:2863–2874 Suseela YV, Narayanaswamy N, Pratihar S, Govindaraju T (2018) Far-red fluorescent probes for canonical and non-canonical nucleic acid structures: current progress and future implications. Chem Soc Rev 47:1098–1131 Takahashi S, Sugimoto N (2013) Effect of pressure on the stability of G-quadruplex DNA: thermodynamics under crowding conditions. Angew Chem Int Ed 52:13774–13778 Takahashi S, Sugimoto N (2021) Watson–Crick versus Hoogsteen Base pairs: chemical strategy to encode and express genetic information in life. Acc Chem Res 54:2110–2120 Takahashi S, Kotar A, Tateishi-Karimata H, Bhowmik S, Wang Z-F, Chang T-C, Sato S, Takenaka S, Plavec J, Sugimoto N (2021) Chemical modulation of DNA replication along G-quadruplex based on topology-dependent ligand binding. J Am Chem Soc 143:16458–16469 Tateishi-Karimata H, Sugimoto N (2012) A-T base pairs are more stable than G-C base pairs in a hydrated ionic liquid. Angew Chem Int Ed 51:1416–1419

35

Effects of Molecular Crowding on Structures and Functions of Nucleic Acids

1171

Tateishi-Karimata H, Isono N, Sugimoto N (2014) New insights into transcription fidelity: thermal stability of non-canonical structures in template DNA regulates transcriptional arrest, pause, and slippage. PLoS One 9:e90580 Tateishi-Karimata H, Nakano M, Pramanik S, Tanaka S, Sugimoto N (2015) i-Motifs are more stable than G-quadruplexes in a hydrated ionic liquid. Chem Commun (Camb) 51:6909–6912 Teng Y, Tateishi-Karimata H, Sugimoto N (2020) RNA G-quadruplexes facilitate RNA accumulation in G-rich repeat expansions. Biochemistry 59:1972–1980 Trajkovski M, Endoh T, Tateishi-Karimata H, Ohyama T, Tanaka S, Plavec J, Sugimoto N (2018) Pursuing origins of (poly)ethylene glycol-induced G-quadruplex structural modulations. Nucleic Acids Res 46:4301–4315 Tuerk C, Gold L (1990) Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249:505–510 Verdolino V, Cammi R, Munk BH, Schlegel HB (2008) Calculation of pKa values of nucleobases and the guanine oxidation products guanidinohydantoin and spiroiminodihydantoin using density functional theory and a polarizable continuum model. J Phys Chem B 112:16860–16873 Verkman AS (2002) Solute and macromolecule diffusion in cellular aqueous compartments. Trends Biochem Sci 27:27–33 Winkler W, Nahvi A, Breaker RR (2002) Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419:952–956 Wölfl S, Wittig B, Rich A (1995) Identification of transcriptionally induced Z-DNA segments in the human c-myc gene. Biochim Biophys Acta 1264:294–302 Woolley P, Wills PR (1985) Excluded-volume effect of inert macromolecules on the melting of nucleic acids. Biophys Chem 22:89–94 Yakovchuk P, Protozanova E, Frank-Kamenetskii MD (2006) Base-stacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res 34:564–574 Yancey PH (2015) Water stress, osmolytes and proteins1. Am Zool 41:699–709 Zeraati M, Langley DB, Schofield P, Moye AL, Rouet R, Hughes WE, Bryan TM, Dinger ME, Christ D (2018) I-motif DNA structures are formed in the nuclei of human cells. Nat Chem 10: 631–637 Zheng KW, Xiao S, Liu JQ, Zhang JY, Hao YH, Tan Z (2013) Co-transcriptional formation of DNA:RNA hybrid G-quadruplex and potential function as constitutional cis element for transcription control. Nucleic Acids Res 41:5533–5541

Structure-Guided Optimization of siRNA and Anti-miRNA Properties

36

Kevin M. Pham and Peter A. Beal

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RNAi and MicroRNA Pathways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structures of Argonaute Protein Domains and Argonaute-RNA Complexes . . . . . . . . . . . . . . . . . siRNA Modifications Whose Design Was Inspired by Ago2-RNA Complexes . . . . . . . . . . . . . . siRNA and Anti-miRNA Modifications from Computational Screening . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1174 1174 1176 1177 1182 1195 1196

Abstract

Chemical modifications to therapeutic nucleic acids are used to modulate properties such as nuclease resistance, target engagement, enzymatic activity, and cell uptake. Therapeutic nucleic acids have different mechanisms of action and often engage nucleic acid binding proteins to elicit their effects. In some cases, highresolution structures of important effector protein-RNA complexes are known and can be used to guide the design of chemical modifications of the therapeutic nucleic acid. This is the case for siRNAs and anti-miRNAs where crystal structures of RNA-bound complexes of the human effector protein Argonaute 2 (Ago2) have been reported along with structures of isolated domains of Ago2. In this chapter, examples of the use of these structures in the rational design of chemical modifications intended to improve siRNA or anti-miRNA properties are described.

K. M. Pham · P. A. Beal (*) University of California Davis, Davis, CA, USA e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_41

1173

1174

K. M. Pham and P. A. Beal

Introduction Nucleic acid drugs elicit therapeutic effects in a variety of ways. Approved oligonucleotide drugs include antisense oligonucleotides (ASOs) that direct RNase H-mediated cleavage of target RNAs (Crooke 2017), ASOs that promote exon skipping (Aartsma-Rus et al. 2017), and siRNAs whose guide strand directs Ago2 to cleave target transcripts (Wang et al. 2021). In addition, new classes of oligonucleotide drugs are being developed that operate by other mechanisms such as antagonists of miRNAs (anti-miRs) (Lima et al. 2018), single guide RNAs (sgRNAs) for CRISPR-Cas-mediated genome editing (Nowak et al. 2016), and ADAR-directed guide strands for directed therapeutic RNA editing (MontielGonzalez et al. 2019). In several of these examples, high-resolution structures have been reported for key effector protein-RNA complexes. These structures provide useful tools for the discovery of new chemical modifications to enhance the properties of oligonucleotide drug candidates. This chapter will focus on examples of the optimization of siRNA and anti-miRNA properties, including cases where high-resolution structures of domains of Ago2 or Ago2-RNA complexes provided clear inspiration to the design of new analogs. In addition, cases where these structures provided the basis for in silico screening that identified modifications for further testing are described. While it is much more common and often very illuminating to use structural information to explain properties of newly developed oligonucleotide modifications, this “after-the-fact” use of structural data is not the focus of this chapter.

RNAi and MicroRNA Pathways RNA interference (Fig. 1) is a gene regulatory process that uses small interfering RNAs (siRNAs) that base pair with full complementarity to an mRNA target strand and induce mRNA cleavage by the RNAi active component Argonaute2 (Ago2) (Hu et al. 2020). Synthetic siRNAs bearing chemical modifications can be transfected into cells to induce RNAi. The RISC loading complex (RLC), made up of Dicer, TAR RNA binding protein (TRBP), and the RNAse H-like endonuclease Ago2, then binds the siRNA duplex and loads one strand (i.e., the guide strand) into Ago2 and discards the other strand (i.e., the passenger strand) (Wilson and Doudna 2013). Ago2 loaded with guide binds target mRNA that is Watson-Crick complementary to the guide and cleaves it, resulting in inhibition of expression of the gene product (i.e., silencing). Single-stranded siRNAs (ss-siRNAs) have also been used to initiate RNAi (Lima et al. 2012). These oligonucleotides are designed to load directly into Ago2 and function as a guide strand. Because they lack the protective effect of an RNA duplex and bypass the RLC, ss-siRNAs require extensive chemical modification for metabolic stability and efficient Ago2 binding. On the other hand, microRNAs (miRNAs) are transcribed in the nucleus and can undergo processing into mature miRNA duplexes by the Drosha/DGCR8 complex followed by Dicer and TRBP. MiRNA and siRNA duplexes contain 30 -overhangs on both the passenger

36

Structure-Guided Optimization of siRNA and Anti-miRNA Properties

1175

Fig. 1 RNAi pathway by siRNA or miRNA

and guide strands. Mature miRNAs are loaded into Ago2, and the resulting complex binds to target mRNAs with complementarity to the seed region of the miRNA (nucleotides 2–8) resulting in reduced expression of those targets. Because siRNAs also possess the miRNA seed sequence, miRNA-like off targeting effects can occur and are therefore a particular challenge to effective siRNA-based therapies (Jackson and Linsley 2010).

1176

K. M. Pham and P. A. Beal

Structures of Argonaute Protein Domains and Argonaute-RNA Complexes Crystal structures of human Ago2 (hAgo2) have aided efforts to design chemical modifications of RNAi-related oligonucleotides for potential therapeutic effects (Schirle and MacRae 2012; Schirle et al. 2014, 2015). Briefly, hAgo2 has a bilobed structure and can cradle a target RNA strand when bound to a loaded guide strand (Fig. 2a, b). The PAZ domain anchors the 30 -end of an siRNA guide or miRNA strand (g20, g21 in Fig. 2c), whereas a phosphate-binding pocket within the MID domain can bind the phosphorylated 50 -end (Fig. 2d) (Lingel et al. 2004; Frank et al. 2010). Indeed, prior to the publication of structures of full length Ago2 bound to RNA, structures of isolated PAZ and MID domains were reported and were used to inform the design of novel siRNA modifications (Song et al. 2003; Lingel et al. 2004; Ma et al. 2004; Boland et al. 2010; Frank et al. 2010). The PIWI domain of Ago2 contains an RNase H-like active site and is responsible for slicing activity 53

(A)

N

140 229

N

348

445

L1 PAZ L2

578

MID

859

PIWI

C

(C)

g20

g21 Y311

(B)

5’ 3’

Y338

5’

F294

(D)

3’

H336

R812

K533

K570

Y529 C546

A859 Q545 g1 K566

(E)

Q558 t1A

I477 S561 W1 M437 W2 W4 W3

R438

Fig. 2 The crystal structure of hAgo2 bound to guide and target RNA. (a) Domain map of fulllength hAgo2. (b) Crystal structure of full-length hAgo2 with bound guide-target RNA duplex (PDB: 4W5O) (Schirle et al. 2014). The guide RNA strand is in red, and the target strand is in blue. Key interactions of nucleotides with hAgo2 binding pockets are encased in black boxes. (c) Closeup of PAZ domain interactions with 30 -end guide RNA nucleotide (guide strand nucleotide 21, g21). (d) Close-up of guide strand 50 -phosphate interactions with the MID domain phosphate-binding pocket. An A859 residue (green) from the N domain and an R812 residue (light blue) from the PIWI domain also make contacts within the pocket. g1 ¼ guide strand nucleotide position 1. (e) Close-up of 30 -end target strand nucleotide adenosine (t1A) with hydrogen bonding interactions toward the highly solvated t1-adenosine binding pocket (Schirle et al. 2015). The side chain from S561 makes a direct hydrogen binding contact with the exocyclic amine of the t1-adenosine base. The water molecules in the pocket are indicated as cyan-colored spheres. W# ¼ water molecule number 1–4

36

Structure-Guided Optimization of siRNA and Anti-miRNA Properties

1177

(Song et al. 2004). Furthermore, a solvated, nucleotide-binding pocket located between the L2 and MID domains was found to have higher binding affinity for a target RNA strand with adenosine at position 1 (t1A) over other canonical and modified bases (Fig. 2e) (Schirle et al. 2015).

siRNA Modifications Whose Design Was Inspired by Ago2-RNA Complexes Since the first reports of structures of the isolated domains of human Ago2, researchers have used these structures for inspiration in the design of siRNA modifications that might improve performance. Several examples where a structural analysis suggested specific modifications could have a beneficial effect are described below. RNAs bearing these modifications were then synthesized and tested. (a) Modifications to influence binding of the guide strand 50 end into the Ago2 MID domain. Structures of the Ago2 MID domain bound to nucleoside 50 -monophosphates and of full length Ago2 bound to guide strand illustrated the ability of this domain to interact with a 50 phosphorylated nucleoside (Fig. 2d). Indeed, the presence of the 50 -phosphate on the guide strand of the small RNA duplex is a key feature for proper strand selection and loading into Ago2 by the RLC (Varley et al. 2020; Varley and Desaulniers 2021). Loading the passenger strand can otherwise lead to off-target effects. Furthermore, when designing synthetic siRNAs, it is important to consider the presence of the 50 -phosphate when introducing into cells because the 50 -phosphate can be dephosphorylated by phosphatases. Or, if the 50 -end is hydroxyl, it needs to be recognized by the kinase Clp1 to be phosphorylated prior to loading by RLC. The works presented below illustrate how 50 -phosphate analogs present on the component strands of an siRNA can alter interactions with the 50 -end nucleotide binding pocket in the MID domain. Investigators from Ionis Pharmaceuticals reported the identification and implementation of 50 -phosphate analogs onto ss-siRNAs that were rationally designed using the crystal structure of 50 -phosphate binding pocket of Ago2 bound with siRNA guide strand (Prakash et al. 2015). Their goal was to see if altering the stereoelectronics at the 50 -phosphate group could elicit and/or improve RNAi activity. They designed 50 -phosphate analogs using two approaches: making derivatives at the 50 -carbon position of the 50 -terminal nucleotide, or making metabolically stable methylenephosphonate analogs where the oxygen bound to phosphorus on the phosphate is replaced with carbon (Fig. 3a). Their design was based on their lead PTEN ss-siRNA that was previously characterized by Lima et al. (2012). For the first approach, they designed 50 -phosphate analogs by adding substituents at the 50 -end nucleotide 50 -carbon position. They used the crystal structure of hAgo2 bound to guide RNA by Schirle and MacRae (2012) to reason that, at the 50 -phosphate binding pocket in the MID domain, a methyl group as either R- or S-isomers at the 50 -position should be well tolerated. They synthesized the corresponding (R)- or (S)-50 -methylated, 20 -methoxyethyl (MOE) thymidine phosphoramidites and incorporated them onto the 50 -end of chemically modified

1178

K. M. Pham and P. A. Beal

Fig. 3 Chemical modification strategies at 50 -end of siRNA to target Ago2 MID domain. (a) 50 -phosphate analogs by Prakash et al. Left: 50 -carbon derivatives on 20 -MOE thymidine. Right: 50 -methylenephosphonate analogs. (b) 50 -methylenephosphonate analogs 50 -CH2-P-I and 50 -CH2P-II. (c) 50 -deoxy-50 -morpholino-20 -O-methylmethyl uridine (Mo) modification by Kumar et al. (d) 50 -phosphate analogs by Shiohama et al. Left: 50 -O-methylthymidine (X). Right: 50 -amino-20 ,50 -dideoxythymidine (Z)

PTEN ss-siRNA (containing phosphorothioate (PS) linkages, 20 -MOE, and 20 -OMe modifications) and transfected them into HeLa cells to determine PTEN mRNA reduction by qRT-PCR. The (R)-50 -methyl ss-siRNA was fivefold more potent than the (S)-50 -methyl ss-siRNA and threefold more potent than parent ss-siRNA. Structural analysis of the modeled-in R- and S- 50 -methylated 50 -end nucleotide and hAgo2 50 -phosphate binding pocket revealed that the R-50 -methyl isomer can adopt a conformation much like the natural 50 -phosphate moiety and would avoid unfavorable contacts with the 50 -methylene and oxygen atoms on the phosphodiester linkage from the neighboring nucleotide seen with the S-50 -methyl isomer. Further structure-activity relationship studies were carried out at the 50 -position by designing other 50 -alkyl modifications such as sterically bulky 50 -methoxymethyl, hydrophobic 50 -fluoromethyl, 50 -aminomethyl bearing a positive charge, and 50 -carboxylate bearing a negative charge. The 50 -methoxymethyl and fluoromethyl ss-siRNAs exhibited similar potencies as 50 -(R)-methyl ss-siRNA when transfected into HeLa cells, whereas 50 -aminomethyl and 50 -carboxymethyl ss-siRNA were less potent than unmodified or 50 -(R)-methyl ss-siRNA. The data confirm that there can be significant differences in potency when only changing the structure at the 50 -position. For the second modification approach, 50 -methylenephosphonate (50 -CH2-P) analogs were designed for the ss-siRNA. First, 50 -MP-modified 50 -end nucleotides on ss-siRNAs with either 20 -OMe (50 -CH2-P-I) or 20 -MOE (50 -CH2P-II) were transfected in HeLa cells and were found to be seven- to tenfold less potent than when the

36

Structure-Guided Optimization of siRNA and Anti-miRNA Properties

1179

50 -end of ss-siRNA is phosphate. Although less potent than 50 -phosphorylated ss-siRNA, 50 -CH2-P-I and 50 -CH2P-II (Fig. 3b) had similar potencies likely because of the tolerance of 20 -hydroxyl modification that does not interact with hAgo2. On the other hand, the differences in conformation and stereoelectronics in the methyl phosphonate moiety may be disrupting the hydrogen bonding interactions by residues Y529, K533, N545, and K566 that normally interact with the 50 -phosphate on guide RNA, therefore resulting in lowered activity (Schirle and MacRae 2012). These investigators decided to alter the electronics, spatial disposition, charge density, and steric crowding of 50 -CH2-P by incorporation of the following modifications on ss-siRNA: α,α-difluoromethylenephosphonate (50 -CF2-P) and α-fluoromethylenephosphonate (50 -CHFP) to serve as isosteric and isopolar analogs of phosphate esters; 50 -O-methylenephosphonate (50 -O-CH2P) where an oxygen is inserted between the methylenephosphonate group to see if it would perturb positioning of the methylenephosphonate group in the 50 -binding pocket of Ago2; and a bisphosphonate (50 -CHP2) modification to see how increasing the charge density could affect binding interactions to Ago2. They found that mono-fluorinated 50 -CHF-P ss-siRNA was slightly more potent than non-fluorinated counterpart 50 -CH2-P-II, whereas difluorinated 50 -CF2-P was significantly less potent. They hypothesize that one fluorine modification at the 50 -position can shift the electronics closer to a phosphate, but an extra fluorine added to make an isopolar CF2 group is detrimental. The 50 -O-CH2P ss-siRNA was significantly less potent than parent ss-siRNA, confirming their hypothesis that inserting the oxygen between the methylenephosphonate shifts the positioning of the phosphate group in the 50 -binding pocket of hAgo2 and perturbs activity. Next, they hypothesized that increasing charge density on the methylenephoshponate group by adding a second phosphate could increase ss-siRNA-binding to hAgo2, based on structural analysis of hAgo2 bound to siRNA guide strand that reveals electrostatic and hydrogen bonding interactions with K533 and K566 amino acid residues. However, the bisphosphate 50 -CHP2 ss-siRNA was also significantly less potent than parent ss-siRNA and 50 -CH2P ss-siRNA with one phosphate group, suggesting that a combination of charge density increase, steric crowding, and positioning affects activity of ss-siRNA. They also explored how the previously reported 50 -(E)vinylphosphonate [(E)-50 -VP] as well as 50 -(Z )-vinylphosphonate [(Z)-50 -VP] analogs could rigidify the conformation of the 50 -methylenephosphonate moiety and improve interaction with hAgo2 by better mimicking 50 -phosphate. The trans (E)-50 -VP analog was found to adopt a conformation similar to 50 -phosphate in the hAgo2 crystal structure compared to cis (Z )-50 -VP. As a matter of fact, (E)50 -VP-modified ss-siRNA was much more potent than the (Z )-50 -VP isomer when compared to parent ss-siRNA. Similar potencies were also found when 50 -vinylphosphonate was fluorinated on either isomer to check for isopolar dependency effects. In conclusion, Prakash et al. established the complexity of designing stereoelectronic-altering analogs at the 50 -phosphate group on the 50 -end of ss-siRNAs to promote gene silencing activity.

1180

K. M. Pham and P. A. Beal

Kumar and colleagues designed modifications in place of the 50 -phosphate at the 5 -end of either strand of siRNA to disrupt interactions with the 50 -phosphate binding pocket of the hAgo2 MID domain and thus study the impact on the strand choice in the loading of Ago2 and RNAi activity (Kumar et al. 2019). One of the modifications was 50 -deoxy-50 -morpholino-20 -O-methyl uridine (Mo) (Fig. 3c) that was modeled into the template MID domain in complex with UMP (PDB ID: 3LUJ) using UCSF Chimera (Frank et al. 2010). After refinement, they found that the morpholino modification fits well into the MID domain, indicated by slight arrangements of the adjacent amino acid residues. Importantly, all interactions that were originally between the 50 -phosphate group of UMP and the binding pocket were lost except for one hydrogen bond between Lys570 and the oxygen on the morpholino ring. They proposed that this hydrogen bonding interaction should be negligible as the morpholino group is positively charged; thus the oxygen would not serve as a strong acceptor. Overall, the morpholino group would possess repulsive interactions with Lys570 and Lys533. Indeed, the modeling analysis supports their in vivo findings where 50 -morpholino-modified siRNA-GalNAc conjugates on the passenger strand had higher ApoB gene silencing activity in mouse liver than parent siRNA over the course of as long as 15 days, suggesting that selection of the guide strand into Ago2 was preferred. Furthermore, no Ago2 loading of 50 -morpholino-modified guide strand was observed from Ago2 immunoprecipitation of transfected mice liver lysates. Shiohama et al. prepared siRNAs bearing 50 -O-methylthymidine (X) or 50 -amino20 ,50 -dideoxythymidine (Z) (Fig. 3d) at the 50 -end of the passenger and/or guide strands to see how strand selection into hAgo2 and the resulting RNAi activity could be altered (Shiohama et al. 2022). They hypothesized that modifying the 50 -end of the guide strand with X or Z would destabilize the complex with Ago2 and decrease RNAi efficiency. The cationic charge from the protonated amine of Z would, in particular, destabilize complex formation as the charge would repel cationic residues within the Ago2 MID binding pocket. Alternatively, modifying the passenger strand with the substituents could reduce unwanted loading of the passenger strand into hAgo2, therefore lowering off-targeting effects arising from passenger strand knockdown. To test their hypothesis, they designed siRNAs that targeted EGFP mRNA in HeLa cells and evaluated RNAi activity by RT-qPCR. The siRNAs were made with combinations of modifications at the 50 -end of either the passenger strand, guide strand, or both. Most notably, at 1, 10, and 100 nM, siRNAs with modified X or Z on the 50 -end of the guide strand and placement of canonical U or T at the 50 -end of the passenger strand (sU/asX, sU/asZ, sT/asX, or sT/asZ) did not exert any RNAi activity, suggesting that the 50 -end modified guide strand was destabilizing and RLC would prefer to load the unmodified passenger strand into hAgo2. Interestingly, when both strands of siRNA were modified with X, Z, or a combination of both, gene silencing was moderately reduced, indicating that probability plays a role in which strand would ultimately be selected for loading into hAgo2. Finally, as expected, modifying only the 50 -end of the passenger strand with X or Z resulted in the greatest degree of RNAi activity compared to the other designs. Overall, modulating the 0

36

Structure-Guided Optimization of siRNA and Anti-miRNA Properties

1181

Ago2-loading mechanism by modifying the 50 -end of the passenger strand with a substituent that destabilizes interactions with the 50 -phosphate-binding MID domain was proven to be an additional strategy in eliminating off-targeting effects by siRNA. (b) Modifications to influence binding of the guide strand 30 end into the Ago2 PAZ domain. PAZ domains, along with PIWI domains, are characteristic of the Argonaute family and bind RNA 30 ends. It has been suggested that siRNA-like recognition and target strand cleavage, involving duplex formation beyond the seed region, requires release of the 30 end from the PAZ domain and re-binding during catalytic turnover (Deerberg et al. 2013). Therefore, modulating binding affinity to the Ago2 PAZ domain by guide strand 30 end modification could significantly impact RNAi activity. Xu and colleagues at the Institute of Medicinal Biotechnology in Beijing, China, designed aromatic modifications on the 30 -overhang of siRNA with the intent of enhancing interactions with the PAZ domain and affecting gene silencing potency (Xu et al. 2015). They used molecular modeling suite Sybyl to analyze the structure of the Ago1 PAZ domain with bound 9-mer siRNA-like duplex (PDB 1SI2) (Ma et al. 2004) and determine which interactions with Ago2 PAZ could be enhanced from the two 30 -overhang nucleotides in siRNA by chemical modification. In their analysis, they found that hydrophilic residues His269, Tyr312, and Tyr309 can interact with the phosphate backbone at the 30 -terminus. The terminal nucleotide is buried in the pocket and interacts with hydrophobic residues Leu294, Phe292, Leu337, Pro293, and Phe310. The penultimate nucleotide, however, sits at the edge of the pocket and is surrounded by mostly neutral residues. Using this information, they decided to pursue a modification strategy that can take advantage of the hydrophilic and hydrophobic interactions in the 30 -overhang binding pocket of PAZ. The pentose ring was replaced with a straight chain that retains hydrogen-bonding interactions with Tyr312 and Tyr309, whereas the thymine base was replaced with a benzene ring to enhance interactions with Phe292, Leu337, and Phe310. This straight chain and benzene modification (Fig. 4) was applied to either the 30 -terminal nucleotide (XdT), the penultimate nucleotide (dTX), or both nucleotides (XX) and compared in binding and activity to unmodified siRNA with two 20 -deoxythymidine nucleotides at the 30 -overhang (dTdT).

Fig. 4 Modification at 30 -terminal dinucleotide overhang of siRNA by Xu et al.

1182

K. M. Pham and P. A. Beal

They next superimposed the dTdT, XdT, dTX, and XX siRNA strands with the RNA-PAZ structure, which revealed only slight conformational changes when compared to the original RNA duplex in the crystal structure. At the site of modification, however, the conformational changes vary depending on the position modified. The XdT strand resulted in a large structural distortion, whereas dTX and XX had a consistent conformation with dTdT. Embrace Minimization in MacroModel was used to calculate the binding energies with the 30 -overhang modified siRNA as the ligand and the PAZ domain (PDB 1SI2) as the receptor. The mean total binding energies (a combination of van der Waals, electrostatic, and solvation) for the modifications, from highest to lowest, were ranked as follows: dTdT > XdT > dTX ≈ XX. Thus, this suggested that the dTX or XX modifications would have higher binding affinity to Ago2 than dTdT and XdT. Indeed, their surface plasmon-resonance (SPR) analyses of modified siRNA bound to purified Ago2 protein showed that there was a correlation of the calculated binding energies to the determined response unit (RU) values. They next determined the silencing efficiency of the modified siRNAs by a luciferase reporter assay. Using Hep G2 cells that stably express luciferase, they found that all siRNAs containing the chemical modifications exhibited higher potency than the dTdT, unmodified siRNA. Interestingly, the dTX modification at the 30 -overhang of either the guide or passenger strands of the siRNA resulted in the greatest amount of luciferase inhibition, regardless of the modification pattern on the 30 -overhang nucleotides of the respective complementary strand. Furthermore, the XdT modification on the 30 -overhang of the passenger strand in combination with dTX on the guide strand resulted in the highest potency and preference of binding of the guide strand over the passenger strand to Ago2. It remains unclear why the XX modification at the 30 -end did not result in increased potency compared to unmodified, even though the in silico and in vitro binding assays of XX-modified siRNA to Ago2 were determined to be the most favorable compared to the other modifications. They hypothesize that the aromatic and hydrophobic structure at the first nucleotide could negatively affect the conformation of the PAZ domain, thus lower binding and potency of siRNA. This hypothesis also supports the binding preference of the dTX modification over XdT. Lastly, they found that the XdT/dTX- (passenger/guide-) modified siRNA was the most effective at minimizing expression of Mdm2, a p53 tumor suppressor, in MCF-7 breast cancer cells, as well as MOV10, a lentivirus replication inhibitor, in HEK293 cells, compared to unmodified dTdT/dTdT and dTX/XdT-modified siRNA. MCF-7 growth was inhibited, and HEK293 cytotoxicity was minimized to the greatest extent upon transfection of the XdT/dTX-modified siRNA as well.

siRNA and Anti-miRNA Modifications from Computational Screening The PAZ domain, MID domain, and t1A binding site of Ago2 each contain a binding pocket that can be occupied by distinct nucleotides of Ago-RNA complexes. The similarity between these nucleotide binding sites on Ago2 and small molecule

36

Structure-Guided Optimization of siRNA and Anti-miRNA Properties

1183

binding sites in other proteins suggested that computational techniques used to optimize small molecule ligands might be redeployed for the purpose of optimizing nucleotide modifications for Ago2-binding RNAs (Onizuka et al. 2013; Greenidge et al. 2019; Shinohara et al. 2021). In this section, examples where automated docking protocols were used to carry out in silico screening of possible siRNA modifications prior to synthesis and testing are described. (a) Targeting the MID domain using computational screening. Onizuka and colleagues reported a computational screening approach to rationally design nucleotide modifications at the 50 -end of the siRNA guide strand that will target the 50 -binding pocket of MID domain in hAgo2 (Onizuka et al. 2013). Using copper(I)catalyzed azide/alkyne cycloadditions (CuAAC) reactions, a variety of substituents onto 1,2,3-triazol-4-yl bases were screened and tested at the 50 -nucleotide that could be formed with azides and 1-ethynylribose (1-ER). The OpenEye suite of molecular docking programs was used to dock a library of synthetically accessible molecules into a rigid receptor generated from the hAgo2-RNA crystal structure complex from Schirle and MacRae (2012). The resulting docking scores based on non-covalent interactions such as hydrogen bonding, sterics, π-stacking, etc. were used to measure quality of docked ligand fit into the binding pocket, in addition to analyses of overlaid docked poses with the 50 -terminal adenosine from the hAgo2-RNA structure. To start, purine analogs generated in previous work were docked, including 7-ethynyl-8-aza-7-deazaadenosine (7-EAA), 57 triazoles derived from 7-EAA including a triazole bearing an N-ethyl-piperidine functional group (7-EAA triazole), and N-ethyl-piperidine derived triazole from 2-propargylaminopurine (2-AP-triazole). The 7-EAA analog did not score well relative to adenosine; the lowest-energy pose had the ethyne group clashing with the aromatic ring from Tyr815 in the neighboring PIWI domain. Additionally, the 57 triazoles generally did not score well and were not able to reasonably fit into the binding site receptor. The 2-AP-triazole ligand also did not dock well into the receptor. Overall, the computationally screened analogs suggested that these purine analogs would likely hinder gene silencing activity if modified at the 50 -nucleotide of guide strand. Indeed, 7-EAA, 7-EAA triazole, and 2-AP triazole exhibited much lower potency than adenosine in Renilla/Firefly dual luciferase assays when modified at the 50 -end of PIK3CB siRNA guide strand. Then, 1,2,3-triazol-4-yl bases prepared from 1-ethynylribose (1-ER) were generated to see if reduction in base size could restore, or even improve, RNAi activity. It was hypothesized that because the 50 -end nucleotide on the siRNA guide strand does not participate in Watson-Crick face hydrogen-bonding with the target RNA strand, replacing the adenosine base entirely and still maintaining stacking interactions with the Tyr529 residue from Ago2 would not be too detrimental. Thus, 58 different 1,2,3-triazol-4-yl bases in addition to the 1-ER analog were docked. Interestingly, it was discovered that these ligands scored well and significantly better than adenosine analogs described above. After synthesizing the 1-ER phosphoramidite, CuAAC reactions were performed to generate two triazole products from azides. 1-ER triazole I (Fig. 5a), which

1184

K. M. Pham and P. A. Beal

Fig. 5 Computationally screened siRNA 50 -end modifications. (a) Structure of 1-ethynylribose (1-ER) triazole I modification. (b) Crystal structure of 1-ER triazole I substituent overlaying 30 -end nucleotide uridine (g1) in MID domain 50 -end nucleotide-binding pocket (Suter et al. 2016). (c) Structure of 6-(3-(2-carboxyethyl)phenyl)purine (6-mCEPh-purine) (Shiohama et al. 2022). (d) Structure of 8-bromo-adenosine-monophosphate (8-Br AMP)

contains a phenyl-imidazole substituent on the triazole, and 1-ER triazole II containing an N-ethyl-piperidine substituent both scored highly relative to adenosine. When transfected into HeLa cells, the 1-ER modification at guide strand position 1 was well-tolerated and gave potencies similar to adenosine. However, 1-ER triazole I and II both gave higher potencies than adenosine at 0.1 nM. These experimental results agree with the computational screening analysis and are distinctly opposite of what was determined from the purine analogs. In conclusion, the 1-ER-derived triazole modifications can adequately fit into a cleft that is near the 50 -binding pocket of hAgo2 and between MID domain residues K525 and P526 and PIWI domain residues Y815 and L817 (Fig. 5b). To further characterize the Ago2 binding of these novel 1-ER-derived triazoles, Suter and colleagues collaborated with Ian MacRae’s group at The Scripps Research Institute and solved the crystal structure of hAgo2 with bound siRNA guide strand

36

Structure-Guided Optimization of siRNA and Anti-miRNA Properties

1185

bearing a 1-ER-derived triazole modification at the g1 (50 -end) nucleotide (Fig. 5b) (Suter et al. 2016). A 22 nt guide RNA with an equivalent sequence to human miR-122 was modified with the 1-ER triazole I modification at g1 and loaded into recombinant hAgo2 then purified with a complementary 20 -O-methyl capture oligonucleotide and crystallized based on previously reported methods by Schirle et al. (2014). The protein atoms from the original hAgo2 structure (PDB 4OLA) was used as an initial model, and then the guide RNA containing g1 1-ER triazole I modification was added during iterative rounds of model building and refinement. The 2.3 Å resolution crystal structure showed the g1 1-ER triazole modification can interact with the MID domain binding pocket and extend towards the hAgo2 central cleft, where the guide RNA strand pairs with a complementary target RNA. Further examination of this structure revealed the 50 -phosphate and ribose of the g1 triazolyl nucleotide occupy the same position as the g1 unmodified nucleotide in the original hAgo2 complex (Fig. 5b). The triazole ring also stacks against the phenyl ring of Tyr529 as equivalent to the unmodified g1 nucleobase. The triazole ring does not, however, make direct contacts to the nucleotide selectivity loop, corresponding to amino acids 523–527 in the hAgo2 sequence (Frank et al. 2010). The imidazole and phenyl rings from the analog are stacked between Tyr815 and the aliphatic region of Lys525 and extend out towards the hAgo2 central cleft in the direction of the paired guide-target RNA duplex. As a result, a steric clash between the imidazole and phenyl groups and the sugar-phosphate backbone of the target RNA between t7 and t8 is revealed when the crystal structure is superimposed with the hAgo2-miRNA recognition complex, where there is complementarity between only the seed-region of the guide strand with the target RNA strand. This analysis suggested that the modification may be able to disrupt pairing of seed-only, miRNA-like duplexes in hAgo2 but not fully complementary duplex pairing as the central cleft must open substantially to allow a fully complementary target RNA to bind to a guide RNA (Schirle et al. 2014). In other words, the 1-ER triazole I modification may be better accommodated in an siRNA-like recognition complex than an miRNA-like complex. Indeed, equilibrium binding assays confirmed that a target RNA with full complementarity (nucleotides 2–21) to the guide RNA with the 1-ER triazole I modification had a higher binding affinity than unmodified guide RNA strand, whereas a miRNA-like target RNA with complementarity to nucleotides 2–8 had a 2.5-fold lower binding affinity with modified guide RNA compared to the unmodified guide. Additionally, in cellular-based RNAi experiments, not only did siRNA targeting human PIK3CB mRNA with a 1-ER triazole I modification at g1 result in a twofold more potent IC50 than unmodified siRNA, but also increased IC50 for YY1 and FADD “off-target” reporter sequences that have seed complementarity to PIK3CB mRNA, demonstrating that the modification reduced miRNA-like off-targeting and increased on-target knockdown activity. Lastly, replacing the phenyl group of the triazole modification with a methyl greatly reduced off-targeting potency and only modestly improved on-target potency, indicating that the phenyl ring interactions with the aliphatic portion of Lys525 residue are important for siRNA efficiency. In conclusion, this analog was capable of modulating siRNA target selectivity by projecting substituent groups towards the hAgo2 central cleft.

1186

K. M. Pham and P. A. Beal

Modifying guide position 12, a position where electron density was lacking in the hAgo2-RNA crystal structures at the time and lies in the middle of the siRNA strand that pairs with the passenger strand, with 1-ER and 1-ER triazole I both resulted in a drastic reduction in potency when compared to adenosine. Intriguingly, 1-ER triazole II had similar potency to adenosine, which could be explained by a stabilizing effect on RNA-RNA interactions from the positive charge on piperidine. Yet, it was found that, through thermal melting temperature (Tm) studies, the 1-ER triazole II modification actually destabilized the siRNA duplex. Regardless, guide strand position-dependent effects by 1-ER modifications were observed. Then, modifications at the passenger strand position 19 were generated to see if the potency differences from guide 50 -end modified nucleotides were simply because of increased duplex unwinding and therefore enhanced loading of guide strand to hAgo2. As expected, modifying the passenger 19 position did not result in an apparent change in activity compared to unmodified PIK3CB siRNA, indicating the likelihood that the improved potency by 1-ER-triazole modifications at the 50 -end is not due to duplex destabilization. Finally, serum stability assays indicated that the 1-ER-modified guide strands are not more nuclease resistant than unmodified siRNA control, therefore further suggesting that the improved RNAi activity is based on enhanced interactions to the 50 -binding pocket of hAgo2. This example demonstrated the benefits of chemically modifying the 50 -end of siRNA guide strands based on computational screening using hAgo2-RNA structures as the starting point. Similar to the work by Onizuka et al., Shinohara et al. described an in silico modeling analysis of the hAgo2 MID domain bound with nucleoside monophosphate to then screen and test 6-(3-(2-carboxyethyl)phenyl)purine (6-mCEPhpurine) (Fig. 5c) that can significantly improve mRNA target knockdown both in vitro and in vivo (Shinohara et al. 2021). They started by taking the crystal structure of AMP-bound MID domain (PDB 3LUD) (Frank et al. 2010) and performing a computational analysis using Molecular Opening Environment to identify purine analogs that could bind to MID domain with higher affinity than natural nucleotides. In the structure, they discovered that spaces surrounding N6 and C8 of adenine could be potential sites for occupation via chemical modifications applied onto the adenine nucleobase. An 8-bromo-substituted AMP (8-Br AMP) (Fig. 5d) was selected as a starting point as halogenated substituents at the C8 position were found to be favorable. The crystal structure of 8-Br AMP bound to MID domain was then solved to determine exact binding modes 8-Br AMP to MID. Using the AMP-MID complex structure as a reference, they found that 8-Br AMP and AMP had nearly identical binding modes with MID. Both nucleobases from AMP and 8-Br AMP gave stacking interactions with Tyr529, an important residue within the MID 50 -binding pocket for recognition of a 50 -nucleotide. One important difference, however, was the bromine from 8-Br AMP was able to contribute to an additional hydrophobic interaction with Tyr529. Next, they added a carboxy-ethyl-phenyl group onto the N6 position of adenine that could occupy the space above N6 from the natural adenine base and provide

36

Structure-Guided Optimization of siRNA and Anti-miRNA Properties

1187

both hydrophobic and hydrogen bonding interactions around the phenyl and carboxylic acid moieties, respectively. Thus, the 6-mCEPh-purine analog was developed, and its crystal structure when bound to MID domain was solved for binding mode analysis. Surprisingly, although interactions between the purine and ribose regions and MID were consistent, the carboxyl group from the modification formed an interaction with the carbonyl backbone of Thr527 and a water molecule instead of an expected ionic interaction with Lys525. Binding affinities of 8-Br AMP and 6-mCEP-purine with the MID domain were then determined by 1H-15N HSQC NMR titration experiments. Indeed, the 8-Br AMP and 6-mCEPh-purine analogs were found to have KD values ~four- and ninefold lower than AMP control, thus indicating that the analogs have increased interactions with MID domain. The 6-mCEPh-purine analog had a higher binding affinity and was therefore used for subsequent activity experiments. A luciferase-targeting siRNA guide strand modified with 6-mCEPh-purine at the 50 -end was found to be 2.4- (IC50 ¼ ~6 pM) and 2.9-fold (IC50 ¼ ~15 pM) more potent than two different luciferase-targeting siRNA sequences with 50 -end adenosine (IC50 ¼ 14.7 pM and 42.6 pM, respectively) when transfected into luciferaseexpressing HeLa cells. Additionally, Ago2 immunoprecipitation from 50 -end 6-mCEPh-purine-modified siRNA-treated HeLa cell lysates revealed a 1.9- (5 pM) and 2.4-fold (50 pM) higher amount of modified guide strand bound to Ago2 compared to unmodified control. Based on time course experiments, 50 -end 6-mCEPh-purine-modified siRNAs were also taken up by RISC at a faster rate than 50 -end adenosine siRNA control over the course of 3 days after 50 pM siRNA treatment. AboB siRNAs with the 50 -end as 6-mCEPh-purine or 6-(phosphonooxy-butylsulfide)purine (6-PBuS-purine) analog used as a negatively charged side chain control were also found to be more potent, four- to sixfold higher, than 50 A siRNA in cell culture, and could reduce ApoB mRNA by ~80% compared to 50% with unmodified siRNA at 0.1 mg/kg dose in vivo. The 50 -end modified siRNAs also persisted longer and reduced plasma cholesterol levels in mice 7 days after treatment. As follow-up to the investigation by Shinohara et al., Brechin et al. described the results from a series of cell-free biochemical assays to better identify, on a molecular basis, how the 6-mCEPh-purine modification at the 50 -end of the siRNA guide strand improves RNAi activity (Brechin et al. 2021). They found that 6-mCEPh-purine can affect the formation of mature RISC by either fixing the loading orientation of the siRNA duplex or improving the stability of mature RISC once the passenger strand has been ejected. Interestingly, Brechin et al. also found that the 6-mCEPh-purine modification did not seem to improve the efficiency of pre-Ago2-RISC formation, which could be explained by a masking effect on the anchoring by the 50 -nucleobase when the siRNA is in a double-stranded state, or structural constraints that prevent the 50 -nucleobase from anchoring efficiently into the 50 -nucleotide binding pocket prior to passenger strand ejection. They conclude by suggesting the potential enhancement of siRNA activity by combining the 6-mCEPh-purine modification with 50 -E-VP at the 50 -end of siRNA guide strands.

1188

K. M. Pham and P. A. Beal

(b) Targeting the PAZ domain using computational screening. In 2013, Kandeel and Kitade reported a computational analysis of modified siRNA with the PAZ domain of hAgo2 to determine whether stronger or weaker binding of the 30 -end of siRNA corresponds to a strengthening or weakening effect of RNAi activity (Kandeel and Kitade 2013). When comparing their computational results from previous experimentally determined in vivo siRNA efficacy data, they overall found that higher RNAi efficacy was correlated with lower inhibition constant Ki (derived from the free energy of modified nucleotide binding to PAZ), lower total intermolecular energy, lower total free energy, higher hydrogen bonding interactions, fewer van der Waals interactions, and a small total surface interaction energy. They docked 26 modified nucleotides or nucleotide analogs at where the siRNA 30 -overhang would be in the binding site of PAZ domain using iGEMDOCK or a docking server (http://www.dockingserver.com/), most of which contained a benzyl, pyridinyl, or other aromatic derivative linking phosphate groups together. The iGEMDOCK docking scores were based on total energy, van der Waals interactions, hydrogen bonding, and electrostatic interactions and were compared alongside their previously determined dual-luciferase reporter assay potencies at 1 nM siRNA. In general, the docked compounds that had the lowest interaction energy were dimers or trimers. Monomeric compounds, on the contrary, had the highest interaction total energy with PAZ. Of note, the larger, more hydrophobic compounds did not yield inhibition constants when docked on the docking server. Instead of ranking the docked compounds by energy or affinity scores, the iGEMDOCK data were organized in clusters and were scored using an evolutionary method to make dendograms. The dendograms were generated based on interaction profiles with PAZ and the atomic makeup of the compounds docked. Each type of non-covalent interaction (electrostatic, hydrogen bonding, and van der Waals) from each compound was assigned to the main chain or side chain of each amino acid in PAZ. Through this method, they found that residues F669, Y681, I706, and L707 were important contributors of van der Waals interactions, while Y641, R562, Y681, and K704 were important for hydrogen bonding. Plotting the docking data against Renilla/Firefly luciferase activity ratio (RL/FF) revealed a handful of correlations based on interaction type. The higher (less negative) the total free and intermolecular energy values are, the lower the RL/FF ratio is (higher RNAi activity), thus suggesting that there is a negative correlation between RNAi efficacy and free energy. As total and free energy indicate binding affinity strength, this correlation therefore also suggests that lower binding affinity to PAZ is associated with better RNAi activity. Interestingly, they found that compounds with higher Ki values were associated with higher RL/FL ratios, thus lower RNAi efficacy. In other words, the nucleotide analogs that bound to PAZ too weakly resulted in lower siRNA activity, yet if the compounds bound to PAZ too tightly (lower Ki), the RNAi activity would also be reduced. Electrostatic interaction values plotted against RL/FF ratios showed only a moderate correlation, which agrees with their docking data showing that electrostatic interactions do not contribute largely to compound binding to PAZ. As a result, they hypothesize that modifying the phosphate groups of nucleotides, which would contribute the most to electrostatic

36

Structure-Guided Optimization of siRNA and Anti-miRNA Properties

1189

interactions, would not result in large changes in binding to PAZ. Next, as the interaction surface area of a compound increases, the lower the RNAi efficacy becomes, based on a moderately high correlation. Smaller compounds are therefore more desirable for modification and use for binding into PAZ domain. Last, a high correlation was found between van der Waals and hydrogen bonding interactions (free energy of ligand binding to receptor) and RNAi efficacy. Lower (more negative) van der Waals and hydrogen bonding interaction energies were associated with higher RNAi efficacy. Overall, their computational analyses suggest that, because the PAZ domain contains an unfavorable steric environment, designing smaller, more hydrophobic compounds for the siRNA 30 -end that can only perform weak interactions with the PAZ domain is conducive to higher RNAi efficacy. Lower free energy of interaction, lower intermolecular energy, higher values of hydrogen bonding, and lower Ki values are key parameters to improve RNAi activity when modifying the 30 -end of siRNA nucleotides targeting the PAZ domain. Greenidge et al. described an in silico, structure-based and fragment-based screening (FBS) approach that combines virtual screening and NMR-based techniques to identify small molecules that could replace the 30 -dinucleotide overhang of siRNA guide strands (Greenidge et al. 2019). Overall, this approach led to the discovery of a number of ligands that can bind to PAZ with similar affinities to a reference dinucleotide uridylyl-(30 -,50 )-uridine, UpU. Such ligands could be conjugated to the 30 -end of siRNA guide strands for improved activity and characteristics described above. First, the PAZ domain from the human Ago2 crystal structure (PDB ID 4F3T) (Elkayam et al. 2012) was used to examine the ligand-binding interactions of a microRNA bound to Ago2, as displayed by Maestro 9.0. These authors make note of key residues His336, Tyr338, Phe294, Tyr311, His271, His316, and Arg315 in PAZ that could engage in non-covalent interactions (Fig. 6) with the 30 -dinucleotide of guide strand siRNA. The 20 -hydroxyl group on ribose of the 30 terminal nucleotide donates a hydrogen bond to the backbone carbonyl oxygen of His336, whereas the its 30 -hydroxyl group can make hydrogen bonding contacts with the carbonyl oxygen of Tyr338 (Fig. 6a). The terminal nucleotide base makes van der Waals interactions with Phe294 and can π-stack with Tyr311 (Fig. 6b). The negatively charged oxygen atoms in the bridging phosphate groups between the dinucleotide make a network of hydrogen bonding interactions with His271, His316, and Tyr311 (Fig. 6c). These interactions were used as a reference to screen for nucleotide mimetics that can improve ligand-binding interactions with the PAZ dinucleotide binding pocket. Next, a library of structurally simple diols joined by phosphodiesters was used to virtually screen small molecules binding into PAZ. As a result, a biphenylcontaining overhang ligand was identified, and a subset from the biphenylcontaining ligands from the Novartis Compound Library was furthermore used for docking analysis prior to NMR assays. Docking suites Glide and GOLD were used to dock 13,000 biphenyl compounds containing derivatives to the PAZ domain of hAgo2. After a first screening of ten compounds from Glide and nine compounds

1190

K. M. Pham and P. A. Beal

Fig. 6 Non-covalent interactions of Ago2 PAZ domain (PDB 4F3T) with terminal 30 -end of guide miR-20a. (a) Hydrogen bonding interactions between backbone carbonyl of His336 and Tyr338 to the 20 - and 30 -hydroxyl of 30 -end guanosine (g1), respectively. (b) π-stacking and van der Waals interactions to g1 base by Tyr331 and Phe294, respectively. (c) Hydrogen bonding interactions from His271, His316, and Tyr311 to the phosphate backbone of g1

from GOLD, a total of six compounds were found to unambiguously bind Ago2 PAZ with affinity competitive with the reference RNA dinucleotide UpU. Representative compounds 1, 2, and 3 (Fig. 7) bound to PAZ with Kd values of 83, 165, and 245 μM, respectively. Of note, a force-field calculated protein-ligand interaction energy profile built from AmberTools10 (AMBER) for these docking analyses was used to discriminate between active and inactive ligands. AMBER uses a molecular mechanics/generalized Born surface area (MM/GBSA) method to generate force fields. A common ambiguity computational chemists face is a high scoring function for a given ligand, yet the corresponding pose towards a receptor is not plausible, making identification of a pharmacophore more difficult. Greenidge et al. employed molecular mechanics energy cutoffs to be 42 kcal/mol for MM/GBSA total energy and  27 and 170 enzymatic ribose and nucleobase modifications in all forms of RNA, with a subset of about 50 modifications in every organism. tRNA is the most frequently modified, with about 10% of its ribonucleotides chemically modified, most variably at the wobble position of the anticodon. While tRNA modifications have long been known to facilitate tRNA maturation, prevent degradation, and reduce frameshifting errors, recent discoveries have revealed that tRNA modifications and the number of copies of each tRNA operate as a system to regulate the translation of mRNAs possessing a J. Wu · P. C. Dedon (*) Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA Antimicrobial Resistance Interdisciplinary Research Group, Singapore-MIT Alliance for Research and Technology, Singapore, Singapore e-mail: [email protected] T. J. Begley The RNA Institute, University at Albany, State University of New York, Albany, NY, USA Department of Biological Sciences, University at Albany, State University of New York, Albany, NY, USA © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_42

1201

1202

J. Wu et al.

system of codon biases that define functional gene families. In this review, the first part focuses on this system of codon-biased translation and its importance in pathology and disease. Attention is then turned to the tools and techniques needed to understand the tRNA epitranscriptome and codon-biased translation, including quantification of tRNAs, tRNA modifications, translation efficiency, and codon usage analysis. Focused attention is given to emerging technologies for mass spectrometry-based tRNA modification mapping and analysis of ribosome-bound tRNAs, which expand the toolbox for quantitative understanding the chemical biology of the tRNA epitranscriptome. Keywords

Epitranscriptome · tRNA modification · Codon-biased translation · RNA sequencing · Mass spectrometry

Introduction The Central Dogma of Biology articulates the flow of genetic information from DNA to RNA to protein. The information content of a gene is transcribed into messenger RNA (mRNA) that is then translated into a protein, with amino acidbearing transfer RNAs (tRNAs) bridging the three-nucleotide code and the growing polypeptide chain. By itself, the genome is inert: built-in instructions for when to schedule transcription and translation or how many mRNAs and protein molecules to synthesize are absent. There is an emerging appreciation for the role of the epigenome and the epitranscriptome in scheduling gene expression in all forms of life. The “epi” prefix denotes the dozens of chemical modifications of DNA, RNA, and histone proteins, which do not change the genetic code of three-nucleotide codons specifying each amino acid. Instead, growing evidence indicates that the epigenome and the epitranscriptome regulate transcription and translation, respectively, by reprogramming in response to environmental cues, signals, or stresses. This review focuses on the central role of the tRNA epitranscriptome in regulating gene expression at the levels of translation and the tools required to quantify this system. The tRNA Epitranscriptome Even before the first sequence of a tRNA was described in 1961, tRNA modifications were chemically identified with the discovery of pseudouridine. Since then, more than 150 unique modified ribonucleotides (nt) have been identified in all forms of RNA as the epitranscriptome. Most of these modifications occur in tRNAs, with an average of 13 modifications scattered along the 70–90 nt cloverleaf-structured tRNA molecule (Fig. 1). These tRNA modifications regulate tRNA stability, facilitate tRNA maturation, stabilize against degradation, prevent frameshifting errors, and regulate translation speed and fidelity. In general, tRNA modifications in the D- and T-arms influence tRNA structure stability, while tRNA modifications in the anticodon arm affect codon matching, specifically

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1203

Fig. 1 tRNA modifications in relation to the secondary and tertiary structure of tRNA. Lower right: The cloverleaf structure of tRNA with post-transcriptional modificiations found in S. cerevisiae (Phizicky and Hopper 2010) and H. sapiens (Suzuki 2021) tRNA. Conventional abbreviations are used based on the Modomics database. The circle’s colors coincide with the analogous locations in the tertiary structure of S. cerevisiae tRNAphe shown on the upper left. tRNA structural coordinates were obtained from Protein Data Bank 1EHZ

in the wobble position (34) and the codon-adjacent position 37. For example, the well-studied 1-methyladenosine (m1A) at position 9 (m1A9) in human mitochondrial tRNALys disrupts a possible Watson-Crick interaction of A9–U64, allowing U64 to bind instead to A50 to maintain the L-shaped structure (Helm et al. 1999). A more recent study using NMR spectroscopy demonstrates that 4-thiouridine (s4U), dihydrouridine (D), thymidine (T), and pseudouridine (Ψ) modifications in tRNAfMet from Escherichia coli increases the stability of the overall tertiary structure (Biedenbänder et al. 2022). A wide variety of modifications are found at the wobble position (34) in prokaryotes and eukaryotes, such as 5-methylaminomethyl-2thiouridine (mnm5s2U), uridine 5-oxyacetic acid methyl ester (mcmo5U), and queuosine (Q). These wobble modifications improve the reading of the cognate codons in mRNA by enhancing noncanonical Watson-Crick base pairing, for example, mnm5s2U pairing with G, mcmo5U pairing with G/C/U, and Q pairing with U

1204

J. Wu et al.

(Grosjean and Westhof 2016). Modifications at position 37, for example, N6-isopentenyladenosine (i6A) and N6-threonylcarbamoyladenosine (t6A), stabilize codon–anticodon interactions in the ribosomal A-site, thus affecting translation speed and fidelity (Lorenz et al. 2017). The presence of these modifications in tRNA and the dynamics of their insertion in some cases pose significant challenges to current methods for quantitative mapping of the modifications within the tRNA sequences. For example, the reverse transcriptase step of next-generation RNA sequencing is often confounded by variable polymerase falloff or mis-insertions at i6A, methyl groups at the base-pairing face, and other modifications (Limbach and Paulines 2017). These errors can actually be used to indirectly map and quantify several modifications (Limbach and Paulines 2017), though this is less desirable for systems-level analyses of many modifications. The tRNA Pool In coordination with the tRNA epitranscriptome, the levels of individual tRNA molecules in the cytosolic and mitochondrial pools play important roles in the system of tRNA reprogramming and codon-biased translation. As specialized noncoding RNAs, tRNAs possess a cloverleaf secondary structure with four helical stems and a central four-way junction that renders the RNA relatively stable compared to mRNAs, rRNAs, and lncRNAs (Fig. 1). tRNAs charged with the same amino acid yet possessing different anticodons are termed “isoacceptors,” with subsets of isoacceptors possessing the same anticodon, but different body sequences are termed “isodecoders.” The human genome contains 429 high-confidence tRNA genes with up to 38 isoacceptors per amino acid and up to 29 isodecoders per anticodon. Of these 429, only about 250 are expressed. The sequence similarities among isodecoders and some isoacceptors pose challenges to complete and accurate sequencing, quantification, and localization of modifications with the sequences (i.e., mapping). As part of the dynamic system of codon-biased translation, the levels of individual tRNAs in the pool of 250 in humans are regulated by an interplay between transcriptional mechanisms, stabilization by tRNA modifications, degradation by nuclease-mediated pathways (e.g., TRAMP), and possibly sequestration in RNA-protein condensates (Dutagaci et al. 2021). The dynamic complexity of the tRNA pool necessitates analytical methods that provide not only precise relative quantification but also accurate absolute quantification, which is illustrated by recent observations of nearly a 100-fold range in the number of copies of the various tRNAs in the pool (Hu et al. 2021). tRNA Modification Reprogramming The mechanism linking the tRNA epitranscriptome and tRNA pool with translational regulation of gene expression involves two recently discovered, interdependent phenomena: dynamic reprogramming of tRNA modifications and a system of biased codon usage in families of stress response genes. As with other cell systems, tRNA biogenesis has been shown to be a dynamic process that is regulated by cell stresses and environmental changes (Huang and Hopper 2016). These tRNA modifications are dynamically regulated by cell environment has now been shown in bacteria, parasites, yeast, rodent, and human cells (Chan et al. 2012; Pang et al. 2014; Chan et al. 2015;

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1205

Deng et al. 2015; Endres et al. 2015; Chionh et al. 2016; Ng et al. 2018; Rothenberg et al. 2018). This was originally observed in 2010 in the response of S. cerevisiae to exposure to the alkylating agent, methyl methanesulfonate (MMS), with exposureinduced increases in TRM9-catalyzed mcm5U and mcm5s2U, while oxidative stress uniquely caused an increase in Trm4-mediated 5-methylcytidine (m5C) levels (Chan et al. 2012). Subsequent studies with a battery of alkylating and oxidizing agents revealed highly predictive exposure-induced reprogramming of the relative quantities of 23 tRNA modifications, with >85% sensitivity and specificity for predicting the toxicant (Chan et al. 2015). This stress-induced reprogramming of tRNA modifications results from a combination of increased activity or levels >100 modification writer and reader proteins (de Crécy-Lagard et al. 2019), dynamic regulation of the tRNA pool by mechanisms noted earlier (Pang et al. 2014; Chionh et al. 2016; Hu et al. 2021) and likely changes in cofactor levels. Modification-removing “erasers,” such as the m6A demethylase FTO for mRNA, remain somewhat elusive for tRNA with a notable example being the ALKBH1 demethylation of m1A in tRNA regulating the translation in response to glucose. Codon-Biased Translation One mechanism that has emerged as a link between tRNA pool dynamics and translational regulation of gene expression involves the selective translation of mRNAs possessing patterns of synonymous codon usage that match the reprogrammed tRNA pool – codon-biased translation. It has long been appreciated that genes for some highly expressed proteins, such as ribosomal proteins, are enriched with codons that are the most frequently used among a group of synonymous codons across a genome – the so-called optimal codons. Further, biased use of synonymous codons was found to dictate translation rate and efficiency. Stress-induced tRNA reprogramming represented a mechanistic link between observations of codon biases in translation and stress-induced changes in the proteome and cell phenotype. For example, in yeast exposed to hydrogen peroxide, the oxidative stress caused an increase in Trm4-mediated m5C levels in tRNALeuCAA, which led to selective translation of UUG-enriched mRNAs (Chan et al. 2012) (Fig. 2). Similarly, in yeast exposed to MMS-induced alkylation stress, exposureinduced increases in TRM9-catalyzed mcm5U and mcm5s2U modifications were associated with increased levels of proteins encoded by genes enriched in the codons read by Trm9-modified tRNAs: tRNAArgUCU and tRNAGluUUC (Begley et al. 2007). Loss of Trm4 and Trm9 prevented the modification of the tRNAs and the associated selective translation of codon-biased mRNAs (Begley et al. 2007; Chan et al. 2012; Deng et al. 2015), which provided genetic and biochemical validation of the mechanism. In these studies, codon usage patterns across the yeast genome were quantified using a codon counting algorithm (Begley et al. 2007; Doyle et al. 2016). For example, in the Trm9 studies, researchers identified a set of DNA damage response genes overusing with specific arginine and glutamic acid codons, such as translation elongation factor 3 (YEF3) and ribonucleotide reductase (RNR1 and RNR3) large subunits. They demonstrated that Trm9 significantly regulates Yef3, Rnr1, and Rnr3 protein levels by methylating the uridine wobble base found in

1206

J. Wu et al.

Fig. 2 RNA modifications control cellular stress response at the level of translation. Exposure to oxidative stress leads to an elevation in the level of m5C at the wobble position in tRNALeuCAA, which leads to selective translation of UUG-enriched mRNAs, for example, RPL22A

tRNAArgUCU and tRNAGluUUC and leading to codon-biased translation of these mRNAs. A growing number of publications provide both associational, genetic, and biochemical validation of the phenomenon of codon-biased translation in organisms ranging from bacteria to humans (Endres et al. 2015; Chionh et al. 2016; Lin et al. 2018; Ng et al. 2018; Rapino et al. 2018; Leonardi et al. 2020; Hammam et al. 2021; Hu et al. 2021; Orellana et al. 2021). For example, a microbial pathogen, Mycobacterium bovis BCG, responded to each stage of hypoxia and aerobic resuscitation by uniquely reprogramming 40 tRNA modifications, thus leading to the prioritized translation of stress response critical genes such as the dosR transcription regulator for the DosR regulon that remodels the cell for hypoxic dormancy (Chionh et al. 2016). Similar evidence has also been observed in malaria parasites; the selective cytosine methylation of tRNAAspGTC at position C38 is critical to maintain the

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1207

stable translation of genes with GAC codon bias (Hammam et al. 2021). Moreover, a recent study in mice demonstrated that mammalian alkylation repair homolog 8 (ALKBH8), a writer enzyme for wobble uridines on tRNASec, plays a protective role against naphthalene-induced lung dysfunction (Leonardi et al. 2020). Defects in tRNA Reprogramming and Codon-Biased Translation in Disease Beyond stress response with model organisms, there is growing appreciation for defects in all aspects of tRNA reprogramming and codon-biased translation as drivers of a wide variety of human diseases. This diversity ranges from mitochondrial diseases and neurological disorders to HIV and cancer (Orellana et al. 2021; Rak et al. 2021; Suzuki 2021). For example, two tRNA modifications, wybutosine (yW) and (2-methylthio-N6-threonylcarbamoyladenosine) (ms2t6A), are reduced dramatically during T cell activation. tRNAs without these modifications in anticodon loop are prone to ribosomal frameshifting that is needed for the translation of HIV-1 polymerase enzyme. These results may help explain HIV’s specific tropism toward proliferating T cells (Rak et al. 2021). In terms of cancer, disruption of this system by loss, mutation, amplification, or dysregulated transcription of genes for RNA-modifying enzymes or tRNAs is now recognized as a major cancer driver. For example, Close et al. observed that overexpression of ELP3 drives melanomas with mutated BRAFV600E (Rapino et al. 2018) by (1) increasing modification of tRNAs with ELP3-catalyzed wobble U modifications, which (2) increases the number of copies of these tRNAs, thus (3) enhancing the translational efficiency of mRNAs enriched in their cognate codons, including tumor-driving proteins such as cell cycle regulators (Rapino et al. 2018). This systems-level behavior has been recapitulated in multiple studies. For example, Goodarzi et al. showed that overexpression of specific tRNA isoacceptors correlated with increased levels of oncoproteins from genes enriched with cognate codons of the tRNAs (Goodarzi et al. 2016), while Gingold et al. also showed that codon biases and matching tRNA overexpression distinguish proliferating from differentiating cells (Gingold et al. 2014). Perhaps the best-developed model for cancer-driving tRNA modifications and codon-biased translation involves overexpression of METTL1 in AML, gliomas, sarcomas, and other cancers (Institute 2021; Orellana et al. 2021). The m7G46 modification catalyzed by METTL1 stabilizes tRNAs whose codons are enriched in mRNAs for cancer-driving cell proliferation genes (Orellana et al. 2021).

Current Tools to Study the tRNA Epitranscriptome and tRNA Reprogramming Because it is a multisystem transcendent property of cells, the study of tRNA reprogramming and codon-biased translation requires multiple “omic” technologies to quantitatively interrogate each of the information-rich functional levels in human cells: 50 tRNA modifications occur on 250 tRNA isoacceptors that decode 61 codons that encode 21 amino acids that make up 20,000 proteins and their 98,000 proteoforms from genes bearing 250,000,000 codons. Ribosomal

1208

J. Wu et al.

Fig. 3 Current methods for the quantification of tRNAs and tRNA modifications and for mapping of modifications in specific tRNAs

components are excluded here, which may themselves be subject to reprogramming of rRNA modifications and modifications on the dozens of ribosomal proteins. As depicted in Fig. 3, here, the most common techniques are reviewed, focusing on the detection and quantification of tRNAs and tRNA modifications, on the localization of tRNA modifications, and on establishing codon-biased translation, discussing the underlying principles, recent advances, and strengths and weaknesses of the different approaches. However, codon analytics and the algorithms for quantifying codon usage patterns in genes and genomes will not be discussed here, as these methods are beyond the bioanalytical chemistry-centric focus of this review and are reviewed elsewhere (Dedon and Begley 2014; Doyle et al. 2016; Huber et al. 2019). tRNA Modification Analytics Mass spectrometry (MS) is the ideal tool for discovery, detection, and absolute and relative quantification of RNA modifications (Cai et al. 2015; Wetzel and Limbach 2016; Yoluc et al. 2021; Amalric et al. 2022). All forms of MS are applicable to various aspects of RNA modification analysis: low-resolution single- and triple-quadrupole and ion trap systems for discovering, detecting, and quantifying modified ribonucleosides and oligonucleotides, and highresolution Orbitrap and quadrupole time-of-flight instruments for structural validation and quantification of RNA modifications. Analytical sensitivity, specificity, and dynamic range are all enhanced by coupling the mass spectrometers to chromatography systems including high-performance liquid chromatography (HPLC), gas chromatography (GC), and capillary electrophoresis (CE) (Cai et al. 2015; Wetzel and Limbach 2016; Yoluc et al. 2021; Amalric et al. 2022).

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1209

Regardless of the MS system, there is a canonical workflow for the analysis of modified ribonucleosides, which starts with the purification of RNA analytes. A discussion of RNA isolation and purification methods and technologies is beyond the scope of this review, and there are numerous relevant publications (Cai et al. 2015; Richter et al. 2022). However, it cannot be stressed too strongly that the relevance of the mass spectrometer signals to biochemical and biological reality is directly related to the purity and integrity of the RNA sample, with numerous examples of incorrect conclusions based on contamination of one form of RNA with another, due to fragmentation, entanglement, poor resolution, or fragmentation (Richter et al. 2022), or adventitious chemical modification of the RNA by enzymes or stresses during isolation, purification, or subsequent processing (Dong and Dedon 2006; Su et al. 2014; Cai et al. 2015). For example, artifacts can arise during the hydrolysis protocol, such as epimerization of cyclic N6-threonylcarbamoyladenosine (ct6A) under mild alkaline conditions (Matuszewski et al. 2017), while amino/imino ribonucleoside artifacts arise by amination/imination of carbonothiolated nucleosides under ammonium-buffered mild basic conditions (Jora et al. 2021). Thus, a wellestablished protocol is needed for reliable and accurate RNA modification quantification (Kaiser et al. 2021). With purified RNA in hand, the next step in the canonical workflow is the hydrolysis of RNA to ribonucleosides using a combination of endonucleases (e.g., nucleases P1 or S1, benzonase) and exonucleases (e.g., phosphodiesterase 1) to release mononucleotides, followed by dephosphorylation to ribonucleoside forms using alkaline or other phosphatases (Su et al. 2014; Cai et al. 2015). Adventitious chemical reactions can be minimized by adding deaminase inhibitors and antioxidants to the RNA isolation, purification, and processing steps (Su et al. 2014; Cai et al. 2015). In the most frequently used methods, the resulting ribonucleoside mixture is then resolved using liquid chromatography (LC) with reversed-phase (RP) HPLC, uHPLC, or capillary column, with the outflow coupled to a mass spectrometer. Subsequent data mining to validate signals, determine peak areas, and normalize data are also fairly well standardized (Su et al. 2014; Cai et al. 2015). This general workflow opens the door to a variety of different analytical methods for targeted and untargeted mass spectrometric analysis, including discovery of existing and new modifications, quantification of known modifications with chemical standards, and “omic” analysis of all or a subset of RNA modifications. One example that illustrates this last approach involves a targeted method to quantify many or all of the tRNA modifications present in a cell or tissue using chromatography-coupled triple-quadrupole MS (LC-MS/MS or LC-QQQ) (Su et al. 2014; Cai et al. 2015). As illustrated in Fig. 4a, the RNA hydrolysate is resolved by RP HPLC, and the eluting ribonucleosides enter the QQQ mass spectrometer where they are ionized in an electrospray ionization source (ESI). The ion stream enters the first mass quadrupole (Q1) for selection of ions with specific massto-charge (m/z) ratios, which are then directed to enter the collision cell (Q2) where the ribonucleosides dissociate into product ions by collision-induced dissociation (CID) with hot nitrogen gas. The most common first fragmentation involves

1210

J. Wu et al.

Fig. 4 tRNA modifications quantification by dynamic MRM mode using QQQ-MS. (a) Workflow for tRNA modifications quantification. tRNA is hydrolyzed into mono-nucleoside, separated via reverse phase LC and indentified by QQQ-MS. Only selected nucleosides with specific m/z value can leave the first quadrupole (Q1) and enter the collision cell (Q2) where the nucleosides dissociate into product ions. Only desired product ions can leave the Q3 and enter the detector where signals are recorded. (b) LC-MS/MS analysis of a mixture of modification strandards, to demonstrate the resolution of dynamic MRM

cleavage of the bond between the ribose sugar and the nucleobase (deglycosylation). The full set of product ions then enters the third quadrupole (Q3) from which selected product ions – usually the deglycosylated nucleobases – are sent to the detector where the m/z value and abundance of the ion is recorded. Since the ribonucleosides (except pseudouridine) predictably fragment to lose the ribose during CID, a table of m/z values (transitions) for the parent ribonucleosides and their product nucleobase ions, along with their HPLC retention times, can be created to automatically perform the mass spectrometric analysis during the course of the HPLC elution. This multiple reaction monitoring (MRM) approach is highly sensitive as substantial noises are reduced dramatically during the double selection of target analytes, which allows the detection and quantification of RNA modifications in the low femtomole range in complex RNA hydrolysis matrices. An example of a mixture of modified ribonucleosides analyzed by MRM LC-MS/MS is shown in Fig. 4b. Recent advances in mass spectrometric analysis of RNA modifications have mainly focused on improving LC separation efficiency and mass quantification

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1211

sensitivity and accuracy. To improve LC separation efficiency, ultrahigh-performance liquid chromatography (UHPLC) with particle sizes less than 2 μm in diameter has been applied to RNA modifications, allowing efficient resolution and analysis of modified ribonucleosides with run times less than 10 min (Gregorova et al. 2021). Compared to typical run times of ~30 min with conventional HPLC, UHPLC is more suitable for high-throughput analysis of large numbers of samples. While the separation of polar ribonucleosides remains challenging with RP HPLC, some new stationary phases, for example, hydrophilic interaction liquid chromatography (HILIC) (Guo et al. 2018) and porous graphitic carbon (PGC) (Sarin et al. 2018), have been tested to overcome this obstacle, allowing the separation of highly polar cytidine derivatives. Another improvement in this area is the use of nanoflow liquid chromatography (nanoLC) to improve quantification sensitivity to attomole range, allowing the analysis of low-abundance samples, such as tissue samples from clinical studies (Sarin et al. 2018). Since each molecule has a unique and unpredictable ionization efficiency, MS signal output does not directly correlate with molecular abundance. The very strong precision of MS signals means that assuming similar biochemical environments (i.e., matrices), relative quantification can be used to compare the changes in the levels of RNA modifications in one sample to those in another, such as treated versus control. Absolute quantification by MS requires synthetic ribonucleoside standards to calibrate ionization efficiency and signal intensity, using an external calibration curve with unlabeled standards or by isotope dilution with labeled standards. The practice of quantitative analysis of nucleosides is beyond the scope of this review and the reader is referred to the many excellent publications on MS-based bioanalytics. The Helm and Kaiser/Kellner groups have been instrumental in developing cell-based methods for low-cost preparation of virtually any isotope-labeled ribonucleoside standard (Kellner et al. 2014; Borland et al. 2019; Heiss et al. 2021). Quantifying tRNAs Environmentally induced changes in the levels of the individual modified ribonucleosides occur on intact tRNA molecules, so tools to quantify the tRNAs themselves are critical to interrogating the system of tRNA modification reprogramming and codon-biased translation. The pool of tRNA molecules in the cytoplasm of all organisms and in the mitochondria of eukaryotic cells is as dynamic as the epitranscriptome, which necessitates the use of precise and accurate tools for rigorous quantification of individual tRNA isoacceptors and isodecoders. Two general strategies have been used for tRNA quantification: hybridization-based approaches, including microarray and northern blots, and reverse transcriptionbased methods including reverse transcription (RT-PCR) and RNA sequencing (RNA-seq). The major inherent challenges for tRNA quantification involve the highly stable secondary structure of tRNAs, with a tight steam-loop structure welded by four duplex regions in the stems (Fig. 1), and the density of modified ribonucleosides (~8–13 per tRNA) that can interfere with hybridization and reverse transcription polymerases. Recent developments in approaches to tRNA quantification, all aiming to circumvent these issues, are reviewed here.

1212

J. Wu et al.

Hybridization-based approaches to tRNA quantification have been widely used and do not require reverse transcription or amplification steps that are potential sources of bias. In the general northern blot technique, which has been adapted to tRNAs, native tRNAs are separated by 1D or 2D gel electrophoresis, transferred onto a membrane via a capillary or vacuum blotting system and immobilized to the blotting membrane by the application of UV light or heat. Oligonucleotide probes containing radioactive, fluorescent, or chemiluminescent tags are then hybridized to the tRNA sequence of interest for quantification. Northern blots are very laborintensive and have a limited dynamic range for quantification. Microarrays, on the other hand, provide far more efficient detection and quantification of a much larger number of tRNA molecules than northern blots (Schwartz and Pan 2017). In microarray, tRNAs are labelled with a fluorescent or radioisotope-labeled nucleotide and then hybridized onto an array of oligonucleotides with sequences specific to each tRNA isoacceptor or isodecoder and discernable by hybridization. To reduce experimental bias, fluorescent nucleotides with different colors can be used to label different samples, and then these samples can be mixed and hybridized onto the same microarray. Ratios between different fluorescent signals are then used for the relative quantification of tRNAs between samples (Schwartz and Pan 2017). The quantification of tRNA by hybridization-based approaches has several drawbacks. Both northern blot and microarray cannot distinguish near-cognate tRNA species with similar sequences, with cross-hybridization leading to false-positive results. In addition, the methods are somewhat insensitive and biased to relatively abundant tRNAs. Reverse transcription-based approaches include RT-PCR and next-generation sequencing (NGS) approaches, in which tRNAs and their fragments are converted into unmodified and amplifiable complementary DNAs (cDNA) that are quantified or sequenced, respectively, as a surrogate for the tRNAs. While the main advantage of arrays is the ease of multiplexing the analysis for dozens of tRNA sequences, RT-PCR has the potential to be more efficient than hybridization-based approaches owing to its high sensitivity due to amplification, the ability to distinguish minor sequence variants, and the relative accessibility of the reagents and PCR equipment. The major problem facing all reverse transcription approaches is the stability of tRNA secondary structure and the presence of polymerase-blocking modifications. To circumvent this issue, Honda and coworkers reported a method called four-leaf clover qRT-PCR (FL-PCR) (Honda et al. 2015). In this method, a stem-loop adapter is ligated to the protruding ends of mature tRNAs to form a “four-leaf clover” secondary structure. During RT-qPCR, the primers are designed to target the Tand D-arms of tRNA and amplify the unmodified stem-loop adaptor; thus, the influences of tRNA modifications on quantification are minimized. One major limitation of this method is that it can only quantify mature tRNAs with full CCA 30 -ends and it cannot detect tail-truncation isoforms missing an A or a CA at the CCA-30 end. This is important since an MS-based method found that only 17% of the commercially prepared tRNAPhe has the full CCA-30 tail (Zhang et al. 2020). Compared to RT-qPCR, next-generation sequencing (NGS) technology has the inherent advantage of the potential to identify nearly all tRNA sequences, including

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1213

immature forms, isodecoders, and fragments, as well as to provide the context of all other coding and noncoding RNAs in a sample. The basis for these advantages in most RNA-seq methods for small RNAs lies in the capture of all RNA species by ligating defined-sequence linkers to the 30 - and 50 -ends of each RNA molecule, followed by reverse transcription and PCR amplification (Fig. 5). The major disadvantage of most RNA-seq methods is that they cannot be quantitative for tRNAs. This is partly rooted in biased ligation of sequencing linkers to the 30 - and 50 -ends of RNAs, with sequence-dependent 103-fold variation in efficiency (Fuchs et al. 2015) causing 106-fold artifacts in sequencing read counts (Pang et al. 2014; Fuchs et al. 2015). Highly structured and modified RNA molecules, such as tRNAs, further challenge the quantitative accuracy of RNA-seq by causing polymerase falloff during cDNA synthesis. Here, the discussion focuses on recent technological advances in the NGS-based tRNA quantification and modification mapping. Several solutions have been advanced to overcome biased adaptor ligation and polymerase falloff. First, RT-blocking methyl modifications can be reduced by AlkB demethylase treatment, which can increase the read length and the proportion of RNA-seq reads mapping to tRNAs (Pinkard et al. 2020; Hu et al. 2021). Second, to improve the capture of all RNA sequences, modification or structure-induced truncated cDNAs arising from polymerase falloff during reverse transcription can be captured together with full-length cDNA using a sequential adapter ligation method or a cDNA circularization method. In the sequential adapter ligation, as the adaptor is ligated to the 30 of the tRNA followed by reverse transcription, which results in either full-length or truncated cDNAs. All cDNA species are then captured with the second adapter ligation (Fig. 5), which enables PCR amplification and sequencing of both truncated and full-length products (Pang et al. 2014; Hu et al. 2021). Another strategy is cDNA circularization, which involves 30 adaptor ligation and reverse

Fig. 5 Conventional next-generation RNA sequencing is inherently nonquantitative due to biased linker ligation and loss of sequence information due to RT polymerase falloff

1214

J. Wu et al.

transcription followed by cDNA circularization with a CircLigase and subsequent PCR amplification and sequencing (Pinkard et al. 2020; Behrens et al. 2021). Third, as detailed shortly, ligation biases can be minimized by molecular crowding (e.g., polyethylene glycol) to increase ligation efficiency, by randomization of the terminal nucleotides on adaptors, or by using splint ligation approaches. For example, ligation bias can be reduced using hairpin or double-strand adapters that are specific to tRNA CCA-30 end of mature tRNAs (Pinkard et al. 2020). The latter strategy leads to another bias due to the isoforms of the tRNAs lacking full-length CCA-30 tail (Zhang et al. 2020). Recently, an RNA-seq method – AQRNA-seq – was developed to minimize ligation bias and capture all small RNA species, thus providing a linear correlation between sequencing read count and RNA abundance (Hu et al. 2021). First, the adapter that ligates to the 30 end of tRNA has two randomized nucleotides at the 50 end to minimize the ligation bias and maximize T4 ligation efficiency. Different than the tRNA CCA-30 end-specific adapter, this optimized adapter ligation strategy expands the application of AQRNA-seq to tRNA fragments and other types of small RNAs, such as miRNA, piRNA, and therapeutic siRNAs. The use of sequential adaptor ligation obviates the need to remove RT-blocking methyl modifications AlkB demethylase, but this step can be added to map the location of some methyl modifications in comparative sequencing runs. Following removal of the excess adaptor with ssDNA-specific 50 -to-30 exonuclease RecJ, cDNA is synthesized and ligated with the second adapter harboring a random hexamer sequence to enhance cDNA ligation efficiency by forming a hairpin splint. RecJ removal of the excess adaptor is followed by PCR amplification and sequencing. In this method, the two ligation steps have been optimized for >90% ligation efficiency, and the truncated cDNA products from reverse transcription are conserved for accurate quantification, while a custom data workflow allows accurate quantification of all expressed small RNAs, as well as detection of RNA modifications, sequence alterations, and RNA structural changes. These optimizations provide a direct, linear correlation between sequencing read count and the number of RNA copies, validated with synthetic oligos and a panel of 963 miRNA sequences. The main weakness of AQRNA-seq and all other RNA-seq methods is the limited number of RNA modifications that can be localized or “mapped.” Mapping tRNA Modifications RNA modifications can only be understood within the context of the RNA sequences in which they reside, especially for those tRNA modifications involved in decoding. However, the geographical mapping of tRNA modifications has proven to be more technologically challenging than quantifying component ribonucleosides. Among all the instrumental analysis methods for tRNA modifications, only RNA sequencing and MS are able to accurately define the location of tRNA modifications in an RNA sequence and to quantify the proportion of modified and unmodified sequences. While RNA-seq exceeds the capacity of MS-based mapping for broad and deep sequence coverage in a single run, MS provides unambiguous chemical identification of the modifications, with RNA-seq requiring specialized methods for each class of RNA modification. Thus, sequencing

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1215

and MS are orthogonal techniques that provide complementary information toward the goal of a complete and dynamic map of tRNA modifications in codon-biased translation. Except for direct RNA sequencing by single-molecule approaches (e.g., Nanopore), all other deep sequencing-based protocols for RNA modification mapping rely on the conversion of RNA into cDNA by RT, as described earlier. During RT, modifications such as m1A, 1-methylguanosine (m1G), 3-methylcytidine (m3C), and N2-dimethylguanosine (m22G) alter RT activity to cause misincorporation, arrest, or nucleotide skipping. These defects can be used as signatures for known modifications (Werner et al. 2020; Yoluc et al. 2021) and can also be used to discover unknown modification sites (Kimura et al. 2020). Since the RT signature of each modification is variable, depending on local sequence context, RT reaction conditions and RT enzymes, the accuracy of mapping these modifications can be improved with machine learning-based computational models (Werner et al. 2020). The drawback of these RT signatures is the presumptive nature of the identification, which requires historical precedent for rigorous chemical identification of the modifications at each site in each tRNA molecule to be confident in the assignment. The chemical specificity of RNA modification mapping can be improved moderately by chemical derivatization (Kellner et al. 2010; Helm et al. 2021) or using modificationselective antibodies. For the former, a relatively small subset of RNA modifications can be cleaved or protected from cleavage on the adjacent ribose-phosphate backbone by chemical reagents and thus can be distinguished from unmodified residues, with localization at single-nucleotide resolution in subsequent RNA-seq (Kellner et al. 2010; Helm et al. 2021). Further, the chemical derivatizations are often nonspecific or at least only selective, such as m7G mapping by mild alkaline pH or NaBH4 reduction in TRAC-seq, m7G-MeRIP-seq, AlkAnilineSeq, and other methods (Lin et al. 2018; Zhang et al. 2019). m7G, yW, ac4C, and D are all sensitive to NaBH4 reduction, while m3C, D, and 5-hydroxycytidine (ho5C) are all sensitive to the alkaline conditions, which confounds the chemical specificity of methods. In TRAC-seq, the specificity is improved by using AlkB, which reduces m1A and m3C levels but not m7G, while parallel sequencing of immunoprecipitated m7Gcontaining RNA confers m7G selectivity in m7G-MeRIP-seq (Lin et al. 2018; Zhang et al. 2019). To obviate the complexities of cDNA-based NGS, direct RNA-sequencing technologies are emerging for the detection of modifications, including single-molecule real-time sequencing (SMRT) (Vilfan et al. 2013) and nanopore sequencing (Pratanwanich et al. 2021). By replacing the DNA polymerase in traditional SMRT DNA sequencing with an RT on the zeromode waveguide (ZMW) arrays, SMRT sequencing has been applied to follow the synthesis of cDNA from an RNA template in real time, with modification-induced changes in the kinetics of the RT used to identify the location of the modification (Vilfan et al. 2013), as originally applied for mapping DNA modifications. However, this proof-of-concept study exposed several shortcomings, including multiple fluorescence pulses from the same nucleotide, relatively short read-lengths, potential enzyme falloff on

1216

J. Wu et al.

RT-blocking modifications, and low single-pass accuracy. More recently, nanopore sequencing has been successfully applied to full-length tRNA sequencing and modification detection, which detected all 43 expected isoacceptors from E. coli tRNAs and revealed systematic miscalls occurring at or near known RNA modifications (Thomas et al. 2021). To further identify modifications, several technical developments will be required. First, nanopore sequencing is limited by low base calling accuracy with 86% median accuracy at best. Second, well-established modification-dependent ionic current patterns are needed to detect modifications with high accuracy, as previous works on m6A, m7G, and pseudouridine (Pratanwanich et al. 2021). These emerging technologies offer significant promise for fast, low-cost tRNA modification mapping. tRNA Modification Mapping by Mass Spectrometry As discussed earlier, MS is a powerful tool for discovering, identifying, and quantifying RNA modifications. However, it is also emerging as a tool for mapping RNA modifications with the advantage of chemical specificity. There are two major MS-based approaches to tRNA sequencing: bottom-up analysis of short fragments derived from tRNA and top-down analysis of intact, full-length tRNA. Both approaches are reviewed here. Theoretically, top-down MS is an attractive method that should provide complete information about tRNA sequence identification and the location of all modifications with each tRNA sequence based on CID fragmentation and high-resolution analysis of the fragments. However, the technology for top-down analysis is still emerging and one of its major shortcomings is that it cannot provide complete sequence and modification coverage due to the limited number of fragments observed in CID. For example, with electron detachment dissociation (EDD), the sequence coverage from d and w fragments for 22 nt and 34 nt fragments and 76 nt full-length tRNA are 100%, 97%, and 80%, respectively (Taucher and Breuker 2012) (fragment nomenclature detailed in Fig. 6b). Other studies reported ~60% coverage of the yeast tRNAPhe sequence by ion trap CID and the product ions were mainly located in 50 - and 30 - but not in the middle of tRNA (Huang et al. 2010). Recently, Breuker and coworkers developed a novel dissociation technique – radical transfer dissociation (RTD), which dissociates RNA into d- and w-type fragments by including cobalt(III) hexammine during the analysis (Calderisi et al. 2020). This method achieved fullsequence coverage for a 39 nt RNA fragment and is especially useful for analyzing labile modifications such as 5-hydroxymethylcytidine (hm5C) and 5-formylcytidine (f5C). Another issue for top-down MS is the resolution of specific full-length tRNA isoacceptors that range between 70 and 100 nucleotides. Such resolution requires the purification of individual tRNAs by DNA probe hybridization, gel electrophoresis, or liquid chromatography. Among these methods, liquid chromatography is the most compatible with MS. One-dimensional liquid chromatography (1D-LC) using ion-pair RP or weak anion exchange resins are the most frequently used approaches to separate full-length tRNAs. However, both methods are not capable of direct isolation of individual tRNAs. Recently, a two-dimensional liquid chromatography (2D-LC) method integrating weak anion exchange resin and ion pair RP was developed for the separation of individual tRNAs from E. coli MRE600 (Cao

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1217

Fig. 6 tRNA modification mapping by bottom-up MS. (a) Workflow for MS-based tRNA modification mapping: tRNA is fragmented into short oligonucleotides that are resolved by reversephase HPLC and sequenced by quadrupole time-of-flight (QTOF) or Orbitrap mass spectrometry. Full-length precusor oligonucloetides are identified in MS1, and selected precursor ions are sent to the collision cell for fragmentation into sequence ladders that are identified by MS3. (b, c) Two illustrative CID spectra identified in a mixture of oligos generated by RNase T1 of a synthetic RNA sequence. The oligo mixture was on a Q Exactive Orbitrap mass spectrometer in positive ion mode with MS1 at 70,000 resolution and in data-dependent mode with the top five MS1 species selected for fragmentation at normalized collision energy 15 and MS2 at 70,000 resolution. Data were processed using OpenMS and NASE (Wein et al. 2020) with 5 ppm mass tolerance and the full list

1218

J. Wu et al.

et al. 2020). Two tRNAs, tRNAValUAC and tRNALeuCAG, were isolated by 2D-LC and further identified using bottom-up MS. With further development of novel CID techniques and online 2D-LC separation methods, an increased interest is expected in the analysis of full-length tRNA with top-down MS. Building on the general framework developed by McCloskey and coworkers in 1993, bottom-up MS mapping of tRNA modifications is analogous to bottom-up proteomics: enzymatic digestion of full-length tRNAs into short oligonucleotides, LC separation of the oligonucleotide mixtures, tandem MS fragmentation to obtain precursor-product spectra of fragment ions (fragment nomenclature detailed in Fig. 6b), reconstruction of oligonucleotide sequences, and modification localization based on the MS spectra and oligonucleotide alignment to full-length tRNAs (Fig. 6). Recent advances and improvements in these steps have made bottom-up MS as an indispensable tool for tRNA modification mapping. The most recent advanced workflow allows the overall sequence coverage varied from 75% to 100% for specific tRNAs in E. coli tRNA mixture (Thakur et al. 2020). In this part, recent major improvements are reviewed, including sample preparation, data acquisition, data analysis, as well as current issues in this method that impede its application in codon-biased translation analysis. In bottom-up MS, tRNAs are digested to short oligonucleotides with ribonucleases (RNases) such as RNase T1 (G-specific), RNase U2 (G/A-selective), and RNase A (C/U-specific). These sequence-specific RNases generate uniform oligonucleotide fragments from tRNAs, which allows the quantitation of specific tRNAs, as long as the fragments are long enough to map uniquely back to specific tRNA reference sequences. Among these RNases, RNase T1 generates fragments that are longer than RNase U2 and RNase A (G vs G/A vs C/U cleavage) due to its G specificity. However, RNase T1 generates many short fragments (10 times more fragments than enzymatic digestion approaches, which can overwhelm the bottom-up MS analysis when dealing with complex samples. In summary, a tRNA fragmentation approach that generates relatively long and overlapping fragments (e.g., RNase U2-E49A mutant) will greatly benefit bottom-up MS sequencing. As the fragment length increases, improved LC separation, MS dissociation, and MS data acquisition and analysis methods are also needed. After digestion, the RNA fragments must be chromatographically resolved to reduce the complexity of the mixture entering the MS at any one time. The most common chromatography method is RP HPLC together with ion-pairing reagents, such as triethylammonium acetate (TEAA). However, TEA and other secondary and tertiary alkylamines are prone to adsorb to the surfaces of HPLC and MS devices, leading to contamination of LC-MS runs that causes suppression of analyte ionization and interference with low-molecular-weight ion detection, especially in positive ion mode. Thus, devices using ion-pairing reagents are usually dedicated for ion-pairing work only. However, several ion-pairing reagents have been reported to have lower contamination and signal suppression, easier washout, and comparable retention of oligonucleotides as classic TEAA, including heptafluorobutylamine (HFBAm) and butylamine (BA). The application of ion-pair RP chromatography (IP-LC) together with nanoflow LC permits the analysis of complex mixtures of human tRNAs (~250 species) with as little as 100 ng of RNA (Wein et al. 2020). Several alternatives not requiring ion-pairing agents have been developed, including C18 RP HPLC, hydrophilic interaction liquid chromatography (HILIC), and two-dimensional liquid chromatography (2D-LC). Kellner and coworkers reported using C18 RP HPLC with ammonium acetate in the running buffer to map modifications in synthetic oligonucleotides and tRNAValAAC (Hagelskamp et al. 2020). Exploiting the strong retention properties of HILIC for hydrophilic compounds such as oligonucleotides, the Patrick and Marianne laboratories have evaluated several different HILIC stationary phases for LC-MS analysis of a series of oligonucleotide ladders and found that diol and amide stationary phases provide high resolving power and good precision for studying oligonucleotides (Demelenne et al. 2020). Another advantage of HILIC is that oligonucleotides are usually eluted in a highvolume ratio of organic solvents, which increases sensitivity by more efficient droplet formation and ionization in ESI-MS. To take advantage of the separation potency and obviate ion-pairing agent contamination issues, 2D-LC approaches have been developed with ion-pair RP chromatography or anion-exchange chromatography as the first dimension, and regular C18 RP (without IP or HILIC) as the second dimension. The first dimension provides high resolving power for oligonucleotides and the second dimension provides online desalting and development of MS-compatible solvent conditions as well as further resolution (Li et al. 2020). While these alternative approaches have been demonstrated with relatively simple samples, such as synthetic oligonucleotides or purified tRNAs, further studies are required to test their efficacy with complex tRNA digestion mixtures.

1220

J. Wu et al.

Following HPLC resolution, RNA fragments are injected into a high-resolution mass spectrometer for analysis. Successful mapping of RNA modifications requires careful attention to a variety of MS parameters, which merits a brief review of MS operation. The most commonly used high-resolution MS instruments are quadrupole time-of-flight (QTOF) mass spectrometry and Orbitrap systems using ESI and CID, both of which offer much high mass accuracy but poorer sensitivity than triplequadrupole instruments. First, the RNA fragments are ionized into multiple charge states in ESI and sent to the mass analyzer (TOF or Orbitrap) for precursor ion scanning (MS1). Depending on the MS1 results, target precursor ions are selected by the first quadrupole (Q1) based on the signal strength, with the precursors of highest abundance – the “top N” – selected in data-dependent acquisition mode (DDA). The selected precursor ions are then fragmented in the CID cell of a QTOF MS or the higher-energy collisional dissociation (HCD) cell in an Orbitrap MS, with the fragment ions sent back to the mass analyzer (TOF or orbitrap) for fragment ion scanning (MS2). High-quality MS2 data containing enough phosphodiester backbone cleavage information is critical for successful, sensitive, and deep RNA sequencing and modification mapping. One of the major challenges for bottom-up MS analysis is that signal intensities of different RNA sequences in equimolar mixtures can span several orders-of-magnitude (Wein et al. 2020), which poses a problem for the highly variable abundance of the ~250 different tRNA species expressed in human cells. To improve the detection of low-abundance and low-response RNA fragments, different precursor ion selection strategies have been explored. Precursor ion selection is commonly done in DDA mode, with the most intense precursor ions (top N) selected for MS2 analysis. However, due to the coelution of RNA fragments, some low-abundance RNA fragments are not detectable in top N mode. One way to enhance the data acquisition in DDA mode is to use an exclusion list for MS/MS analysis. The precursor ions corresponding to unmodified RNA fragments are excluded from subsequent MS2 analyses, reducing the complexity of samples and improving the detection of modified fragments, especially those low-abundance fragments (Cao and Limbach 2015). Another experimental paradigm is data-independent acquisition (DIA) that has been widely used for proteomics to get high protein coverage, reproducibility, and accuracy (Li et al. 2021). In DIA mode, all precursor ions in each predefined m/z window are fragmented and analyzed together. Thus, the fragment ion spectrum (MS2) is highly multiplexed, and the direct linkage between the precursor ion and the fragment ions is lost, which makes the data analysis more complex than the DDA approach. To analyze the DIA results, a reference spectral library is used for spectrum matching. However, the DIA approach has not yet been applied to RNA sequencing, with a well-designed DIA and data analysis protocol for RNA sequencing benefiting the analysis of complex tRNA samples. Another major challenge for bottom-up MS as well as top-down MS is the identification of geometric isomers, especially for methylation modifications (e.g., Am, m1A, m2A, m6A). Sugar methylation (e.g., Am) can be distinguished from base methylation (e.g., m1A, m2A, m6A) based on the a-B type ion (loss of base from a-type ion). The a-B type ion can also be used to distinguish pseudouridine (ψ) from its isomeric precursor U based on the absence of a-B type ion at ψ site due to its

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1221

C-glycosidic bond. However, a-B type ions are not always detected for all positions and are especially challenging for 50 - and 30 -ends of an RNA fragment and for low-quality MS2 data from low-abundance RNA fragments. To further enhance the detection of isomers, several alternative approaches have been developed, including derivatization and pseudo-MS3. The derivatization approach has been applied to distinguish ψ from its isomeric precursor U by reacting with N-cyclohexyl-N0 β-(4-methylmorpholinium)ethylcarbodiimide (CMC). CMC reacts with G, U, as well as ψ at high pH (10.4). Under mild alkaline conditions, N1-CMC-ψ, CMC-G, and CMC-U are cleaved to release CMC, while N3-CMC-ψ is stable enough for downstream purification and LC-MS/MS analysis (Durairaj and Limbach 2008). However, s4U and 2-methylthio-N6-isopentenyladenosine (ms2i6A) can also be labelled with CMC, which will interfere with the detection of ψ. A systematic evaluation of derivatization approaches on complex tRNA samples is needed since these conditions may destroy labile modifications. Another approach is to use a pseudo-MS3 fragmentation strategy to directly distinguish isomers without chemical derivatization. In this approach, a standard LC-MS/MS protocol is used to generate MS1 and MS2 spectra, followed by selection of target B ions in MS2 for MS3 analysis. For example, MS2 ions with m/z 148.063 covers nucleobase methylation isomers of adenosine: m1A, m2A, m6A and m8A. These B ions are then subjected to nucleobase fragmentation in another round of CID to generate a series of fragment spectra unique to each isomer. By matching the MS3 results with reference spectra derived from authentic isomers, methylation modifications and ψ have been distinguished and mapped in the rRNAs derived from Leishmania donovani (Nakayama et al. 2019). The LC-MS analysis of a complex tRNA digestion mixture produces thousands of spectra with a maximum of nine different types of fragment ions at each site in each spectrum. To deal with this large amount of data, automatic data analysis software has been developed in recent years, including simple oligonucleotide sequencer (SOS), Ariadne, oligonucleotide mass assembler (OMA) and oligonucleotide peak analyzer (OPA), RNAModMapper (RAMM), and, most recently, NucleicAcidSearchEngine (NASE) (Wein et al. 2020). RAMM and NASE are capable of analyzing large datasets arising from complex oligonucleotide mixtures and biological samples. For both tools, the inputs are mass spectrometry data file (mzML or MGF formats) and reference RNA sequences in FASTA format. Then the RNA sequences are digested in silico based on the user-specified RNase (e.g., RNase T1, RNase U2, RNase A, RNase MC1, Cusativin). NASE further supports the mazF, colicin E5, RNase H, and unspecific cleavage. Tandem mass spectra are mapped to RNA fragments based on the predicted precursor ions and product ions, with userdefined mass tolerance. For each oligonucleotide spectrum match, NASE calculates the false discovery rate (FDR), which can be used for high-confidence mapping filtration. For modification mapping, both tools use the MODOMICS database and nomenclature, as well as supporting customed RNA modification. In particular, NASE was developed within the OpenMS framework, enabling the data visualization with TOPPview and access to other common interfaces of OpenMS to achieve downstream data analysis, such as label-free quantification (Wein et al. 2020). Furthermore, NASE supports the correction for the “monoisotopic peak” that is

1222

J. Wu et al.

selected by the data acquisition software from isotopologue peaks based on their intensity. For long sequences, the other isotopic peaks might have higher intensity than the monoisotopic peak. Thus, a correction as supported by NASE is necessary for rigorous sequence assignment. Further development of these software tools to reduce manual curation demands and improve mapping accuracy will greatly benefit the field. There is still significant room to improve bottom-up MS as a tool for sensitive, precise, and accurate tRNA modification mapping. Accurate quantitation would be enhanced by abundant and evenly distributed data points for each fragment. In this regard, multiple reaction monitoring (MRM) using triple-quadrupole instruments would be an ideal method for quantification due to its high specificity and sensitivity. A single cycle of MS1 and MS/MS analysis for 50 MRM channels usually takes 500 ms, which is enough to get more than 20 data points from a typical HPLC peak. MRM mode has been applied to quantify RNA fragments using unlabeled standards with a calibration curve or using isotope-labelled standards (Chan et al. 2012; Hagelskamp et al. 2020). However, MRM mode is not compatible with RNA sequence mapping. The Benjamin group reported a label-free quantification method for bottom-up MS-based RNA sequencing using DDA mode. Quantification here is based on the MS1 signal intensity, summing up multiple charge and adduct signals (Wein et al. 2020). One drawback to label-free quantification is that a single cycle of tandem mass analysis usually takes 2–3 s, depending on the mass configuration and the number of precursor ions. It may not be possible to acquire enough data points for each RNA fragment for accurate quantification, especially for low-abundance fragments. Another drawback is that sample preparation and LC-MS/MS conditions may differ between samples and lead to errors in quantification. To overcome these problems, two different RNA-labelling approaches have been explored: metabolic labelling and enzymatic labelling. The Williamson group generated a mature 15N-labelled rRNA standards by culturing E. coli in 15N-labeled media (Popova and Williamson 2014). 14N-rRNA (unlabeled) and 15N-labelled rRNA are mixed with 1:1 ratio, digested by RNase, and analyzed by LC-MS. The quantification results are calculated using a least-square Fourier transform (LS-FTC) algorithm, accounting for isotope distributions and metabolic scrambling. The metabolic isotope labelling approach provides an accurate and reproducible quantification method to compare modification levels. However, the approach only allows the comparison of two samples in each injection, and it has not been demonstrated yet for tRNA analysis. Alternatively, stable isotope labelling can be done during the enzymatic digestion step. In this approach, full-length RNA is digested with RNase in two separate tubes with H216O or H218O, yielding a 30 cyclic phosphate intermediate that is further hydrolyzed by water and labelled with one of the isotopic oxygens at the final 30 phosphate group (Castleberry and Limbach 2010). After digestion, the 16 O- and 18O-labelled RNA fragments are combined and subjected to LC-MS analysis for relative quantification of RNAs. However, the 2 Da mass difference for 16O and 18O may not be enough to distinguish long RNA fragments due to the overlap of isotopologue peaks. Moreover, back-exchange between 16O and 18O can impede the quantification analysis. Looking forward, alternate approaches are much needed for quantitative MS-based RNA sequencing and modification mapping.

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1223

Current Tools to Study Codon-Biased Translation While beyond the scope of this review, the last analytical leg for the tRNA reprogramming and codon-biased translation model involves quantifying translation of mRNAs into proteins, either as newly translated proteins or actively translated mRNAs, and then quantifying codon usage patterns in genes associated with the upand downregulated proteins and mRNAs relative to unstressed controls. There are a variety of tools available for quantifying codon-biased translation, including proteomics (Altelaar et al. 2013), ribosome profiling (e.g., ribo-seq) (Andreev et al. 2017; Ingolia et al. 2019), and polysome profiling (Mazzoni-Putman and Stepanova 2018). Bottom-up MS-based proteomics has been extensively reviewed as an expensive means to quantify environmentally induced changes in the steady-state proteome (Altelaar et al. 2013), which represents the balance between translation and protein degradation. Ribosome profiling has also been extensively reviewed as a means to identify actively translating mRNAs and their associated codon usage (Andreev et al. 2017; Ingolia et al. 2019). While ribosome profiling is a technically challenging method that involves sequencing the limit nuclease digest of 20–30 nt of mRNA bound to the ribosome, polysome profiling represents a much less demanding NGS-based analysis of mRNAs and tRNAs loaded onto actively translating polysomes and has been widely used to investigate translation activity and to reveal genome-wide landscape of the translatome at lower cost and greater depth than proteomics (Mazzoni-Putman and Stepanova 2018).

Conclusion The explosive growth of interest in the tRNA epitranscriptome and codon-biased translation has resulted in part from the development of convergent technologies for systems-level analyses of tRNA modifications, tRNA molecules, the translatome, and the genome. The field is now poised to move understanding of translational regulation of gene expression to the larger context of the mRNA and rRNA epitranscriptomes and ribosome reprogramming, However, new technologies, tools, and approaches are needed to fully understand the chemical biology of RNA modifications. For example, one immediate challenge is to improve the ability to quantitatively map modifications in all forms of RNA. The most attractive technologies are RNA-seq and MS, which, respectively, lack either chemical accuracy or sensitivity and depth of coverage. As an example of where the field is heading, Kimura and coworkers used a hybrid approach of sequencing and MS to discover a novel modification in tRNA from Vibrio cholerae, namely, acetylated 3-amino-3carboxypropyl (acacp3U) (Kimura et al. 2020). This is still highly labor intensive in isolating individual tRNA species for MS analysis and performing PAGE gel purification during sequencing library preparation, both of which limit an exhaustive systems-level analysis of altered tRNA modification dynamics in cell stress or disease. Under the umbrella of “translatomics,” the goal now is to adapt and develop technologies for isolating ribosome-bound tRNAs, quantitatively mapping all modifications in the purified tRNAs, and correlating this data with tRNA occupancy,

1224

J. Wu et al.

codon usage, and translation speed and fidelity. This kind of granular mechanistic understanding of the molecular and chemical biology of RNA modifications will accelerate the identification of new therapeutic targets in translationally driven diseases; motivate the development of new therapeutic modalities, such as molecular glues (Spradlin et al. 2021) or RNA-binding molecules (Messner et al. 2022); and facilitate the identification of biomarkers for drug discovery and development.

References Altelaar AF, Munoz J, Heck AJ (2013) Next-generation proteomics: towards an integrative view of proteome dynamics. Nat Rev Genet 14(1):35–48 Amalric A, Bastide A, Attina A et al (2022) Quantifying RNA modifications by mass spectrometry: a novel source of biomarkers in oncology. Crit Rev Clin Lab Sci 59(1):1–18 Andreev DE, O’connor PB, Loughran G et al (2017) Insights into the mechanisms of eukaryotic translation gained with ribosome profiling. Nucleic Acids Res 45(2):513–526 Begley U, Dyavaiah M, Patil A et al (2007) Trm9-catalyzed tRNA modifications link translation to the DNA damage response. Mol Cell 28(5):860–870 Behrens A, Rodschinka G, Nedialkova D (2021) High-resolution quantitative profiling of tRNA abundance and modification status in eukaryotes by mim-tRNAseq. Mol Cell 81(8):1802–1815 e1807 Biedenbänder T, De jesus V, Schmidt-Dengler M et al (2022) RNA modifications stabilize the tertiary structure of tRNAfMet by locally increasing conformational dynamics. Nucleic Acids Res 50(4):2334–2349 Borland K, Diesend J, Ito-Kureha T et al (2019) Production and application of stable isotope-labeled internal standards for RNA modification analysis. Genes 10(1):26 Cai WM, Chionh YH, Hia F et al (2015) A platform for discovery and quantification of modified ribonucleosides in RNA: application to stress-induced reprogramming of tRNA modifications. Methods Enzymol 560:29–71 Calderisi G, Glasner H, Breuker K (2020) Radical transfer dissociation for de novo characterization of modified ribonucleic acids by mass spectrometry. Angew Chem Int Ed Engl 59(11): 4309–4313 Cao X, Limbach PA (2015) Enhanced detection of post-transcriptional modifications using a massexclusion list strategy for RNA modification mapping by LC-MS/MS. Anal Chem 87(16): 8433–8440 Cao KY, Pan Y, Yan TM et al (2020) Purification, characterization and cytotoxic activities of individual tRNAs from Escherichia coli. Int J Biol Macromol 142:355–365 Castleberry CM, Limbach PA (2010) Relative quantitation of transfer RNAs using liquid chromatography mass spectrometry and signature digestion products. Nucleic Acids Res 38(16): e162–e162 Chan CT, Pang YL, Deng W et al (2012) Reprogramming of tRNA modifications controls the oxidative stress response by codon-biased translation of proteins. Nat Commun 3:937 Chan CT, Deng W, Li F et al (2015) Highly predictive reprogramming of tRNA modifications is linked to selective expression of codon-biased genes. Chem Res Toxicol 28(5):978–988 Chionh YH, Mcbee M, Babu IR et al (2016) tRNA-mediated codon-biased translation in mycobacterial hypoxic persistence. Nat Commun 7:13302 De crécy-Lagard V, Boccaletto P, Mangleburg CG et al (2019) Matching tRNA modifications in humans to their known and predicted enzymes. Nucleic Acids Res 47(5):2143–2159 Dedon PC, Begley TJ (2014) A system of RNA modifications and biased codon use controls cellular stress response at the level of translation. Chem Res Toxicol 27(3):330–337 Demelenne A, Gou MJ, Nys G et al (2020) Evaluation of hydrophilic interaction liquid chromatography, capillary zone electrophoresis and drift tube ion-mobility quadrupole time of flight

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1225

mass spectrometry for the characterization of phosphodiester and phosphorothioate oligonucleotides. J Chromatogr A 1614:460716 Deng W, Babu IR, Su D et al (2015) Trm9-catalyzed tRNA modifications regulate global protein expression by codon-biased translation. PLoS Genet 11(12):e1005706 Dong M, Dedon PC (2006) Relatively small increases in the steady-state levels of nucleobase deamination products in DNA from human TK6 cells exposed to toxic levels of nitric oxide. Chem Res Toxicol 19(1):50–57 Doyle F, Leonardi A, Endres L et al (2016) Gene- and genome-based analysis of significant codon patterns in yeast, rat and mice genomes with the CUT Codon UTilization tool. Methods 107: 98–109 Durairaj A, Limbach PA (2008) Improving CMC-derivatization of pseudouridine in RNA for mass spectrometric detection. Anal Chim Acta 612(2):173–181 Dutagaci B, Nawrocki G, Goodluck J et al (2021) Charge-driven condensation of RNA and proteins suggests broad role of phase separation in cytoplasmic environments. elife 10:e64004 Endres L, Begley U, Clark R et al (2015) Alkbh8 regulates selenocysteine-protein expression to protect against reactive oxygen species damage. PLoS One 10(7):e0131335 Fuchs RT, Sun Z, Zhuang F et al (2015) Bias in ligation-based small RNA sequencing library construction is determined by adaptor and RNA structure. PLoS One 10(5):e0126049 Gingold H, Tehler D, Christoffersen NR et al (2014) A dual program for translation regulation in cellular proliferation and differentiation. Cell 158(6):1281–1292 Goodarzi H, Nguyen HCB, Zhang S et al (2016) Modulated expression of specific tRNAs drives gene expression and cancer progression. Cell 165(6):1416–1427 Gregorova P, Sipari NH, Sarin LP (2021) Broad-range RNA modification analysis of complex biological samples using rapid C18-UPLC-MS. RNA Biol 18(10):1382–1389 Grosjean H, Westhof E (2016) An integrated, structure- and energy-based view of the genetic code. Nucleic Acids Res 44(17):8020–8040 Guo C, Xie C, Chen Q et al (2018) A novel malic acid-enhanced method for the analysis of 5-hydroxymethyl-20 -deoxycytidine, 5-methylcytidine and 5-methyl-20 -deoxycytidine, 5-hydroxymethylcytidine in human urine using hydrophilic interaction liquid chromatography-tandem mass spectrometry. Anal Chim Acta 1034:110–118 Hagelskamp F, Borland K, Ramos J et al (2020) Broadly applicable oligonucleotide mass spectrometry for the analysis of RNA writers and erasers in vitro. Nucleic Acids Res 48(7):e41 Hammam E, Sinha A, Baumgarten S et al (2021) Malaria parasite stress tolerance is regulated by DNMT2-mediated tRNA cytosine methylation. MBio 12(6):e0255821 Heiss M, Borland K, Yoluc Y et al (2021) Quantification of modified nucleosides in the context of NAIL-MS. Methods Mol Biol 2298:279–306 Helm M, Giege R, Florentz C (1999) A Watson-Crick base-pair-disrupting methyl group (m1A9) is sufficient for cloverleaf folding of human mitochondrial tRNALys. Biochemistry 38(40): 13338–13346 Helm M, Schmidt-Dengler MC, Weber M et al (2021) General principles for the detection of modified nucleotides in RNA by specific reagents. Adv Biol 5(10):e2100866 Honda S, Shigematsu M, Morichika K et al (2015) Four-leaf clover qRT-PCR: a convenient method for selective quantification of mature tRNA. RNA Biol 12(5):501–508 Hu JF, Yim D, Ma D et al (2021) Quantitative mapping of the cellular small RNA landscape with AQRNA-seq. Nat Biotechnol 39(8):978–988 Huang HY, Hopper AK (2016) Multiple layers of stress-induced regulation in tRNA biology. Life 6(2):16 Huang TY, Liu J, Mcluckey SA (2010) Top-down tandem mass spectrometry of tRNA via ion trap collision-induced dissociation. J Am Soc Mass Spectrom 21(6):890–898 Huber SM, Leonardi A, Dedon PC et al (2019) The versatile roles of the tRNA epitranscriptome during cellular responses to toxic exposures and environmental stress. Toxics 7(1):17 Ingolia NT, Hussmann JA, Weissman JS (2019) Ribosome profiling: global views of translation. Cold Spring Harb Perspect Biol 11(5):a032698 Institute N C (2021) The cancer genome Atlas. https://www.cancer.gov/tcga. Accessed 20 May 2022

1226

J. Wu et al.

Jiang T, Yu N, Kim J et al (2019) Oligonucleotide sequence mapping of large therapeutic mRNAs via parallel ribonuclease digestions and LC-MS/MS. Anal Chem 91(13):8500–8506 Jora M, Borland K, Abernathy S et al (2021) Chemical amination/imination of carbonothiolated nucleosides during RNA hydrolysis. Angew Chem Int Ed Engl 60(8):3961–3966 Kaiser S, Byrne SR, Ammann G et al (2021) Strategies to avoid artifacts in mass spectrometrybased epitranscriptome analyses. Angew Chem Int Ed Engl 60(44):23885–23893 Kellner S, Burhenne J, Helm M (2010) Detection of RNA modifications. RNA Biol 7(2):237–247 Kellner S, Neumann J, Rosenkranz D et al (2014) Profiling of RNA modifications by multiplexed stable isotope labelling. Chem Commun 50(26):3516–3518 Kimura S, Dedon PC, Waldor MK (2020) Comparative tRNA sequencing and RNA mass spectrometry for surveying tRNA modifications. Nat Chem Biol 16(9):964–972 Leonardi A, Kovalchuk N, Yin L et al (2020) The epitranscriptomic writer ALKBH8 drives tolerance and protects mouse lungs from the environmental pollutant naphthalene. Epigenetics 15(10):1121–1138 Li F, Su X, Baurer S et al (2020) Multiple heart-cutting mixed-mode chromatography-reversedphase 2D-liquid chromatography method for separation and mass spectrometric characterization of synthetic oligonucleotides. J Chromatogr A 1625:461338 Li J, Smith LS, Zhu HJ (2021) Data-independent acquisition (DIA): an emerging proteomics technology for analysis of drug-metabolizing enzymes and transporters. Drug Discov Today Technol 39:49–56 Limbach PA, Paulines MJ (2017) Going global: the new era of mapping modifications in RNA. Wiley Interdiscip Rev RNA 8(1):10.1002 Lin S, Liu Q, Lelyveld VS et al (2018) Mettl1/Wdr4-mediated m(7)G tRNA methylome is required for normal mRNA translation and embryonic stem cell self-renewal and differentiation. Mol Cell 71(2):244–255 e245 Lorenz C, Lunse CE, Morl M (2017) tRNA modifications: impact on structure and thermal adaptation. Biomol Ther 7(2):35 Matuszewski M, Wojciechowski J, Miyauchi K et al (2017) A hydantoin isoform of cyclic N6-threonylcarbamoyladenosine (ct6A) is present in tRNAs. Nucleic Acids Res 45(4): 2137–2149 Mazzoni-Putman SM, Stepanova AN (2018) A plant biologist’s toolbox to study translation. Front Plant Sci 9:873 Messner K, Vuong B, Tranmer GK (2022) The boron advantage: the evolution and diversification of boron’s applications in medicinal chemistry. Pharmaceuticals 15(3):264 Nakayama H, Yamauchi Y, Nobe Y et al (2019) Method for direct mass-spectrometry-based identification of monomethylated RNA nucleoside positional isomers and its application to the analysis of leishmania rRNA. Anal Chem 91(24):15634–15643 Ng CS, Sinha A, Aniweh Y et al (2018) tRNA epitranscriptomics and biased codon are linked to proteome expression in Plasmodium falciparum. Mol Syst Biol 14(10):e8009 Orellana EA, Liu Q, Yankova E et al (2021) METTL1-mediated m(7)G modification of Arg-TCT tRNA drives oncogenic transformation. Mol Cell 81(16):3323–3338 Pang YL, Abo R, Levine SS et al (2014) Diverse cell stresses induce unique patterns of tRNA upand down-regulation: tRNA-seq for quantifying changes in tRNA copy number. Nucleic Acids Res 42(22):e170 Phizicky EM, Hopper AK (2010) tRNA biology charges to the front. Genes Dev 24(17):1832–1860 Pinkard O, Mcfarland S, Sweet T et al (2020) Quantitative tRNA-sequencing uncovers metazoan tissue-specific tRNA regulation. Nat Commun 11(1):4104 Popova AM, Williamson JR (2014) Quantitative analysis of rRNA modifications using stable isotope labeling and mass spectrometry. J Am Chem Soc 136(5):2058–2069 Pratanwanich PN, Yao F, Chen Y et al (2021) Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore. Nat Biotechnol 39(11):1394–1402 Rak R, Polonsky M, Eizenberg-Magar I et al (2021) Dynamic changes in tRNA modifications and abundance during T cell activation. Proc Natl Acad Sci U S A 118(42):e2106556118 Rapino F, Delaunay S, Rambow F et al (2018) Codon-specific translation reprogramming promotes resistance to targeted therapy. Nature 558(7711):605–609

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1227

Richter F, Plehn JE, Bessler L et al (2022) RNA marker modifications reveal the necessity for rigorous preparation protocols to avoid artifacts in epitranscriptomic analysis. Nucleic Acids Res 50(8):4201–4215 Rothenberg DA, Taliaferro JM, Huber SM et al (2018) A proteomics approach to profiling the temporal translational response to stress and growth. iScience 9:367–381 Sarin LP, Kienast SD, Leufken J et al (2018) Nano LC-MS using capillary columns enables accurate quantification of modified ribonucleosides at low femtomol levels. RNA 24(10):1403–1417 Schwartz MH, Pan T (2017) Determining the fidelity of tRNA aminoacylation via microarrays. Methods 113:27–33 Spradlin JN, Zhang E, Nomura DK (2021) Reimagining druggability using chemoproteomic platforms. Acc Chem Res 54(7):1801–1813 Su D, Chan CT, Gu C et al (2014) Quantitative analysis of ribonucleoside modifications in tRNA by HPLC-coupled mass spectrometry. Nat Protoc 9(4):828–841 Suzuki T (2021) The expanding world of tRNA modifications and their disease relevance. Nat Rev Mol Cell Biol 22(6):375–392 Taucher M, Breuker K (2012) Characterization of modified RNA by top-down mass spectrometry. Angew Chem Int Ed Engl 51(45):11289–11292 Thakur P, Estevez M, Lobue PA et al (2020) Improved RNA modification mapping of cellular non-coding RNAs using C- and U-specific RNases. Analyst 145(3):816–827 Thomas NK, Poodari VC, Jain M et al (2021) Direct Nanopore sequencing of individual full length tRNA strands. ACS Nano 15(10):16642–16653 Vilfan ID, Tsai YC, Clark TA et al (2013) Analysis of RNA base modification and structural rearrangement by single-molecule real-time detection of reverse transcription. J Nanobiotechnology 11:8 Wein S, Andrews B, Sachsenberg T et al (2020) A computational platform for high-throughput analysis of RNA sequences and modifications by mass spectrometry. Nat Commun 11(1):926 Werner S, Schmidt L, Marchand V et al (2020) Machine learning of reverse transcription signatures of variegated polymerases allows mapping and discrimination of methylated purines in limited transcriptomes. Nucleic Acids Res 48(7):3734–3746 Wetzel C, Limbach PA (2016) Mass spectrometry of modified RNAs: recent developments. Analyst 141(1):16–23 Wu J, Mcluckey SA (2004) Gas-phase fragmentation of oligonucleotide ions. Int J Mass Spectrom 237(2–3):197–241 Yoluc Y, Ammann G, Barraud P et al (2021) Instrumental analysis of RNA modifications. Crit Rev Biochem Mol Biol 56(2):178–204 Zhang LS, Liu C, Ma H et al (2019) Transcriptome-wide mapping of internal N(7)methylguanosine methylome in mammalian mRNA. Mol Cell 74(6):1304–1316 Zhang N, Shi S, Wang X et al (2020) Direct sequencing of tRNA by 2D-HELS-AA MS Seq reveals its different isoforms and dynamic base modifications. ACS Chem Biol 15(6):1464–1472

Further Reading Alon S, Vigneault F, Eminaga S et al (2011) Barcoding bias in high-throughput multiplex sequencing of miRNA. Genome Res 21(9):1506–1511 Bauer F, Hermand D (2012) A coordinated codon-dependent regulation of translation by Elongator. Cell Cycle 11(24):4524–4529 Bauer F, Matsuyama A, Candiracci J et al (2012) Translational control of cell division by Elongator. Cell Rep 1(5):424–433 Berg MD, Brandl CJ (2021) Transfer RNAs: diversity in form and function. RNA Biol 18(3): 316–339 Björk GR, Ericson JU, Gustafsson CE et al (1987) Transfer RNA modification. Annu Rev. Biochem 56:263–287

1228

J. Wu et al.

Boccaletto P, Machnicka MA, Purta E et al (2018) MODOMICS: a database of RNA modification pathways. 2017 update. Nucleic Acids Res 46(D1):D303–D307 Boccaletto P, Stefaniak F, Ray A et al (2022) MODOMICS: a database of RNA modification pathways. 2021 update. Nucleic Acids Res 50(D1):D231–D235 Boel G, Letso R, Neely H et al (2016) Codon influence on protein expression in E. coli correlates with mRNA levels. Nature 529(7586):358–363 Cao B, Chen C, Demott MS et al (2014) Genomic mapping of phosphorothioates reveals partial modification of short consensus sequences. Nat Commun 5:3951 Cao KY, Pan Y, Yan TM et al (2020) Purification, characterization and cytotoxic activities of individual tRNAs from Escherichia coli. Int J Biol Macromol 142:355–365 Chan C, Pham P, Dedon PC et al (2018) Lifestyle modifications: coordinating the tRNA epitranscriptome with codon bias to adapt translation during stress responses. Genome Biol 19(1):228 Chan CT, Dyavaiah M, Demott MS et al (2010) A quantitative systems approach reveals dynamic control of tRNA modifications during cellular stress. PLoS Genet 6(12):e1001247 Chionh YH, Ho CH, Pruksakorn D et al (2013) A multidimensional platform for the purification of non-coding RNA species. Nucleic Acids Res 41(17):e168 Cozen AE, Quartley E, Holmes AD et al (2015) ARM-seq: AlkB-facilitated RNA methylation sequencing reveals a complex landscape of modified tRNA fragments. Nat Methods 12(9): 879–884 Dai Z, Liu H, Liao J et al (2021) N(7)-Methylguanosine tRNA modification enhances oncogenic mRNA translation and promotes intrahepatic cholangiocarcinoma progression. Mol Cell 81(16):3339–3355 e3338 Damon JR, Pincus D, Ploegh HL (2015) tRNA thiolation links translation to stress responses in Saccharomyces cerevisiae. Mol Biol Cell 26(2):270–282 Dittmar KA, Goodenbour JM, Pan T (2006) Tissue-specific differences in human transfer RNA expression. PLoS Genet 2(12):e221 Dolan JW (2008) Ion pairing - blessing or curse? LCGC Europe 21(5):258–263 Donegan M, Nguyen JM, Gilar M (2022) Effect of ion-pairing reagent hydrophobicity on liquid chromatography and mass spectrometry analysis of oligonucleotides. J Chromatogr A 1666:462860 Endres L, Dedon PC, Begley TJ (2015b) Codon-biased translation can be regulated by wobble-base tRNA modification systems during cellular stress responses. RNA Biol 12(6):603–614 Erber L, Hoffmann A, Fallmann J et al (2020) LOTTE-seq (Long hairpin oligonucleotide based tRNA high-throughput sequencing): specific selection of tRNAs with 30 -CCA end for highthroughput sequencing. J RNA Biol 17(1):23–32 Esteve-Puig R, Bueno-Costa A, Esteller M (2020) Writers, readers and erasers of RNA modifications in cancer. Cancer Lett 474:127–137 Fernandez-Vazquez J, Vargas-Perez I, Sanso M et al (2013) Modification of tRNA(Lys) UUU by elongator is essential for efficient translation of stress mRNAs. PLoS Genet 9(7):e1003647 Goyon A, Zhang K (2020) Characterization of antisense oligonucleotide impurities by ion-pairing reversed-phase and anion exchange chromatography coupled to hydrophilic interaction liquid chromatography/mass spectrometry using a versatile two-dimensional liquid chromatography setup. Anal Chem 92(8):5944–5951 Grelet S, Mcshane A, Hok E et al (2017) SPOt: a novel and streamlined microarray platform for observing cellular tRNA levels. PLoS One 12(5):e0177939 Gu C, Begley TJ, Dedon PC (2014) tRNA modifications regulate translation during cellular stress. FEBS Lett 588(23):4287–4296 Hafner M, Renwick N, Brown M et al (2011) RNA-ligase-dependent biases in miRNA representation in deep-sequenced small RNA cDNA libraries. RNA 17(9):1697–1712 Hanson G, Coller J (2018) Codon optimality, bias and usage in translation and mRNA decay. Nat Rev. Mol Cell Biol 19(1):20–30

37

Tools for Understanding the Chemical Biology of the tRNA Epitranscriptome

1229

Helm M, Motorin Y (2017) Detecting RNA modifications in the epitranscriptome: predict and validate. Nat Rev. Genet 18(5):275–291 Hia F, Chionh YH, Pang YL et al (2015) Mycobacterial RNA isolation optimized for non-coding RNA: high fidelity isolation of 5S rRNA from Mycobacterium bovis BCG reveals novel posttranscriptional processing and a complete spectrum of modified ribonucleosides. Nucleic Acids Res 43(5):e32 Holley RW, Apgar J, Everett GA et al (1965) Structure of a ribonucleic acid. Science 147(3664): 1462–1465 Ivanov P, Kedersha N, Anderson P (2019) Stress granules and processing bodies in translational control. Cold Spring Harb Perspect Biol 11(5):a032813 Jaroensuk J, Atichartpongkul S, Chionh YH et al (2016) Methylation at position 32 of tRNA catalyzed by TrmJ alters oxidative stress response in Pseudomonas aeruginosa. Nucleic Acids Res 44(22):10834–10848 Jia G, Fu Y, Zhao X et al (2011) N6-methyladenosine in nuclear RNA is a major substrate of the obesity-associated FTO. Nat Chem Biol 7(12):885–887 Khalique A, Mattijssen S, Maraia RJ (2022) A versatile tRNA modification-sensitive northern blot method with enhanced performance. RNA 28(3):418–432 Kowalak JA, Pomerantz SC, Crain PF et al (1993) A novel method for the determination of posttranscriptional modification in RNA by mass spectrometry. Nucleic Acids Res 21(19): 4577–4585 Ladang A, Rapino F, Heukamp LC et al (2015) Elp3 drives Wnt-dependent tumor initiation and regeneration in the intestine. J Exp Med 212(12):2057–2075 Lajin B, Goessler W (2020) Fluoroalkylamines: novel, highly volatile, fast-equilibrating, and electrospray ionization-mass spectrometry signal-enhancing cationic ion-interaction reagents. Anal Chem 92(14):10121–10128 Lechner A, Wolff P, Leize-Wagner E et al (2020) Characterization of post-transcriptional RNA modifications by sheathless capillary electrophoresis-high resolution mass spectrometry. Anal Chem 92(10):7363–7370 Li F, Lammerhofer M (2021) Impurity profiling of siRNA by two-dimensional liquid chromatography-mass spectrometry with quinine carbamate anion-exchanger and ion-pair reversed-phase chromatography. J Chromatogr A 1643:462065 Linsen SE, De Wit E, Janssens G et al (2009) Limitations and possibilities of small RNA digital gene expression profiling. Nat Methods 6(7):474–476 Liu F, Clark W, Luo G et al (2016) ALKBH1-mediated tRNA demethylation regulates translation. Cell 167(7):1897 Lobue PA, Jora M, Addepalli B et al (2019) Oligonucleotide analysis by hydrophilic interaction liquid chromatography-mass spectrometry in the absence of ion-pair reagents. J Chromatogr A 1595:39–48 Loos G, Van Schepdael A, Cabooter D (2016) Quantitative mass spectrometry methods for pharmaceutical analysis. Philos Trans A Math Phys Eng Sci 374(2079):20150366 Machnicka MA, Olchowik A, Grosjean H et al (2014) Distribution and frequencies of posttranscriptional modifications in tRNAs. RNA Biol 11(12):1619–1629 Marchand V, Ayadi L, Ernst FGM et al (2018) AlkAniline-Seq: profiling of m(7)G and m(3)C RNA modifications at single nucleotide resolution. Angew Chem Int Ed Engl 57(51):16785–16790 Maslov DL, Trifonova OP, Balashova EE et al (2019) n-Butylamine for improving the efficiency of untargeted mass spectrometry analysis of plasma metabolite composition. Int J Mol Sci 20(23):5957 Megel C, Morelle G, Lalande S et al (2015) Surveillance and cleavage of eukaryotic tRNAs. Int J Mol Sci 16(1):1873–1893 Miyauchi K, Ohara T, Suzuki T (2007) Automated parallel isolation of multiple species of non-coding RNAs by the reciprocal circulating chromatography method. Nucleic Acids Res 35(4):e24

1230

J. Wu et al.

Motorin Y, Helm M (2019) Methods for RNA modification mapping using deep sequencing: established and new emerging technologies. Genes 10(1):35 Motorin Y, Muller S, Behm-Ansmant I et al (2007) Identification of modified residues in RNAs by reverse transcription-based methods. Methods Enzymol 425:21–53 Nagaraja S, Cai MW, Sun J et al (2021) Queuine is a nutritional regulator of entamoeba histolytica response to oxidative stress and a virulence attenuator. MBio 12(2):e03549–e03520 Nakayama H, Akiyama M, Taoka M et al (2009) Ariadne: a database search engine for identification and chemical analysis of RNA using tandem mass spectrometry data. Nucleic Acids Res 37(6):e47 Nyakas A, Blum LC, Stucki SR et al (2013) OMA and OPA – software-supported mass spectra analysis of native and modified nucleic acids. J Am Soc Mass Spectrom 24(2):249–256 Patil A, Dyavaiah M, Joseph F et al (2012) Increased tRNA modification and gene-specific codon usage regulate cell cycle progression during the DNA damage response. Cell Cycle 11(19): 3656–3665 Quax TE, Claassens NJ, Soll D et al (2015) Codon bias as a means to fine-tune gene expression. Mol Cell 59(2):149–161 Rost HL, Sachsenberg T, Aiche S et al (2016) OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods 13(9):741–748 Rozenski J, Mccloskey JA (2002) SOS: a simple interactive program for ab initio oligonucleotide sequencing by mass spectrometry. J Am Soc Mass Spectrom 13(3):200–203 Sarkar A, Gasperi W, Begley U et al (2021) Detecting the epitranscriptome. Wiley Interdiscip Rev. RNA 12(6):e1663 Scannell JP, Crestfield AM, Allen FW (1959) Methylation studies on various uracil derivatives and on an isomer of uridine isolated from ribonucleic acids. Biochim Biophys Acta 32:406–412 Sharp PM, Li WH (1987) The codon Adaptation Index – a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15(3):1281–1295 Shigematsu M, Honda S, Loher P et al (2017) YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs. Nucleic Acids Res 45(9):e70 Smith AM, Jain M, Mulroney L et al (2019) Reading canonical and modified nucleobases in 16S ribosomal RNA using nanopore native RNA sequencing. PLoS One 14(5):e0216709 Solivio B, Yu N, Addepalli B et al (2018) Improving RNA modification mapping sequence coverage by LC-MS through a nonspecific RNase U2-E49A mutant. Anal Chim Acta 1036: 73–79 Streit S, Michalski CW, Erkan M et al (2009) Northern blot analysis for detection and quantification of RNA in pancreatic cancer cells and tissues. Nat Protoc 4(1):37–43 Taliaferro JM, Wang ET, Burge CB (2014) Genomic analysis of RNA localization. RNA Biol 11(8): 1040–1050 Tavares JF, Davis NK, Poim A et al (2021) tRNA-modifying enzyme mutations induce codonspecific mistranslation and protein aggregation in yeast. RNA Biol 18(4):563–575 Torres AG (2019) Enjoy the silence: nearly half of human tRNA genes are silent. Bioinf Biol Insights 13:1177932219868454 Torres AG, Reina O, Stephan-Otto Attolini C et al (2019) Differential expression of human tRNA genes drives the abundance of tRNA-derived fragments. Proc Natl Acad Sci U S A 116(17): 8451–8456 Yu N, Lobue PA, Cao X et al (2017) RNAModMapper: RNA modification mapping software for analysis of liquid chromatography tandem mass spectrometry data. Anal Chem 89(20): 10744–10752

Sulfur- and Selenium-Modified Bacterial tRNAs

38

B. Nawrot, M. Sierant, and P. Szczupak

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sulfur- and Selenium-Containing Nucleosides in the Wobble Position of the Bacterial tRNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sulfur-Containing Nucleosides in the tRNA Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S-geranyl- and Selenonucleosides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Modification Pathways of Thio-, S-geranyl, and Selenonucleosides . . . . . . . . . . . . . . . . . Escherichia coli tRNA 2-Selenouridine Synthase (SelU), the Enzyme Modifying the R5-Substituted 2-Thiouridines in the Anticodon of Bacterial tRNAs . . . . . . . . . . . . . . . . . . . . . . . . . Structure of SelU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SelU Is a tRNA-Bound Nucleoprotein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Substrate Specificity of SelU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Readout of 50 -NNA-30 and 50 -NNG-30 Synonymous mRNA Codons by Sulfur- and Selenium-Modified tRNA Anticodons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synonymous Codons Specific for Lys, Glu, and Gln . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . U-A and U-G Base Pairing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tautomeric Forms of Modified Uridines and Their Base Pairs with Guanosine . . . . . . . . . . Theoretical Modeling of U-G Base Pairs with mnm5S2Ura and mnm5Se2Ura . . . . . . . . . . Crystal Structures of U*-G Base Pairs in tRNA-mRNA at the Ribosome Context . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1232 1235 1235 1237 1238 1241 1242 1243 1244 1248 1248 1249 1250 1253 1256 1259 1260

Abstract

Transfer RNAs (tRNAs) are universal components of cells found in organisms from three domains of life. An important function of tRNAs in the cell is their involvement in the accurate translation of messenger RNA (mRNA) codons and the delivery of the corresponding amino acids to the ribosomal machinery during B. Nawrot (*) · M. Sierant · P. Szczupak Department of Bioorganic Chemistry, Centre of Molecular and Macromolecular Studies, Polish Academy of Sciences, Lodz, Poland e-mail: [email protected]; [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_43

1231

1232

B. Nawrot et al.

protein synthesis. Therefore, tRNAs are key players in the dynamic process of regulating cellular gene expression. This review focuses primarily on the specific epitranscriptome of bacterial tRNAs containing 2-thio- and 2-selenouridines in wobble position and describes recent achievements in understanding their biosynthetic pathways and their function in correct recognition of synonymous codons in mRNA. Keywords

Sulfur · Selenium · Transfer ribonucleic acid · Post-transcriptional nucleoside modifications · 2-thiouridine · S-geranyl-2-thiouridine · 2-selenouridine

Introduction Since the discovery of pseudouridine, the “fifth nucleoside,” in the mid-1950s of the twentieth century (Cohn 1957; Davis and Allen 1957), the number of discovered nucleoside modifications has steadily increased. According to the list available in RNA modification databases (Modomics, RNAMDB) (Boccaletto et al. 2022; Cantara et al. 2011), the total number of ribonucleoside modifications currently known is ~150. Some of them are widely distributed in all domains of life, while others are detected only in certain species, resembling an “organism-specific code” (McCown et al. 2020), Fig. 1. Nucleoside modifications can be relatively simple,

Fig. 1 Euler diagrams showing the currently known phylogenetic distribution of ribonucleoside modifications in tRNA, rRNA, mRNA, and ncRNA classes. Figure created based on McCown et al. 2020 and Modomics database http://genesilico.pl/modomics/ (Boccaletto et al. 2022)

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1233

such as methylation, hydroxylation, pseudouridylation, dihydrouridylation, or thiolation; or more complex, such as ring closure, glycosylation, acylation, or aminoacylation; or unusual, such as the incorporation of a geranyl group or selenium. Some RNA modifications are constant and conserved, such as pseudouridine (Ψ), dihydrouridine (D), N1-methyladenosine (m1A), and N6-methyladenosine (m6A), while others are dynamic and reversible, occurring as a rapid response to environmental changes (Schauerte et al. 2021; Sierant et al. 2018; Björk and Hagervall 2014). All types of cellular RNAs, including messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA), or other non-coding RNAs (ncRNAs), are posttranscriptionally modified and form the so-called “epitranscriptome” (McCown et al. 2020), but transfer ribonucleic acids are the most modified, considering the number (~100) and diversity of modifications. Bacterial tRNA nucleosides are modified up to 10% per molecule, while eukaryotic ones are modified up to 25% (Björk and Hagervall 2014). TRNA is one of the best characterized classes of noncoding RNAs found in all living organisms. Originally transcribed as a longer precursor in the nucleus, to become fully functional, the pre-tRNA undergoes a well-defined maturation process involving the following major steps: the removal of the 50 -leader and the 30 -trailer sequences from the pre-tRNA transcript, the removal of the intronic fragments, the addition of the 30 -CCA amino acid acceptor fragment, and the incorporation of the numerous post-transcriptional chemical modifications. The four canonical RNA bases A, C, G, and U are modified in different ways by specific enzymes (de Crécy-Lagard and Jaroch 2021). Nucleoside modifications are present throughout the tRNA molecule and their functions can be predicted based on their position within the tRNA structure (Cochella and Green 2005; Agris et al. 2018), Fig. 2. Modifications in the tRNA “core” region, including the acceptor stem, D-loop, and T-loop (e.g., S4U, D, Ψ, Gm, m5S2U) are involved in structural stabilization and serve as identification elements in RNA-protein recognition. They interact with a variety of partners in translation, such as aminoacyl-tRNA ligases, elongation factors, ribosomal proteins, and directly with rRNA (Björk and Hagervall 2014; Edwards et al. 2020). Modified nucleosides in the anticodon loop influence translation fidelity and efficiency by maintaining the reading frame and ensuring correct codon : anticodon base pairing. Positions 34 and 37 in the anticodon loop have the greatest diversity of modifications among all RNAs, supporting the idea that modified nucleosides at both positions play critical roles in tRNA functions. Post-transcriptional modifications in tRNA can be divided into two groups based on their complexity. The first group of modifications is introduced by a single enzymatic reaction, such as methylation, and is found at numerous positions in the tRNA core region. The second group includes complex modifications whose synthesis requires the sequential activity of multiple enzymes. These hypermodifications are mainly found in the anticodon loop, where they maintain its structure as a prerequisite for efficient translation (Cochella and Green 2005; Agris et al. 2018). The anticodon stem-loop (ASL) is the most frequently modified part of the tRNA molecule, particularly the nucleosides at the wobble position (also referred

1234

B. Nawrot et al.

Fig. 2 The set of possible tRNA nucleoside modifications in the Escherichia coli model strain K12

to as the first position of the anticodon or 34th position in the tRNA chain), which interact with the third base of the codon in the mRNA. The second frequently modified position in the ASL is the 37th position, which is located immediately after the last base of the anticodon, Fig. 2. Mature tRNAs adopt a “cloverleaf” secondary structure that includes five characteristic structural elements: the acceptor stem, the D-stem loop (D-arm), the anticodon stem loop, ASL (ASL-arm), the variable loop, and the T-stem loop (T-arm). In addition, tRNA adopts complex three-dimensional folds to precisely present chemical constituents that are essential for its function as a translator of genetic information (Batey et al. 1999; Shepherd and Ibba 2015). The elements of the tertiary tRNA structure are primarily concerned with the interaction between different elements of the secondary structure. The L-shaped tertiary structure is folded by coaxial stacking and interactions between conserved or semiconserved nucleotides, mainly in the D arm and T arm, between G18 and Ψ55, between G19 and C56, U8 and A14, and U9 and A23 in the D arm. In this review, we focus on the characterization of sulfur-, S-geranyl-, and selenium-modified nucleosides present in a wobble position of bacterial tRNAs,

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1235

their biosynthesis, function, influence on tRNA structure, and proper interaction with 50 -NNA-30 and 50 -NNG-30 synonymous codons in mRNA during the translation process.

Sulfur- and Selenium-Containing Nucleosides in the Wobble Position of the Bacterial tRNAs Sulfur-Containing Nucleosides in the tRNA Chain Bacterial tRNAs contain thionucleosides in five poisitions: 4-thiouridine at position 8 (S4U8), 2-thiocytidine at position 32 (S2C32), 5-methylaminomethyl-2-thiouridine or 5-carboxymethylaminomethyl-2-thiouridine at position 34 (mnm5S2U34, cmnm5S2U34, respectively), and 2-methylthio-N6-isopentenyladenosine, 2methylthio-N6-hydroxyisopentenyladenosine, or 2-methylthio-N6-threonyl-carbamoyladenosine at position 37 (mS2i6A37, mS2io6A37, mS2t6A37). In addition, some thermophilic bacteria have a 5-methyl-2-thiouridine (m5S2U54) at position 54, Fig. 3. Thionucleosides are also found in the tRNA of organisms from other domains of life, e.g., archaea (S4U8, S4U9, S2C32, R5S2U34, m5S2t6A37, m5S2U54) and eukaryotes (R5S2U34, mS2t6A37, mS2io6A37). The S4U8 was detected in the virus Enterobacteria phage T4, suggesting that this modification also occurs in viruses (McClain and Foss 1984). The sulfur-modified tRNA nucleosides serve different functions, and their roles differ depending on their position in the tRNA. The 4-thiouridine, present at a root of the acceptor stem stabilizes the conformation of the tRNA and prevents its degradation/damage in the cell (Kimura and Waldor 2019). In addition, S4U8 absorbs near ultraviolet light (~340 nm) and acts as a UV sensor to block translation through intramolecular cross-linking with cytidine at position 13 (Carré et al. 1974). The thio-modifications located in or around the anticodon are important for accurate codon recognition. The 2-thiocytidine at position 32 of the anticodon loop impairs translation efficiency by recognizing rare codons (e.g., AGG, AGA for Arg) that are normally inefficiently decoded by tRNAs that recognize common codons (Carré et al. 1974). The R5-substituted 2-thiouridines at position 34 of tRNAGlu, tRNALys, and tRNAGln are universal thio- modifications found in all three domains of life. Depending on the organism and subcellular localization, the R5 substituent may be hypermodified to methylaminomethyl- (mnm), carboxymethylaminomethyl(cmnm) (Fig. 4a) in bacteria and archaea, methylcarboxymethyl- (mcm) in the eukaryotic cytosol, or taurinomethyl- (τm-) in mammalian mitochondria. Several functions have been proposed for the R5S2U34 modification. A detailed explanation of the function of R5S2U34 in codon-anticodon recognition is discussed in the next sections of this review. The sulfur atom at position C2 of the uracil ring is the identity-promoting element in aminoacylation reactions (Madore et al. 1999) and enhances translation efficiency at the ribosome by increasing the binding affinity of aminoacylated tRNAs to the A site of the ribosome, as well as the GTP hydrolysis

1236

B. Nawrot et al.

Fig. 3 Thionucleosides found in bacterial tRNA (a) Positions on the tRNA molecule where thionucleosides occur, (b) Structural formulas of known thionucleosides. Abbreviations: S4U 4-thiouridine, S2C 2-thiocytidine, mnm5S2U 5-methylaminomethyl-2-thiouridine, cmnm5S2U 5carboxymethylaminomethyl-2-thiouridine, mnm5geS2U 5-methylaminomethyl-S-geranyl-2thiouridine, cmnm5geS2U 5-carboxymethylaminomethyl-S-geranyl-2-thiouridine, mnm5Se2U 5methylaminomethyl-2-selenoouridine, cmnm5Se2U 5-carboxymethylaminomethyl-2selenoouridine, mS2i6A 2-methylthio-N6-isopentenyladenosine, mS2io6A 2-methylthio-N6hydroxyisopentenyladenosine, mS2t6A 2-methylthio-N6-threonylcarbamoyl-adenosine, m5S2U 5-methyl-2-thiouridine (2-thioribothymidine)

rate (Rodriguez-Hernandez et al. 2013). In addition, the R5S2U34 modification preserves translation fidelity by preventing +1 and þ2 ribosome frameshifting (Björk et al. 1999; Urbonavicius et al. 2003; Brégeon et al. 2001). The thiomodification of adenosine at position 37, adjacent to the anticodon, is present in tRNAs that decode the UNN codons. In bacteria, there are three types of A37 hypermodification, 2-methylthio-N6-isopentenyladenosine (mS2i6A37) or 2-methylthio-N6-hydroxyisopentenyladenosine (mS2io6A37) and 2-methylthioN6-threonylcarbamoyl-adenosine (mS2t6A37) (Björk and Hagervall 2014; Arragain et al. 2010), Fig. 3b. The thio-modifications of A37 are important for translation fidelity and efficiency by stabilizing A-U base pair at the first codon position and preventing a þ1 frameshift (Gustilo et al. 2008). In bacteria, mS2i6A37 deficiency resulted in a reduction in the elongation rate of the polypeptide chain, leading to a reduced growth rate and a pleiotropic phenotype (Björk and Hagervall 2014). The 5-methyl-2-thiouridine at position 54 is essential for the growth of thermophilic microorganisms at high temperatures. The presence of m5S2U54 at the

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1237

Fig. 4 Modified uridines at position 34 (U34) of bacterial tRNAs specific for lysine and glutamate: 5-methylaminomethyl-2-thiouridine (mnm5S2U) or 5-aminomethyl-2-thiouridine (nm5S2U), 5-methylaminomethyl-S-geranyl-2-thiouridine (mnm5geS2U) or 5-aminomethyl-S-geranyl-2thiouridine (nm5geS2U) and 5-methylaminomethyl-2-selenouridine (mnm5Se2U) or 5-aminomethyl-2-selenouridine (nm5Se2U). In addition, the modified U34 specific for glutamine: 5-carboxymethylaminomethyl-2-thiouridine (cmnm5S2U), 5-carboxymethylaminomethyl-Sgeranyl-2-thiouridine (cmnm5geS2U) and 5-carboxymethylaminomethyl-2-selenouridine (cmnm5Se2U). Figure created based on Szczupak et al. 2022a

T-loop stabilizes the double-stranded core structure of the elbow region formed between a D-loop and a T-loop. This interaction improves the thermal stability of the entire tRNA molecule (Horie et al. 1985).

S-geranyl- and Selenonucleosides So far, selenonucleosides have been found only in the form of derivatives of uridine at position 34 in tRNAs specific for glutamate, glutamine, and lysine (tRNAGlu, tRNAGln, tRNALys). Four selenouridines have been identified in bacterial tRNAs: 2-selenouridine (Se2U), 5-methylaminomethyl-2-selenouridine (mnm5Se2U), 5-carboxymethylaminomethyl-2-selenouridine (cmnm5Se2U), and 5-aminomethyl2-selenouridine (nm5Se2U) (Boccaletto et al. 2022; Cantara et al. 2011), Fig. 4c. In addition, selenium-containing tRNAs were found in mammalian cells, mouse

1238

B. Nawrot et al.

leukemia cells (Ching 1984), bovine liver cells (Mizutani et al. 1999), in archaea Methanococcus vannielii (Politino et al. 1990), and in plants, germinating barley (Huang et al. 2001), algae, wild carrot, tobacco, bamboo, rice, mung bean, and soybean seedlings (Chen et al. 1985). However, despite these reports on the presence of selenonucleosides in eukaryotes, research has not continued and the exact positions of selenium atoms in eukaryotic tRNAs have not yet been described. The S-geranyl modification was discovered and first described in 2012 by Dumelin et al. as a new unusual hydrophobic group conjugated to the sulfur atom of 2-thiouridine (Dumelin et al. 2012). The S-geranyl group occurred in bacterial tRNAs (tRNAGlu, tRNALys, tRNAGln) from Escherichia coli, Enterobacter aerogenes, Pseudomonas aeruginosa, and Salmonella typhimurium in the first (wobble) position of the anticodon (Fig. 4b) with a frequency of up to 6.7%. Originally, geranylation of tRNA (S2U-tRNA!geS2U-tRNA) was thought to be an alternative to the selenation process (S2U-tRNA!Se2U-tRNA), which occurs at low selenium concentrations. Recently, we published evidence that S-geranyl-tRNA is an intermediate formed during the conversion of S2U-tRNA to Se2U-tRNA in the two-step linear reaction (S2U-tRNA!geS2U-tRNA!Se2U-tRNA) catalyzed by tRNA 2-selenouridine synthase (SelU) (Sierant et al. 2018; Szczupak et al. 2022a).

The Modification Pathways of Thio-, S-geranyl, and Selenonucleosides Biosynthesis of 2-Thiouridines Biosynthesis of sulfur-containing nucleosides begins with activation of sulfur in free L-cysteine by cysteine desulfurase (Nilsson et al. 2002), the IscS enzyme in bacteria. IscS cysteine desulfurase is a pyridoxal-50 -phosphate (PLP)-dependent enzyme that catalyzes the conversion of L-cysteine to L-alanine and sulfane sulfur via the formation of a protein-bound cysteine persulfide intermediate at a conserved cysteine residue (Black and Santos 2015). Subsequently, the enzyme-bound persulfide is transferred to sulfur carrier proteins, and finally the sulfur-modified base is biosynthesized by appropriate sulfur transferases. The ability of IscS to interact with several different acceptor proteins is due to the conformational plasticity of a long loop containing the catalytic Cys (Shi et al. 2010). In addition to tRNA thiolation, persulfide at cysteine desulfurase is the sulfur donor in the biosynthesis of [Fe-S] clusters and many sulfurcontaining vitamins (Black and Santos 2015). The tRNA thiolation enzymes are either dependent or independent on [Fe-S] clusters. Because of this dependence, the next steps of thionucleoside biosynthesis are divided into two pathways. For example, the synthesis of S4U8 in bacteria (IscS, Thil) is independent of [Fe-S] clusters, but in the methanogenic archaeon Methanococcus maripaludis it is [Fe-S]-dependent. Similarly, the synthesis of the mcm5S2U modification, which is in the wobble position of tRNAGlu, tRNAGln, and tRNALys in eukaryotes (Nfs1, Tum1- RLD, Urm1, Uba4RLD, Ncs2/Ncs6) is [Fe-S]-dependent, but the synthesis of the related (c)mnm5S2U modifications in bacteria (IscS, TusA, TusBCD, TusE, MnmA) is [Fe-S]-independent (Shigi 2014), etc.

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1239

Biosynthesis of S2U34 in E. coli is independent of [Fe-S] clusters. Persulfide from cysteine desulfurase IscS is transferred to the MnmA enzyme (tRNA uridine-2sulfur transferase) in a persulfide-based manner via several intermediate sulfur carriers (Tus A, TusBCD complex and Tus E), Fig. 5a. TusA interacts with IscS to yield the sulfane sulfur, which is transferred via TusD (in a complex with TusB and TusC) and TusE to the Cys199 in the active site of MnmA sulfurtransferase. MnmA is a member of the ATP pyrophosphatase family and has the PP-loop as its signature motif. This enzyme recognizes nucleotides U34 and U35 in the anticodon of tRNALys, tRNAGlu and tRNAGln. ATP bound to the PP-loop is consumed to activate the C2 atom of U34 by O2-adenylation, resulting in the adenylated intermediate. The first catalytic Cys199 takes up the sulfur and generates a persulfide enzyme adduct. Then, the second catalytic Cys102 forms a disulfide bond with Cys199, which facilitates the release of sulfur from the MnmA persulfide, which is finally introduced into the activated U34 to form S2U34, Fig. 5a. The biosynthesis of the R5 side chain (nm-, mnm-, or cmnm-) and the thiolation of uracyl at the C2 position described above proceed independently, i.e., the absence of one modification does not affect the synthesis of the other (Armengod et al. 2014; Jäger et al. 2016). The MnmEG pathway initiates the modification of U34 at position C5 through the action of the MnmEG complex, which consists of the multidomain proteins MnmE and MnmG. MnmE is a GTP- and tetrahydrofolate (THF)-binding protein, whereas MnmG is a FAD- and NADH-binding protein, Fig. 5b. MnmE and MnmG form the complex, in which both proteins function interdependently. The complex catalyzes the addition of the aminomethyl- (nm) and carboxymethylaminomethyl- (cmnm) groups at position C5 of U34 using ammonium and glycine, respectively. Both reactions require GTP, FAD, and the THF derivative, such as methylene-THF, which serves as a donor for the methylene carbon directly attached to the C5 atom. In the next step of R5 side chain synthesis, the wobble uridine can be further modified by the bifunctional enzyme MnmC after exposure to the MnmEG protein complex, which converts the MnmEG products (nm5U and cmnm5U) into mnm5U. The C-terminal domain of MnmC is a FAD-dependent oxidoreductase that catalyzes the deacetylation of cmnm5U to generate nm5U, while the N-terminal domain of MnmC is a SAM-dependent methylase that converts nm5U to mnm5U (Armengod et al. 2014), Fig. 5b.

Biosynthesis of S-Geranyl- and 2-Selenonucleosides The biosynthesis of 2-selenouridines in bacterial tRNAs was originally considered to be similar to 2-thiouridine biosynthesis, i.e., the deselenation of L-selenocysteine by the selenium-specific analogue of cysteine desulfurase (lscS) (Mihara et al. 2002), followed by incorporation of selenium into uridine by the action of a 2-selenouridinespecific synthetase, which was thought to be analogous to 2-thiouridine synthetase (MnmA), (Shigi 2014). Mihara et al. reported that the bacterial strain lacking IscS was unable to synthesize 5-methylaminomethyl-2-selenouridine and its precursor 5-methylaminomethyl-2-thiouridine (mnm5S2U) in tRNA. Current knowledge suggests that bacteria synthesize selenium-modified tRNAs using selenophosphate as the active selenium donor, which is synthesized by

1240

B. Nawrot et al.

Fig. 5 Synthesis of the nucleoside R5S2U in the wobble position of tRNAGlu, tRNAGln, and tRNALys in E. coli. (a) Thiolation biosynthetic pathway of U34, (b) synthesis of the R5 substituent. Thiolation at

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1241

selenophosphate synthetase (SelD) and the corresponding 2-thio precursors, in a reaction catalyzed by tRNA 2-selenouridine synthase (SelU) (Wolfe et al. 2004; Jäger et al. 2016). In numerous bacteria (Escherichia coli, Enterobacter aerogenes, Pseudomonas aeruginosa, and Salmonella enterica var. Typhimurium), S-geranylated derivatives of 2-thiouridine (mnm5geS2U and cmnm5geS2U) were found in addition to selenium-modified uridine, although in relatively low abundance (up to 6.7%) (Dumelin et al. 2012). The tRNA 2-selenouridine synthase SelU, the enzyme responsible for the replacement of S ! Se, catalyzes also the Sgeranylation of 2-thiouridine in tRNAs (Dumelin et al. 2012). Contrary to previous assumptions, S-geranylated tRNA has been shown to function primarily as the intermediate in S2U-tRNA!Se2U-tRNA transformation (Sierant et al. 2018; Szczupak et al. 2022b; Bartos et al. 2016) and did not serve as the amino acid carrier on the ribosome or bound to bacterial cell membranes (Dumelin et al. 2012; Jäger et al. 2016; Chen et al. 2005) (Fig. 6).

Escherichia coli tRNA 2-Selenouridine Synthase (SelU), the Enzyme Modifying the R5-Substituted 2-Thiouridines in the Anticodon of Bacterial tRNAs The bacterial enzyme tRNA 2-selenouridine synthase (SelU) belongs to the group of tRNA-modifying enzymes involved in the post-transcriptional modification of 2-thiouridine in the wobble position of the anticodon in bacterial tRNAs specific for Lys, Glu, and Gln (tRNALys, tRNAGlu, and tRNAGln). SelU catalyzes the conversion of 5-substituted 2-thiouridines (R5S2Us) to 5-substituted 2-selenouridines (R5Se2Us). The enzymatic activity of an extract from Salmonella enterica var. typhimurium catalyzing the conversion of mnm5S2U to mnm5Se2U was first described by Veres et al. in 1992 (Veres et al. 1992). Two years later, Veres and Stadtman isolated and purified the enzyme responsible for this conversion and introduced the name tRNA 2-selenouridine synthase (SelU) (Veres and Stadtman 1994). In 2012, Dumelin et al. demonstrated that SelU is also responsible for the introduction of the geranyl group into two types of R5-substituted thio-nucleosides (mnm5S2U and cmnm5S2U) in bacterial tRNAs (Dumelin et al. 2012). Initially, the SelU enzyme was considered to be the biocatalyst of two independent modification pathways, direct R5S2U-tRNA selenation (R5S2U ! R5Se2U) ä Fig. 5 (continued) the C2 position of U34 is catalyzed by MnmA, whereas the MnmEG protein complex acts at the C5 position of U34. In vitro, the MnmEG complex uses glycine and ammonium to incorporate cmnm- and nm-, respectively, at the C5 position of U34 into the tRNA substrates. SAM-dependent enzyme TrmL methylates the 20 -OH group of U-ribose in tRNALeu (cmnm5Um). MnmEG- and MnmA-catalyzed modifications occur independently; thus, thiolation may precede or follow synthesis of the side chain at position 5. The FAD-dependent activity of MnmC(o) domain of MnmC converts cmnm5S2U to nm5S2U, whereas the activity of the MnmC(m), the second domain of MnmC, converts nm5S2U to mnm5S2U, using SAM as the methyl donor

1242

B. Nawrot et al.

Fig. 6 The proposed cellular pathway of conversion of R5S2U-tRNA to R5Se2U-tRNA with R5geS2U-tRNA as the intermediate in the two-step linear reaction. Abbreviations: GePP – geranyl diphosphate, SePO33 selenophosphate, R ¼ mnm or cmnm. Figure created based on Sierant et al. 2018

and S-geranylation of R5S2U-tRNA (R5S2U ! R5geS2U). For several years, selenation of R5S2U-tRNA and geranylation were thought to occur independently and in parallel. In 2018, using 17-mer model ASL RNAs, Sierant et al. demonstrated that the conversion of 2-thiouridine RNA to 2-selenouridine RNA occurs via the S-geranyl-S2U-RNA intermediate, corresponding to the two subsequent reactions: S2U-RNA ! geS2U-RNA ! Se2U-RNA (Sierant et al. 2018). The concept that the same reaction occurs in natural tRNA has recently been definitively confirmed (Szczupak et al. 2022a).

Structure of SelU The tRNA 2-selenouridine synthase (SelU, MnmH, YbbB) is a 41.1-kDa protein containing a 364-amino acid chain, divided into two structural domains: N-terminal domain with rhodanese homology, with -Cys-X-X-Cys- active site and C-terminal P-loop domain containing the Walker A motif and an isoleucine-tRNA synthetase (IleS) like helical region (Wolfe et al. 2004), Fig. 7a. The P-loop motif is found in proteins that bind ATP or GTP (Walker et al. 1982; Koonin 1993). The intact Walker motif is required for the geranylation activity of the enzyme, implying that it is the binding site for geranyl pyrophosphate (GePP), which is the donor molecule in the geranyl transfer reaction in vitro (Jäger et al. 2016; Szczupak et al. 2022b). To date, the crystal structure of the SelU protein has not been determined. The putative 3D structure of the SelU protein was recently predicted based on its amino acid sequence in the AlphaFold v2.0 system. This structure can be analyzed on the AlphaFold Protein Structure Database website (https://alphafold.ebi.ac.uk/entry/ P33667) (Jumper et al. 2021; Varadi et al. 2022), Fig. 7b.

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1243

Fig. 7 The structure of SelU. (a) The organization of domains in the SelU polipeptide chain. The N-terminal rhodanese homology domain with the active site -Cys-X-X-Cys- and the C-terminal P-loop domain with the IleS-like helical region (b) the putative structure of SelU predicted from its amino acid sequence in AlphaFold v2.0, https://alphafold.ebi.ac.uk/entry/P33667

SelU Is a tRNA-Bound Nucleoprotein SelU synthase is a nucleoprotein containing a tightly bound tRNAs. In recent studies, Szczupak et al. determined the SelU-bound tRNA fraction and estimated it to be at the level one tRNA molecule bound to one protein molecule (Szczupak et al. 2022a). Therefore, the purified protein has the unusual absorption spectrum with a maximum at 260 nm, as in nucleic acids, and no peak at 280 nm, which is characteristic of proteins. The three types of tRNAs are associated with the SelU protein tRNAGlu, tRNALys, and tRNAGln. Recently, Szczupak et al. published the composition of the tRNA fraction isolated from SelU and analyzed by UPLC-PDAESI-()-MS. The pool of identified full-length tRNAs mainly included mnm5geS2U-tRNALys, nm5geS2U-tRNALys, geS2U-tRNALys, mnm5geS2UtRNAGlu, nm5geS2U-tRNAGlu, and cmnm5geS2U-tRNAGln (Szczupak et al. 2022a), Table 1. Detailed analysis of the composition of the modification occurring in the wobble position of the anticodon was performed after nucleolytic hydrolysis of the SelU-bound tRNA fraction. UPLC-PDA-ESI-()-HRMS analysis showed that mainly geranylated nucleosides as: mnm5geS2U, cmnm5geS2U, nm5geS2U, and geS2U derivatives are present with the small amount of R5-substituted unmodified uridines, including mnm5U, cmnm5U, and nm5U (Szczupak et al. 2022a), Fig. 8.

1244

B. Nawrot et al.

Table 1 UPLC-PDA-ESI()-MS based identification of the full-length tRNAs associated with SelU protein Name of identified tRNA mnm5geS2U tRNALys, b nm5geS2U tRNAGlu1, c nm5geS2U tRNAGlu2, d mnm5geS2U tRNAGlu nm5geS2U tRNAGln cmnm5geS2U tRNAGln, e

Formula C744H939N285O541P76S C733H924N288O532P76S C733H924N288O531P76S C734H926N288O531P76S C727H915N284O524P75S2 C729H917N284O526P75S2

Mass (Da)a Calculated 24916.1 24666.9 24650.9 24664.9 24425.8 24460.9

m/z 24915.4 24666.4 24649.3 24665.0 24423.0 24459.6

a

The results were obtained after deconvolution of the raw ESI mass spectra using the MaxEnt1 algorithm to a zero-charge state mass b tRNALys 50 -GGGUCGUUAGCUCAGDDGGDAGAGCAGUUGACUSUU 6APCAAUUG7XCGCAGGTPCGAAUCCUGCACGACCCACCA-30 c tRNAGlu1 50 -GUCCCCUUCGUCPAGAGGCCCAGGACACCGCCCUSUC/CGGCGGUAACAGGGGTPCGAAUCCCCUGGGGGACGCCA-30 d tRNAGlu2 50 -GUCCCCUUCGUCPAGAGGCCCAGGACACCGCCCUSUC/CGGCGGUAACAGGGGTPCGAAUCCCCUAGGGGACGCCA-30 e tRNAGln 50 -UGGGGUA4CGCCAAGC#GDAAGGCACCGGUJUNUG/PACCGGCAUUCCCUGGTPCGAAUCCAGGUACCCCAGCCA-30 (N¼A or G) According to http://trnadb.bioinf.uni-leipzig.de/

Substrate Specificity of SelU Using synthetic RNA oligonucleotide substrates with a single S2U or geS2U modification at the position corresponding to the wobble site of tRNALys from E. coli, Szczupak et al. demonstrated that SelU recognizes the position of S2U in the anticodon-stem-loop (ASL) RNA chain, the sequences flanking S2U, and the length of the RNA substrate. Recombinant SelU showed very high activity toward the 17-mer RNAs mimicking the ASL of tRNA. The respective shift of S2U modification to different positions in the anticodon loop (33rd, 35th, 36th), the change of the nucleoside at the position corresponding to the 35th position in the anticodon sequence, in addition to S2U, led to a decrease in enzyme efficiency (Szczupak et al. 2022a), Table 2. SelU also recognizes the 35th position, immediately after S2U at position 34. Substrates containing purine residues, A or G at position 35 were barely accepted by the enzyme, while natural U35 was the better substrate compared to the C35 variant. The length of the oligonucleotide substrate mimicking the structural elements of tRNALys affected recognition by the SelU enzyme. The oligonucleotide truncated to 7-nt mimicking the anticodon loop of tRNALys or the oligonucleotide truncated to 3-nt mimicking the anticodon were less accepted, and the efficiency of the enzyme in the geranylation reaction decreased to 25% and 14% of the initial efficacy respectively, indicating that the enzyme requires the fully structured loop to exert its catalytic activity. The single nucleosides (S2U or geS2U) did not serve as substrates for the SelU. The enzyme was less demanding in terms of substrate specificity in the selenation reaction, as all tested geS2U-ASL models were converted to their selenoanalogues quantitatively (geS2U in position 33, 34, and 35 of the ASL) or with even high yield (geS2U in position 36 of the ASL) (Szczupak et al. 2022a, b).

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1245

Fig. 8 UPLC-PDA-ESI()-HRMS analysis of modified nucleosides in the wobble position of tRNAs associated with SelU protein. (a) The extracted ions chromatogram (XIC) of the mixture of 15 nucleoside standards (used in the same amount), (b) modified nucleosides in the wobble position of tRNAs associated with SelU protein. (Data taken from Szczupak et al. 2022a)

Several prenyl compounds (iPP, isopentenyl pyrophosphate; dmaPP, dimethylallyl pyrophosphate; GePP, geranyl pyrophosphate; fPP, farnesyl pyrophosphate; GeGePP, geranylgeranyl pyrophosphate) are present in bacterial cells, but only GePP is accepted by SelU as the second substrate and donor molecule in geranyl transfer during modification of 2-thiouridine in tRNA (Szczupak et al. 2022b). Haruehanroengra et al. investigated the influence of terpene chain length

– 0

0 0

Yield (%)

Geranylation Selenation

0 0

geS2U

Nucleoside S2U mnm5S2U

14  0.7 Nd

Anticodon (A) 3-mer

25  4.4 Nd

Anticodon Loop (AL) 7-mer

9  0.9 100

91  4.7 100

65  4.1 100

Anticodon Stem-Loop (ASL) 17-mer

13  2.0 66.0  1

10  1.0 94  2.6

Table 2 The recombinant SelU, in fusion with MBP protein (MBP-SelU), efficiency in geranylation and selenation reactions depends on the position of S2U in the RNA chain and the length of the RNA chain. (Data taken from Szczupak et al. 2022a)

1246 B. Nawrot et al.

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1247

on the stability and specificity of base cleavage, using the series of 2-thiouridine analogues containing methyl-, dimethylallyl-, and farnesyl-modified nucleosides in a DNA duplex (Haruehanroengra et al. 2017). They showed that longer than C10 chain (geranyl group) is required to maintain the base pairing discrimination of thymidine between G and A, to insert into the minor groove, and to stabilize the overall structure of the duplex. However, using RNA oligonucleotides, 17-mers, containing prenyl-PP analogues (iPP, dmaPP, fPP, GePP, and GeGePP), Szczupak et al. discovered that only the geranyl group of GePP can be incorporated into the oligonucleotide chain in the prenylation reaction catalyzed by SelU synthase (Szczupak et al. 2022b). Molecular modeling studies revealed the highest binding energy of dimethylallyl and geranyl pyrophosphates to the SelU protein (~100.1  33 and ~95  13 kcal/mol, respectively). However, the remarkable variation in binding energy was observed for the former ligand, suggesting that despite the slightly higher binding energy, the geranyl pyrophosphate was preferred by the protein (Szczupak et al. 2022b) (Fig. 9 and Table 3).

Fig. 9 The putative structure of SelU and structure of SelU – prenyl-PP complex studied by in silico analysis. The space occupied by the docked ligands is shown as a multicolored area. (Figure taken from Szczupak et al. 2022b)

Table 3 In silico analysis of binding of methyl and prenyl pyrophosphates to SelU protein. (Data taken from Szczupak et al. 2022b)

Ligand Methyl pyrophosphate Isopentenyl pyrophosphate Dimethylallyl pyrophosphate Neryl pyrophosphate Geranyl pyrophosphate Farnesyl pyrophosphate Geranylgeranyl pyrophosphate

Molecular docking Average affinity [kcal/mol] 4.8 þ 0.5 5.4 þ 0.4 5.5 þ 0.5 6.0 þ 0.3 6.2 þ 0.4 6.8 þ 0.5 6.6 þ 0.6

Molecular dynamics Relative free energy of binding ΔG [kcal/mol] 75 þ 30 80 þ 12 100.1 þ 33 61 þ 29 95 þ 13 78 þ 21 81 þ 11

1248

B. Nawrot et al.

The selenation reaction of chemically synthesized S-prenylated 17-mers RNA (with S-methyl, S-dimethylallyl, S-geranyl, and S-farnesyl modifications) proceeded with S-prenyl!Se conversion (Table 2). These results suggest that SelU has very high substrate specificity in the prenylation reaction, but the selenation reaction is not so demanded.

Readout of 50 -NNA-30 and 50 -NNG-30 Synonymous mRNA Codons by Sulfur- and Selenium-Modified tRNA Anticodons Synonymous Codons Specific for Lys, Glu, and Gln During translation, the specific aa-tRNAs (aminoacyl-tRNAs) are accommodated at the ribosome and serve as donors of amino acid residues for the growing peptide chain corresponding to the genetic code information in the mRNA template. The readout process occurs through the interaction of the mRNA codons with their cognate tRNA anticodons. In particular, the decoding activities of tRNAs depend on base modifications at positions 34 and 37 within the tRNA anticodon loop, which critically affect the rate and fidelity of translation and the integrity of the proteome (Nedialkova and Leidel 2015). To date, numerous biological experiments have shown that 2-thio- and 2-selenouridines in the tRNA wobble position play an important role in the precise reading of genetic information and in maintaining the reading frame in protein synthesis (Urbonavicius et al. 2001). However, the question arises as to why nature has introduced such a complex modification system of wobble uridines to improve the reading frame and how these modifications exert their reading function. The tRNAs containing S2U or Se2U units at position 34 provide for the reading of three two-codon sets of synonymous codons specific for glutamine, glutamic acid, and lysine (from boxes of codons of doubly degenerated amino acids, Table 4) that differ by the 30 -end letter, A or G (Grosjean et al. 2010; Demeshkina et al. 2012). Read preferences for synonymous codons are influenced by the cellular amount of isoacceptor tRNA species, the strength of codon-anticodon pairings, and directional mutational pressure (i.e., bias in genomic G þ C base composition) (Kunisawa et al. Table 4 Assignment of amino acid codons to sulfur and selenium-modified tRNA isoacceptors found in the bacterial system Amino acid Lysine (Lys) Glutamic acid (Glu) Glutamine (Gln)

Synonymous codons (50 -30 ) AAA AAG

tRNA (50 -N*34NN-30 ) tRNALysU*UU

U*34 mnm5S2U, mnm5Se2U

GAA

GAG

tRNAGluU*UC

mnm5S2U, mnm5Se2U

CAA

CAG

tRNAGlnU*UG tRNAGlnC UG

cmnm5S2U, cmnm5Se2U

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1249

1998). In higher organisms, multiple tRNA isoacceptors with different anticodons are available to read the mRNA codons specific for these amino acids. However, in the bacterial system, a single tRNAGluUUC molecule with the anticodon 50 -U*34UC -30 recognizes both synonymous codons specific for glutamic acids (50 -GAA-30 and 50 -GAG-30 ), and a single tRNALysUUU molecule with the anticodon 50 -U*34UU-30 recognizes both synonymous codons specific for lysine (50 -AAA-30 and 50 -AAG-30 ), with slightly increased promotion for adenosine-containing codons (Shigi 2014; Wittwer and Ching 1989). In the case of decoding Gln 50 -CAA-30 and 50 -CAG-30 codons, two isoaccepting tRNAs containing either U* or C at position 34 are used to decode both cognate codons. The selection of a cognate codon by the aa-tRNA is influenced by the reading accuracy and determines the elongation rate during translation, as some codons are more prone to translation (Rocha 2004). In general, in bacteria, the AAA codons are used about three times more frequently than the AAG codons (Nakamura 2007), unless the next codon after the lysine-encoding codon starts with cytidine (Dardel 2006). Then the AAG codon is preferred.

U-A and U-G Base Pairing Modes The modified uridine (U*), located at the wobble position of the anticodon, is capable to form base pairs with both the A and G complement. Such differential base pairing of U*, with A via the classical Watson-Crick mode and with G via the wobble mode, was initially proposed by Francis Crick in his so-called wobble hypothesis (Crick 1966) and extended by Paul F. Agris in his modified wobble theory (Agris 1991). According to this theory, only the first two bases of the codon form the precise Watson-Crick pairing with the nucleosides of the anticodon, but the pairing between the third base of the codon and the anticodon with specific base modification can form a wobble bond to select specific codons. This “classical wobble” base pair G-U (Fig. 10) has been identified as a fundamental building block in several RNA molecules (rRNA, mRNA, tRNA, ribozymes) and is an important structural and functional player in biological systems (Varani and McClain 2000). It has also been found as a standard U-G34 wobble pair in the mRNA/tRNA ribosomal context (Westhof et al. 2014). However, the reversed U34-G base pair in the proposed wobble bond (Fig. 10) is not isosteric to G34-U, and such base pair is not well accommodated on the ribosome (Westhof et al. 2014). Therefore, it has already been proposed that the third base pair must adapt to the allowed spatial fit by tautomeric and geometric rearrangement (Takai and Yokoyama 2003; Murphy et al. 2004; Vendeix et al. 2012; Rozov et al. 2016a, b; Weixlbaumer et al. 2007), depending on the structure of the modified uridine containing either a chalcogene atom at position 2 or/and the specific substituent at position 5 of the uracil ring. The main function of sulfur and selenium modifications in tRNA wobble uridines is their role in codon-anticodon interaction and precise fine-tuning of genetic information stored in mRNA. Unmodified uridines in tRNA preferentially recognize the A complement through Watson-Crick interactions and, with lower affinity, the G complement through the wobble hydrogen-bonding pattern. The introduction of

1250

B. Nawrot et al.

Fig. 10 (a) Base pairing of U with A according to the Watson-Crick mode, and U with G according to the F. Crick wobble hypothesis. The structures of the isosteric base pairs U34-G and G34-U are shown, of which only G34-U is well accommodated on the ribosome, while U34-G does not fit well in the ribosome context. (b) Sulfur- and selenium-modified uridines pairing with A and G, respectively. Indicated are unfavorable S and Se interactions with the NH1 acceptor of G. The third structure represents a new wobble S2U/Se2U34-G base pair

sulfur or selenium atoms into the C2 position of the uridine increases the thermodynamic stability of RNA duplexes containing S2U-A or Se2U-A base pairs and decreases the formation of base pairs with the G complement (Agris et al. 1992; Sun et al. 2012). The observed effect is due to the lower electronegativity of the sulfur and selenium atoms compared to the oxygen atom, which weakens the hydrogen bond originally formed by 2-oxygen in the wobble pair. Moreover, the steric repulsion between the S/Se atoms (with the larger atomic nucleus radius than oxygen) and the 20 -oxygen atom leads to the adoption of the C30 -endo conformation of the sugar ring, which allows the formation of S2U-A and Se2U-A pairs.

Tautomeric Forms of Modified Uridines and Their Base Pairs with Guanosine Tautomers are structural isomers that differ in the position of protons and double bonds. In aqueous solution, the conversion between the different prototropic tautomers occurs by acid-base catalysis. The mechanism has been shown to be a watermediated exchange of protons between donor and acceptor atoms, with equilibrium occurring in the nanosecond range (Peng et al. 2013).

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1251

Nucleic acid bases can exist in many tautomeric forms due to the presence of solvent-exchangeable protons. Under physiological conditions, the tautomeric equilibrium of nucleobases is shifted toward the keto and amine forms, which are the predominant tautomeric forms. The presence of only one major tautomeric form at physiological pH (pH  7) is critical for maintaining the integrity of genomic information during DNA and RNA replication and the biological activity of functional RNAs (Singh et al. 2015). The concentration of ionized intermediates that promote tautomerism is maximal, regardless of their charge state, when the pH is close to the pKa of the functional groups involved in tautomerism. The functional groups in RNA bases have unperturbed pKa values that are either well below neutral (pKa ¼ 4.2 for N3 cytidine and pKa ¼ 3.5 for N1 adenosine) or above neutral pH (pKa ¼ 9.2 for N1 guanosine and pKa ¼ 9.2 for uridine N3) (Bevilacqua et al. 2004). One of the consequences of the presence of multiple tautomeric forms is unclear base pairing, since the different tautomeric forms can be expected to have different and distinct base pairing preferences. However, it is known that chemical modifications of nucleobases (as in tRNA and rRNA) contribute to increased chemical versatility of RNA by forming relatively stable different tautomeric forms (Singh et al. 2015; Takai and Yokoyama 2003; Murphy et al. 2004; Rozov et al. 2016a; Vendeix et al. 2012; Weixlbaumer et al. 2007). In 2003, Takai and Yokoyama published the seminal work on the role of the 5-substituents of tRNA wobble uridines in the recognition of codons with 30 -purine ends (Takai and Yokoyama 2003). In this work, they proposed that the wobble uridines could recognize G in a non-canonical mode by rearranging into different tautomeric forms depending on the nature of the substituent at position 5 and the chalcogene atom at position 2 of the uracil ring. This hypothesis was based on the assumption that the frequency of the pre-structured form of a nucleoside in solution at a given pH is related to the acidity of the N3H function in a nucleobase, which in turn depends on the electron-withdrawing or electron-donating properties of the substituent at position C5, Fig. 11.

Fig. 11 Tautomers of 5-modified 2-chalcogen containing uridines (X ¼ O, S, Se). For R5 substituents with amino alkyl side chain (as in mnm, cmnm, τm), the tautomer is protonated at the side chain and the negative charge is distributed along the chalcogen nitrogen N3 and oxygen atom O4 edge. Tautomers: K - 2,4-diketo; E2 - 4-keto-2-enol; E4 - 2-keto-4-enol; ZI – zwitterion

1252

B. Nawrot et al.

Among the more than 40 modified nucleosides found in the wobble position of tRNAs, most are 5-substituted uridines and 2-thiouridines containing -O-R or -CH2-R substituents at the C5 position of the uracil ring. The -OR substituents (-OH, -OCH3, and -OCH2COOH, denoted ho, mo, and cmo, respectively) are expected to increase the electron density of the uracil ring by a mesomeric effect arising from the overlap of the p orbital of the oxygen atom with the orbital of the uracil ring (Egert et al. 1980). The electron donating properties of the -CH2-R substituents, e.g., -CH3 (m) or -CH2COOCH3 (mcm), are weak and their contribution to the electron density of the pyrimidine ring is limited. However, the substituents containing aminoalkyl groups, e.g., -CH2NHCH3 (mnm), -CH2NHCH2COOH (cmnm), and -CH2NHCH2CH2S(O) (O)OH (τm, taurinomethyl substituent), significantly affect the electron density of the nucleobases because their nitrogen atoms are essentially protonated at a physiological pH 7.4; the pKa values of the secondary amines exceed 9 units (Clayden et al. 2001). The protonated 5-aminoalkyl substituents exert strong electron-withdrawing properties and promote deprotonation of the N3H function of the uracil ring in 2-thio- and 2-selenouridines present in bacterial tRNA (but also in some tRNAs of higher organisms). Interestingly, this property is also crucial for the pathogenesis of human mitochondrial diseases caused by a lack of taurine modification in mitochondrial tRNAs (Suzuki et al. 2011).

Ionizable Tautomeric Forms of 2-Thio- and 2-Selenouridines The respective pKa values of ionizable functions of aminoalkyl modified units are given in Table 5 (Sochacka et al. 2017; Leszczynska et al. 2020). The process of departure of proton from the N3H function of 5-methylaminomethyl-2-selenouridine (mnm5Se2U, pKa 6.43) is much easier than that for the most abundant mnm5S2U (pKa 7.28) (Leszczynska et al. 2020). This phenomenon is due to the higher polarizability of the selenium atom compared to the sulfur or oxygen atoms at C2 of the uracil ring (Reich and Hondal 2016) and is also observed as a difference in pKa of 3–4 units between natural selenolates (R-SeH) and thiolates (R- SH) (Huber and Criddle 1967). Table 5 pKa values of bacterial tRNA wobble nucleosides determined by pH-dependent potentiometric titration. The pKa values (determined at 25 C) were evaluated for the dissociation of the N3H proton and for the protonation/deprotonation of the amino alkyl and carboxyl groups present in the C5 side chains (SD ¼ 0.01). The content (%) of fractions of nucleosides ionized at N3H at pH 7.4 is given in brackets C5-substituent (abbreviated name) CH3NHCH2-(mnm)

HOOC-CH2NHCH2(cmnm) H-

Deprotonation/protonation site N3H NHCH2 N3H NHCH2 COOH N3H

R5Se2U 6.43 (>90) 9.36 6.55 (89) 8.89 2.26 7.30 (58)

R5S2U 7.28 (57)

R5U 8.15 (15)

9.51 7.36 (52) 9.10 2.50 8.09 (17)

10.02 8.24 (13) 10.13 3.05 9.15 (2)

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1253

Fig. 12 (a) Scheme of the ionization of the N3H function in wobble nucleosides; and (b) the content (%) of their ionized fractions at pH 7.4, calculated according to the Henderson–Hasselbalch equation (log([BH]/[B-]) ¼ pKa-pH), as listed in Table 5 5-substituent: blue ¼ mnm, green ¼ H

Consequently, at physiological pH, mnm5Se2U and mnm5S2U adopt either a significant proportion (approximately 90%) or more than half (57%) of the ionized form (in this case, a zwitterionic structure, ZI), as shown in Table 5 (in parentheses) and Fig. 12. Similar data were obtained for the 5-cmnm-substituted S/Se nucleosides (89 and 52%, respectively). The calculated ionized fraction content for 5-nonsubstituted Se2U was much lower (about 58%) than for 5-substituted units, but similar to that for mnm5S2U (57%) and much higher than for 2-thiouridine (15%). These results indicate that mnm5Se2U and cmnm5Se2U are most likely to tautomerize to the ionized zwitterionic form at physiological pH, whereas U, mnm5U, cmnm5U, and S2U are mainly in the diketo form (K). The remaining mnm5S2U, cmnm5S2U, and Se2U with pKa 7.0–7.5 can equally adopt the tautomeric K and ZI forms and are used for translation depending on the codon usage preference (Plotkin and Kudla 2011). Therefore, based on the obtained pKa data, it was proposed that R5Se2U with the positive charge on the aminoalkyl side chain and the negative charge on the Se2-N3-O4 edge can read mRNA-synonymous codons ending with 30 -A and 30 -G under physiological conditions (pH 7.4) by Watson-Crick (UK-A) and a new UZI-G mode, respectively, which was later referred to as “a new wobble” (Fig. 13) (Rozov et al. 2016a, b). As indicated by literature data (Sochacka et al. 2017), the pKa values of 2-thiouridines found in higher organisms can vary depending on the nature of the substituent at position 5, and therefore the nature of their binding to G can also differ, as will be discussed later.

Theoretical Modeling of U-G Base Pairs with mnm5S2Ura and mnm5Se2Ura In search of energetically favored structures exhibiting base pairing between the R5X2U nucleosides and guanosine (Fig. 13), the enthalpies for the formation of

1254

B. Nawrot et al.

Fig. 13 The structures of the complexes between m1R5X2Ura and m9Gua. The wobble and C-Glike base pairs represent the complexes formed between the K and E4 tautomers of the pyrimidine nucleobase models and m9Gua. The “new wobble” base pair represents the most stable complexes of the zwitterionic form ZI(3,4) of m1mnm5XUra with and m9Gua (with the protonated mnm side chain) X¼S,Se

Table 6 Enthalpies of formation of the complexes of 9-methyl guanine (m9Gua) and modified uracils (m1R5X2Ura, where X ¼ O, S, Se, R ¼ H or mnm) in water calculated using the CPCMB3LYP-GD3/6–311++G(3df,2p)//B3LYP/6–31 þ G(d) method (in kcal/mol). The calculation error is 0.5 kcal/mol. The corresponding structures of the base pairs are shown in Fig. 13. (Data taken from Leszczynska et al. 2020; Sochacka et al. 2017) ΔH298 of a base pair of m9Gua with m1R5X2Ura component (kcal/mol) R H mnm X O S Se O S Se Base pair mode 10.0 8.1 7.9 10.2 8.4 8.4 UK-G (‘classical wobble’) UE4-G (C-G-like) 7.8 6.6 6.2 7.6 6.4 6.8 UZI-G (‘new wobble’) – – – 5.9 7.3 8.6

hydrogen bond complexes between the most stable K, E4, and ZI tautomers of the model units, 1-methyl-5-substituted X2-uracils (with R5 ¼ H or mnm and X ¼ O, S, Se), and 9-methylguanine (m9Gua) in the most stable 6-keto form were calculated (Table 6) (Sochacka et al. 2017; Leszczynska et al. 2020). The optimization resulted in structures that are fairly close to planar. The UK-G complexes proved to be the preferred binding mode for all tested models, except for the base pair m1mnm5Se2Ura-m9Gua. The enthalpies of the C-G-like binding modes for the S2 and Se2 models (UE4-G) were quite similar, and such U-G base pairing was indeed confirmed by crystallographic analysis of mcm5S2U in complex with guanosine in the ribosome context (Vendeix et al. 2012), as described in the following section. Interestingly, the “new wobble” base pair proved to be significantly stronger for the selenium model when comparing oxo- and sulfur-substituted m1mnm5X2Ura. In this prestructured ionic form, mnm5S/Se2U can interact with the N1H and N2H donors of guanosine using either the S2 and N3 acceptors or the N3 and O4 acceptors. Since the base pair UZI(2,3)-G has a significantly reduced interaction enthalpy due to adopted twisted geometry and repulsive interactions between O4 of m1mnm5S/SeUra and O6 of m9Gua, UZI(3,4)-G was assumed to be

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1255

the preferred binding mode for 5-mnm-substituted 2-thio- and 2-selenouridines to guanosine as an alternative pairing to the classical UK-G wobble pattern, Fig. 13. The above conclusion was supported by the analysis of the differences in the electrostatic potential-derived (ESP) atomic charges on the O4, N3, and X2 atoms in the X2-C2-N3-C4-O4 bond region of the zwitterionic forms of the m1mnm5X2Ura (X ¼ O, S, Se) models before and after the formation of the base pairs with m9Gua. The corresponding charge distributions of the free and bound forms of 2-oxo-, 2-thio-, and 2-seleno-uracil are shown in Fig. 14. The graph in Fig. 15 shows the change in atomic charges on O4, N3, and the chalcogen atom O, S, or Se (in blue, green, or brown, respectively) in the form of two adjacent bars, with the left bar representing the charge before and the right bar representing the charge after binding

Fig. 14 Comparison of selected electrostatic potential-derived (ESP) atomic charges and distances in zwitterionic m1mnm5Ura, m1mnm5S2Ura, and m1mnm5Se2Ura and in their complexes with 9-methyl guanine (m9Gua) in water; bond distances are marked in blue

1256

B. Nawrot et al.

q [e]

O4

N3

U/S/Se

0 -0,2 -0,4 -0,6 -0,8

U / U-G S / S-G Se / Se-G

-1 -1,2

Fig. 15 Comparison of the selected atomic charges on O4, N3, and X2 atoms in zwitterionic structures of m1mnm5Ura, mnm5S2Ura, and mnm5Se2Ura and in their complexes with m9Gua in water (as shown in Fig. 14). Each graph represents the charge value at the nucleobase in the free state (left bar) and after its binding to m9Gua (right bar). Blue graphs represent m1mnm5Ura, green – m1mnm5S2Ura, and brown – m1mnm5Se2Ura. The arrow indicates the most pronounced charge transfer observed at the N3 atom of m1mnm5Se2U upon binding to m9Gua. This is the feature that distinguishes 2-Se uridines from their 2-oxo and 2-thio precursors (Figure taken from Leszczynska et al. 2020)

to the G partner. While the changes in negative charge on O4 and the chalcogen atom gradually increased upon binding (increased Δq), the dramatic charge transfer was observed on the nitrogen atom of the selenium-modified base (indicated by the arrow), much larger (Δq ¼ 0.861e) than that in the sulfur-modified base (Δq ¼ 0.546e) and in the uracil base (Δq ¼ 0.178e). Moreover, the length of the hydrogen bond N3...HN2 in the Se2U-G base pair was shorter than the same hydrogen bond in the corresponding S2U-G and U-G base pairs (Fig. 14). The largest charge transfer from X2Ura to Gua occurred in the Se2U base, and this feature distinguishes the 2-Se uridines from their 2-oxo and 2-thio precursors, so that tRNA anticodons with selenium-modified uridines 34 are most likely to read for 50 -NNG-30 codons compared with mnm5U and mnm5S2U. However, it cannot be excluded that sulfur-modified tRNA anticodons also read 50 -NNG-30 codons according to the “new wobble” pattern, and this has already been demonstrated (Rozov et al. 2016a, b), as discussed in the following section (Fig. 15).

Crystal Structures of U*-G Base Pairs in tRNA-mRNA at the Ribosome Context How modified uridines 34 contribute to the recognition of 30 -terminal purines in synonymous codons has been the subject of several crystallographic studies performed in the context of the mRNA programmed ribosomes. The small ribosomal subunit is responsible for decoding genetic information during translation. The structure of the decoding center of the 30S ribosome of Thermus thermophilus, a

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1257

single chain of mRNA and three molecules of tRNAs in A, P, and E sites is composed entirely of RNA (Schluenzen et al. 2000). The decoding site around the ribosomal A site is formed by a cavity that accommodates a short double helix of the mRNA codon and the tRNA anticodon (nucleotides at positions 34–36). The crystal structure of the complex of short mRNA, anticodon stem-loop fragment of tRNA (ASL), and 30S ribosomal unit allowed to determine the precise geometry of codonanticodon base pairing and to learn that nearly cognate tRNAs are distinguished by the series of interactions of the first and second base pairs with the ribosome (Ogle et al. 2001). In contrast, the third or wobble position is subject to less stringent constraints and therefore can allow for a broader range of base pairing geometries consistent with the requirements of the genetic code. The crystal structure of the bacterial 70S ribosome, refined to 2.8 Angstrom resolution, revealed atomic details of the ribosome’s interactions with mRNA and tRNA (Selmer et al. 2006). In this structure, metal ions located at the interface between the subunits and between the ribosome and tRNA and mRNA play a role in stabilizing a kink in the mRNA at the boundary between the A- and P-site codons, which may be important in preventing slippage during translation. In 2004, the first crystal structure of a modified tRNA fragment bound to the 30S ribosomal subunit of Thermus thermophilus and a short RNA oligonucleotide with the lysine codons AAA or AAG in the A site was solved (Murphy et al. 2004). The experiment was performed by soaking of the native ribosome crystals with a solution containing doubly modified ASLLys (mnm5U34 and t6A37) and the mRNA fragment 50 -AAGAAA-30 . The obtained structures allowed to understand how the modifications in the anticodon loop enable the decoding of the two lysine codons AAA and AAG. The latter one shows the base pair mnm5U-G formed by the bifurcated hydrogen bonds between the acceptor O2 of mnm5U and the two donors at N1H and NH2 of guanosine, Fig. 16a. In the following studies published in 2012 solved the crystal structure of the triply modified ASL fragment of human tRNALys3 bound to the RNA template accommodated at 30S ribosomal subunit (Vendeix et al. 2012). It was discovered that 5-methylcarboxymethyl-2-thiouridine 34 (mcm5S2U34) reads the 30 -G of the lysine codon via a C-G-like base pairing (Watson-Crick geometry), Fig. 16b, with mcm5S2U adopting the tautomeric E4 structure as described in the previous section. In contrast, binding to the 30 -A of the Lys codon occurs via the tautomeric diketo form (K) of U*34. Moreover, spectral and physicochemical studies have confirmed that modifications in the human ASLLys loop at positions 34 (mcm5S2U34), 37 (mS2t6A37), and 39 (Ψ39, pseudouridine) are involved in loop stability and enable efficient anticodon-codon interactions. A similar C-G-like base pair U*-G was identified for the base pair cmo5U-G (Weixlbaumer et al. 2007), of which U* was found in Escherichia coli and Bacillus subtilis. Finally, thanks to recent studies by the team of Yusupova and Westhof, the crystal structure of Escherichia coli tRNALys in complex with the 70S ribosome and a short mRNA fragment has been solved (Rozov et al. 2016a, b). Based on several X-ray structures of the 70S ribosome complex, it was shown here, how Escherichia coli tRNALysUUU with hypermodified 5-methylaminomethyl-2-thiouridine (mnm5S2U) at the wobble position discriminates between the cognate codons AAA and AAG

1258

B. Nawrot et al.

Fig. 16 The pre-structured non-standard base pairs of hypermodified wobble uridines with G-unit as found in the crystal structures of the corresponding tRNAs in complex with ribosome and mRNA fragments with 50 -NNG-30 codons (PDB: 1XMO, 3T1Y, and 5E81, respectively). The structures of the base pairs in the crystals are shown on the left

and the near-cognate stop codon UAA (ochre) or the isoleucine codon AUA, with which it forms pyrimidine-pyrimidine mismatches. An unusual base pair mnm5S2U34-G found in a biological context has the previously postulated UZI-G geometry (Fig. 16c, Tables 5 and 6) (Takai and Yokoyama 2003; Duechler et al. 2016; Sochacka et al. 2017; Leszczynska et al. 2020). This “new wobble’” base pair is realized by the zwitterionic form of the modified nucleoside, in which the methylaminomethyl substituent is protonated and the 2-thiouracil ring is negatively charged, resulting in two hydrogen bonds between the uracil acceptors N3 and O4 and the respective NH2 and N1H donors of guanosine. The presence of a 2-thio modification together with a C5 aminoalkyl substituent appears to be essential for the formation of this novel wobble U34-G base pair with excellent spatial fit to the decoding center. These data indicate that the prestructuring of U*34 is an important requirement for efficient and accurate recognition of cognate and wobble

38

Sulfur- and Selenium-Modified Bacterial tRNAs

1259

codons. The U*-G base pairs described here, which are either accommodated on the ribosome in a Watson-Crick C-G-like geometry or, in the latter case, form a new wobble base pair by shifting the mnm5S2U toward the minor groove, show that the mechanism of translational fidelity is controlled by steric complementarity and shape acceptance of the decoding machinery. All these examples extend the general knowledge of the degeneracy of the genetic code and reveal an important role of tRNA modifications in translation.

Conclusions Epitranscriptomics of transfer RNA is another level of the mechanism of regulation of gene expression depending on the modified units of tRNA. In bacteria, the main players in this process are sulfur- and selenium-containing modified uridines located in the wobble position of anticodons. These modified tRNAs trigger the reading of synonymous mRNA codons ending with the nucleoside 30 -purine A or G delivering the series of proteins required by the cell according to the requirements of the critical stress environments. In this chapter, we present the biosynthetic pathways involved in the formation of 2-selenouridines and characterize the enzyme that exhibits dual enzymatic activity, namely in S2U-tRNA geranylation and in the selenation of S-geranyl-2-thiouridine-tRNA. Previously, this geranylated intermediate was thought to contribute directly to the translation process, but as recently found, it is a canonical component of the SelU nucleoprotein that serves as an intermediate in the replacement of S by Se at the tRNA level. We also aimed to determine how these modifications exert their reading function. As shown previously, the classical wobble base pair U34-G does not accommodate well on the ribosome, and for years it has been suspected that it can only adapt to the available space after a tautomeric and geometric rearrangement. A comparison of the pKa values of a series of 5-(c)mnm-substituted uridines, 2-thiouridines, and 2-selenouridines shows that only these last modifications are almost completely ionized under physiological conditions (pH 7.4) and can readily tautomerize to the zwitterionic form, which in turn forms the most stable base pair with G, the so-called “new wobble.” Such a UZI-G base pair has already been found in the crystal structure of a tRNALys with the wobble mnm5S2U in complex with the NNG mRNAprogrammed 70S ribosome. Theoretical calculations on model U*-G base pairs confirmed that the dramatic charge transfer between the selenium-modified and guanosine bases confers the lowest enthalpy to the mnm5Se2U-G base pair compared to the 2-thio and 2-oxo analogues. These physicochemical and theoretical studies demonstrate the highest pre-patterning potential of mnm5Se2U34, an important requirement for efficient and accurate recognition of wobble codons. Thus, in summary, regulation of gene expression may occur through expression of the SelU protein, which exerts its enzymatic activity via reprogramming of the tRNA epitranscriptome and production of selenium-modified tRNAs, which in turn can tune the translation of NNG codon-containing mRNAs critical for stress response proteins.

1260

B. Nawrot et al.

Acknowledgments This review was prepared within the project funded by The National Science Center in Poland (projects UMO-2018/29/B/ST5/02509 to B.N. and UMO-2016/23/B/NZ1/02316 to M.S.).

References Agris PF (1991) Wobble position modified nucleosides evolved to select transfer RNA codon recognition: a modified-wobble hypothesis. Biochimie 73(11):1345–1349. https://doi.org/10. 1016/0300-9084(91)90163-u Agris PF, Sierzputowska-Gracz H, Smith W, Malkiewicz A, Sochacka E, Nawrot B (1992) Thiolation of uridine carbon-2 restricts the motional dynamics of the transfer RNA wobble position nucleoside. J Am Chem Soc 114:2652–2656. https://doi.org/10.1021/Ja00033A044 Agris PF, Eruysal ER, Narendran A, Väre VYP, Vangaveti S, Ranganathan SV (2018) Celebrating wobble decoding: half a century and still much is new. RNA Biol 15:537–553. https://doi.org/ 10.1080/15476286.2017.1356562 Armengod ME, Meseguer S, Villarroya M, Prado S, Moukadiri I, Ruiz-Partida R, Garzón MJ, Navarro-González C, Martínez-Zamora A (2014) Modification of the wobble uridine in bacterial and mitochondrial tRNAs reading NNA/NNG triplets of 2-codon boxes. RNA Biol 11: 1495–1507. https://doi.org/10.4161/15476286.2014.992269 Arragain S, Handelman SK, Forouhar F, Wei FY, Tomizawa K, Hunt JF, Douki T, Fontecave M, Mulliez E, Atta M (2010) Identification of eukaryotic and prokaryotic methylthiotransferase for biosynthesis of 2-methylthio-N6-threonylcarbamoyladenosine in tRNA. J Biol Chem 285(37): 28425–28433. https://doi.org/10.1074/jbc.M110.106831 Batey RT, Rambo RP, Doudna JA (1999) Tertiary motifs in RNA structure and folding. Angew Chem Int Ed Engl 38:2326–2343. https://doi.org/10.1002/(sici)1521-3773(19990816)38: 16 A mutations generate new binding sites for the GABP transcription factor, a member of the broader ETS transcription factor family (core sequence CCGGAA) (Bell et al. 2015). GABP exists as a heterodimer that binds DNA sequence-specifically and activates transcription. Furthermore, a subset of GABP isoforms contain a leucine-zipper domain, allowing two precisely spaced GABP heterodimers to cooperatively form a heterotetrameric complex that can more strongly induce transcription (Fig. 6). The wt TERT promoter contains two ETS binding sites; however, these are positioned ~100 base-pairs apart and therefore do not support cooperative GABP binding. It is proposed that the de novo GABP binding sites generated through mutation cooperate with the native binding sites to activate transcription (Bell et al. 2015). This mechanism explains the specificity of the observed mutations (only G > A) and their precise positions (228 or 250), generating GABP binding sites that are separated by full helical turns corresponding to the periodicity of DNA (Fig. 6). A GABP heterotetramer is not the only transcription factor complex to bind cooperatively to the mutant TERT promoter. In an analogous mechanism, the transcription factors ETS1 and p52 (a subunit of NFκB) can also form a heterodimeric complex, wherein ETS provides the DNA binding interactions (Li et al. 2015). In the presence of the promoter mutation, p52 provides an interface for cooperative heterotetramerization and transcriptional activation of the TERT gene. An alternative mechanism of TERT activation was revealed through genomic sequencing of a panel of neuroblastomas, a pediatric tumor. Approximately 50% of neuroblastomas are classified as “low-risk” and regress spontaneously, whereas those classified as “high-risk” display poor clinical outcome. Genomic sequencing of a panel of neuroblastomas revealed large-scale genomic rearrangements, including interchromosomal translocations, which placed strong enhancer elements

1306

T. M. Bryan and S. B. Cohen

Fig. 6 Transcriptional activation of TERT expression through cooperative GABP binding. The transcription factor GABP exists as a heterodimer of a DNA binding protein (α) and a regulatory protein with a leucine zipper (β). The wt TERT promoter contains independent GABP binding sites (left); a precisely positioned G > A mutation within the TERT promoter (here shown at position G250) generates a de novo additional GABP binding site spaced by full helical turns. Maintaining the periodicity of DNA allows two GABPs to cooperate through their leucine zipper domains for enhanced binding affinity and transcriptional activation of hTERT expression

proximal to the TERT coding sequence in about a third of the high-risk cancers. The rearrangements induce chromatin remodeling and activate the gene for transcription (Peifer et al. 2015). The TERT rearrangements were strongly associated with poor clinical outcome and were not observed in low-risk neuroblastomas, lending further support to telomerase as a molecular target for cancer therapy. Other mechanisms used by cancer cells to activate TERT expression include gene amplification and an increase in expression of MYC, an oncogenic transcription factor that drives TERT transcription (Roake and Artandi 2020; Nassour et al. 2021), but it is likely that further mechanisms remain to be discovered.

Telomerase Inhibitors The idea of inhibiting telomerase as a cancer therapy has been a major driver of basic research on telomerase for almost three decades. The use of small molecules to inhibit telomerase enzymatic activity has yielded only modest success, with just two small organic molecules and one oligonucleotide demonstrating sufficient enzyme affinity and specificity to inhibit telomerase in a direct (i.e., non-PCR-based) activity

40

Telomerase

1307

assay and display phenotypic effects at the cellular level. At the time of this writing (2022), only one compound has proceeded to clinical trials. Major challenges to progress in this endeavor have been, and still are: (i) the extremely low cellular abundance of human telomerase, even in immortal cell lines that display “robust” telomerase activity, measured at ~50–200 molecules per cell (Cohen et al. 2007; Xi and Cech 2014); (ii) the inability to generate sufficient quantities of purified enzyme to carry out high-throughput screening of diverse (>106) compound libraries with a direct activity assay; and (iii) the lack of atomic-resolution ( ~2 Å) structural data for the complete enzyme complex to guide structure-based drug design.

Potential Caveats of Telomerase Inhibition Telomerase is expressed in ~90% of all cancers, and inhibition of telomerase holds promise as a therapy of exceptionally broad scope. Nonetheless, there are aspects of telomerase as an anticancer target that must be considered (Harley 2008). The Lag Phase The basic premise of telomerase inhibition is that in the absence of telomerase activity, telomeres will eventually shorten sufficiently to initiate cell growth arrest or cell death. However, telomere shortening will require many population doublings to reach a critically short length, resulting in a lag phase between the time telomerase is inhibited and the time when the telomeres of cancer cells are sufficiently short to signal growth arrest. This lag phase will vary depending on the length of telomeres in the tumor and would necessitate the sustained administration of a telomerase inhibitor. Somatic Cells That Express Telomerase Telomerase is expressed under careful control in hematopoietic and other stem cells, germline cells, and rapidly dividing cells such as cells of the basal layer of the epidermis and intestinal crypts (Wright et al. 1996). Inhibitors of telomerase could potentially affect the function of these cells. However, telomeres of normal (non-diseased) cells have been observed to be longer than telomeres of the corresponding cancerous cells (Engelhardt et al. 1997; Marian et al. 2010), potentially providing a therapeutic window where the effect of telomerase inhibition would be negligible on normal cells. Furthermore, many stem cells proliferate only intermittently; when these cells are quiescent, telomerase activity is low or undetectable and telomere shortening does not occur (Harley 2008). While these favorable telomere dynamics suggest that telomerase inhibition may be a specific and safe cancer therapy, sustained administration of a telomerase inhibitor would require careful monitoring for deleterious effects on these normal cell compartments. Resistance of ALT Tumors Telomerase inhibitors would not be effective against the ~10% of human cancers that use the telomerase-independent, recombinationbased ALT pathway of telomere maintenance and lack detectable telomerase activity (Bryan et al. 1995; reviewed in Sobinoff and Pickett 2020). Furthermore, telomerase

1308

T. M. Bryan and S. B. Cohen

inhibition in telomerase-positive immortal cells may create selection pressure to activate ALT. This has been observed in cell culture experiments, although it was a rare event (Bechter et al. 2004). Ideally, a telomerase inhibitor could be administered in combination with an ALT inhibitor, thereby covering both telomere maintenance mechanisms.

BIBR1532 In 2001, scientists from Boehringer Ingelheim Pharma (Germany) reported the first organic small-molecule inhibitor of human telomerase, designated BIBR1532 (Chart 1) (Damm et al. 2001). BIBR1532 was found to inhibit telomerase purified from HeLa cell lysate with an IC50 of ~90 nM. The authors measured affinity of BIBR1532 for telomerase over human DNA polymerases ɑ, β, and ɣ, and RNA Polymerases I, II, and III; none of these enzymes were inhibited up to 50 μM, demonstrating specificity of 103. The profile of telomerase extension products upon titration of BIBR1532 revealed an unexpected mode of enzymatic inhibition (Fig. 7a) (Pascolo et al. 2002).

Fig. 7 Small-molecule inhibitors of human telomerase. (a) BIBR1532 preferentially inhibits the formation of longer products, suggesting the compound inhibits the translocation step of enzyme processivity. Generation of the first repeat (+TTAG) in a direct activity assay is marginally inhibited up to 3 μM, indicating BIBR1532 does not primarily inhibit nucleotide addition. (From Pascolo et al. 2002). (b) A549 (lung carcinoma) or MCF7 (breast carcinoma) cells were treated with 10 μM NU-1 or control compound NU-2 for 24 h; DMSO is a negative (vehicle) control. At 24 h cells were harvested and the telomerase immunopurified for analysis with a direct activity assay. (*) is a recovery and loading control. (From Betori et al. 2020)

40

Telomerase

1309

BIBR1532 does not alter the six-nucleotide product profile, indicating the compound is not impeding reverse transcription along the RNA template, which would be expected to disrupt the distribution of products. Rather, BIBR1532 preferentially inhibits formation of longer DNA extension products (three or more repeats) compared to the first round of nucleotide addition (before translocation). These observations suggest that BIBR1532 inhibits the process of translocation. Mechanistically, this could occur by increasing koff of the DNA product when reverse transcription has reached the end of the template. However, kinetic experiments revealed that BIBR1532 does not compete for DNA binding (Pascolo et al. 2002). It is hypothesized that BIBR1532 inhibits a conformational change associated with translocation.

1310

T. M. Bryan and S. B. Cohen

Extended treatment of immortal human cell lines derived from cancers of the lung, breast, and prostate with 10 μM BIBR1532 led to significant telomere shortening over 100–140 population doublings (PDs), from 4–5 kb (typical for a telomerase-positive cell line) to 1.5 kb (Damm et al. 2001). The doubling rates of these cell lines were essentially unchanged for at least 100 PDs. This illustrates the lag phase described above: the proliferative potential that must be exhausted before growth rates are affected. However, by PD  120, a clear decrease in doubling rate ensued, followed by an almost complete cessation of proliferation by PD  140. This was accompanied by morphological changes consistent with senescence. For unclear reasons, BIBR1532 never progressed to clinical trials despite impressive specificity and telomerase inhibition data.

NU-1 NU-1, a small-molecule telomerase inhibitor developed by the Scheidt lab at Northwestern University (Chicago, USA), operates through a covalent mechanism by reacting irreversibly with the thiol of a cysteine residue of hTERT (Chart 1) (Betori et al. 2020). The design of NU-1 was inspired by the complex natural product chrolactomycin, which had been proposed to inhibit telomerase. The total synthesis of chrolactomycin has (as of 2022) yet to be achieved, despite significant effort from multiple labs. NU-1 represents the reactive core of chrolactomycin and is produced in just five synthetic steps on the gram-scale (Betori et al. 2020). To facilitate biochemical and biological studies, a control compound (NU-2) lacking the electrophilic exo-methylene carbon was also synthesized; NU-2 is inert towards thiols. Based on in silico modelling of NU-1 with TcTERT, it is hypothesized that NU-1 reacts with the thiol of hTERT cysteine 931. Treating purified telomerase with saturating (100 μM) NU-1 revealed a gradual but complete loss of telomerase activity over ~6 h; in contrast, 100 μM NU-2 had no effect (Betori et al. 2020). These contrasting kinetic profiles are consistent with a covalent mode of inhibition. Most compelling was NU-1’s efficacy in inhibiting endogenous telomerase in immortal cells. Treating A549 (lung carcinoma) or MCF7 (breast carcinoma) cells with 10 μM NU-1 for 24 h led to a complete loss of telomerase activity; in contrast, NU-2 was indistinguishable from the DMSO (vehicle) control (Fig. 7b). NU-1 has also been shown to sensitize immortal cells to chemotherapeutic agents and radiation (Liu et al. 2022). Treating A549 cells with NU-1 reduced the IC50 of etoposide by ~10-fold. In a mouse tumor model, radiation leads to a temporary reduction in tumor growth before recovering; however, administering NU-1 prior to radiation resulted in complete and irreversible tumor regression (Liu et al. 2022). The affinity of NU-1 is modest (μM) and future research will be needed to improve its affinity and specificity through structure-guided design; as a covalent inhibitor, a complex of telomerase bound by NU-1 may be particularly amenable to structural studies.

40

Telomerase

1311

Imetelstat As of 2022, only one telomerase inhibitor has progressed to clinical trials: the synthetic antisense oligonucleotide Imetelstat (GRN163L), developed by Geron Corporation (Menlo Park, CA, USA) (Chart 1). The idea of using antisense oligomers to block the template region of the telomerase RNA (hTR) through complementary base pairing became readily apparent with the realization that, biochemically, telomerase is a reverse transcriptase. As part of functionally validating the hTR gene at the time of its discovery, the authors demonstrated inhibition of telomerase with an antisense RNA (Feng et al. 1995). DNA and RNA oligomers are rapidly degraded in vivo by nucleases and thus are not ideal as drug candidates. To circumvent nuclease susceptibility, the developers of Imetelstat utilized N30 ! P50 thiophosphoramidates (NPS, Chart 1) of the same base sequence. The NPS oligonucleotide displayed high duplex stability with hTR, with a melting temperature of 70  C (Shea-Herbert et al. 2002). In practical terms, such stability would translate to essentially irreversible inhibition of telomerase once the duplex is formed; the complex of telomerase with NPS-TAGGGTTAGACAA can be observed as a discrete band on a native polyacrylamide gel, attesting to its stability (Shea-Herbert et al. 2002). Treating immortal cells in culture with GRN163 in conjunction with a lipid-based carrier reduced the IC50s by ~102–103; these observations inspired the addition of a lipid (palmitoyl) to the 50 end, creating GRN163L (Imetelstat) (Herbert et al. 2005). Consistent with telomerase being a target of broad scope, Imetelstat displayed antitumor efficacy in preclinical studies of in vivo mouse xenograft models for a range of human cancers derived from the lung, liver, breast, and others (Harley 2008). Imetelstat has also shown efficacy against cancer stem cells: rare cells within a tumor that can self-renew, display greater resistance to radiation and chemotherapy, and contribute to tumor recurrence. In the brain cancer glioblastoma, telomere length analysis supported telomerase inhibition as a viable approach: normal (mortal) brain cells displayed an average telomere length of 12 kb, compared to 6 kb for bulk glioblastoma tumor cells, and just ~3.5 kb for glioblastoma stem cells (Marian et al. 2010). These observations are consistent with the prevailing hypothesis that normal cells have longer telomeres than tumor cells, providing a therapeutic window to apply a telomerase inhibitor. Continuous culture of glioblastoma stem cells with 2 μM GRN163L resulted in a decrease in their rate of proliferation and their ability to seed new tumor masses when plated at low density, accompanied by progressive telomere shortening, from ~3.5 to P50 thiophosphoramidate oligonucleotide, enhances the potency of telomerase inhibition. Oncogene 24:5262–5268 Hoffman H, Rice C, Skordalakes E (2017) Structural analysis reveals the deleterious effects of telomerase mutations in bone marrow failure syndromes. J Biol Chem 292:4593–4601 Horn S, Figl A, Rachakonda PS, Fischer C, Sucker A, Gast A, Kadel S, Moll I, Nagore E, Hemminki K et al (2013) TERT promoter mutations in familial and sporadic melanoma. Science 339:959–961 Huang FW, Hodis E, Xu MJ, Kryukov GV, Chin L, Garraway LA (2013) Highly recurrent TERT promoter mutations in human melanoma. Science 339:957–959 Jacobs SA, Podell ER, Cech TR (2006) Crystal structure of the essential N-terminal domain of telomerase reverse transcriptase. Nat Struct Mol Biol 13:218–225 Jansson LI, Hentschel J, Parks JW, Chang TR, Lu C, Baral R, Bagshaw CR, Stone MD (2019) Telomere DNA G-quadruplex folding within actively extending human telomerase, vol 116. Proc Natl Acad Sci U S A, p 9350 Jurczyluk J, Nouwens AS, Holien JK, Adams TE, Lovrecz GO, Parker MW, Cohen SB, Bryan TM (2011) Direct involvement of the TEN domain at the active site of human telomerase. Nucleic Acids Res 39:1774–1788 Killela PJ, Reitman ZJ, Jiao Y, Bettegowda C, Agrawal N, Diaz Jr LA, Friedman AH, Friedman H, Gallia GL, Giovanella BC et al (2013) TERT promoter mutations occur frequently in gliomas and a subset of tumors derived from cells with low rates of self-renewal. Proc Natl Acad Sci U S A 110:6021–6026

40

Telomerase

1315

Kim NW, Piatyszek MA, Prowse KR, Harley CB, West MD, Ho PL, Coviello GM, Wright WE, Weinrich SL, Shay JW (1994) Specific association of human telomerase activity with immortal cells and cancer. Science 266:2011–2015 Latrick CM, Cech TR (2010) POT1-TPP1 enhances telomerase processivity by slowing primer dissociation and aiding translocation. EMBO J 29:924–933 Lewis KA, Wuttke DS (2012) Telomerase and telomere-associated proteins: structural insights into mechanism and evolution. Structure 20:28–39 Li Y, Zhou QL, Sun W, Chandrasekharan P, Cheng HS, Ying Z, Lakshmanan M, Raju A, Tenen DG, Cheng SY et al (2015) Non-canonical NF-κB signalling and ETS1/2 cooperatively drive C250T mutant TERT promoter activation. Nat Cell Biol 17:1327–1338 Lim CJ, Cech TR (2021) Shaping human telomeres: from shelterin and CST complexes to telomeric chromatin organization. Nat Rev Mol Cell Biol 22:283–298 Lin J, Ly H, Hussain A, Abraham M, Pearl S, Tzfati Y, Parslow TG, Blackburn EH (2004) A universal telomerase RNA core structure includes structured motifs required for binding the telomerase reverse transcriptase protein. Proc Natl Acad Sci U S A 101:14713–14718 Lingner J, Cech TR (1996) Purification of telomerase from Euplotes aediculatus: requirement of a primer 30 overhang. Proc Natl Acad Sci U S A 93:10712–10717 Lingner J, Hughes TR, Shevchenko A, Mann M, Lundblad V, Cech TR (1997) Reverse transcriptase motifs in the catalytic subunit of telomerase. Science 276:561–567 Liu Y, Betori RC, Pagacz J, Frost GB, Efimova EV, Wu D, Wolfgeher DJ, Bryan TM, Cohen SB, Scheidt KA et al (2022) Targeting telomerase reverse transcriptase with the covalent inhibitor NU-1 confers immunogenic radiation sensitization. Cell Chem Biol 29:1517–1531.e1517 Marian CO, Cho SK, McEllin BM, Maher EA, Hatanpaa KJ, Madden CJ, Mickey BE, Wright WE, Shay JW, Bachoo RM (2010) The telomerase antagonist, imetelstat, efficiently targets glioblastoma tumor-initiating cells leading to decreased proliferation and tumor growth. Clin Cancer Res 16:154–163 Mascarenhas J, Komrokji RS, Palandri F, Martino B, Niederwieser D, Reiter A, Scott BL, Baer MR, Hoffman R, Odenike O et al (2021) Randomized, single-blind, multicenter phase II study of two doses of imetelstat in relapsed or refractory myelofibrosis. J Clin Oncol 39:2881–2892 Massenet S, Bertrand E, Verheggen C (2017) Assembly and trafficking of box C/D and H/ACA snoRNPs. RNA Biol 14:680–692 Mitchell JR, Cheng J, Collins K (1999a) A box H/ACA small nucleolar RNA-like domain at the human telomerase RNA 30 end. Mol Cell Biol 19:567–576 Mitchell JR, Wood E, Collins K (1999b) A telomerase component is defective in the human disease dyskeratosis congenita. Nature 402:551–555 Morin GB (1989) The human telomere terminal transferase enzyme is a ribonucleoprotein that synthesizes TTAGGG repeats. Cell 59:521–529 Moyzis RK, Buckingham JM, Cram LS, Dani M, Deaven LL, Jones MD, Meyne J, Ratliff RL, Wu JR (1988) A highly conserved repetitive DNA sequence, (TTAGGG)n, present at the telomeres of human chromosomes. Proc Natl Acad Sci U S A 85:6622–6626 Musgrove C, Jansson LI, Stone MD (2018) New perspectives on telomerase RNA structure and function. WIREs RNA 9:e1456 Nakamura TM, Morin GB, Chapman KB, Weinrich SL, Andrews WH, Lingner J, Harley CB, Cech TR (1997) Telomerase catalytic subunit homologs from fission yeast and human. Science 277: 955–959 Nandakumar J, Bell CF, Weidenfeld I, Zaug AJ, Leinwand LA, Cech TR (2012) The TEL patch of telomere protein TPP1 mediates telomerase recruitment and processivity. Nature 492:285–289 Narayanan A, Lukowiak A, Jady BE, Dragon F, Kiss T, Terns RM, Terns MP (1999) Nucleolar localization signals of box H/ACA small nucleolar RNAs. EMBO J 18:5120–5130 Nassour J, Schmidt TT, Karlseder J (2021) Telomeres and cancer: resolving the paradox. Annu Rev Cancer Biol 5:59–77 Nguyen THD, Tam J, Wu RA, Greber BJ, Toso D, Nogales E, Collins K (2018) Cryo-EM structure of substrate-bound human telomerase holoenzyme. Nature 557:190–195

1316

T. M. Bryan and S. B. Cohen

Olovnikov AM (1973) A theory of marginotomy. The incomplete copying of template margin in enzymic synthesis of polynucleotides and biological significance of the phenomenon. J Theor Biol 41:181–190 Parks JW, Stone MD (2014) Coordinated DNA dynamics during the human telomerase catalytic cycle. Nat Commun 5:4146 Pascolo E, Wenz C, Lingner J, Hauel N, Priepke H, Kauffmann I, Garin-Chesa P, Rettig WJ, Damm K, Schnapp A (2002) Mechanism of human telomerase inhibition by BIBR1532, a synthetic, non-nucleosidic drug candidate. J Biol Chem 277:15566–15572 Peifer M, Hertwig F, Roels F, Dreidax D, Gartlgruber M, Menon R, Kramer A, Roncaioli JL, Sand F, Heuckmann JM et al (2015) Telomerase activation by genomic rearrangements in highrisk neuroblastoma. Nature 526:700–704 Qin J, Autexier C (2021) Regulation of human telomerase RNA biogenesis and localization. RNA Biol 18:305–315 Roake CM, Artandi SE (2020) Regulation of human telomerase in homeostasis and disease. Nat Rev Mol Cell Biol 21:384–397 Rouda S, Skordalakes E (2007) Structure of the RNA-binding domain of telomerase: implications for RNA recognition and binding. Structure 15:1403–1412 Salloum R, Hummel TR, Kumar SS, Dorris K, Li S, Lin T, Daryani VM, Stewart CF, Miles L, Poussaint TY et al (2016) A molecular biology and phase II study of imetelstat (GRN163L) in children with recurrent or refractory central nervous system malignancies: a pediatric brain tumor consortium study. J. Neurooncol. 129:443–451 Sauerwald A, Sandin S, Cristofari G, Scheres SH, Lingner J, Rhodes D (2013) Structure of active dimeric human telomerase. Nat Struct Mol Biol 20:454–460 Schmidt JC, Dalby AB, Cech TR (2014) Identification of human TERT elements necessary for telomerase recruitment to telomeres. eLife 3:e03563 Shay JW, Bacchetti S (1997) A survey of telomerase activity in human cancer. Eur J Cancer 33: 787–791 Shea-Herbert B, Pongracz K, Shay JW, Gryaznov SM (2002) Oligonucleotide N30 ! P50 phosphoramidates as efficient telomerase inhibitors. Oncogene 21:638–642 Smith EM, Pendlebury DF, Nandakumar J (2020) Structural biology of telomeres and telomerase. Cell Mol Life Sci 77:61–79 Sobinoff AP, Pickett HA (2020) Mechanisms that drive telomere maintenance and recombination in human cancers. Curr Opin Genet Dev 60:25–30 Sun D, Lopez-Guajardo CC, Quada J, Hurley LH, von Hoff DD (1999) Regulation of catalytic activity and processivity of human telomerase. Biochemistry 38:4037–4044 Tomlinson CG, Moye AL, Holien JK, Parker MW, Cohen SB, Bryan TM (2015) Two-step mechanism involving active-site conformational changes regulates human telomerase DNA binding. Biochem J 465:347–357 Tomlinson CG, Holien JK, Mathias JA, Parker MW, Bryan TM (2016) The C-terminal extension of human telomerase reverse transcriptase is necessary for high affinity binding to telomeric DNA. Biochimie 128–129:114–121 Tomlinson CG, Sasaki N, Jurczyluk J, Bryan TM, Cohen SB (2017) Quantitative assays for measuring human telomerase activity and DNA binding properties. Methods 114:85 Wallweber G, Gryaznov S, Pongracz K, Pruzan R (2003) Interaction of human telomerase with its primer substrate. Biochemistry 42:589–600 Wang F, Podell ER, Zaug AJ, Yang Y, Baciu P, Cech TR, Lei M (2007) The POT1-TPP1 telomere complex is a telomerase processivity factor. Nature 445:506–510 Wright WE, Piatyszek MA, Rainey WE, Byrd W, Shay JW (1996) Telomerase activity in human germline and embryonic tissues and cells. Dev Genet 18:173–179 Xi L, Cech TR (2014) Inventory of telomerase components in human cells reveals multiple subpopulations of hTR and hTERT. Nucleic Acids Res 42:8565–8577

Telomeres: Structure and Function

41

Scott B. Cohen and Tracy M. Bryan

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Telomeres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Telomeric DNA and Telomere Secondary Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Telomere Binding Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Role of Telomere Binding Proteins in End Protection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Role of Telomere Binding Proteins in Telomere Length Regulation . . . . . . . . . . . . . . . . . . . . . . Telomere Dynamics in Aging and Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aging and Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alternative Lengthening of Telomeres (ALT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Telomere Biology Disorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1318 1319 1319 1320 1322 1323 1324 1324 1326 1327 1328 1329

Abstract

Telomeres are specialized, highly conserved DNA-protein complexes at the ends of linear eukaryotic chromosomes. Human telomeric DNA is composed of tandem repeats of the sequence 50 -(TTAGGG)n-30 and is complexed with sequence-specific DNA binding proteins, forming a distinctive “cap” at the ends of chromosomes. Telomeres serve to distinguish the end of the chromosome from an internal DNA break and the cellular DNA repair machinery, thereby protecting chromosomes from deleterious end-to-end fusions. Telomeres are dynamic structures, shortening during each cycle of DNA replication and cell division. In the absence of a compensatory telomere-lengthening mechanism, progressive telomere shortening imposes limits on the proliferative capacity of

S. B. Cohen · T. M. Bryan (*) Children’s Medical Research Institute, Faculty of Medicine & Health, University of Sydney, Sydney, NSW, Australia e-mail: [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_104

1317

1318

S. B. Cohen and T. M. Bryan

cells, contributing to organismal aging. Cells with unlimited proliferative capacity – notably stem cells and cancer cells – must activate a telomere lengthening mechanism. There are two such mechanisms known in humans: the ribonucleoprotein enzyme telomerase (discussed in ▶ Chap. 40, “Telomerase”) and the recombination-based mechanism Alternative Lengthening of Telomeres. Most human somatic cells do not maintain their telomeres and undergo telomere shortening. In contrast, dysregulated telomere maintenance is a universal property of cancer cells. Understanding the biology and dynamics of telomeres has far-reaching implications for human health and medicine. Keywords

Telomere · Shelterin · Telomere shortening · Cancer · Cellular immortality

Introduction The evolution from prokaryotic life with circular chromosomes to eukaryotes and linear chromosomes posed a new challenge for cellular biology, from the simple fact that linear chromosomes have ends. How can these DNA ends be protected and distinguished from an internal DNA break that would require repair? Telomeres, distinctive DNA-protein “caps” at the ends of linear chromosomes, provide a structural solution to this problem, and are an almost universal feature of eukaryotic genomes. Telomeres also provide a functional solution to another challenge that life encountered: with the development of complex multicellular organisms, biology required a mechanism to control cellular proliferation. Telomeres are dynamic structures that shorten through each cycle of DNA replication – essentially a molecular counting mechanism – that can provide a barrier to unlimited cellular proliferation. Conversely, activation of a telomere maintenance mechanism, either the ribonucleoprotein enzyme telomerase (▶ Chap. 40, “Telomerase”) or the recombination-based mechanism Alternative Lengthening of Telomeres (ALT), provides a mechanism to replenish telomeres, often conferring the cell with unlimited proliferative potential, i.e., immortality. In humans, telomere dynamics are inextricably linked to cellular lifespan. Aberrant telomere maintenance is an almost universal property of cancer cells, sustaining their immortal phenotype. Conversely, insufficient telomere maintenance is associated with age-related degeneration and inherited disorders of stem cell maintenance. This chapter will present a high-level overview of the topics of human telomere structure and binding proteins, the role of telomeres in the DNA damage response and genome integrity, and telomere dynamics as related to disease. For further details and references, the reader is referred to the many excellent reviews on these topics (de Lange 2018; Musgrove et al. 2018; Roake and Artandi 2020; Lim and Cech 2021, and others cited below).

41

Telomeres: Structure and Function

1319

Telomeres Telomeric DNA and Telomere Secondary Structures Telomeric DNA is almost universally composed of tandem repeats of a short (6–8 base pairs) G-rich sequence. The first sequence of a telomere was obtained by Elizabeth Blackburn and Joe Gall from the model single-celled protozoan Tetrahymena thermophila (Blackburn and Gall 1978) as tandem repeats of the sequence 50 -(TTGGGG)n-30 . This unique array of DNA distinguishes the telomere from the rest of the chromosome, providing a way to prevent deleterious end-to-end fusions through the cell’s DNA repair pathways. As discussed below, a complex system of sequence-specific binding proteins has evolved to further protect the telomere. Human telomeres are tandem repeats of 50 -(TTAGGG)n-30 (Moyzis et al. 1988). The similarity to the Tetrahymena telomeric sequence illustrates the high conservation of telomeres and their essential role in genomic stability. Human telomeres range from ~5000 to 10,000 base pairs of double-stranded TTAGGG repeats and are terminated by a 30 single-stranded overhang of ~130–250 TTAGGG nucleotides (reviewed in Bonnell et al. 2021) (Fig. 1a). Telomeric DNA is dynamic and can adopt at least two alternative secondary structures. The single-stranded 30 overhang can strand-invade into the region of double-stranded DNA (dsDNA), hybridizing with the C-rich strand and displacing the G-rich strand, generating a T-loop (Griffith et al. 1999; de Lange 2018). T-loops have been observed directly from cellular telomeric DNA preparations using electron microscopy (Fig. 1b). T-loops vary in size, from a few hundred base pairs to several thousand; as described below, they play an essential role in protecting the end of the chromosome from being recognized as a site of DNA damage. The G-rich nature of telomeric DNA also allows the formation of compact G-quadruplex structures (Fig. 1c) (Williamson et al. 1989). Four guanines can associate through Hoogsteen base-pairing in a planar tetrad; when three or more tetrads can stack, the result is a G-quadruplex (G4). In vitro studies of telomeric DNA oligonucleotides have revealed extensive heterogeneity in the structure of G4s (Phan 2010). Telomeric G4s can be made up of one, two, or four strands of DNA; furthermore, the 50 ! 30 orientation of the strands can be in the same direction (parallel quadruplex) or opposite (anti-parallel) (Bryan 2020). Some of these structures display remarkable stability, with thermal melting temperatures of 80–90  C. A fluorescently labeled antibody against G4 has provided compelling evidence for the existence of these structures throughout the genome in human cells (Biffi et al. 2013). When combined with a fluorescently labeled telomeric PNA probe, G4s were observed at telomeres (Moye et al. 2015), although it is not yet known if they are present on the single-stranded overhangs. A specific biological function(s) for telomeric G4s has not yet been established; however, it is hypothesized that the variety of potential G4 conformations may underlie different biological properties (Bryan 2020).

1320

S. B. Cohen and T. M. Bryan

Fig. 1 Telomeric DNA. (a) Human telomeric DNA is composed of double- and single-stranded 50 -TTAGGG-30 repeats. (b) Electron micrograph of a T-loop from the human cell line HeLa; this T-loop is ~10 kb in size (from Griffith et al. 1999). (c) A G-tetrad: four guanines in a planar geometry stabilized by Hoogsteen base pairing and further stabilized by a monovalent cation, usually potassium. Three or more G-tetrads can stack to form a G4; shown are four-stranded (left) and two-stranded (right) parallel telomeric G4s (yellow, dT; green, dA; purple, dG). (d) Potential dynamics of telomere structure incorporating T-loops and G4s. A T-loop is formed by intercalation of a telomeric 30 overhang into the duplex portion of a telomere; it is possible that G4 could form in the displaced G-strand (the “D-loop”), possibly involving association with the RNA transcribed from telomeres (green) (reviewed in Bryan 2020)

Telomere Binding Proteins Telomeres are bound by complexes of specialized proteins that mediate their protection and replication. In humans, the six key telomere-binding proteins form a complex known as shelterin (Fig. 2), although subcomplexes of these six proteins also likely exist (Denchi and de Lange 2007; Takai et al. 2010; Lim et al. 2017; de Lange 2018). There has been a large amount of recent progress in determining highresolution structures of shelterin proteins and their complexes, from human and model organisms, which is providing a rich source of insight into their functions at telomeric DNA (reviewed in Lewis and Wuttke 2012; Smith et al. 2020; Lim and Cech 2021). The essential core of shelterin consists of two proteins that bind doublestranded telomeric repeats via their C-terminal myb-like domains, TRF1 and TRF2, encoded by the genes TERF1 and TERF2, respectively (Fig. 2, bottom, left and third from left). Both proteins homodimerize through a domain known as the TRF

41

Telomeres: Structure and Function

1321

Fig. 2 Protection of the telomere by the shelterin proteins. Center: schematic of the current understanding of interactions between shelterin components and telomeric DNA. TRF1 (greens) and TRF2 (oranges) form homodimers that interact with double-stranded telomeric repeats via their C-terminal myb domains, while POT1 (pinks) binds to the single-stranded telomeric overhang. TPP1 (blues) is a binding partner of POT1 and provides a docking site for telomerase. TIN2 (reds) tethers the ssDNA and dsDNA binding proteins together, and RAP1 (yellow) interacts with TRF2. Outer panels: crystal structures of regions of interaction; protein domains used in structure determination in brackets. Clockwise from top left: a dimer of the TRF2 TRFH domain interacting with RAP1 peptides (PDB 4RQI); a dimer of the TRF1 TRFH domain interacting with TIN2 peptides (PDB 3BQO); the POT1 C-terminal domain interacting with the TPP1 central PBD domain (PDB 5UN7 and 5H65); the OB-fold domain of TPP1, with residues shown to interact with hTERT in cyan (PDB 2I46); the POT1 OB domain in complex with ssDNA (PDB 1XJV); the TRF1 myb domain in complex with dsDNA (PDB 1W0T); the TIN2 TRFH-like domain interacting simultaneously with peptides from TRF2 and TPP1 (PDB 5XYF); the TRF2 myb domain in complex with dsDNA (PDB 1W0U); the RAP1 C-terminal RCT domain interacting with a peptide from TRF2 (PDB 3K6G). Note that subcomplexes of these six proteins also exist (Takai et al. 2010; Lim et al. 2017)

homology (TRFH) domain, resulting in a very similar cleft-like topology, which serves as a binding platform for other proteins (Fig. 2, top, left and middle). The single-stranded telomeric overhang is protected by the protein POT1 (Baumann and Cech 2001; Loayza and de Lange 2003). The N-terminal half of POT1 contains two oligonucleotide/oligosaccharide-binding folds (OB-folds), which are a conserved single-stranded DNA (ssDNA) binding motif, and in POT1 they mediate very tight and specific binding to the sequence 50 -TTAGGGTTAG-30 (Lei et al. 2004; Fig. 2, bottom right). The other half of POT1 forms extensive

1322

S. B. Cohen and T. M. Bryan

interactions with its binding partner TPP1 (encoded by the gene ACD) (Smith et al. 2020; Fig. 2, top right). TPP1 in turn binds to the protein TIN2 (encoded by gene TINF2), which links to both TRF1 and TRF2, forming a bridge between the ssDNA and dsDNA-binding portions of shelterin. TIN2 has a TRFH-like domain that can simultaneously bind to TRF2 and TPP1 at opposite ends (Fig. 2, bottom, second from left), while a separate domain contains a conserved TRF1-binding motif (TBM), F-X-L-X-P, that docks into the TRFH cleft of TRF1 (Lim and Cech 2021; Fig. 2, top middle). The sixth member of shelterin is RAP1, which localizes to telomeres through extensive interactions with both the central domain and the TRFH domain of TRF2 (Lewis and Wuttke 2012; Smith et al. 2020; Fig. 2, left and top left).

Role of Telomere Binding Proteins in End Protection Telomeres, since they are at the ends of the chromosomes, resemble broken DNA. Cells have evolved a very elaborate set of molecular defenses against broken DNA known as the DNA damage repair (DDR) response, but it is important that undamaged telomeres evade being “repaired” by these pathways. The integrated actions of the six shelterin proteins constitute a very effective defense mechanism for telomeres; together they inhibit the activation of at least six different DDR pathways (reviewed in de Lange 2018). TRF2, in particular, plays a major role in protection of telomeres from a DDR response, by promoting formation of a T-loop structure to mask the free telomeric end (Griffith et al. 1999; Stansel et al. 2001; Doksani et al. 2013). It does this by promoting invasion of the 30 overhang into double-stranded telomeric DNA (Amiard et al. 2007). Once formed, the junction of a T-loop resembles a Holliday Junction (HJ) recombination intermediate, and thus is in danger of stimulating recruitment of HJ resolvases that can cleave the T-loop from the telomere, resulting in loss of telomeric DNA (Wang et al. 2004; Vannier et al. 2012). In addition to helping T-loops form, TRF2 also plays a role in protecting them from this homologous recombination (HR)-mediated cleavage through binding of its N-terminal basic domain to the T-loop junction (Rai et al. 2016; Schmutz et al. 2017). RAP1 assists TRF2 in this protection against HR-mediated telomere loss (Sfeir et al. 2010; Rai et al. 2016). The promotion and protection of T-loops by TRF2 prevents telomeres from activating the major DNA damage response kinase ATM (Van Ly et al. 2018), whose activation would lead to a cascade of DDR-related signaling events at telomeres (Takai et al. 2003; Karlseder et al. 2004). TRF2 also plays a key role in preventing telomeres from ligating to each other (van Steensel et al. 1998), which would result in dicentric chromosomes that get pulled to opposite ends of the spindle during cell division, leading to chromosome breakage and genome instability. If TRF2 is deleted in mouse cells, almost every telomere is subjected to end-to-end fusions via the process of nonhomologous end-joining (NHEJ) (Celli and de Lange 2005). While the function of TRF2 in T-loop folding undoubtedly contributes to prevention of NHEJ, since the buried 30 telomere end would be unavailable for fusion, TRF2 also plays a separate role in direct prevention of NHEJ. If TRF2 is partially depleted from human cells or

41

Telomeres: Structure and Function

1323

mutated such that it can no longer maintain T-loops, an ATM-dependent DDR signal is generated at telomeres, but end-to-end fusions remain repressed (Cesare et al. 2009). Part of this repression originates from the TRFH domain of TRF2, which serves as a docking site for transient recruitment of other proteins, some of which assist in protection against NHEJ. For example, the nuclease Apollo is recruited by TRF2 to maintain leading-strand telomere overhangs, which prevent the ends being recognized as a blunt-ended substrate for NHEJ (Wu et al. 2010; reviewed in de Lange 2018). TRF2 also directly inhibits fusions by blocking the propagation of DNA damage signaling downstream of ATM (Okamoto et al. 2013). The POT1-TPP1 branch of the shelterin complex contributes to telomere protection by blocking activation of the other master DDR kinase, ATR (Denchi and de Lange 2007), which is activated by the presence of unprotected ssDNA and can trigger HR-mediated repair (Saldivar et al. 2017). The first step in ATR activation is binding of the ssDNA by the OB fold-containing protein RPA; by coating singlestranded telomeric DNA with high affinity, POT1 blocks RPA from binding (Gong and de Lange 2010). POT1 needs to be tethered to the telomere by TPP1 for this function, which in turn needs to be tethered to the telomere via TIN2, making these two proteins also indispensable for protection of the single-stranded telomere overhang (de Lange 2018). TRF1 contributes to telomere maintenance by promoting passage of the DNA replication machinery through telomeric DNA. The repetitive nature of telomeric DNA repeats poses a challenge for the DNA replication machinery (reviewed in Bonnell et al. 2021), likely due at least in part to formation of secondary structures such as G4 (Fig. 1c), which can stall the replication process (reviewed in Bryan 2019). If TRF1 is depleted from either human or mouse cells, there is an increase in telomere aberrations known to be caused by “stalling” of DNA replication forks (Sfeir et al. 2009). There appear to be several mechanisms by which TRF1 assists replication of telomeres, including recruitment of G4 helicases such as BLM, and through its role in tethering TIN2 and POT1-TPP1, thereby preventing ATR activation (Sfeir et al. 2009; Zimmermann et al. 2014). Overall, it is now clear that the cooperative action of all six shelterin proteins is essential for telomere protection and genome integrity.

Role of Telomere Binding Proteins in Telomere Length Regulation The ribonucleoprotein enzyme telomerase maintains telomere length in unicellular eukaryotic organisms, and in germline, embryonic and highly proliferative cells of most multicellular organisms (reviewed in Batista 2014). The protein and RNA components of telomerase assemble to form the active enzyme, which is only recruited to telomeres when needed. In mammals, the shelterin complex plays a critical role in regulating access of telomerase to the telomere, thereby keeping telomeres within a defined range of lengths. If telomeres become too short, they can trigger senescence, which leads to aging and problems in stem cell maintenance (discussed in section “Telomere Dynamics in Aging and Disease” below).

1324

S. B. Cohen and T. M. Bryan

If telomeres become hyper-elongated, they trigger a “telomere trimming” mechanism that brings them back within their equilibrium telomere length range, implying that excessively long telomeres are also detrimental to mammalian cells (Pickett and Reddel 2012). In human cells using telomerase to maintain telomeres (e.g., germ cells, stem cells, and most cancers), regulation of the timing and extent of telomerase access to the telomere plays an important part in telomere length control (Nandakumar and Cech 2013; Roake and Artandi 2020). TRF1 has long been known to be a negative regulator of telomere length (van Steensel and de Lange 1997); this effect is mediated through control of telomerase at the same telomere (Ancelin et al. 2002). Depletion of TRF1 from immortal human cells revealed that this effect is mediated by restricting telomerase recruitment to telomeres: in the presence of TRF1, telomerase is recruited to telomeres primarily in the S phase of the cell cycle (Jady et al. 2006; Tomlinson et al. 2006), but depletion of TRF1 leads to inappropriate retention of telomerase at telomeres in the subsequent G2 and/or mitotic (M) phases of the cell cycle (Tong et al. 2015). The mechanism by which TRF1 regulates telomerase access is not established, but may involve its role in promoting DNA replication through telomeres (discussed above), since the replication fork stalling that occurs in the absence of TRF1 is known to act as a signal for ATR-mediated recruitment of telomerase to telomeres (Tong et al. 2015). Conversely, the shelterin component TPP1 plays a major role in tethering telomerase to telomeres, enabling telomere elongation (reviewed in Nandakumar and Cech 2013). TPP1 interacts with telomerase, and its OB fold domain is required for telomerase recruitment to telomeres. The region of interaction between TPP1 and telomerase has been mapped to a surface known as the “TEL patch” on TPP1, and a surface of the N-terminal TEN domain in the catalytic reverse-transcriptase protein subunit of human telomerase, hTERT (Fig. 2, center right).

Telomere Dynamics in Aging and Disease Telomere length dynamics – the balance of telomere shortening and telomere maintenance by telomerase – is normally a highly regulated process. When the regulation is lost and the balance disrupted, usually as a consequence of mutation, the result is invariably disease: too much telomerase or aberrant telomere maintenance by ALT is strongly associated with cancer, but too little telomerase can likewise be highly deleterious to normal physiology and development.

Aging and Cancer The DNA replication machinery cannot completely replicate the ends of linear chromosomes while also maintaining the required 30 overhang of a telomere, leading to progressive loss of telomeric DNA with continued cell division (Fig. 3). The idea that telomere shortening would impose limits on the proliferative capacity of cells was first proposed in 1973 by the theoretical biologist Olovnikov (Olovnikov 1973)

41

Telomeres: Structure and Function

1325

Fig. 3 Telomere dynamics in cell biology. Stem cells and cells of the germ line (egg, sperm progenitors) express telomerase to maintain stable telomere length. In contrast, most somatic cells lack a telomere maintenance mechanism, leading to progressive telomere shortening with successive cell divisions until the cell enters the non-proliferative state of senescence. Mutations may allow a cell to bypass senescence, until further telomere shortening induces crisis and cell death. Activation of either telomerase or ALT to maintain telomeres allows a cell to continue proliferating and confers cellular immortality

and confirmed experimentally with cell lines in culture (Harley et al. 1990). Telomere shortening represents a molecular counting mechanism to limit the number of times a cell can divide and is a natural part of cellular aging. Telomere shortening will eventually result in a subset of telomeres that no longer recruit sufficient shelterin proteins to prevent a DNA damage response, which triggers the cell to enter the p53-dependent non-proliferative state of senescence (reviewed in Nassour et al. 2021). If the tumor suppressor p53 and the other senescence-controlling tumor suppressor Rb are inactivated in any cell, it can bypass the senescence trigger and continue dividing and experiencing telomere shortening (Counter et al. 1992). Ultimately, its telomeres will become too short to support any shelterin binding or T-loop formation, at which point telomeres will fuse to each other, resulting in dicentric chromosomes and genome instability. This leads to a state of autophagymediated cell death known as crisis (Nassour et al. 2019, 2021). Besides limitations on telomere maintenance imposed by the mechanics of normal DNA replication, telomeres are also exquisitely sensitive to other forms of damage. For example, guanine is particularly sensitive to oxidative damage, which can result when reactive oxygen species overwhelm the cellular mechanisms for repair of such damage, generating 8-oxoguanine (8-oxo-G) (reviewed in Barnes et al. 2019). The folding of telomeric DNA into G4 can render 8-oxo-G resistant to repair by DNA glycosylases, and telomeres are also preferred sites of iron-mediated

1326

S. B. Cohen and T. M. Bryan

Fenton reactions that induce cleavage 50 to GGG sequences. Increased oxidative stress leads to accelerated telomere shortening and stochastic telomere loss in mouse tissues and cultured normal human cells, and human or mouse cells lacking enzymes involved in repair of oxidative DNA damage also exhibit telomere aberrations (Barnes et al. 2019). The use of a chemoptogenetic tool to induce formation of 8-oxo-G specifically at telomeres allowed the remarkable demonstration that oxidative telomere damage can lead to rapid cellular senescence, independent of telomere shortening or the level of shelterin proteins at telomeres (Barnes et al. 2022). Like senescence resulting from telomere shortening, oxidative damage-induced senescence involved an ATM- and p53-dependent DNA damage signal, but this was a result of replicative stress at telomeres rather than short telomeres (Barnes et al. 2022).

A telomere maintenance mechanism is essential for a cell to avoid crisis in response to short and dysfunctional telomeres and continue proliferating. While most normal somatic cells lack a telomere maintenance mechanism, dysregulated telomerase activity is associated with ~90% of human cancers (Kim et al. 1994; Shay and Bacchetti 1997), and most of the remaining cancers have activated ALT (Henson and Reddel 2010). Numerous genetic experiments performed with cell lines in culture have provided proof-of-principle for inhibiting telomere maintenance as a potential strategy to treat cancer, driving extensive research into telomerase and ALT, and the development of potent and specific small-molecule inhibitors (see ▶ Chap. 40, “Telomerase”).

Alternative Lengthening of Telomeres (ALT) In the absence of telomerase, both unicellular and multicellular organisms have back-up mechanisms for telomere lengthening. Telomerase-independent telomere elongation was first described in telomerase-deficient budding yeast, which use a homology-directed repair (HDR) means of telomere lengthening via DNA recombination to avoid senescence (Lundblad and Blackburn 1993). Subsequently, a sizeable minority of human cancers and immortal cell lines were also found to use an HDR mechanism for telomere elongation, which was named Alternative Lengthening of Telomeres (ALT) (Bryan et al. 1995, 1997). Human ALT involves invasion of one strand of a short telomere into the double-stranded region of a longer telomere, and use of the latter as a copy template for elongation (Fig. 3; Dunham

41

Telomeres: Structure and Function

1327

et al. 2000). In both human and yeast cells, there are at least two separate pathways of HDR-mediated telomere elongation, involving different subsets of the proteins that mediate single-stranded DNA invasion and annealing (Le et al. 1999; Verma et al. 2019; Zhang et al. 2019). Human ALT-mediated telomere synthesis utilizes DNA polymerase δ and PCNA, the polymerase complex involved in several DNA repair processes, and processing of recombination intermediates by a complex containing the BLM helicase is essential for productive telomere elongation (reviewed in Sobinoff and Pickett 2020). ALT telomeres cluster in phase-separated nuclear bodies containing the protein PML, which is where ALT telomere elongation takes place (Draskovic et al. 2009; Zhang et al. 2019). HDR-mediated DNA recombination events are normally repressed at human telomeres, so the molecular changes that unleash telomeric recombination in ALT cells is an active area of research. One of the most common genetic alterations in ALT cancers is mutation of the chromatin-modifying protein ATRX (Heaphy et al. 2011), whose loss leads to progressive telomere decompaction (Li et al. 2019). Altered telomeric chromatin and aberrant recruitment of other chromatin modifiers such as the NuRD complex are features of ALT telomeres, and there is some evidence that this fosters a recombinogenic environment (Conomos et al. 2014; Episkopou et al. 2014; Bhargava et al. 2022). Several factors also contribute to increased replication fork stalling and DNA damage at ALT telomeres; the resulting breaks likely contribute to recombination (reviewed in Bhargava et al. 2022). The ongoing study of telomere biology in ALT cells and cancers will not only reveal vulnerabilities of these cells that could be harnessed for potential cancer therapies but will also provide fundamental insights into the regulation of DNA replication and repair at telomeres.

Telomere Biology Disorders Embryonic cells and stem cells of many tissues express carefully regulated telomerase to maintain the length and integrity of their telomeres, thereby supporting their high proliferative capacity. During embryonic development, telomerase is gradually downregulated through silencing of hTERT expression as cells differentiate into mature tissues (Wright et al. 1996). Germline cells are also telomerase-positive, ensuring that proper telomere length and organismal lifespan are conferred onto the next generation (Wright et al. 1996). Individuals carrying mutation(s) in any of the genes encoding the components of telomerase (e.g., hTERT, the RNA subunit hTR, or the hTR-stabilizing protein dyskerin), or in telomere-related genes (section “Telomere Binding Proteins”) that are necessary for telomere maintenance, do not fully maintain telomere length in the germ line or in their stem cells; this is manifest in the form of abnormally short telomeres and insufficient telomere maintenance being passed onto the offspring, leading to a collection of diseases referred to as Telomere Biology Disorders (TBDs). The first disease linked to short telomeres was dyskeratosis congenita (DC); mutations in the DKC1 gene, encoding the protein dyskerin, were identified through

1328

S. B. Cohen and T. M. Bryan

familial linkage analysis (Heiss et al. 1998). It was subsequently demonstrated that dyskerin stabilizes hTR levels in the cell and is an integral component of the telomerase enzyme complex (Mitchell et al. 1999). Subsequently, the genes TERC (encoding hTR) and TERT (encoding hTERT) were also found to be mutated in patients with DC, making it clear that this is a disease of insufficient telomere maintenance (Savage 2018). There are now at least 16 genes associated with TBDs, all of which encode proteins with links to telomere biology (reviewed in Bertuch 2016; Savage 2018). Besides the genes encoding the core components of telomerase, patient-associated mutations have also been found in telomerase-binding H/ACA proteins (NOP10, NHP2 and NAF1), proteins involved in hTR biogenesis (PARN, ZCCHC8), telomerase recruitment factors (TPP1, TCAB1), shelterin components (TIN2, POT1, RAP1), and telomere replication proteins (CTC1, STN1, RTEL1). The causative mutation remains to be identified in ~30% of TBD families, so there are likely more TBD genes to be identified. The cellular outcome of mutations in any of these genes is the same: short and deprotected telomeres that trigger a DNA damage response leading to cellular senescence. TBDs are variable and multisystem disorders, with clinical manifestations that include bone marrow failure, pulmonary fibrosis, liver cirrhosis, gastrointestinal symptoms, dental abnormalities, and predisposition to malignancies (Bertuch 2016). TBD mutations can display highly variable penetrance within families, with respect to both the degree of telomere shortening and the clinical phenotype. TBDs also demonstrate anticipation, i.e., an increase in severity of the phenotype and earlier onset of disease in each generation, due to inheritance of shorter telomeres (Armanios et al. 2005). There is an age-related spectrum of phenotypes; pediatric patients often present with bone marrow failure, whereas pulmonary fibrosis tends to occur later in life. Even in patients with no family history of disease, TBD mutations have been identified in 1–3% of sporadic idiopathic pulmonary fibrosis (IPF). Since IPF is a very common disease, the true incidence of telomere-related disease in the population is probably not yet fully appreciated, and probably represents a spectrum going from rare, very deleterious mutations, to more common gene variants leading to age-related degenerative phenotypes (Armanios 2013; Savage 2018).

Conclusions and Future Directions While telomeres constitute only 99.6%. In addition, it was found that the small population of cells that lost the UBP did not overtake the culture, supporting the conclusion that the UBP does not impart the SSO with a significant growth disadvantage.

UBP Optimization Using the SSO Unlike the in vitro replication of the dNaM-dTPT3 UBP, replication of the UBP in the SSO showed significant sequence biases. The in vitro and in vivo environments are distinct, and thus a screen of candidate UBPs was undertaken in which 135 paired

1384

F. E. Romesberg

combinations drawn from 91 different unnatural triphosphates were added to the SSO growth media to identify those able to support high level retention of a plasmidbased, corresponding UBP. While there was much agreement with the previously collected in vitro SAR, several interesting differences were observed in the in vivo environment of the SSO. Most notable among these were four additional UBPs that are replicated more efficiently and with less sequence bias than dNaM-dTPT3: dMMO2-dTPT3, d5FM-dTPT3, dCNMO-dTPT3, and dCIMO-dTPT3 (Fig. 3). These results highlight an important concept in the process of designing synthetic parts for use in living organisms in that different environments can yield different behaviors. In addition, since multiple cellular systems and components are involved, the introduction of any synthetic parts may have unintended and unexpected consequences. The interested reader is directed to a review about the interplay between chemistry and synthetic biology as it pertains to UBP development and the expansion of the genetic alphabet (Feldman and Romesberg 2018).

UBP Decoding in an SSO In the next stage of SSO development, the ability of the SSO to both store information with the UBP and retrieve information in the form of encoded protein was examined. The sfGFP gene was modified at codon 151, replacing the native TAC codon with the unnatural codon AXC (i.e., sfGFP(AXC)151; X ¼ NaM), and the resulting sfGFP gene was placed on a plasmid along with an E. coli tRNASer gene (serT) recoded with the cognate unnatural anticodon GYT (tRNASer(GYT); Y ¼ TPT3) (Zhang et al. 2017b) and used to transform the optimized E. coli SSO. While ultimately the goal would be to use the system to encode noncanonical amino acids (ncAAs), in the initial experiments, Ser was assigned to the unnatural codon since E. coli seryl-tRNA synthetase does not rely on anticodon recognition for tRNA aminoacylation and thus avoids complications due to poor tRNA charging. The resulting SSO was cultured in media containing both the deoxy- and ribo-variants of the unnatural triphosphates, i.e., with dNaMTP and dTPT3TP, as well as NaMTP and TPT3TP. Production of T7RNAP and tRNASer(GYT) was induced by addition of isopropyl β-D-1-thiogalactopyranoside (IPTG), and then expression of sfGFP (AXC)151 was induced by addition of anhydrotetracycline (aTc). The production of sfGFP was monitored by fluorescence, and cell lysates were further analyzed by Western blotting with an α-GFP antibody; observed results were consistent with the production of full-length sfGFP. Little fluorescence was observed in control experiments lacking tRNASer(GYT) or containing any of the four natural “near-cognate” tRNAs (tRNASer(GGT), tRNASer(GCT), tRNASer(GAT), or tRNASer(GTT)). The fidelity of decoding in the complete SSO system expressing both tRNASer(GYT) and sfGFP(AXC)151 was assessed using liquid chromatography–tandem mass spectrometry (LC–MS/MS) and showed that virtually all (98.5%) of the sfGFP produced contained the encoded Ser. Based on these results, it was concluded that PtNTT2 is able to import all of the required unnatural nucleotides, that the native E. coli ribosome is able to decode an unnatural AXC codon with a tRNA recoded with a

43

Unnatural Base Pairs to Expand the Genetic Alphabet and Code

1385

cognate unnatural anticodon, and that both the efficiency and fidelity of this decoding are high. With this success, the system was next applied to the incorporation of the noncanonical amino acid (ncAA) N6-[(2propynyloxy)carbonyl]-L-lysine (PrK). Methanosarcina mazei tRNAPyl(GYT) was encoded on a plasmid with sfGFP(AXC)151 and transformed into the SSO along with a second plasmid encoding the Methanosarcina barkeri pyrrolysine amino acyl tRNA synthetase (PylRS). The combination of sfGFP(GXC)151 with Mm tRNAPyl(GYC) was also examined; control experiments lacked either PylRS, the cognate unnatural tRNAPyl (tRNAPyl(GYT) or tRNAPyl(GYC)), or PrK. Production of sfGFP was followed by fluorescence in intact cells, as well as by α-GFP Western blotting of cell lysates. Results indicated that sfGFP was produced with both unnatural codon systems in cases when all required components were present. Further analysis using LC–MS/ MS indicated that the purified sfGFP contained PrK. The unnatural AXC codon was also found to allow incorporation of the ncAA p-azidophenylalanine using the Methanococcus jannaschii tyrosyl synthetase (TyrRS) with tRNApAzF(GYT). In all, 20 unnatural codons have been shown to be able to efficiently direct ncAA incorporation in the E. coli SSO, including 7 codons of the type NXN/NYN and 13 of the type NNX/NNX, where X ¼ NaM and Y ¼ TPT3. Additional characterization of three of the codon-anticodon pairs, AXC/GYT, GXT/AYC, and AGX/XCT was undertaken to explore the orthogonality and the potential for simultaneous encoding of multiple ncAAs. Not only did the pairs orthogonally mediate protein synthesis but decoding with GXT/AYC and AXC/GYT was shown to be more efficient than decoding with amber and ochre. In an experiment with all three unnatural codon-anticodon pairs with AXC decoded with tRNApAzF(GYT), AGX decoded with tRNAPyl(XCT), and GXT decoded with tRNASer(AYC) (because only two thoroughly validated orthogonal tRNA–aaRS pairs were validated at the time), essentially all of the sfGFP produced (96%) contained all three of the encoded amino acids, and yields were comparable with that observed when decoding two unnatural codons (Fischer et al. 2020). These results have been reviewed in detail (Romesberg 2022); a generalized illustration of the components of the SSO is shown in Fig. 6.

Applications of the SSO The integration of manmade parts into a living organism to both store information in its genome and retrieve it in the form of copies of DNA and mRNA, as well as protein, provides ample opportunity to explore very basic biological questions, but the full potential of the advances will be achieved through its practical application. In the case of the UBP and the E. coli SSO, the biotech company Synthorx was founded in 2014 (and acquired by Sanofi in 2019) to apply these advances to the development of novel protein therapeutics for cancer immunotherapy. As an example, Synthorx focused on interleukin-2 (IL-2), a 15.5 kDa cytokine produced mainly by CD4+ T cells that is involved in T cell proliferation and activation. There are two different forms of the IL-2 receptor found on cells that differ based on the combination of its component subunits: IL-2Rα, IL-2Rβ, and IL2Rγ. When all three subunits combine,

1386

F. E. Romesberg

Fig. 6 Illustration of the components of an SSO. Unnatural triphosphates ((d)XTP and (d)YTP) are imported by the PtNTT2 transporter protein (purple) and used to maintain and transcribe a gene that contains unnatural codons (NXN/NYN) to encode a protein that contains a noncanonical amino acid (ncAA; green)

what results is a high affinity trimeric form, IL-2Rαβγ. IL-2Rα acts to increase affinity for IL-2 by a factor of 10–100, and without it, the dimeric form, IL-2Rβγ has lower affinity for IL-2. IL-2Rβγ is found on memory CD8+ T cells and natural killer cells, and the action of IL-2 to specifically stimulate these cancer-fighting cells is what made IL-2 a candidate for cancer immunotherapy. As early as 1985, administration of IL-2 was found to be able to mediate tumor regression and recombinant IL-2 was approved for treatment of metastatic renal cell carcinoma in 1992 and for metastatic melanoma in 1998 (Jiang et al. 2016). IL-2 unfortunately has a short circulating half-life, and due to its preferential stimulation of regulatory T cells (Tregs) and type II innate lymphoid cells, both of which constitutively express the high efficiency IL-2Rαβγ receptor, IL-2 monotherapy does not typically improve survival and has significant dose-limiting toxicity. Investigation of IL-2’s effectiveness in lower doses and in combination with other therapies or with added mutations is an active area of research in the cancer immunotherapy field. The approach Synthorx has taken is unique. Using the dNaM-dTPT3 UBP in the AXC codon and the M. mazei tRNAPyl(GYT), Synthorx is producing IL-2 containing the ncAA AzK, and then using the AzK side chain and click chemistry to site-selectively incorporate alkyne-PEG moieties at positions

43

Unnatural Base Pairs to Expand the Genetic Alphabet and Code

1387

throughout IL-2. The goal of these efforts is to extend half-life and shift the preferential binding of IL-2 from IL-2Rαβγ to IL-2Rβγ. PEGylation was observed to have wide-ranging effects among the various sites examined, and the variant with PEG attached at position 65 showed promising results. This variant, referred to THOR-707 (SAR444245), retained native-like affinity for the lower affinity, dimeric IL-2Rβγ receptor, while also exhibiting a virtually complete loss of preferential recognition for the high affinity, trimeric IL-2Rαβγ receptor. This binding affinity profile was affirmed in both murine and tumor models and nonhuman primates with THOR-707 powerfully stimulating the expansion and activation of CD8+ T cells and NK cells without activity on either Tregs or eosinophils (reviewed in Ref. (Romesberg 2022)). Clinical trials with THOR-707 are ongoing (Synthorx Inc, a Sanofi company 2019; Sanofi 2021a, b, c, d, 2022).

Conclusion This chapter has reviewed the development of UBPs to expand the genetic alphabet and code, and their deployment in SSOs to produce proteins that contain ncAAs. The project began with unnatural nucleobases that were stable in duplex DNA but that were accepted by Kf polymerase with only marginal success, prone to mispairing with dA, and very poorly extended if at all. Through the combined efforts of many very talented graduate and postdoctoral students and the financial support of NIH over nearly 15 years of in vitro work, UBPs were discovered that had near native-like rates of replication in 2012 (Malyshev et al. 2012). From there, additional milestones were achieved relatively quickly: in vivo replication of a plasmid in E. coli in 2014 (Malyshev et al. 2014), use of unnatural codons to incorporate ncAAs in 2017 (Zhang et al. 2017b), the use of up to three unnatural codons in one gene in 2020 (Fischer et al. 2020), clinical trials with a protein produced with the UBP/SSO in 2019 (Synthorx Inc, a Sanofi company 2019), and the first steps to expand the system to eukaryotic cells in 2019 (Zhou et al. 2019). Along the way, the project has advanced our understanding of nucleic acid structure and function, challenged the uniqueness of the genetic alphabet and code, and served as an example of what can be achieved through the combined application of chemical and biological principles.

References Benner SA, Karalkar NB, Hoshika S, Laos R, Shaw RW, Matsuura M, Fajardo D, Moussatche P. (2016) Alternative Watson-Crick synthetic genetic systems. Cold Spring Harb Perspect Biol 8:a023770 Benner SA (2023) Rethinking nucleic acids from their origins to their applications. Phil Trans R Soc B 378:20220027 Betz K, Malyshev DA, Lavergne T, Welte W, Diederichs K, Dwyer TJ, Ordoukhanian P, Romesberg FE, Marx A (2012) KlenTaq polymerase replicates unnatural base pairs by inducing a Watson-crick geometry. Nat Chem Biol 8:612–614

1388

F. E. Romesberg

Betz K, Malyshev DA, Lavergne T, Welte W, Diederichs K, Romesberg FE, Marx A (2013) Structural insights into DNA replication without hydrogen bonds. J Am Chem Soc 135: 18637–18643 Biondi E, Benner SA (2018) Artificially expanded genetic information systems for new aptamer technologies. Biomedicine 6:509 Dhami K, Malyshev DA, Ordoukhanian P, Kubelka T, Hocek M, Romesberg FE (2014) Systematic exploration of a class of hydrophobic unnatural base pairs yields multiple new candidates for the expansion of the genetic alphabet. Nucleic Acids Res 42:10235–10244 Feldman AW, Romesberg FE (2018) Expansion of the genetic alphabet: a chemist’s approach to synthetic biology. Acc Chem Res 51:394–403 Fischer EC, Hashimoto K, Zhang Y, Feldman AW, Dien VT, Karadeema RJ, Adhikary R, Ledbetter MP, Krishnamurthy R, Romesberg FE (2020) New codons for efficient production of unnatural proteins in a semisynthetic organism. Nat Chem Biol 16:570–576 Hamashima K, Kimoto M, Hirao I (2018) Creation of unnatural base pairs for genetic alphabet expansion toward synthetic xenobiology. Curr Opin Chem Biol 46:108–114 Henry AA, Romesberg FE (2003) Beyond a, C, G and T: augmenting nature’s alphabet. Curr Opin Chem Biol 7:727–733 Henry AA, Olsen AG, Matsuda S, Yu C, Geierstanger BH, Romesberg FE (2004) Efforts to expand the genetic alphabet: identification of a replicable unnatural DNA self-pair. J Am Chem Soc 126: 6923–6931 Hirao I (2023) Genetic alphabet expansion of DNA, Ch 48. In: Burrows C (ed) Springer handbook of chemical biology of nucleic acids Hoshika S et al (2019) Hachimoji DNA and RNA: a genetic system with eight building blocks. Science 363:884–887 Jiang T, Zhou C, Ren S (2016) Role of IL-2 in cancer immunotherapy. Onco Targets Ther 5: e1163462 Kimoto M, Hirao I (2020) Genetic alphabet expansion technology by creating unnatural base pairs. Chem Soc Rev 49:7602–7626 Leconte AM, Romesberg FE (2009) Engineering nucleobases and polymerases for an expanded genetic alphabet. In: Köhrer C, RajBhandary UL (eds) Protein engineering. Berlin, Heidelberg, Springer Berlin Heidelberg, pp 291–313 Leconte AM, Hwang GT, Matsuda S, Capek P, Hari Y, Romesberg FE (2008) Discovery, characterization, and optimization of an unnatural base pair for expansion of the genetic alphabet. J Am Chem Soc 130:2336–2343 Ledbetter MP, Karadeema RJ, Romesberg FE (2018) Reprograming the replisome of a semisynthetic organism for the expansion of the genetic alphabet. J Am Chem Soc 140:758–765 Malyshev DA, Romesberg FE (2015) The expanded genetic alphabet. Angew Chem Int Ed Engl 54: 11930–11944 Malyshev DA, Pfaff DA, Ippoliti SI, Hwang GT, Dwyer TJ, Romesberg FE (2010) Solution structure, mechanism of replication, and optimization of an unnatural base pair. Chem Eur J 16:12650–12659 Malyshev DA, Dhami K, Quach HT, Lavergne T, Ordoukhanian P, Torkamani A, Romesberg FE (2012) Efficient and sequence-independent replication of DNA containing a third base pair establishes a functional six-letter genetic alphabet. Proc Natl Acad Sci U S A 109:12005–12010 Malyshev DA, Dhami K, Lavergne T, Chen T, Dai N, Foster JM, Correa Jr IR, Romesberg FE (2014) A semi-synthetic organism with an expanded genetic alphabet. Nature 509:385–388 Matsuda S, Henry AA, Romesberg FE (2006) Optimization of unnatural base pair packing for polymerase recognition. J Am Chem Soc 128:6369–6375 Matsuda S, Fillo JD, Henry AA, Rai P, Wilkens SJ, Dwyer TJ, Geierstanger BH, Wemmer DE, Schultz PG, Spraggon G, Romesberg FE (2007a) Efforts toward expansion of the genetic alphabet: structure and replication of unnatural base pairs. J Am Chem Soc 129:10466–10473 Matsuda S, Leconte AM, Romesberg FE (2007b) Minor groove hydrogen bonds and the replication of unnatural base pairs. J Am Chem Soc 129:5551–5557

43

Unnatural Base Pairs to Expand the Genetic Alphabet and Code

1389

McMinn DL, Ogawa AK, Wu Y, Liu J, Schultz PG, Romesberg FE (1999) Efforts toward expansion of the genetic alphabet: DNA polymerase recognition of a highly stable, self-pairing hydrophobic base. J Am Chem Soc 121:11585–11586 Morales JC, Kool ET (1998) Efficient replication between non-hydrogen-bonded nucleoside shape analogs. Nat Struct Biol 5:950–954 Moran S, Ren RX, Kool ET (1997a) A thymidine triphosphate shape analog lacking Watson-crick pairing ability is replicated with high sequence selectivity. Proc Natl Acad Sci U S A 94: 10506–10511 Moran S, Ren RXF, Rumney S, Kool ET (1997b) Difluorotoluene, a nonpolar isostere for thymine, codes specifically and efficiently for adenine in DNA replication. J Am Chem Soc 119: 2056–2057 Oh J, Shin J, Unarta IC, Wang W, Feldman AW, Karadeema RJ, Xu L, Xu J, Chong J, Krishnamurthy R, Huang X, Romesberg FE, Wang D (2021) Transcriptional processing of an unnatural base pair by eukaryotic RNA polymerase II. Nat Chem Biol 17:906–914 Piccirilli JA, Krauch T, Moroney SE, Benner SA (1990) Enzymatic incorporation of a new base pair into DNA and RNA extends the genetic alphabet. Nature 343:33–37 Rich A (1962) On the problems of evolution and biochemical information transfer. In: Kasha MPB (ed) Horizons in biochemistry. Academic Press, New York Romesberg FE (2022) Creation, optimization, and use of semi-synthetic organisms that store and retrieve increased genetic information. J Mol Biol 434:167331 Sanofi (sponsor) (2021a) A study of SAR444245 combined with other anticancer therapies for the treatment of participants with gastrointestinal cancer (master protocol) (pegathor gastrointestinal 203). ClinicalTrials.gov Identifier: NCT05104567 Sanofi (sponsor) (2021b) A study of SAR444245 combined with other anticancer therapies for the treatment of participants with lung cancer or mesothelioma (pegathor lung 202). Clinical Trials.gov Identifier: NCT04914897 Sanofi (sponsor) (2021c) A study of SAR444245 combined with cemiplimab for the treatment of participants with various advanced skin cancers (pegathor skin 201). ClinicalTrials.gov Identifer: NCT04913220 Sanofi (sponsor) (2021d) A study of SAR444245 combined with other anticancer therapies for the treatment of participants with HNSCC (master protocol) (pegathor head and neck 204). ClinicalTrials.gov Identifer: NCT05061420 Sanofi (sponsor) (2022) A study of SAR444245 with or without other anticancer therapies for the treatment of adults and adolescents with relapsed or refractory b cell lymphoma (master protocol) [pegathor lymphoma 205]. ClinicalTrials.gov Identifer: NCT05179603 Switzer C, Moroney SE, Benner SA (1989) Enzymatic incorporation of a new base pair into DNA and RNA. J Am Chem Soc 111:8322–8323 Synthorx Inc, a Sanofi company (sponsor) (2019) A Study evaluating safety and therapeutic activity of THOR-707 in adult subjects with advanced or metastatic solid tumors (THOR-707-101), ClinicalTrials.gov Identifier: NCT04009681 Wu Y, Fa M, Tae EL, Schultz PG, Romesberg FE (2002) Enzymatic phosphorylation of unnatural nucleosides. J Am Chem Soc 124:14626–14630 Yu C, Henry AA, Romesberg FE, Schultz PG (2002) Polymerase recognition of unnatural base pairs. Angew Chem Int Ed 41:3841–3844 Zhang Y, Lamb BM, Feldman AW, Zhou AX, Lavergne T, Li L, Romesberg FE (2017a) A semisynthetic organism engineered for the stable expansion of the genetic alphabet. Proc Natl Acad Sci U S A 114:1317–1322 Zhang Y, Ptacin JL, Fischer EC, Aerni HR, Caffaro CE, San Jose K, Feldman AW, Turner CR, Romesberg FE (2017b) A semi-synthetic organism that stores and retrieves increased genetic information. Nature 551:644–647 Zhou AX, Sheng K, Feldman AW, Romesberg FE (2019) Progress toward eukaryotic semisynthetic organisms: translation of unnatural codons. J Am Chem Soc 141:20166–20170

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

44

Anne-Marie Di Guilmi, Nuria Fonknechten, and Anna Campalans

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Genome-Wide Distribution of 8-OxoG Is not Random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Role of 8-OxoG and OGG1 in Transcriptional Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regulation of Transcription Mediated by Induction of 8-OxoG at G-Quadruplex Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regulation of Transcription Mediated by 8-OxoG Induced by the Enzymatic Activity of LSD1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Role of 8-OxoG and OGG1 in the Regulation of Transcription of Inflammatory Genes . . . Origin of ROS as a Signaling Molecule for the Induction of 8-oxoG . . . . . . . . . . . . . . . . . . . . . . . . Finding the 8-OxoG in the Chromatin Context: A Challenging Task for OGG1 . . . . . . . . . . . . . DNA Packaging into Nucleosomes: A Barrier for OGG1 and BER Activity . . . . . . . . . . . . . BER in the Highly Compacted Nuclear Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interplay Between BER, NER, and Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BER Is Linked to Transcription via the Mediator Complex and the Cohesin Structure . . . . . . Structure and Function of the Mediator Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nuclear Condensates Related to Transcription Initiation: Mediator and SuperEnhancers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1392 1394 1398 1398 1401 1402 1404 1405 1406 1407 1409 1412 1412 1416 1417 1420

Abstract

Oxidative DNA modifications are a major challenge to genomic stability. One of the most abundant oxidative DNA base damage is the 8-oxoG, a mutagenic lesion that needs to be efficiently removed to avoid G/C to T/A transversions. It has recently been postulated that 8-oxoG could also act as an epigenetic mark. There A.-M. Di Guilmi · N. Fonknechten · A. Campalans (*) Université Paris Cité, CEA, Stabilité Génétique Cellules Souches et Radiations, LCE/iRCM/IBFJ, F-92260, Fontenay-aux-Roses, France Université Paris-Saclay, CEA, Stabilité Génétique Cellules Souches et Radiations, LCE/iRCM/ IBFJ, F-92260, Fontenay-aux-Roses, France e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_50

1391

1392

A.-M. Di Guilmi et al.

is indeed increasing evidence for the key roles played by 8-oxoG and OGG1, the DNA glycosylase responsible for initiating its removal through the base excision repair (BER) pathway, in the regulation of transcription. While it seems clear that there should be a tight coordination between both processes to avoid collision between DNA repair and transcription machineries, there is a real need in the field to better conciliate both aspects that are essential for the maintenance and the expression of our genome. In this review, we will discuss the present knowledge concerning the role of OGG1 in both DNA repair and transcriptional regulation in order to shed light on the coordination between these two major cellular processes.

Introduction Maintenance of genetic stability is an essential cellular function. By avoiding or limiting the impact of DNA damage, induced by either the cell metabolism or genotoxic agents, DNA repair systems play a critical role in cell survival and the prevention of pathologies such as neurodegenerative diseases, cancer, or accelerated ageing. Human cells are continuously exposed to ROS arising from endogenous metabolism or by exposure to oxidizing agents or radiations that compromise the integrity of our genome. Due to the very low redox potential of guanine, one of the most abundant base modifications induced by ROS is 8-oxoguanine (8-oxoG). If not efficiently repaired, the 8-oxoG is potentially mutagenic due to the misincorporation of adenine in front of 8-oxoG during replication, resulting in G/C to T/A transversions. Indeed, the mutational signatures 36 and 18, prevalent in several human cancers, have been linked to the presence of 8-oxoG as they are dominated by C to A mutations (Poetsch 2020). 8-oxoG is specifically recognized and excised by the DNA glycosylase OGG1 that initiates BER. Excision of the 8-oxoG by the DNA glycosylase activity of OGG1 results in the generation of an abasic site (AP site) that is removed by the endonuclease APE1, generating a single-strand break (SSB). The resulting one nucleotide gap is further filled in by DNA polymerase β, and the nick is sealed by DNA ligase III, completing the short patch BER pathway (Boiteux et al. 2017). Abasic sites and SSB, generated as intermediates during the repair of 8-oxoG, are highly cytotoxic DNA lesions as they can block both replication and transcription and induce the formation of double-strand breaks (DSB). Thus, the different enzymatic activities during BER need to be extremely coordinated to avoid exposure of these toxic lesions to the cellular milieu. The scaffolding protein XRCC1 plays an essential role in the coordination of the different enzymatic steps (Campalans et al. 2013). The repair of 8-oxoG by the BER pathway has been extensively reviewed as well as the properties and biological roles of the OGG1 glycosylase (Boiteux et al. 2017). During the last few years a new exciting view has emerged suggesting that 8-oxoG is not only a DNA lesion that challenges the stability of our genomes but can also be an epigenetic mark playing a major role in transcriptional regulation. A

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1393

first indication arises from the observation that many promoters of genes coding for stress-response proteins display a high GC content and are particularly sensitive to oxidation (Ding et al. 2017). Secondly, it has been shown that the induction of 8-oxoG at specific sites of the genome, together with its processing by BER, allows the regulation of transcription of certain genes (Pastukh et al. 2015). Consistently, recent mapping of the presence of 8-oxoG performed at high resolution showed an enrichment of 8-oxoG in promoters and regulatory regions (Ding et al. 2017). Although the molecular mechanisms are not completely elucidated, the general idea is that 8-oxoG is induced at promoters of oxidative stress responsive genes and that recognition of the lesion by OGG1 would trigger the recruitment of the transcriptional machinery and the activation of transcription. Several models have been proposed, involving the direct recruitment of transcription factors by BER proteins or repair intermediates, or the stabilization of G-quadruplex structures which in turn drive the recruitment of transcriptional activators (Fleming and Burrows 2021). Supporting the role of OGG1 (and of the BER pathway in general) in the regulation of transcription, several functional links between OGG1 and transcription have arisen from data showing physical and/or functional association between OGG1 and histone chaperones, chromatin remodelers, or histone modifiers involved at different steps of transcription initiation or elongation (Menoni et al. 2017). A direct relationship between OGG1 and Cohesin and Mediator complexes, both playing essential roles in genome organization and transcriptional regulation, has also been recently established (Lebraud et al. 2020). A complex network of protein-protein interactions seems required to assist OGG1 in the challenging task of finding 8-oxoG in the context of crowded nuclear architecture in which DNA is highly compacted in the chromatin structure. While biochemical and in vitro single molecule studies have shown that OGG1 can very rapidly scan naked DNA (Boiteux et al. 2017), the extrusion and excision of the 8-oxoG is clearly impaired by the presence of nucleosomes, likely requiring chromatin-remodeling activities (Menoni et al. 2017). Whether OGG1 needs to find the 8-oxoG to proceed to its repair through the BER pathway or to regulate transcription by mediating the recruitment of transcription factors to the site of damage, the challenge of finding the 8-oxoG in the nuclear environment remains the same. In order to better understand the coordination between BER and transcription, it is thus extremely important to consider the nuclear organization and the molecular mechanisms orchestrating the initiation and elongation of transcription in which the Mediator complex plays a particular important role. In this review, we will describe the multiple connections that have been established between OGG1 and transcription at different levels, going from whole genome approaches to single-cell microscopy. Genomic studies mostly based on ChIP-seq experiments have indicated a non-random generation of 8-oxoG as well as a co-occupancy of OGG1 and different transcription factors (such as NF-κB or HIF) at the promoters of specific genes. The functional role of OGG1 in this context has been largely documented and seems important for the establishment of complex transcriptional programs in response to specific stimuli involving the generation of

1394

A.-M. Di Guilmi et al.

reactive oxygen species (ROS) (Fleming and Burrows 2021). Considering the major links established between transcription and the NER pathway, mostly concerning the sub-pathway TC-NER, a special emphasis will be given to the description of factors shared between NER and BER, which could bring further understanding to the nuclear coordination between DNA repair and transcription. The last part of the review will be dedicated to the role of the Mediator complex in the assembly of DNA repair complexes and the transcriptional regulation, both at the molecular level and in the context of the nuclear organization. Of particular interest is the involvement of the Mediator complex in the formation of nuclear condensates, shown to play a major role in both DNA repair and transcription organization in the nuclear environment.

Genome-Wide Distribution of 8-OxoG Is not Random Detection and quantification of 8-oxoG levels has always been a challenging task, and entire workshops were dedicated to this question in the late 1990s. One particular initiative was the creation of the European Standards Committee on Oxidative DNA Damage (ESCODD) working group to solve methodological problems and to standardize the assays to accurately determine the levels of oxidized bases in DNA (Collins et al. 2004). The same samples were distributed to more than 20 laboratories across the ESCODD network, in which the 8-oxoG quantification was performed using different protocols. A huge variability was observed depending on the protocol used, and the 8-oxoG levels measured at basal conditions were estimated to be between 0.3 and 4 per 106 guanines. Several methods were based on the use of OGG1 or Fpg (functional analogue of OGG1 from E. coli), resulting in the processing of 8-oxoG to SSB, the latter measured by alkaline elution, alkaline unwinding, or comet assay. These enzymatic protocols systematically yielded lower values compared to the chromatographic methods such as HPLC-MS/MS or GC-MS (the last one giving the highest values). It appeared from these comparative studies that cell isolation, lysis, and DNA extraction were critical steps, probably due to spurious oxidation occurring during manipulation partly explaining the higher values obtained with the chromatographic methods. However, enzymatic-based approaches are not deprived of drawbacks since they depend on the specificity and sensitivity of the enzymes, which might have limited access to the damaged sites in some genomic contexts. Although absolute quantification is difficult, the different methods gave very consistent relative results concerning 8-oxoG repair kinetics, following repair of the damage after exposure of cells to an acute oxidative stress. Similar repair kinetics were measured in a comparative study in which the same samples were measured in parallel by HPLC-MS/MS, alkaline elution, and immunofluorescence using an antibody specifically generated against the 8-oxoG (Lebraud et al. 2020). The use of an anti-8-oxoG antibody, even if it is limited by the specificity and accessibility of the antibody to the lesion, has the major advantage to allow the analysis of the genomic distribution of the lesion. Earlier results raised the question of the

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1395

distribution of 8-oxoG within the genome in basal conditions, which reflects the equilibrium between the induction of the damage and its repair. Studies on mitotic chromosomes had indeed shown that the lesion is not distributed homogeneously through the genome but enriched in regions with a high frequency of recombination and single nucleotide polymorphisms (SNPs) (Ohno et al. 2006). After exposure of cells to oxidative stress, the use of the same antibody unveiled an uneven distribution of the lesion in the interphasic nucleus (Campalans et al. 2013; Pezone et al. 2020). Immunoprecipitation experiments using an antibody against 8-oxoG followed by microarray analysis revealed a preferential accumulation of the lesion in gene deserts and a correlation between 8-oxoG and lamina-associated chromatin domains (Yoshihara et al. 2014). These results can be explained by either a higher susceptibility of these regions to ROS, due to their localization at the nuclear periphery, or a limited recruitment of DNA repair factors at these chromatin compacted regions. Enrichment of 8-oxoG in heterochromatin domains has also been reported in yeast by using the high-resolution click-code-seq methodology based on the use of Fpg and APE1 followed by the incorporation of an oligonucleotide at the site of the damage by click chemistry (Wu et al. 2018). This elegant method depicted also a higher accumulation of damage in nucleosomes enriched in histone H3 while a reduction of 8-oxoG was observed in nucleosomes containing higher levels of histone modifications such as H3K4me3, known to be associated to open chromatin regions. It was concluded that the persistence of oxidative DNA in heterochromatin could be explained by the lower BER efficiency in these highly compacted areas of the genome (Wu et al. 2018). These data are in agreement with the reported enrichment of OGG1, and BER proteins in general, on open chromatin regions in cells exposed to oxidizing agents (Amouroux et al. 2010). The reduced repair efficiency in heterochromatin domains could also explain the higher mutagenesis rates found in this chromatin compartment in human cancers. Indeed, cancer genome sequencing has shown that single somatic variants were strongly correlated with repressive chromatin histone marks such as H3K9me3, H3K9me2, or H4K20me3 while an anti-correlation was observed with H3K4me3 or H3K9Ac, associated to open chromatin (Schuster-Böckler and Lehner, 2012). This higher mutational load in heterochromatin is of course not only explained by the presence of unrepaired 8-oxoG, as no difference was observed between transversions and transitions, but probably reflects the general compromised accessibility of DNA repair complexes to these highly compacted domains. Indeed, reduced accessibility of DNA repair proteins in heterochromatin has been reported for several DNA repair mechanisms which is probably related to the slower repair kinetics measured in those regions. Additional mechanisms and players have been involved in the maintenance of genomic stability in these particular highly compacted regions (Menoni et al. 2017; Fortuny et al. 2021). Other studies have reported a higher accumulation of 8-oxoG in euchromatin regions after exposure of cells to acute oxidative stress. Immunofluorescence studies performed with the anti-8-oxoG antibody have indeed revealed the formation of distinct and bright foci, mostly excluded from heterochromatin domains, at shorter times after treatment (Campalans et al. 2013). Such enrichment of 8-oxoG foci in

1396

A.-M. Di Guilmi et al.

euchromatin regions questions the functional meaning of this particular distribution: either the lesions are grouped in euchromatin regions in order to be efficiently repaired as has been postulated for double-strand breaks (Chiolo et al. 2011) or they are preferentially induced in these less condensed regions of the chromatin. Interestingly, ChIP-seq experiments performed with anti-8-oxoG antibodies unveiled that 8-oxoG residues induced by hypoxia were particularly abundant in G-quadruplex secondary structures (G4) promoter sequences of hypoxia responding genes. G4 structures can fold in DNA sequences with four or more contiguous runs of >3 Gs separated by small intercalating sequences forming loops between the G runs, the so-called potential G4 sequence (PQS). PQS are not homogeneously distributed in the human genome, but enriched in telomeric and gene regulatory regions. Their folding into G4 plays important roles in genome organization and dynamics and represents an extremely active research field (Fleming and Burrows 2021). A comparison of the genomic distribution of 8-oxoG between normoxia and hypoxia conditions gave rise to the concept that 8-oxoG could be important for the coordination of the transcriptional reprogramming in response to external stimuli (Clark et al. 2012). An enrichment of 8-oxoG at gene promoters was also observed using the OG-seq methodology, which takes advantage of the very low redox potential of 8-oxoG to further oxidize and biotin-label 8-oxoG. The use of streptavidin-coated magnetic beads to purify DNA fragments containing the lesion results in a very good recovery yield compared to immunoprecipitation methods (Ding et al. 2017). The analysis of enrichment peaks revealed a higher accumulation of 8-oxoG at the promoters, G4 sequences, 50 -UTRs and 30 -UTR sequences, indicating once again a link between 8-oxoG and regulatory regions. A correlation between the accumulation of 8-oxoG at promoters, RNA Pol II occupancy, and nascent transcription has also been observed using OxiDIP-seq, which couples the use of anti-8-oxoG antibodies with high-throughput sequencing (Gorini et al. 2020). Recruitment of the topoisomerase TOPIIB and PARP1 has been observed at oxidized promoters together with OGG1 suggesting that processing of 8-oxoG by BER activity may be at the origin of SSB formation leading to DSB since phosphorylation of H2AX has also been observed in those regions (Gorini et al. 2020). The authors conclude that the accumulation of 8-oxoG at promoters correlates with some features predisposing to genomic instability such as G4 structures, R-loops, RNA/DNA hybrids, or bidirectional promoters and tend to accumulate both SSB and DSB. All these features are known to induce RNA Pol II collisions or pausing due to the persistence of ssDNA. Whether the presence of the 8-oxoG is at the origin of genomic instability or the presence of ssDNA favors the oxidation at those regions remains to be further explored. In the enTRAP-seq strategy, the use of the OGG1(K249Q) mutant that recognizes the 8-oxoG but cannot excise it allowed the trapping of a stable enzyme-DNA complex, enriched by affinity purification and characterized by DNA nextgeneration sequencing (Fang and Zou, 2020). 8-oxoG peaks were more frequently observed in regulatory elements such as promoters and 50 UTR and were substantially enriched in open chromatin regions characterized by RNA Pol II occupancy, DNAse I high-sensitivity, and H3K27Ac enrichment. In agreement with several

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1397

previous publications, more than 80% of the identified peaks possessed potential G4 sequences, mostly enriched in promoters and 50 UTR. Strikingly, reduced occurrence of 8-oxoG at G4 as well as a depletion of the lesion at transcription start sites was recently reported by using the single-nucleotide resolution 8-oxoG sequencing method CLAP-seq (for Chemical Labeling And Polymerase Stalling Sequencing) (An et al. 2021). In this study, hotspots of 8-oxoG were found at potential quadruplex sequences PQSs, suggesting a role of those regions in 8-oxoG-mediated oxidative stress response. It has to be taken into consideration that CLAP-seq is based on the stalling of high-fidelity DNA polymerases at a chemically labeled 8-oxoG and is potentially sensitive to particular sequence context, repeated sequences, and secondary structures that may compromise the subsequent primer extension and PCR reactions (An et al. 2021). In conclusion, many different strategies have been developed to assess the genomic distribution of 8-oxoG, and depending on the cellular model, the resolution of the technology used and the associated bioinformatic analysis different situations have been described. Advantages and pitfalls can be found in all the methods. The use of antibodies has some limitations in terms of specificity and accessibility, as it is also the case with the use of enzymes such as Fpg or OGG1 whose activities may be affected by the sequence surrounding the lesion. The protocols based on chemical reactions could also result in side products that are difficult to control. As mentioned before, in all the methods dealing with the measurement and the precise mapping of 8-oxoG, extreme care has to be taken in order to avoid artificially oxidizing DNA during manipulation. As concluded by the ESCODD working group, the use of antioxidants all over the process is of major importance in order to reduce background and to allow reliable conclusions. A major discrepancy between the different methods to assess the genomic distribution of 8-oxoG is on the enrichment of the lesion in heterochromatin or euchromatin regions. While accumulation of 8-oxoG in heterochromatin has been reported in some studies (Yoshihara et al. 2014; Wu et al. 2018), others have observed an enrichment of the lesion in promoters and regulatory regions associated to open chromatin regions (Clark et al. 2012; Ding et al. 2017; Gorini et al. 2020). Furthermore, although studies have reported higher enrichment of 8-oxoG in G4 sequences (Ding et al. 2017; Clark et al. 2012), others have observed the opposite results with an exclusion of 8-oxoG from G4 and an enrichment in PQS (An et al. 2021). A particularity of all these techniques is that they represent a snapshot of an average of millions of cells during the extremely dynamic process of induction and repair of 8-oxoG. Performing precise temporal kinetics in single cells represents an extremely challenging task but could really help in our understanding of this complex process, and we are confident that future technical developments will allow to cover this fascinating issue. Even if some contradictory results have been obtained, it is very clear that the distribution of 8-oxoG is not homogeneous all over the genome. The enrichment of 8-oxoG at specific promoters observed in several independent studies raises an important and intriguing question: what are the mechanisms that permit the use of 8-oxoG as a regulatory/epigenetic mark without compromising the stability of the genome?

1398

A.-M. Di Guilmi et al.

Role of 8-OxoG and OGG1 in Transcriptional Regulation Many examples can be found in the literature in which the presence of 8-oxoG results in an up- or downregulation of transcription, mostly depending on the precise location of the damage and the sequence context (Fleming and Burrows 2021). Although many more studies are needed to further understand the impact of 8-oxoG at specific promoters, the role of 8-oxoG and OGG1 in the regulation of transcription has been extensively investigated, and several molecular mechanisms have been proposed. The different models are illustrated in Fig. 1 and will be briefly described here as they have already been discussed in recent reviews (Seifermann and Epe 2017; Wang et al. 2018).

Regulation of Transcription Mediated by Induction of 8-OxoG at G-Quadruplex Sequences The first evidence suggesting that oxidative DNA damage could be biologically relevant came from the observation that hypoxia induced oxidized bases in promoters of several hypoxia inducible genes such as VEGF, HO-1, and ET-1. The accumulation of the oxidative DNA modifications in the G-rich sequences of the HRE and their sensitivity to Fpg pointed to 8-oxoG, suggesting for the first time the involvement of this particular base lesion on transcriptional regulation. The fact that the hypoxia-induced 8-oxoG is transient suggests that the oxidative base modification is repaired and, indeed, the involvement of the BER pathway in the repair of the damage has been clearly demonstrated. A temporal correlation was reported between the induction of 8-oxoG at the VEGF promoter, the recruitment of OGG1 and APE1, and the formation of SSB, preceding the arrival of HIF-1α and transcription of the VEGF mRNA. The formation of the SSB mediated by APE activity could modulate DNA topology and flexibility and facilitate the loop formation between enhancers and promoters and assembly of the transcription complex (Pastukh et al. 2015). Furthermore, reducing the levels of OGG1 by the use of an siRNA resulted, as expected, in a higher accumulation of 8-oxoG at the VEGF promoter, in a reduced association of APE1, and in a failure in HIF-1α binding to HRE, resulting in a reduced transcription, directly demonstrating the importance of OGG1 glycosylase activity in this process (Pastukh et al. 2015). These observations extend beyond the VEGF gene as a good correlation was established in the same study between hypoxia-induced 8-oxoG in promoter regions and the transcription levels for more than 300 hypoxia-regulated genes (Pastukh et al. 2015). Transcriptional control mediated by induction of 8-oxoG at promoter G-quadruplex sequences was described already 10 years ago by the Gillespie laboratory (Clark et al. 2012), and its role in transcriptional regulation has been largely documented (Fleming and Burrows 2021). By using reporter plasmids bearing 8-oxoG at defined positions in the VEGF promoter driving luciferase expression, the molecular mechanisms have been extensively characterized by the

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1399

stimulus ROS APE1

OGG1 8-oxoG

8-oxoG excision APE1 Recruitment G4 folding

8-oxoG induction OGG1 binding DNA bending

Stimuli

LSD1

hypoxia

Recruitment Transcription Factors

OGG1 APE1 activity

G4

References

YES

Pastukh et al. 2015

YES

YES

E2

YES

YES

?

Perillo et al. 2008

unknown Retinoic acid TGFb

YES

YES

YES

Amente et al. 2010

YES

YES

YES

Zuchegna et al. 2014

YES

YES

YES

TNFa LPS

Pezone et al. 2020 Wang et al. 2018; Hao et al. 2020

NO YES

YES

YES YES

H2O2

YES

unknown

NO

unknown

YES

YES

Seifermann and Epe 2017b YES

Roychoudhury et al. 2020

YES

Cogoi et al. 2018

YES

Fleming and Burrows 2021

Fig. 1 Role of 8-oxoG and OGG1 in transcriptional regulation. The schema illustrates the pathway of 8-oxoG repair highlighting the steps potentially involved in transcription. Exposure of cells to a large variety of agents (stimulus) generates the production of ROS either directly or through the function of mitochondria (dotted arrows). ROS induces the formation of the oxidative DNA damage 8-oxoG (red star) that is recognized by OGG1 (represented in green). The excision of 8-oxoG by OGG1 activity generates an abasic site and the recruitment of APE1 (light orange), proposed to play a role in the stabilization of G4 structures. The successive BER steps are figured by the curved arrows and their links with transcription by the linear arrows. The table summarizes the different models referenced in the literature describing a role of 8-oxoG and OGG1 in the regulation of transcription. The nature of the stimulus is indicated whenever it has been identified as well as the contributions of LSD1, OGG1, APE1, and the G4 structures

1400

A.-M. Di Guilmi et al.

Burrows laboratory. The presence of 8-oxoG in the promoter resulted in a 2.5-fold increase of the expression in an OGG1-dependent way, as it was not observed in OGG1 KO Mouse Embryonic fibroblasts. Interestingly, the introduction of the abasic site analog tetrahydrofuran (THF) resulted in an increase of more than 4 of the luciferase expression, and in this case also the BER protein APE1 was essential. The introduction of mutations in PQS demonstrates the importance of this potential G-quadruplex in activation of transcription. Based on these results, the authors proposed the following mechanism: potential G4 sequences (PQS) being G-rich, are highly susceptible to oxidative modification of G to 8-oxoG. This base modification has a negligible impact on DNA structure, but its excision by the DNA glycosylase activity of OGG1 results in the formation of an AP site that favors the folding of the G4 structure. The AP site is then recognized by APE1, whose activity is highly attenuated by the G4 structure, resulting in longer residence time at the damage site that is required for recruitment of transcription factors and the induction of transcription (Fleming and Burrows 2021). Further publications of the same laboratory and others have extended these observations to many other examples, generalizing the role of OGG1 and APE1 in the transcriptional regulation of many DNA repair genes (such as NTH1, NEIL3), cancer associated genes (c-MYC, KRAS), and stress responsive genes, all harboring promoter sequences with the potential to fold into G4 structures. In agreement with this model, Roychoudhury et al. 2020 demonstrated that endogenous oxidized guanine bases in PQS sequences and the subsequent activation of BER drive the spatiotemporal formation of G4 structures at the whole genome level. By genome-wide mapping of G4 structures, AP sites, OGG1, and APE1, they provided direct evidence that AP sites derived from oxidative base modifications were predominant in PQS sequences. This study further established that stable binding and acetylation of APE1 were essential for the G4-mediated gene expression. The authors stress the importance of the formation of a stable APE1–AP site pre-incision complex, rather than AP site cleavage or catalytic activity of APE1, for promoting G4 formation (Roychoudhury et al. 2020). However, the global picture is much more complex, and probably this mechanism is highly regulated and might depend on the position of the 8-oxoG in the PQS (G-run or the loop) or on the template versus the coding strand, since both an up- and downregulation of transcription have been reported. Data from the Khobta lab have shown that, while the 8-oxoG does not constitute a barrier to transcription, its processing by OGG1 results in transcriptional inactivation (Allgayer et al. 2016). Using reporter plasmids bearing one single 8-oxoG in the coding sequence of EGFP, the effect of the lesion on transcription could be elegantly and quantitatively followed by flow cytometry. Interestingly, the inhibition in transcription observed was independent on the location of 8-oxoG at either transcribed or non-transcribed DNA strand, indicating that 8-oxoG did not directly block RNA Pol II elongation. This approach demonstrated that not only the DNA glycosylase activity of OGG1 but also the incision performed by APE1 was essential for the inhibitory effect on transcription, being the generated SSB critical for the transcriptional silencing of many different viral and human promoters (Allgayer et al. 2016).

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1401

Regulation of Transcription Mediated by 8-OxoG Induced by the Enzymatic Activity of LSD1 Another interesting model concerning the role of 8-oxoG and OGG1 in expression of estrogen-responsive genes was proposed by Perillo et al. in 2008. Demethylation of H3 lysine 9 at both enhancer and promoter sites by the LSD1 demethylase produces hydrogen peroxide, which was postulated to induce the formation of 8-oxoG and the recruitment of OGG1. The identification of an enzymatic activity capable of inducing 8-oxoG at specific genomic locations further reinforces the concept of 8-oxoG being an epigenetic mark that needs writers and readers. Depletion or inhibition of LSD1 or the use of ROS scavengers, such as N-acetylcysteine, was shown to drastically reduce the formation of 8-oxoG. The involvement of LSD1 in ROS production needed for transcription activation via 8-oxoG and BER proteins has also been demonstrated for the transcription of genes depending on Myc or induced by retinoic acid (Amente et al. 2010; Zuchegna et al. 2014). Binding of Myc to target genes requires the presence of H3K4me3 at the E-box binding motif and results in recruitment of LSD1 and demethylation of H3K4. In this example also, demethylation of H3K4 and the burst of 8-oxoG formation as well as the participation of the BER enzymes OGG1 and APE1 are key events for the initiation of transcription (Amente et al. 2010). In addition to LSD1, the H3K9me3 demethylase JMJD2A has also been involved in the transcription of genes regulated by retinoic acid, unveiling a finely tuned regulatory mechanism in which demethylation cycles of histone H3K4 and H3K9 would be important for dynamic chromatin loop formation and transcriptional activation (Zuchegna et al. 2014). Interestingly, the cohesin subunit RAD21 was recently shown to be recruited at estrogens or retinoic acid-induced chromatin loops formed between promoter and enhancer regions (Pezone et al. 2019). It would be interesting to investigate if Rad21 is required in this context for the recruitment of OGG1, as has been shown to be the case after exposure to oxidative stress (Lebraud et al. 2020). BER activity was postulated to be essential during this process for the generation of transient nicks used as entry points for the Topoisomerase IIb, triggering DNA conformational changes that are essential for the assembly of the transcription initiation complex. More recently, Pezone et al. 2020 highlighted the implication of a nuclear oxidative wave in the triggering of the epithelial-to-mesenchymal transition (EMT) in which polarized epithelial cells acquire mesenchymal characteristics. This process would involve the implementation of a complex transcriptional program induced by transforming growth factor β1 (TGF-β1) and in which some genes are activated or repressed. In mammary epithelial MCF10A cells, they report that TGF-β1 triggers an early (30 min) oxidative wave in the nucleus, mediated by H2O2 production by LSD1. Phosphorylated SMAD2/3 was proposed to play a role in the targeted recruitment of LSD1 to the promoters of activated and repressed TGF-β1/EMT genes. At 90 min, there was a second oxidative wave, detected by the accumulation of OGG1, only at TGF-β1-repressed genes. At these promoters, a repressive complex was formed by LSD1 and HDAC3–NCoR1 with the newly synthesized

1402

A.-M. Di Guilmi et al.

repressor SNAI1. Altogether, DNA oxidation mediated by LSD1 was necessary for both activation and repression of transcription by TGF-β1, in an OGG1dependent way.

Role of 8-OxoG and OGG1 in the Regulation of Transcription of Inflammatory Genes In contrast with the previously described examples, binding but not excision of 8-oxoG by OGG1 has also been proposed as a regulatory mechanism for gene expression. Indeed, it has been proposed that nonproductive binding of OGG1 to 8-oxoG in promoter sequences could be an epigenetic mechanism to modulate gene expression for a prompt innate immune response (Wang et al. 2018). Exposure to TNF-α increased 8-oxoG levels in promoter regions, induced the recruitment of OGG1, and transiently inhibited base excision repair of 8-oxoG by cysteine oxidation of OGG1. Promoter-associated OGG1 then enhanced NF-κB/RelA binding to cis-elements and facilitated recruitment of specificity protein 1, transcription initiation factor II-D, and p-RNA polymerase II, resulting in the rapid expression of chemokines/cytokines and inflammatory cell accumulation in mouse airways. Depletion of OGG1 by siRNAs or prevention of guanine oxidation significantly decreased TNF-α-induced inflammatory responses. In vitro studies further demonstrated that binding of OGG1 to 8-oxoG located upstream of the NF-κB binding site increased NF-κB DNA occupancy. ChIP-seq experiments have enlarged these observations at the whole genome level and have shown that enrichment peaks of OGG1 were primarily aligned with guanine runs in close proximity to the NF-κB motifs on the regulatory regions of a large number of genes involved in the inflammatory response such as TNF-α, CXCL1, and CCL20 (Wang et al. 2018). Genome-wide distributions of OGG1 and the NF-κB enrichment sites were similar and restricted to regulatory regions, as there were no significant levels of engagement of OGG1 with exons, introns, or untranslated regions. A decrease in pro-inflammatory gene expression and inflammation induced by TNFα has also been observed in the presence of the OGG1 inhibitor TH5487 that reduces OGG1 binding and repair of 8-oxoG (Visnes et al. 2018). TH5487 perturbs the binding of NF-κB to 8-oxoG containing oligonucleotides in nuclear extracts, further suggesting that recognition of the 8-oxoG by OGG1 is an essential step in pro-inflammatory gene expression mediated by NF-kB. Studies performed in OGG1-deficient cells complemented with exogenously expressed WT or mutant versions of OGG1 have demonstrated that it is not the enzymatic activity of OGG1 but the recognition of the 8-oxoG by the enzyme that mediates transcriptional activation. Thus, a robust increase in Cxcl2 gene expression was observed, as measured both by qRT-PCR and RNA-FISH after TNF-α exposure of cells expressing the inactive OGG1 (K249Q) mutant, able to recognize 8-oxoG but lacking both N-glycosylase and AP-lyase activities (Hao et al. 2020). These results were consistent with those obtained with the OGG1 inhibitor O8, which inhibits catalytic activity of OGG1 without blocking its substrate binding (Hao et al. 2020). In contrast with the effects

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1403

observed with TH5487, treatment of cells with TNF-α in the presence of the OGG1 inhibitor O8 does not impair recruitment of OGG1 to the CXCL1 promoter nor its transcriptional activation (Hao et al. 2020). However, the enzymatic activity of OGG1 was shown to be involved in the inflammatory response to LPS treatment in murine splenocytes (Seifermann et al. 2017) where a role of OGG1 but also of LSD1 and APE1 in the activation of the transcription of TNF-α after LPS treatment has been described. Besides the role of OGG1 as auxiliary transcription factor when binding 8-oxoG on DNA, a complex of OGG1 with the free excised base (OGG1:8-oxoG) has been shown to be involved in activation of canonical Ras family GTPases. It has been proposed that the OGG1/8-oxoG complex has guanine nucleotide exchange factor activity and modulates cellular signaling. Challenging mice with a mild oxidative burst or intranasal application of 8-oxoG resulted in rapid KRAS GDP!GTP exchange and activation of both MAP and PI3 kinases, leading to NF-κB-mediated pro-inflammatory chemokine/cytokine expression and inflammatory cell recruitment to the airways (Aguilera-Aguirre et al. 2015). The effects observed were dependent on OGG1, leading the authors to hypothesize that release of genomic 8-oxoG by OGG1-initiated repair was involved in inflammation and immune response. In this case, OGG1 would act as a bona fide repair enzyme, in contrast to the mechanism described previously in which the enzymatic activity of OGG1 was not required. Whether these apparently contradictory observations reflect different steps of the same dynamic process leading to a robust induction of inflammatory gene expression or are two independent mechanisms remains to be established. Importantly, the implication of OGG1 in pro-inflammatory gene expression and immune response sheds light on the molecular mechanisms behind several previous observations indicating that OGG1-deficient mice show reduced inflammatory responses upon exposure to oxidative stress induced by LPS endotoxins and various allergenic proteins (Seifermann and Epe 2017). Considering the major role of chronic inflammation in many human diseases such as cancer, atherosclerosis, asthma, neurodegenerative disorders, and many others, the identification of OGG1 as a key player opens the way to the development of new therapeutical strategies in the future (Visnes et al. 2018). The roles of BER proteins in both repair and transcription are intriguing, and they probably correspond to different steps of a coupled mechanism. Even if there is more and more evidence indicating a role of 8-oxoG and the DNA repair proteins OGG1 and APE1 in transcriptional regulation, this mechanism is for sure transient and needs to be extremely regulated, as the lesions need to be repaired at some point to avoid compromising the stability of the genomes. While in some models the enzymatic activities of both OGG1 and APE1 are required (Fleming and Burrows 2021), in other examples the DNA bending induced by the DNA-glycosylase when encountering the 8-oxoG seems to be enough (Cogoi et al. 2018; Hao et al. 2020). The increase in the residence time of OGG1 and/or APE1 at the site of repair has been pointed out as a potential mechanism to stabilize the complex protein/DNA allowing the recruitment of transcription factors and the induction of transcription. Interestingly, OGG1 has several cysteines that have been shown to be sensitive to

1404

A.-M. Di Guilmi et al.

oxidation and important to modulate the affinity of the protein for its substrate as well as its DNA glycosylase and lyase enzymatic activities (Bravard et al. 2006; Wang et al. 2021). Oxidation of specific cysteines induced by exposure to oxidative stress could influence the residence time of the protein to the site of the damage and may therefore facilitate the recruitment of transcription factors such as NFκB (Wang et al. 2018). Interestingly, acetylated APE1 has also been observed to map with promoter regulatory regions at the whole genome level (Roychoudhury et al. 2020). Furthermore, the recruitment of APE1 could also have an additional role beside its implication in SSB formation during BER. Indeed, APE1 (Ref-1) is a multifunctional protein not only involved in BER but also in transcriptional regulation through its redox activity that has been shown to stimulate the DNA binding of numerous transcription factors (such as AP-1, NF-κB, or HIF-1α just to cite some) and their further assembly into transcriptional complexes (Bhakat et al. 2009; Pastukh et al. 2015). The induction of 8-oxoG as a consequence of the enzymatic activity of LSD1, observed upon exposure to estrogens, LPS, retinoic acid, TGF-β, and many other stimuli (Perillo et al. 2008; Amente et al. 2010; Zuchegna et al. 2014; Seifermann et al. 2017; Pezone et al. 2020), reinforces the idea that this oxidative base modification could indeed act as an epigenetic mark, requiring writers and readers. The role of LSD1 in the transcriptional regulation of hypoxia-induced genes, requiring the participation of OGG1, remains largely unexplored. It remains an open question whether there is indeed an enzymatically regulated induction of 8-oxoG at the promoters of the induced genes and what could be the underlying mechanism. To our knowledge the involvement of LSD1 in the OGG1-mediated transcription of inflammatory genes has been reported in only one publication (Seifermann et al. 2017). However, considering recent findings elucidating novel regulatory roles of LSD1 in the epigenetic control of inflammatory responses and in the demethylation and stabilization of HIF-1α under hypoxic conditions (Kim et al. 2021), it is tempting to speculate that the combined function of LSD1 and OGG1 in the regulation of transcription could be more general than previously expected. Interestingly, phosphorylation of LSD1 is induced by LPS and hypoxia and is required for the activation of NF-κB, the up-regulation of transcription of the target genes, and inflammatory response in vivo (Kim et al. 2021). These observations extend the concept of oxidative DNA damage playing a role in the regulation of transcription as it has been described for other BER repaired lesions such as the oxidized derivatives of methylated cytosines which have clearly been established as epigenetic marks involved in gene silencing (Fleming and Burrows 2021).

Origin of ROS as a Signaling Molecule for the Induction of 8-oxoG In all the previously discussed models, ROS are proposed to be central players for the targeted induction of 8-oxoG at the promoters of specific genes, although the origin and the nature of the ROS involved have not been clearly established. In the

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1405

mechanism involving LSD1, it has been proposed that H2O2 would be locally produced as a consequence of the enzymatic activity of the demethylase on histones (Perillo et al. 2008). Additionally, ROS could be originated from mitochondria clustered around the nucleus as it was postulated for the transcriptional activation of hypoxia inducible genes (Al-Mehdi et al. 2013). Indeed, inhibition of the hypoxiainduced perinuclear mitochondrial clustering results in a markedly reduced induction of 8-oxoG in the HRE sequences of the VEGF genes resulting in a reduced binding of Hif1-1α and reduced levels of VEGF transcripts (Al-Mehdi et al. 2013). Changes in mitochondrial morphology and mitochondrial fragmentation mediated by the fission protein DRP1 were further described in cells exposed to hypoxic stress and are generally observed in cells exposed to oxidative stress, resulting in a mitochondrial dysfunction and higher mitochondrial ROS production (Ježek et al. 2021). Indeed, mitochondria are the largest intracellular source of oxygen radicals, being superoxide produced through the electron transport chain mostly at Complex I and III. Superoxide has limited reactivity and does not cross membranes, but it is rapidly metabolized to H2O2 that can freely diffuse to the cytoplasm and the nucleus, with the capacity to act as a signaling molecule. Paradoxically, the absence of OGG1 in mitochondria results in higher levels of mitochondrial ROS production (Lia et al. 2018), which could be a signal to trigger transcription of oxidative stress regulated gene by a nuclear-OGG1-dependent mechanism. Most of the studies concerning the role of OGG1 in transcriptional regulation have been performed in cellular models deficient for both the nuclear and the mitochondrial forms of OGG1, and future studies are required to further elucidate the role of the protein in the different cellular compartments and the intertwined connexions between them. Despite many indications for the presence of OGG1, and in general of BER proteins, in mitochondria (Rong et al. 2021) and considering that in all cases both the nuclear and the mitochondrial isoforms of the enzymes are encoded by the same nuclear gene, almost nothing is known concerning the mechanisms regulating the subcellular distribution of the enzymes. What is clear is that the maintenance of mitochondrial DNA and mitochondrial function plays a major role in the control of nuclear transcriptional programs and epigenetic regulation through the retrograde response, which has already been involved in hypoxia, inflammatory, and immune responses (Guha et al. 2014). Many fascinating questions remain unexplored concerning the interplay of both nuclear and mitochondrial OGG1 functions in the coordination of transcription under oxidative stress conditions.

Finding the 8-OxoG in the Chromatin Context: A Challenging Task for OGG1 The enzymological characterization of OGG1 has been performed on naked DNA bearing the 8-oxoG lesion. In addition, crystal structures of OGG1 in complex with different forms of DNA molecules containing the damaged base 8-oxoG or the normal base G have been solved, which all together reveal how OGG1 recognizes and extrudes the oxidized base (Shigdel et al. 2020). However, despite their

1406

A.-M. Di Guilmi et al.

important contribution to the elucidation of OGG1 molecular characterization, these structural and biochemical studies do not take into account the high degree of nuclear DNA condensation imposed by chromatin structure. Although substantial amount of data performed in in vitro conditions are now available, questions regarding the in vivo behavior of OGG1 still remain open.

DNA Packaging into Nucleosomes: A Barrier for OGG1 and BER Activity One of the most challenging tasks for DNA repair and transcription factors is finding their targets in the highly complex nuclear environment in which the DNA is strongly compacted around the histone octamer. Studies performed more than 20 years ago with in vitro reconstituted nucleosomes revealed that the nucleosome structure inhibited the initial steps of BER. Indeed, the catalytic activity of the DNA glycosylases UNG2, SMUG2, and UDG, involved in the excision of uracil, as well as the endonuclease activity of APE1 were shown to be dramatically reduced in nucleosomes compared to their measured activities on naked DNA (Nilsen et al. 2002). Menoni et al. investigated the repair of oxidized lesions on chromatin structure by introducing the 8-oxoG in reconstituted nucleosomes and showed that OGG1 activity was strongly reduced when compared to naked DNA, but restored in the presence of SWI/SNF, suggesting that 8-oxoG repair on nucleosomes requires the assistance of chromatin remodelers. Later on, a more complex structure was designed, composed by two nucleosomes assembled by linkers of different lengths in the presence or in the absence of the histone H1, in which the lesion 8-oxoG was introduced either in the linker DNA or in the nucleosomal DNA (dyad position). In this context, 8-oxoG inserted in the linker sequence was excised by OGG1 only when H1 was pulled out. When present in the dyad position, two conditions were required to remove the lesion: the absence of H1 and the remodeling of nucleosome by RSC (remodel the structure of chromatin), a SWI/SNF family remodeling complex (Menoni et al. 2017). To feature how DNA packaging in eukaryotes affects the activity of different DNA glycosylases involved in the BER process, Olmon et al. conducted a comparative study using homogenous nucleosomes core particles under identical reaction conditions (Olmon and Delaney 2017). The different lesions were incorporated near the dyad axis with different rotational positions. It was observed that OGG1 activity was inhibited at the dyad axis regardless of the position of the lesion and that chromatin remodeling complexes must be added to restore 8-oxoG removal (Olmon and Delaney 2017). A structural model based on the crystal structures of OGG1 and nucleosomes showed how the chromatin organization might interfere with OGG1 recognition and excision of 8-oxoG (Olmon and Delaney 2017). The surface patch on OGG1 in contact with the histone core comprises the residues R206, R197, and Y203, which have been shown to interact with the DNA backbone (R197) and with the estranged cytosine (R204 and Y203) (Shigdel et al. 2020). It is

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1407

then conceivable that the histone core hinders either the contact between OGG1 and the lesion or the DNA bending induced by the 8-oxoG extrusion (Fig. 2). Altogether, these results indicate that OGG1 needs to reach the DNA molecule to undergo 8-oxoG excision and thus to initiate repair. Although spontaneous and transient unwrapping of DNA from the histones also participates to the 8-oxoG repair, this process requires an active and catalyzed removal of histones and chromatin remodeling.

BER in the Highly Compacted Nuclear Environment As described, the repair by BER of a large range of DNA lesions is largely hampered by the DNA condensed form in nucleosome structures. Histone post-translational modifications and incorporation of variants provide access of repair proteins to their DNA target sites within nuclear chromatin. Furthermore, the disassembly of Fig. 2 Finding the 8-oxoG in the highly compact nuclear chromatin is a challenging task for OGG1. The model “Prime-Access-RepairRestore” that fits all DNA repair pathways is here figured in the context of BER. More precisely, it is focused on the 8-oxoG repair by OGG1. The factors shown to be involved in the chromatin disassembly and re-assembly allowing the accessibility of 8-oxoG and its subsequent repair are indicated

1408

A.-M. Di Guilmi et al.

nucleosomes through histones removal is accomplished by histone chaperones assisted by ATP-dependent nucleosome remodelers. Whether BER takes advantage of chromatin remodeling occurring during other processes such as transcription or replication to gain access to the lesion or uses its proper remodelers remains to be further explored. Very recent reviews survey and extend the interplay between histones chaperones and DNA repair (Chakraborty et al. 2021b; Ferrand et al. 2021). A common scheme figuring the steps required to conduct efficiently DNA repair processes emerges from the “prime-access-repair-restore” model (Fig. 2). In this model, histone modifications play a major role in determining the interactions of chromatin with specific DNA repair factors. Chromatin disassembly, mediated by histone chaperones and nucleosome remodelers, is necessary to permit the recruitment of DNA repair proteins and to stimulate their enzymatic activity. After repair of the damage, nucleosomes are reassembled to restore the chromatin organization to its original state. To initiate NER and to give access to UV-damaged chromatin, histones H2A.Z-H2B and H1 are evicted (Chakraborty et al. 2021b). Several ATP-dependent nucleosome remodelers like SWI/SNF and INO80 promote chromatin disassembly and facilitate DNA repair of DSB and UV-induced lesions. Their downregulation induces various defects, in particular inefficient removal of DNA damage and reduced recruitment of repair factors (Chakraborty et al. 2021b). Another way to facilitate the accessibility to damaged DNA is to stimulate chromatin disassembly through histone post-translational modifications, such as H3, H2A, and H4 ubiquitination (Chakraborty et al. 2021b). Compared to other DNA repair processes, our current knowledge of the factors that drive BER enzymes to the lesions in the nuclear chromatin context is still fragmental. Joined roles of the histone chaperone FACT and the RSC ATP-dependent nucleosome remodeler in the removal of uracil from DNA were reported (Charles Richard et al. 2016). Depletion of the ATPase subunit of RSC, STH1, increased the sensitivity of cells to alkylating agents, establishing a functional link between ATP-dependent chromatin remodeling and BER activity in yeast living cells (Czaja et al. 2014). A recent study shows that although repair of 7meG is generally slower in STH1 depleted cells, RSC activity is not required for BER activity in nucleosome free regions or adjacent linker DNA, as it is the case for CPD repaired by the NER pathway (Bohm et al. 2021). These observations indicate a different requirement of RSC in promoting repair of lesions by NER and BER pathways. Despite significant in vivo studies that highlight the cooperation between chromatin remodelers and histone chaperones to regulate BER, detailed mechanisms need to be further elucidated. Genome-wide mapping of alkylation DNA damage and mutagenesis in yeast has revealed that BER efficiency is impacted by chromatin structure. Strongly positioned nucleosomes accumulated unrepaired 7meG and 3meA and displayed significantly higher mutation density compared to nucleosome-depleted regions (Mao et al. 2017). The same study revealed a striking correlation between the DNA repair efficiency and the presence of histone post-translational modifications, such as H3K14ac or H3K4me3 known to increase the accessibility of DNA within the nucleosome. However, considering the amazing complexity of histone post-

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1409

translational modifications and the associations between them, in particular the positive and negative correlations existing between methylation and acetylation, a clear link between a particular histone mark and BER efficiency has not yet been clearly established.

Interplay Between BER, NER, and Transcription The first evidence concerning an interplay between DNA repair and transcription came from the observations made in the late 1980s indicating a preferential repair of pyrimidine dimers in expressed genes (Hanawalt 2015). The TC-NER is a sub-pathway of NER involved in the removal of UV-induced photolesions, which is activated when the RNA Pol II elongation is blocked by the lesion located on the transcribed strand of active genes. Under this circumstance, the transcription elongation factor CSB binds tightly to the stalled RNA Pol II facilitating the recruitment of further key TCR components such as CSA, NER factors, and the chromatin remodeler p300. Once the damage is recognized and the RNA Pol II paused, DNA is opened by TFIIH to allow the recruitment of the NER machinery including replication factor A (RPA) and XPA. DNA strand bearing the lesion is excised on the 3’ side by XPG and by a dimer of XPF-ERCC1 on the 5’ side. After removal of the damaged DNA fragment, DNA is re-synthesized by a specific DNA polymerase and ligated by a DNA ligase. Once the repair is complete, CSB is ubiquitinated and degraded leading to the re-start of transcription. It is generally assumed that RNA Pol II can bypass most of the non-bulky DNA lesions (such as 8-oxoG or thymine glycol) during transcriptional elongation but with the risk to contribute to transcriptional mutagenesis. However, transcriptional inactivation can occur at non-bulky DNA lesion sites after BER initiation and formation of SSB (Allgayer et al. 2016). In agreement to these observations, exposure of cells to oxidative stress induces a transient arrest of transcription, and XRCC1 seems to play an essential role in the transcriptional recovery. In the absence of XRCC1, persistent PARP1 trapping and activity at unrepaired BER intermediates lengthens transcriptional suppression (Adamowicz et al. 2021). Although evidence for the role of TC-NER factors in the BER process has been provided by several groups, this issue is still under debate (Chakraborty et al. 2021a), mostly due to several retracted publications proposing the existence of a TCR of 8-oxoG involving NER factors. Despite the extensive crosstalk between NER and BER factors, further work is definitively required to better understand the interplay between these two essential DNA repair pathways (Kumar et al. 2022; d’Augustin et al. 2020). The transcription factor FACT displays a histone chaperone activity and among others, a role in TC-NER and potentially in BER (Charles Richard et al. 2016). In HeLa cells treated by H2O2 to induce an oxidative stress, FACT pull-down and mass spectrometry analysis revealed that FACT was no more associated to transcription protein complexes but to repair proteins and chromatin remodelers from the SWI/SNF family (Charles Richard et al. 2016). While in vitro experiments showed that FACT boosts the remodeling activity of the RSC chromatin remodeler and

1410

A.-M. Di Guilmi et al.

facilitates the removal of uracil from DNA in in vitro reconstituted nucleosomes, further studies are required to elucidate the role of FACT in the repair of oxidative DNA damage in living cells (Charles Richard et al. 2016). The CSB protein, roles of which in both transcription and TC-NER are well established, seems to participate in BER of oxidatively damaged DNA lesions in multiple ways. CSB is a member of the SWI2/SNF2-family of DNA-dependent ATPases that contains seven ATPase motifs and acts as both a transcription elongation factor and a nucleosome remodeler. In biochemical assays using synthetic oligonucleotides containing a single 8-oxoG lesion, a contribution of CSB in the incision of the DNA backbone was observed when whole cell or nuclear extracts of normal and CSB-deficient cells were compared, indicating a potential role of CSB in the BER pathway (Dianov et al. 1999). Furthermore, 8-oxoG repair kinetics measured in cells isolated from OGG1 and CSB double knock-out mice suggested that CSB could be involved in the global repair of 8-oxoG in the absence of OGG1 by modulating the activities of other DNA glycosylases and/or of chromatin remodelers (Trapp et al. 2007). CSA- and CSB-deficient cells are sensitive to oxidizing agents such as potassium bromate, or X-rays, and accumulate higher levels of 8-oxoG compared to WT cells (d’Errico et al. 2007). CSB has been proposed to interact with the NAP-1L-histone chaperone for efficient TC-NER (Cho et al. 2013) suggesting that chromatin remodeling and recruitment of DNA repair proteins mediated by CSB are distinguishable activities. Which of these two distinct activities of CSB is required in the BER pathway is still an open issue, which deserves further evaluation. The participation of CSB in BER of oxidative DNA damage has also been observed after induction of 8-oxoG by laser micro-irradiation, resulting in a partly transcription-dependent recruitment of CSB to the sites of damage (Menoni et al. 2012) although recruitment of OGG1 in these conditions was shown to be independent on CSB or active transcription (Menoni et al. 2018). The p300 and CREB-binding protein (CBP) paralogs are transcriptional co-activators that associate with transcription factors and RNA Pol II complexes contributing to the formation of large transcriptional complexes. Beside their structural roles, CBP and p300 are involved in gene transcription through their lysine acetyl transferase (HAT) catalytic activity, targeting histones, DNA replication, and DNA repair factors (Bhakat et al. 2020). Many BER factors such as OGG1, TDG, NEIL2, MPG, APE1, FEN1, and DNA polβ have been identified as p300/CBP substrates. The connection between OGG1 and p300 was shown by co-immunoprecipitation and by nuclear co-localization. Moreover, cellular acetylated form of OGG1 (AcOGG1) perfectly correlates with p300 expression levels. The amount of AcOGG1 was estimated to be about 20% of the total OGG1 present in HeLa cells, and the predominantly acetylated residues are K338 and K341, in both in vitro and in vivo conditions. The repair efficiency of 8-oxoG in response to oxidative stress was proposed to be modulated by the amount of AcOGG1 since acetylation stimulates OGG1 activity. Mechanistically, the acetylation is assumed to increase the turnover of OGG1 by reducing its affinity for the AP product and/or an increased displacement by APEI (Bhakat et al. 2020). The interaction between

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1411

OGG1 and the histone deacetylase 1 HDAC1 suggests that OGG1 deacetylation might be required to return to a basal status. The interplay between BER and NER seems to go far beyond the link with transcription as several interactions between BER proteins and NER factors involved in global genome NER (GG-NER), responsible for the removal of DNA lesions throughout the whole genome, have also been reported (Fig. 3). More specifically, DNA glycosylases and APE1 enzymatic activities would be facilitated by UV-DDB and XPC, involved in the first step of GG-NER that is the recognition of the damage. Indeed, XPC interacts with OGG1, MPG, and TDG that initiate repair of oxidation, alkylation, and deamination base damage, respectively (D’Errico et al. 2006). Furthermore, by exploiting the laser micro-irradiation tool to induce locally 8-oxoG, it has been possible to observe the recruitment of XPC and CSB to the lesion site (Menoni et al. 2012). The role of UV-DDB1 as a general sensor of DNA damage in both BER and NER pathways in the chromatin context has also been reported (Jang et al. 2019). Using a novel chemoptogenetic approach, the dynamic recruitment of UV-DDB to locally induced 8-oxoG sites in telomeric regions was detected in vivo. UV-DDB1 was observed to be recruited to the 8-oxoG lesions induced at telomeres, where it interacts with OGG1 and stimulates its activity. UV-DDB1 facilitates OGG1 and APE1 strand cleavage and promotes Pol β-mediated gap-filling activity (Jang et al. 2019). Single-molecule real-time imaging revealed the dynamic interaction between

Fig. 3 Interplay between BER, NER, and transcription. Representation of the initials steps of BER and NER pathways, i. e., recognition and excision of the damaged DNA. The involvement of transcriptional factors FACT and CSB in both BER and NER is indicated to underline the interplay between these two processes. The NER components UV-DDB and XPC shown to participate in BER are also illustrated

1412

A.-M. Di Guilmi et al.

UV-DDB1, OGG1, and APE1, which facilitated the turnover rates of OGG1 and APE1 from DNA, hence increasing BER capacity (Jang et al. 2019). The same group has recently shown that UV-DDB2 also plays an essential role in the initiation of 8-oxoG repair by promoting the decompaction of chromatin (Kumar et al. 2022) (Fig. 3).

BER Is Linked to Transcription via the Mediator Complex and the Cohesin Structure Upon exposure of cells to oxidative stress, a fraction of OGG1 is recruited to open chromatin regions where it co-localizes with RNA Pol II and histone marks associated to active transcription (Amouroux et al. 2010). Other proteins involved in BER such as NTH1, APE1, and XRCC1 co-localize with OGG1 in these nuclear regions suggesting that they correspond to BER repair centers (Campalans et al. 2013; Campalans et al. 2015). Interestingly, OGG1 mutants unable to recognize the 8-oxoG were shown to re-localize to the chromatin fraction to a similar extent as the WT protein, suggesting that the recognition of the lesion itself was not at the origin of this association and that other factors may be required. In agreement with this idea, an essential role for the Cohesin and Mediator complexes in the recruitment of OGG1 to euchromatin regions and for the efficient excision of 8-oxoG has recently been reported (Lebraud et al. 2020). Co-immunoprecipitation experiments unveiled a dynamic association between OGG1 and both Mediator and Cohesin complexes induced by oxidative stress as no association could be detected in untreated cells. Cohesin and Mediator complexes play essential roles in the tridimensional organization of the genome and in the regulation of transcription (Kagey et al. 2010; Soutourina 2018). Genetic CRISPR-Cas9 screens combined to Hi-C and structural cryo-electron microscopy (EM) have shown that mammalian Mediator functionally links promoters and enhancers with the participation of Cohesins (El Khattabi et al. 2019). Mediator complex is at the center of a dense proteinprotein interaction network involving many transcription factors, RNA Pol II, and transcription elongation factors. The abundant literature relative to the highly conserved Mediator reflects the wide range of key cellular functions harbored by this complex in gene transcription, mRNA processing, chromatin organization, and genome stability. In the following sections, we will briefly describe how the Mediator complex is involved in the regulation of transcription initiation. The connection between OGG1 and transcription through the Mediator complex will be also stressed.

Structure and Function of the Mediator Complex The Mediator complex, composed by 30 subunits in humans, is defined into four modules: Head, Middle, Tail, and Kinase (CKM) (Fig. 4). Topologically, the Middle

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1413

Fig. 4 Mediator complex is involved in the initiation of transcription and in BER. A very simplified representation of the initiation of transcription is represented. Details are accessible in recent reviews (Jeronimo and Robert 2017; Soutourina 2018). The topological organization of the CKM and core Mediator complexes are shown in the box. The shape of each module is based on the 3D structure (Jeronimo and Robert 2017). The Mediator complex loaded on the enhancer region comprises the core and the CKM. The association of the core Mediator with the RNA pol II to form the Pre-Initiation Complex (PIC) requires the detachment of CKM. The events driving the phosphorylation of the RNA pol II CTD and the start of mRNA synthesis are not shown. The connection between the core Mediator, the CKM, and OGG1 is illustrated (according to Lebraud et al. 2020) to point out the link between transcription and BER

and Head modules are positioned at the center of the complex, while the Tail lies at one end. The Med14 subunit forms a scaffold to hold together the Head, Middle, and Tail modules (Jeronimo and Robert 2017). The sum of in vivo data from yeast and mammalian models shows that most of the essential subunits of the Head and Middle modules are necessary for transcription while the Tail and CKM modules display regulatory functions. An overview of the Mediator modular organization is well documented in recent reviews (Jeronimo and Robert 2017; Soutourina 2018) and schematically presented in Fig. 4. Considering the large size (about 1.5 MDa), the multi-subunits composition combined to intrinsic flexibility and conformational plasticity, the resolution of the Mediator structure has been a tremendous challenge for structural biologists and still remains an active research field. The function of the Mediator complex in the assembly and the stabilization of PIC at the promoter region are very well characterized. Structural and biochemical studies revealed the dense interaction network established between the Mediator, RNA Pol II, and the general transcription factors, including TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH (Jeronimo and Robert 2017). To allow the start of synthesis of mRNA, the PIC has to be disorganized, this step being commonly referred as

1414

A.-M. Di Guilmi et al.

“promoter escape.” One main facet of this process is the property of Mediator to stimulate the phosphorylation of the CTD of RNA Pol II by the Cdk7 subunit of TFIIH, which in turn breaks the Mediator-RNA Pol II interaction since the bulky phosphate group on the CTD-Ser5 induces steric hindrance incompatible with the geometry of the binding site at the interface of the Head and the Middle modules (Abdella et al. 2021). The following step in the initiation of transcription is the so-called “pause-release,” a reference to the observation that RNA Pol II often pauses 30–60 nucleotides after the initiation site. This pause is induced by DSIF and NELF and released upon the phosphorylation of these two factors by P-TEFb, the recruitment of which is promoted by the Mediator. A model to reconcile the successive tasks of Mediator in the transcription course has been proposed by Jeronimo and Robert in 2017, from which Fig. 4 is adapted. The CKM is composed by the subunits Med12 or 12-like (Med12/L), Med13 or 13-like (Med13/L), cyclin-dependent kinase 8 or 19 (Cdk8/19), and cyclin C (CycC). It is of note that no paralog of CycC has been identified and that CycCCDK19 regulates a different transcription set than the CycC-Cdk8 pair. Crystallography, cryo-EM, and biochemistry approaches have been widely used to decipher the structures of the individual CKM subunits from yeast and mammalian origins, their relative spatial organization, and the interactions network between the CKM and the core Mediator. The overall CKM structure appears as an elongated particle (Tsai et al. 2013), where Med13 and the pair Cdk8-CycC are located at the opposite tips of the structure and connected through the central Med12 subunit (Fig. 4). The Mediator-CKM interaction topology is conserved in yeast and human models and mainly depends on the contact between Med13 and the Middle module. Depletion of the CKM subunits Med12 and Med13, the ones that contact the Mediator core, impaired strongly the chromatin recruitment of OGG1 induced by oxidative stress (Lebraud et al. 2020) (Fig. 4). This observation, together with the fact that the central core subunit Med14 was also observed to be essential for OGG1 recruitment, indicates a role of the whole Mediator complex in the initiation of BER (Lebraud et al. 2020). Another important feature is that the Med12 and Cdk8 subunits of the CKM, present in the soluble fraction in untreated cells, were co-recruited to euchromatin regions and co-localized with OGG1 in response to oxidative stress (Lebraud et al. 2020). The similar dynamic behavior of OGG1 and the CKM subunits Med12 and Cdk8 after oxidative stress as well as their physical proximity in euchromatin regions constitute additional evidence for a functional link between BER and transcription through the leading role of Mediator. Detailed structural and biochemical studies of the mammalian Mediator subunits have revealed that CKM and RNA Pol II bind Mediator in a mutually exclusive manner (El Khattabi et al. 2019) since the C-terminal domain (CTD) of RNA Pol II and the CKM share the same binding sites at the Head and Middle modules (El Khattabi et al. 2019; Tsai et al. 2013; Abdella et al. 2021). Studies in yeast confirmed that binding of the Mediator core with CKM sterically blocks the insertion of Mediator within the PIC, resulting in the inhibition of transcription (Fig. 4).

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1415

Mediator detected at enhancer regions contains all four modules (El Khattabi et al. 2019), whereas the CKM module is undetectable at core promoters. CKM represses transcription by preventing the binding of RNA Pol II to Mediator thereby blocking the formation of the PIC-scaffold complex. This role of CKM in blocking basal transcription is independent of its kinase activity, but data supporting a role of CDK8 kinase activity in the repression of transcription should also be considered (Osman et al. 2021). CKM functions as both an activator and repressor of gene transcription (Friedson and Cooper 2021). For example, the involvement of CDK8 kinase activity in the induction of transcription has been reported upon treatment of cells with the two structurally unrelated CDK8 inhibitors Senexin A or Cortistatin A (Chen et al. 2017). ChIP analysis revealed that CDK8/19 is co-recruited with NFκB to the promoters of NFκB-responsive genes and potentiates NFκB activity, in particular inducing the transcription of pro-inflammatory cytokines. Since basal expression is not affected, CDK8 appears to be a mediator of transcriptional reprogramming (Chen et al. 2017). These observations are amazingly similar to the ones made for OGG1 leading to the hypothesis that the DNA glycosylase was essential for the recruitment of NFκB and the activation of transcription of pro-inflammatory genes (Seifermann et al. 2017). A combined role of both OGG1 and CDK8 in the regulation of transcription of pro-inflammatory genes remains unexplored, but it is an interesting hypothesis, mostly considering the recent interaction identified between both proteins in oxidative stress conditions (Lebraud et al. 2020). A role of the Mediator in the process of NER is also emerging and has been reviewed very recently (André et al. 2020). An interaction between Med17, a subunit of the Head module of Mediator, and the NER factor Rad2/XPG has been reported in yeast and humans. Julie Soutourina’s laboratory generated in vivo data using genome-wide approaches in yeast and demonstrated an interplay between Mediator, Rad2, and RNA Pol II. They showed that Rad2/XPG is recruited to upstream activating sequences UASs by Mediator, involved in the PIC assembly and transferred to transcribed regions through interaction with RNA Pol II (André et al. 2020). Altogether, these observations highlight the richness and dynamic nature of the interaction network established, both inside the Mediator complex (core and CKM) and outside with the other transcriptional components and DNA repair actors. Considering the huge complexity of protein-protein interactions and posttranslational modifications required for the initiation of transcription, which we have just very briefly summarized here, the image of OGG1 or APE1 triggering transcription factors to the promoters to activate transcription of target genes described previously is probably an oversimplification of what really happens in the cell nucleus. The discovery that OGG1 can indeed associate with Mediator and Cohesin complexes in oxidative stress conditions is probably just the tip of the iceberg, and many studies are still required to further characterize the interplay of these factors in transcription regulation and DNA repair, and most importantly, in the tridimensional nuclear context.

1416

A.-M. Di Guilmi et al.

Nuclear Condensates Related to Transcription Initiation: Mediator and Super-Enhancers Very high concentration of Mediator subunits has been associated to superenhancers, a class of enhancers that regulate transcription of cell identity genes and oncogenes and drive very high levels of transcription. The super-enhancer concept, first described by Whyte in 2013, refers to a region densely occupied by high amounts of transcription factors, Mediator, histone modifications (H3K27ac), and other factors (Brd4, chromatin modifiers, and remodelers) (Whyte et al. 2013). It has been proposed that phase-separation assembly relying on interactions between transcriptional and chromatin regulators could mediate the formation of super-enhancers by grouping several enhancers and promoters allowing a coordinated transcription of the involved genes (Sabari et al. 2018) (Fig. 4). Liquid-liquid phase-separation causes the formation of the so-called biomolecular condensates that play diverse and major roles in the cellular organization and function. Dysfunctions within these processes are associated to many human diseases (Alberti and Hyman 2021). Within the nucleus, condensates participate in the organization of nuclear bodies such as speckles or PML bodies, in the formation of heterochromatin domains and in the assembly of transcription factories and DNA damage repair compartments. Condensates are emerging as an important structure for the maintenance of the genomic stability and function (Spegg and Altmeyer 2021). The condensates are defined as membrane-less cellular compartments in which assembled components are separated from the environment. The compartmentalization of molecules (proteins, lipids, nucleic acids) involved in a specific pathway and thus individualized from other activities might facilitate the spatiotemporal regulation (limiting the area) and enhance the process efficiency (high local concentration). The association of components within the condensates relies on weak, multivalent, and dynamic interactions and involves most of the time intrinsically disordered regions (IDRs) as well as modular domains. Mediator subunits such as Med1 have an exceptionally large IDR and have been shown to form condensates both in vitro and in vivo (Sabari et al. 2018). Mediator operates in the formation of transcriptiondependent condensates by forming clusters with RNA Pol II and many transcription factors. This organization being of course highly dynamic, a finely tuned regulation of these clusters is necessary and involves post-translational modifications. A very nice example is the phosphorylation of the CTD domain of RNA Pol II that mediates the switch between transcriptional and splicing condensates (Guo et al. 2019). Live cell fluorescence microscopy, super-resolution microscopy, and single particle tracking have brought major contributions to this emerging field and shed light on how transcription clusters are organized and regulated in the nucleus of living cells. A model suggesting that CKM might act by itself in transcription regulation at the super-enhancer level is emerging. Interestingly, ChIP-seq analysis has revealed the association of CDK8 to super-enhancers, and it has been demonstrated that Mediator kinases negatively regulate super enhancer-associated gene expression (Pelish et al. 2015). Furthermore, Med12 and Med13 subunits of the CKM also contain large intrinsically disordered regions, suggesting that these subunits might participate in the organization of nuclear condensates. Many mutations in Med12 and Med13 IDR

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1417

domains have been associated to cancer as well as several neuronal and developmental disorders, indicating that they play crucial roles for cellular homeostasis (Friedson and Cooper 2021). The co-recruitment of Mediator CKM subunits, CDK8 and Med12, together with OGG1 at euchromatin regions after induction of oxidative DNA damage (Lebraud et al. 2020), suggests a link between these processes. Future studies are required to unveil if and how these CKM subunits contribute to the coordination between transcription and BER. Condensates have been shown to improve the efficiency of particular enzymatic pathways by increasing the local concentration of the proteins involved and could thus play an important role in BER by limiting the exposure of BER intermediates such as abasic sites or SSB to the cellular milieu. A fascinating study has recently reported that DNA synthesis is associated to SSB repair within neuronal enhancers since a very good correlation between H3K4me1 and super-enhancer marks was observed. Even though the origin of SSB in enhancers has not been completely elucidated, the authors suggest that they could correspond to BER intermediates (Wu et al. 2021). Considering the major contribution of Cohesin and Mediator complexes in the assembly of enhancers and super-enhancers, this DNA repair activity would be in agreement with observations indicating the requirement of those complexes for the chromatin assembly of BER complexes (Lebraud et al. 2020). In other words, these observations bring us again to our initial question by suggesting that those nuclear regions where enhancers and super-enhancers are assembled would play essential roles in both transcription and DNA repair (Fig. 5). It is of major importance to better characterize the molecular mechanisms involved in the orchestration of both processes in order to avoid collisions and to ensure a finely tuned coordination between the very complex and multifactorial protein machineries involved.

Conclusion The coordination between DNA repair and transcription is extremely important to ensure the stability of our genome and involves many different factors. Although the 8-oxoG is not a blocking lesion, abasic sites and single-strand breaks generated as intermediates during its processing by the BER pathway can compromise the progression of the RNA polymerase (Allgayer et al. 2016). Furthermore, the presence of BER factors at the site of repair can also induce collisions with the transcription machinery. However, the link between the repair of 8-oxoG and transcription seems far more complex and goes beyond the coordination required between BER and transcription machineries to avoid collisions. Indeed, the 8-oxoG itself has been proposed to be an epigenetic mark involved in the activation of complex transcriptional programs in response to different stimuli such as hormones or hypoxia (Wang et al. 2018). In all the examples described up to now, ROS seem to play a major role as signaling molecules in the induction of 8-oxoG at specific regulatory regions of the genome, mostly at the gene promoters. ROS can be originated by mitochondrial metabolism or enzymatically induced by demethylases acting at specific histone marks although in most cases, the exact molecular

1418

A.-M. Di Guilmi et al.

Fig. 5 Transcription and BER localized in nuclear condensates. The microscopy image shows the localization of BER complexes in open and transcriptional-active chromatin regions (from Campalans et al. 2013). Considering the connexion between OGG1, the Mediator core, the CKM, and transcription active marks (Amouroux et al. 2010; Lebraud et al. 2020), it is conceivable that these actors might be clusterized in transcriptional condensates, together with super-enhancer items like Brd4 and histone remodelers. The regulation of nuclear spatial organization could contribute to both BER efficiency and transcriptional regulation

mechanism leading to the generation of ROS and the induction of 8-oxoG have not been clearly established. In any case, the formation of 8-oxoG triggers the recruitment of OGG1 and other BER factors that induce conformational changes at the chromatin level and play a direct role in the recruitment of transcription factors to their targets. Several different molecular mechanisms have been proposed linking the detection and processing of the 8-oxoG by the BER pathway and transcriptional regulation. The different models involve or not the excision of 8-oxoG by OGG1, the dynamic folding of G-quadruplex structures as a consequence of OGG1 activity, or the SSB generated by the enzymatic activity of both OGG1 and APE1, although many more studies are needed to fully understand the mechanistic details (Fig. 1). The different steps involved in the processing of the lesion are highly dynamic, and the precise kinetics and sequence of events are difficult to apprehend by using global genome approaches that take an average of millions of cells. To overcome this

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1419

limitation, it is important to develop new experimental approaches allowing to follow the precise kinetics at specific loci in single cells. We cannot fully understand the coordination between the processing of 8-oxoG by the BER and the transcriptional regulation without taking into account that these processes occur in the nuclear environment where DNA is not naked but highly compacted in chromatin. The folding of DNA around the histone octamer limits the accessibility of the BER enzymes and may require the participation of chromatin remodelers to facilitate the recognition and processing of the 8-oxoG (Menoni et al. 2017). Furthermore, the many post-translational histone modifications characteristic of both euchromatin and heterochromatin affect the accessibility of both BER and transcriptional machineries and may also play an important role in the coordination between both processes. In this context it is not surprising that many factors involved in chromatin remodeling or histone modification, mostly known for their role in transcription or in the transcription coupled NER DNA repair pathway, have also been identified to play a role in BER (Figs. 2 and 3). Studying the spatio-temporal coordination of different steps of BER and transcription occurring in the cell nucleus upon induction of 8-oxoG is a major challenge. Results from both microscopy and ChIP-seq experiments have unveiled that upon exposure of cells to oxidative stress, OGG1 is found to be enriched in euchromatin and gene regulatory regions where it colocalizes with transcription active marks and transcription factors (d’Augustin et al. 2020; Wang et al. 2018). The enrichment of OGG1 in euchromatin regions is dependent on the Mediator complex, mostly known for its major role in transcription initiation and regulation but also playing a role in NER (Lebraud et al. 2020; André et al. 2020). Considering the role of Mediator in transcription and in at least two DNA repair pathways, the NER and the BER, together with its role in the tridimensional organization of the genome and in the formation of supramolecular condensates, this complex is emerging as an important player in the coordination of both DNA repair and transcription in the nuclear context. The connections between the processing of 8-oxoG by the BER pathway and transcription regulation can be established at multiple levels. Many evidences coming from biochemical studies, wide-genome approaches, and cellular biology point to an important role of OGG1 at the crossroad between detection and excision of 8-oxoG and the regulation of complex transcriptional programs. The use of 8-oxoG as an epigenetic mark is not without risk as it may give rise to the fixation of mutations that could challenge the stability of our genomes. Further studies are required to unveil the molecular mechanisms allowing the dynamic coordination of the different steps that allow 8-oxoG to be read as a mark for transcriptional regulation and as a lesion that needs to be removed, in order to ensure both the expression and the stability of our genome. Acknowledgments We thank Bernd Epe and Juan Pablo Radicella for the critical reading of the manuscript and for fruitful discussions. We apologize to the authors of many excellent manuscripts that could not be cited during this review because of strong restrictions in the number of references. A.C. lab is funded by Agence National de la Recherche (project ANR TG-TOX), CEA Radiobiology program, Foncer contre le cancer, and Electricité de France.

1420

A.-M. Di Guilmi et al.

References Abdella R, Talyzina A, Chen S, Inouye CJ, Tjian R, He Y (2021) Structure of the human Mediatorbound transcription preinitiation complex. Science 372:52–56. https://doi.org/10.1126/science. abg3074 Adamowicz M, Hailstone R, Demin AA, Komulainen E, Hanzlikova H, Brazina J, Gautam A, Wells SE, Caldecott KW (2021) XRCC1 protects transcription from toxic PARP1 activity during DNA base excision repair. Nat Cell Biol 23:1287–1298. https://doi.org/10.1038/ s41556-021-00792-w Aguilera-Aguirre L, Hosoki K, Bacsi A, Riadak Z, Sur S, Hedge ML, Tian B, Saavedra-Molina A, Brasier AR, Ba X, Boldogh I (2015) Whole transcriptome analysis reveals a role for OGG1initiated DNA repair signaling in airway remodeling. Free Radic Biol Med 89:20–33. https:// doi.org/10.1016/j.freeradbiomed.2015.07.007.Whole Alberti S, Hyman AA (2021) Biomolecular condensates at the nexus of cellular stress, protein aggregation disease and ageing. Nat Rev Mol Cell Biol 22:196–213. https://doi.org/10.1038/ s41580-020-00326-6 Allgayer J, Kitsera N, Bartelt S, Epe B, Khobta A (2016) Widespread transcriptional gene inactivation initiated by a repair intermediate of 8-oxoguanine. Nucleic Acids Res. https://doi. org/10.1093/nar/gkw473 Al-Mehdi AB, Pastukh VM, Swiger BM, Reed DJ, Patel MR, Bardwell GC, Pastukh VV, Alexeyev MF, Gillespie MN (2013) Perinuclear mitochondrial clustering creates an oxidant-rich nuclear domain required for hypoxia-induced transcription. Sci Signal 5:1–20 Amente S, Bertoni A, Morano A, Lania L, Avvedimento EV, Majello B (2010) LSD1-mediated demethylation of histone H3 lysine 4 triggers Myc-induced transcription. Oncogene 29: 3691–3702. https://doi.org/10.1038/onc.2010.120 Amouroux R, Campalans A, Epe B, Radicella JP (2010) Oxidative stress triggers the preferential assembly of base excision repair complexes on open chromatin regions. Nucleic Acids Res 38: 2878–2890. https://doi.org/10.1093/nar/gkp1247 An J, Yin M, Yin J, Wu S, Selby CP, Yang Y, Sancar A, Xu G-L, Qian M, Hu J (2021) Genome-wide analysis of 8-oxo-7,8-dihydro-2’-deoxyguanosine at single-nucleotide resolution unveils reduced occurrence of oxidative damage at G-quadruplex sites. Nucleic Acids Res 49: 12252–12267. https://doi.org/10.1093/nar/gkab1022 André KM, Sipos EH, Soutourina J (2020) Mediator roles going beyond transcription. Trends Genet 37:224–234. https://doi.org/10.1016/j.tig.2020.08.015 Bhakat KK, Mantha AK, Mitra S (2009) Transcriptional regulatory functions of mammalian AP-endonuclease (APE1/Ref-1), an essential multifunctional protein. Antioxid Redox Signal 11:621–637. https://doi.org/10.1089/ars.2008.2198 Bhakat KK, Sengupta S, Mitra S (2020) Fine-tuning of DNA base excision/strand break repair via acetylation. DNA Repair (Amst) 93:102931. https://doi.org/10.1016/j.dnarep.2020.102931 Bohm KA, Hodges AJ, Czaja W, Selvam K, Smerdon MJ, Mao P, Wyrick JJ (2021) Distinct roles for RSC and SWI/SNF chromatin remodelers in genomic excision repair. Genome Res 31: 1047–1059. https://doi.org/10.1101/gr.274373.120 Boiteux S, Coste F, Castaing B (2017) Repair of 8-oxo-7,8-dihydroguanine in prokaryotic and eukaryotic cells: properties and biological roles of the Fpg and OGG1 DNA N-glycosylases. Free Radic Biol Med 107:179–201. https://doi.org/10.1016/j.freeradbiomed.2016.11.042 Bravard A, Vacher M, Gouget B, Coutant A, de Boisferon FH, Marsin S, Chevillard S, Radicella JP (2006) Redox regulation of human OGG1 activity in response to cellular oxidative stress. Mol Cell Biol 26:7430–7436. https://doi.org/10.1128/MCB.00624-06 Campalans A, Kortulewski T, Amouroux R, Menoni H, Vermeulen W, Radicella JP (2013) Distinct spatiotemporal patterns and PARP dependence of XRCC1 recruitment to single-strand break and base excision repair. Nucleic Acids Res 41:3115–3129. https://doi.org/10.1093/nar/gkt025 Campalans A, Moritz E, Kortulewski T, Biard D, Epe B, Radicella JP (2015) Interaction with OGG1 Is required for efficient recruitment of XRCC1 to base excision repair and maintenance

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1421

of genetic stability after exposure to oxidative stress. Mol Cell Biol 35:1648–1658. https://doi. org/10.1128/mcb.00134-15 Chakraborty A, Tapryal N, Islam A, Mitra S, Hazra T (2021a) Transcription coupled base excision repair in mammalian cells: so little is known and so much to uncover. DNA Repair (Amst) 107: 103204. https://doi.org/10.1016/j.dnarep.2021.103204 Chakraborty U, Shen ZJ, Tyler J (2021b) Chaperoning histones at the DNA repair dance. DNA Repair (Amst) 108:103240. https://doi.org/10.1016/j.dnarep.2021.103240 Charles Richard JL, Shukla MS, Menoni H, Ouararhni K, Lone IN, Roulland Y, Papin C, Ben Simon E, Kundu T, Hamiche A et al (2016) FACT assists base excision repair by boosting the remodeling activity of RSC. PLoS Genet 12:1–21. https://doi.org/10.1371/journal.pgen. 1006221 Chen M, Liang J, Ji H, Yang Z, Altilia S, Hu B, Schronce A, McDermott MSJ, Schools GP, Lim CU et al (2017) CDK8/19 Mediator kinases potentiate induction of transcription by NFκB. Proc Natl Acad Sci U S A 114:10208–10213. https://doi.org/10.1073/pnas.1710467114 Chiolo I, Minoda A, Colmenares SU, Polyzos A, Costes SV, Karpen GH (2011) Double-strand breaks in heterochromatin move outside of a dynamic HP1a domain to complete recombinational repair. Cell 144:732–744. https://doi.org/10.1016/j.cell.2011.02.012 Cho I, Tsai PF, Lake RJ, Basheer A, Fan HY (2013) ATP-dependent chromatin remodeling by Cockayne syndrome protein B and NAP1-like histone chaperones is required for efficient transcription-coupled DNA repair. PLoS Genet 9. https://doi.org/10.1371/journal.pgen.1003407 Clark DW, Phang T, Edwards MG, Geraci MW, Gillespie MN (2012) Promoter G-quadruplex sequences are targets for base oxidation and strand cleavage during hypoxia-induced transcription. Free Radic Biol Med 53:51–59. https://doi.org/10.1016/j.freeradbiomed.2012.04.024 Cogoi S, Ferino A, Miglietta G, Pedersen EB, Xodo LE (2018) The regulatory G4 motif of the Kirsten ras (KRAS) gene is sensitive to guanine oxidation: Implications on transcription. Nucleic Acids Res 46:661–676. https://doi.org/10.1093/nar/gkx1142 Collins AR, Cadet J, Moller L, Poulsen HE, Viña J (2004) Are we sure we know how to measure 8-oxo-7,8-dihydroguanine in DNA from human cells? Arch Biochem Biophys 423:57–65. https://doi.org/10.1016/j.abb.2003.12.022 Czaja W, Mao P, Smerdon MJ (2014) Chromatin remodeling complex RSC promotes base excision repair in chromatin of Saccharomyces cerevisiae. DNA Repair (Amst) 16:35–43. https://doi.org/ 10.1016/j.dnarep.2014.01.002 D’Augustin O, Huet S, Campalans A, Radicella JP (2020) Lost in the crowd: how does human 8-oxoguanine DNA glycosylase 1 (OGG1) find 8-oxoguanine in the genome? Int J Mol Sci 21: 1–18. https://doi.org/10.3390/ijms21218360 D’Errico M, Parlanti E, Teson M, De Jesus BMB, Degan P, Calcagnile A, Jaruga P, Bjørås M, Crescenzi M, Pedrini AM et al (2006) New functions of XPC in the protection of human skin cells from oxidative damage. EMBO J 25:4305–4315. https://doi.org/10.1038/sj.emboj. 7601277 D’Errico M, Parlanti E, Teson M, Degan P, Lemma T, Calcagnile A, Iavarone I, Jaruga P, Ropolo M, Pedrini AM et al (2007) The role of CSA in the response to oxidative DNA damage in human cells. Oncogene 26:4336–4343. https://doi.org/10.1038/sj.onc.1210232 Dianov G, Bischoff C, Sunesen M, Bohr VA (1999) Repair of 8-oxoguanine in DNA is deficient in Cockayne syndrome group B cells. Nucleic Acids Res 27:1365–1368. https://doi.org/10.1038/ sj.onc.1205994 Ding Y, Fleming AM, Burrows CJ (2017) Sequencing the mouse genome for the oxidatively modified base 8-Oxo-7,8-dihydroguanine by OG-Seq. J Am Chem Soc 139:2569–2572. https://doi.org/10.1021/jacs.6b12604 El Khattabi L, Zhao H, Kalchschmidt J, Young N, Jung S, Van Blerkom P, Kieffer-Kwon P, KiefferKwon KR, Park S, Wang X et al (2019) A pliable mediator acts as a functional rather than an architectural bridge between promoters and enhancers. Cell 178:1145–1158.e20. https://doi.org/ 10.1016/j.cell.2019.07.011

1422

A.-M. Di Guilmi et al.

Fang Y, Zou P (2020) Genome-wide mapping of oxidative DNA damage via engineering of 8-oxoguanine DNA glycosylase. Biochemistry 59:85–89. https://doi.org/10.1021/acs. biochem.9b00782 Ferrand J, Plessier A, Polo SE (2021) Control of the chromatin response to DNA damage: histone proteins pull the strings. Semin Cell Dev Biol 113:75–87. https://doi.org/10.1016/j.semcdb. 2020.07.002 Fleming AM, Burrows CJ (2021) Oxidative stress-mediated epigenetic regulation by G-quadruplexes. 3:1–16 Fortuny A, Chansard A, Caron P, Chevallier O, Leroy O, Renaud O, Polo SE (2021) Imaging the response to DNA damage in heterochromatin domains reveals core principles of heterochromatin maintenance. Nat Commun 12:1–16. https://doi.org/10.1038/s41467-021-22575-5 Friedson B, Cooper KF (2021) Cdk8 kinase module: A mediator of life and death decisions in times of stress. Microorganisms 9. https://doi.org/10.3390/microorganisms9102152 Gorini F, Scala G, Di Palo G, Dellino GI, Cocozza S, Pelicci PG, Lania L, Majello B, Amente S (2020) The genomic landscape of 8-oxodG reveals enrichment at specific inherently fragile promoters. Nucleic Acids Res 48:4309–4324. https://doi.org/10.1093/NAR/GKAA175 Guha M, Srinivasan S, Ruthel G, Kashina AK, Carstens RP, Mendoza A, Khanna C, Winkle V, Avadhani NG (2014) Mitochondrial retrograde signaling induces epithelial-mesenchymal transition and generates breast cancer stem cells. Oncogene 33:5238–5250. https://doi.org/10.1038/ onc.2013.467 Guo YE, Manteiga JC, Henninger JE, Sabari BR, Agnese AD, Shrinivas K, Abraham BJ, Hannett NM, Spille J, Afeyan LK et al (2019) Pol II phosphorylation regulates a switch between transcriptional and splicing condensates. Nature. https://doi.org/10.1038/s41586-019-1464-0 Hanawalt PC (2015) Historical perspective on the DNA damage response. DNA Repair (Amst) 36: 2–7. https://doi.org/10.1016/j.dnarep.2015.10.001 Hao W, Wang J, Zhang Y, Wang C, Xia L, Zhang W, Zafar M, Kang JY, Wang R, Ali Bohio A et al (2020) Enzymatically inactive OGG1 binds to DNA and steers base excision repair toward gene transcription. FASEB J 34:7427–7441. https://doi.org/10.1096/fj.201902243R Jang S, Kumar N, Beckwitt EC, Kong M, Fouquerel E, Rapić-Otrin V, Prasad R, Watkins SC, Khuu C, Majumdar C et al (2019) Damage sensor role of UV-DDB during base excision repair. Nat Struct Mol Biol 26:695–703. https://doi.org/10.1038/s41594-019-0261-7 Jeronimo C, Robert F (2017) The mediator complex: at the Nexus of RNA polymerase II transcription. Trends Cell Biol 27:765–783. https://doi.org/10.1016/j.tcb.2017.07.001 Ježek, J.; Cooper, K.F.; Strich, R. The impact of mitochondrial fission-stimulated ROS production on pro-apoptotic chemotherapy. 2021. Kagey MH, Newman JJ, Bilodeau S, Zhan Y, Orlando DA, Van Berkum NL, Ebmeier CC, Goossens J, Rahl PB, Levine SS et al (2010) Mediator and cohesin connect gene expression and chromatin architecture. Nature 467:430–435. https://doi.org/10.1038/nature09380 Kim D, Kim KI, Baek SH (2021) Roles of lysine-specific demethylase 1 (LSD1) in homeostasis and diseases. J Biomed Sci 28:1–14. https://doi.org/10.1186/s12929-021-00737-3 Kumar N, Theil AF, Roginskaya V, Ali Y, Calderon M, Watkins SC, Barnes RP, Opresko PL, Pines A, Lans H et al (2022) Global and transcription-coupled repair of 8-oxoG is initiated by nucleotide excision repair proteins. Nat Commun 13:1–16. https://doi.org/10.1038/s41467-02228642-9 Lebraud E, Pinna G, Siberchicot C, Depagne J, Busso D, Fantini D, Irbah L, Robeska E, Kratassiouk G, Ravanat J-L et al (2020) Chromatin recruitment of OGG1 requires cohesin and mediator and is essential for efficient 8-oxoG removal. Nucleic Acids Res. https://doi.org/ 10.1093/nar/gkaa611 Lia D, Reyes A, de Melo Campos JTA, Piolot T, Baijer J, Radicella JP, Campalans A (2018) Mitochondrial maintenance under oxidative stress depends on mitochondrially localised α-OGG1. J Cell Sci 131:jcs213538. https://doi.org/10.1242/jcs.213538 Mao P, Brown AJ, Malc EP, Mieczkowski PA, Smerdon MJ, Roberts SA, Wyrick JJ (2017) Genome-wide maps of alkylation damage, repair, and mutagenesis in yeast reveal mechanisms

44

OGG1 at the Crossroads Between Repair and Transcriptional Regulation

1423

of mutational heterogeneity. Genome Res 27:1674–1684. https://doi.org/10.1101/gr. 225771.117 Menoni H, Hoeijmakers JHJ, Vermeulen W (2012) Nucleotide excision repair-initiating proteins bind to oxidative DNA lesions in vivo. J Cell Biol 199:1037–1046. https://doi.org/10.1083/jcb. 201205149 Menoni H, Di Mascio P, Cadet J, Dimitrov S, Angelov D (2017) Chromatin associated mechanisms in base excision repair – nucleosome remodeling and DNA transcription, two key players. Free Radic Biol Med 107:159–169. https://doi.org/10.1016/j.freeradbiomed.2016.12.026 Menoni H, Wienholz F, Theil AF, Janssens RC, Lans H, Campalans A, Radicella JP, Marteijn JA, Vermeulen W (2018) The transcription-coupled DNA repair-initiating protein CSB promotes XRCC1 recruitment to oxidative DNA damage. Nucleic Acids Res 46:7747–7756. https://doi. org/10.1093/nar/gky579 Nilsen H, Lindahl T, Verreault A (2002) DNA base excision repair of uracil residues in reconstituted nucleosome core particles. EMBO J 21:5943–5952. https://doi.org/10.1093/emboj/cdf581 Ohno M, Miura T, Furuichi M, Tominaga Y, Tsuchimoto D, Sakumi K, Nakabeppu Y (2006) A genome-wide distribution of 8-oxoguanine correlates with the preferred regions for recombination and single nucleotide polymorphism in the human genome. Genome Res 16:567–575. https://doi.org/10.1101/gr.4769606 Olmon ED, Delaney S (2017) Differential ability of five DNA glycosylases to recognize and repair damage on nucleosomal DNA. ACS Chem Biol 12:692–701. https://doi.org/10.1021/ acschembio.6b00921 Osman S, Mohammad E, Lidschreiber M, Stuetzer A, Bazsó FL, Maier KC, Urlaub H, Cramer P (2021) The Cdk8 kinase module regulates interaction of the mediator complex with RNA polymerase II. J Biol Chem 296:100734. https://doi.org/10.1016/j.jbc.2021.100734 Pastukh V, Roberts JT, Clark DW, Bardwell GC, Patel M, Al-Mehdi AB, Borchert GM, Gillespie MN (2015) An oxidative DNA “damage” and repair mechanism localized in the VEGF promoter is important for hypoxia-induced VEGF mRNA expression. Am J Physiol Lung Cell Mol Physiol 309:L1367–L1375. https://doi.org/10.1152/ajplung.00236.2015 Pelish HE, Liau BB, Nitulescu II, Tangpeerachaikul A, Poss ZC, Da Silva DH, Caruso BT, Arefolov A, Fadeyi O, Christie AL, Du K et al (2015) Mediator kinase inhibition further activates super-enhancer associated genes in AML. Nature 526:273–276. https://doi.org/10. 1038/nature14904.Mediator Perillo B, Ombra MN, Bertoni A, Cuozzo C, Sacchetti S, Sasso A, Chiariotti L, Malorni A, Abbondanza C, Avvedimento EV (2008) DNA oxidation as triggered by H3K9me2 demethylation drives estrogen-induced gene expression. Science 202–207 Pezone A, Zuchegna C, Tramontano A, Romano A, Russo G, de Rosa M, Vinciguerra M, Porcellini A, Gottesman ME, Avvedimento EV (2019) RNA stabilizes transcription-dependent chromatin loops induced by nuclear hormones. Sci Rep 9:1–12. https://doi.org/10.1038/s41598019-40123-6 Pezone A, Taddei ML, Tramontano A, Dolcini J, Boffo FL, De Rosa M, Parri M, Stinziani S, Comito G, Porcellini A et al (2020) Targeted DNA oxidation by LSD1-SMAD2/3 primes TGF-β1/EMT genes for activation or repression. Nucleic Acids Res 48:8943–8958. https:// doi.org/10.1093/nar/gkaa599 Poetsch AR (2020) The genomics of oxidative DNA damage, repair, and resulting mutagenesis. Comput Struct Biotechnol J 18:207–219. https://doi.org/10.1016/j.csbj.2019.12.013 Rong Z, Tu P, Xu P, Sun Y, Yu F, Tu N, Guo L, Yang Y (2021) The mitochondrial response to DNA damage. Front Cell Dev Biol 9:1–10 Roychoudhury S, Pramanik S, Harris HL, Tarpley M, Sarkar A, Spagnol G, Sorgen PL, Chowdhury D, Band V, Klinkebiel D et al (2020) Endogenous oxidized DNA bases and APE1 regulate the formation of G-quadruplex structures in the genome. Proc Natl Acad Sci U S A 117. https://doi.org/10.1073/pnas.1912355117 Sabari BR, Dall’Agnese A, Boija A, Klein IA, Coffey EL, Shrinivas K, Abraham BJ, Hannett NM, Zamudio AV, Manteiga JC et al (2018) Coactivator condensation at super-enhancers links phase separation and gene control. Science 361:eaar3958. https://doi.org/10.1126/science.aar3958

1424

A.-M. Di Guilmi et al.

Schuster-Böckler B, Lehner B (2012) Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature 488:504–507. https://doi.org/10.1038/nature11273 Seifermann M, Epe B (2017) Oxidatively generated base modifications in DNA: not only carcinogenic risk factor but also regulatory mark? Free Radic Biol Med. https://doi.org/10.1016/j. freeradbiomed.2016.11.018 Seifermann M, Ulges A, Bopp T, Melcea S, Schäfer A, Oka S, Nakabeppu Y, Klungland A, Niehrs C, Epe B (2017) Role of the DNA repair glycosylase OGG1 in the activation of murine splenocytes. DNA Repair (Amst) 58:13–20. https://doi.org/10.1016/j.dnarep.2017.08.005 Shigdel UK, Ovchinnikov V, Lee SJ, Shih JA, Karplus M, Nam K, Verdine GL (2020) The trajectory of intrahelical lesion recognition and extrusion by the human 8-oxoguanine DNA glycosylase. Nat Commun 11:1–8. https://doi.org/10.1038/s41467-020-18290-2 Soutourina J (2018) Transcription regulation by the Mediator complex. Nat Rev Mol Cell Biol 19: 262–274. https://doi.org/10.1038/nrm.2017.115 Spegg V, Altmeyer M (2021) Biomolecular condensates at sites of DNA damage: more than just a phase. DNA Repair (Amst) 106:103179. https://doi.org/10.1016/j.dnarep.2021.103179 Trapp C, Reite K, Klungland A, Epe B (2007) Deficiency of the Cockayne syndrome B (CSB) gene aggravates the genomic instability caused by endogenous oxidative DNA base damage in mice. Oncogene 26:4044–4048. https://doi.org/10.1038/sj.onc.1210167 Tsai KL, Sato S, Tomomori-Sato C, Conaway RC, Conaway JW, Asturias FJ (2013) A conserved Mediator-CDK8 kinase module association regulates Mediator-RNA polymerase II interaction. Nat Struct Mol Biol 20:611–619. https://doi.org/10.1038/nsmb.2549 Visnes T, Cázares-Körner A, Hao W, Wallner O, Masuyer G, Loseva O, Mortusewicz O, Wiita E, Sarno A, Manoilov A et al (2018) Small-molecule inhibitor of OGG1 suppresses proinflammatory gene expression and inflammation. Science 362:834–839. https://doi.org/10. 1126/science.aar8048 Wang R, Hao W, Pan L, Boldogh I, Ba X (2018) The roles of base excision repair enzyme OGG1 in gene expression. Cell Mol Life Sci 75:3741–3750 Wang K, Maayah M, Sweasy JB, Alnajjar KS (2021) The role of cysteines in the structure and function of OGG1. J Biol Chem 296:100093. https://doi.org/10.1074/jbc.RA120.016126 Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA (2013) Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153:307–319. https://doi.org/10.1016/j.cell.2013.03.035 Wu J, McKeague M, Sturla SJ (2018) Nucleotide-resolution genome-wide mapping of oxidative DNA damage by Click-Code-Seq. J Am Chem Soc 140:9783–9787. https://doi.org/10.1021/ jacs.8b03715 Wu W, Hill SE, Nathan WJ, Paiano J, Callen E, Wang D, Shinoda K, van Wietmarschen N, ColónMercado JM, Zong D et al (2021) Neuronal enhancers are hotspots for DNA single-strand break repair. Nature. https://doi.org/10.1038/s41586-021-03468-5 Yoshihara M, Jiang L, Akatsuka S, Suyama M, Toyokuni S (2014) Genome-wide profiling of 8-oxoguanine reveals its association with spatial positioning in nucleus. DNA Res 21:603–612. https://doi.org/10.1093/dnares/dsu023 Zuchegna C, Aceto F, Bertoni A, Romano A, Perillo B, Laccetti P, Gottesman ME, Avvedimento EV, Porcellini A (2014) Mechanism of retinoic acid-induced transcription: histone code, DNA oxidation and formation of chromatin loops. Nucleic Acids Res 42:11040–11055. https://doi. org/10.1093/nar/gku823

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

45

Ro´ża Pawłowska and Piotr Guga

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Polydiastereomerism of PS-Oligomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synthesis of P-Stereodefined Phosphorothioate Oligonucleotides . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stereodefined Phosphorothioate Nucleotides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interactions of P-Stereodefined PS-Oligomers with DNA, RNA, and Protein Molecules . . . . Formation of the Homoduplexes DNA/DNA and RNA/RNA and Heteroduplexes DNA/RNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Formation of Higher-Order Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stereodefined PS-Oligomers As Tools in Mechanistic Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interactions with Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Metabolism of PS-Oligomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biological Activity of Synthetic Nucleic Acids Containing Phosphorothioate Backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Physiological Phosphorothioate Modification of Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1426 1429 1430 1433 1433 1434 1435 1436 1436 1437 1438 1440 1445

Abstract

Internucleotide phosphate diester bonds in unmodified oligonucleotides are rapidly degraded by nucleolytic enzymes in cells or body fluids. This property excludes natural DNA and RNA molecules from potential medical applications and from many structural and mechanistic studies. DNA nucleotides and oligonucleotides in which one of the nonbridging phosphate oxygen atoms is replaced by a sulfur atom (PS-DNA) were among the first DNA analogs to be designed and synthesized. PS-DNA exhibits significantly higher nuclease resistance and also offers important opportunities for detailed studies of interactions with other biomolecules at the molecular level. However, the substitution creates a stereogenic R. Pawłowska · P. Guga (*) Department of Bioorganic Chemistry, Centre of Molecular and Macromolecular Studies, Polish Academy of Sciences, Łódź, Poland e-mail: [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_51

1425

1426

R. Pawłowska and P. Guga

center at the phosphorus atom, so that even short oligomers synthesized by a non-stereocontrolled method exist as mixtures of hundreds or even thousands of P-diastereomers, which usually cannot be separated chromatographically. Stereocontrolled synthesis methods have been developed to overcome this problem. P-stereodefined probes, including isotopomerically labeled species, have been used to elucidate the mode of action of numerous enzymes (nucleases, transferases, and kinases), ribozymes, and DNA-zymes, as well as to study the thermodynamic stability of nucleic acid complexes (duplexes, triplexes, and i-motif) and the mechanism of B-Z-type conformational changes. They are also useful tools for tuning the properties of siRNA duplexes. For many years, phosphorothioate modification was considered purely artificial, having been designed and implemented by chemists. However, in 2007, a phosphorothioate modification of DNA in bacteria was discovered and its functioning was intensively studied. In 2020, a report was published on the presence of a phosphorothioate modification in RNA isolated from prokaryotes and eukaryotes, but this claim has been criticized and seems premature, to say the least. This chapter covers two main areas related to PS-oligonucleotides: first, the synthetic routes to P-stereodefined oligonucleotides and selected examples of their application in structural, biochemical, and biological experiments; and second, the biosynthesis of oligonucleotides with phosphorothioate modification and their physiological functions.

Introduction The potential of synthetic oligonucleotides as a means to study the functions of nucleic acids and other biomolecules has been known for more than 40 years. In 1978, Zamecnik and Stephenson attempted to use unmodified DNA oligonucleotides to reduce the expression of selected genes. To date, several approaches based on sequence-specific targeting of nucleic acids have been investigated. In the so-called antisense strategy, the undesired process of mRNA-to-protein translation should be stopped by the formation of the oligonucleotide-mRNA duplex and the activation of RNase H to hydrolyze the messenger RNA (Stein and Krieg 1998; Crooke 1998). In the antigene strategy, inhibition of transcription should be achieved by “locking” the double-stranded DNA with an oligonucleotide probe in an “inert” triplex structure stabilized by Hoogsteen interactions. Unfortunately, the results of these two classical methods did not meet expectations. Better results were obtained by using oligonucleotides with alternative and novel mechanisms of action, such as natural and synthetic ribozymes (and their chemically modified analogs) that can cause sequence-specific cleavage of a target RNA (Michienzi and Rossi 2001). The synthetic DNA companions with sequence-specific hydrolytic activity against RNA molecules have also been prepared (Breaker 2000). In 1998, the first report on RNA interference (RNAi) was published (Fire et al. 1998). RNA interference is an evolutionarily highly conserved process of posttranscriptional gene silencing, and

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1427

Chart 1 P-stereochemistry of the phosphorothioate analogs of dinucleotide

_

RP

_

SP

the strongly related phenomenon of synthetic short interfering RNA duplexes (siRNA) led to major advances in molecular biology. The recently discovered CRISPR (clustered regularly interspaced short palindromic repeats) system, which provides adaptive immunity to plasmids and phages in prokaryotes, has been ingeniously used to develop an even more powerful genome-editing tool, namely, the CRISPR/CRISPR-associated nuclease 9 (CRISPR/Cas9) genome-editing system. A special collection of articles on this method is available online at https:// www.annualreviews.org/page/crispr. Another area in which DNA or RNA oligomers are used is in mechanistic studies, where these molecules are expected to provide insight into the mechanisms of their interactions with other nucleic acids or proteins. Detailed knowledge of these interactions can be very helpful in the development of specific diagnostic probes, inhibitors, agonists, or drugs. There are a growing number of approved nucleic acid therapeutics that treat diseases by targeting their genetic causes in vivo (Kulkarni et al. 2021). By manipulating a target gene through inhibition, addition, replacement, or editing, nucleic acid-based cures can achieve long-term effects. There is also an important field of small molecules related to nucleic acids. All of the above methods require specific oligonucleotide probes that not only have the right length and sequence, but also must be sufficiently stable in body fluids to perform the desired functions. Unfortunately, internucleotide-phosphate-diester linkages are rapidly degraded in cells or body fluids by nucleolytic enzymes. To address this problem, numerous chemical and structural modifications have been made to the nucleobases, ribose or deoxyribose, and phosphate units. The phosphorothioate modification, in which one of the nonbridging phosphate oxygen atoms was replaced with a sulfur atom (Chart 1), was one of the first to be developed and has demonstrated its versatility in many areas of research. Interestingly, in 2007, a mass spectrometry analysis of DNA isolated from Streptomyces lividans in Wang’s lab revealed that the phosphorothioate modification was not actually a modification, as these phosphorothioate internucleotide bonds were generated by nature during evolution (Wang et al. 2007). In 2019, an antiviral system based on phosphorothioate DNA was discovered in archaea (Xiong et al. 2019). Remarkably, these phosphorothioate linkages contained only the phosphorus atoms of the RP absolute configuration (see Chart 1). In 2020, Wu et al. reported the

1428

R. Pawłowska and P. Guga

presence of the phosphorothioate modification in RNA isolated from prokaryotes and eukaryotes (Wu et al. 2020). However, more detailed mass spectrometry analyzes revealed that the dinucleotides purported to contain phosphorothioate internucleotide linkages were actually 20 -O-methylated dinucleotides (Kaiser et al. 2021). Chemically synthesized PS-oligonucleotides (PS-oligo) appeared on the scene in 1967, 40 years before Wang’s discovery, when Professor Fritz Eckstein of the Max Planck Institute for Experimental Medicine (Goettingen, Germany) first synthesized PS-DNA and PS-RNA dinucleotides. Chemical synthesis of longer PS-DNA oligomers was first performed in 1984 using the phosphoramidite monomers 1 (Scheme 1, R ¼ Me) in solid-phase synthesis of oligonucleotides, in which oxidation of the internucleotide phosphite moiety in 2 (with an iodine/lutidine/water mixture) was replaced by sulfurization in each synthetic cycle. Nowadays, non-stereocontrolled synthesis of PS-DNA oligomers is relatively easy (the monomers 1 usually have R ¼ 2-cyanoethyl), and phosphorothioate analogs of RNA oligomers (PS-RNA) as well as phosphorothioate forms of various

1H-tetrazole

RP or SP

*

*

R P + SP 1

B'= AdeBz, Cyt Bz, GuaiBu, Thy R = Me or -CH2-CH2-CN

2 sulfurization

n 1. H+, detritylation

*

2. Cycles 2 n 3. End-deprotection

*

R P + SP = solid support, e.g. LCA CPG Scheme 1 The principle of the phosphoramidite method of PS-DNA synthesis

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1429

unnatural oligomers (locked nucleic acids, LNA; glycol nucleic acids, GNA, etc.) can also be synthesized quite efficiently. It was found that the PS-analogs, although isoelectronic with natural oligonucleotides, have significantly higher nuclease resistance. Phosphorothioate oligonucleotides have been found to stimulate cellular uptake of naked siRNAs into mammalian cells (Detzer et al. 2008) and to enter mammalian cells much more efficiently than DNA, even in the absence of transfection agents (Crooke et al. 2017). An explanation based on the dynamic covalent exchange of phosphorothioates with cellular thiols and disulfides has recently been published (Laurent et al. 2021).

Polydiastereomerism of PS-Oligomers A PS-dinucleotide has a stereogenic phosphorus atom (is P-chiral) and exists as an RP- or SP-diastereomer. A longer DNA or RNA oligomer modified in this way, for example, consisting of ten nucleoside units and thus containing nine phosphorothioate internucleotide linkages, exists as a mixture of 29 ¼ 512 diastereomers, with the proportion of any single diastereomer being less than 0.2% (1/512). This polydiastereomerism poses many problems because one is dealing with hundreds or thousands of similar but formally different compounds, each of which may interact with a particular biological target in a slightly different manner. The presence of a sulfur atom affects the properties of PS-nucleotides compared with natural nucleotides, mainly because of the more demanding steric requirements of a sulfur atom (larger atomic radius, 0.88 vs. 0.48 Å) and the altered negative charge distribution in the phosphorothioate anion (Frey and Sammons 1985). Compared with DNA and RNA, the overall geometry of PS-DNA and PS-RNA oligomers differs only slightly or moderately, although important changes in hydration patterns and interactions with metal ions are observed (Eckstein 2000). The chemical synthesis of PS-oligo with stereodefined internucleotide bonds is a very challenging task. In the commonly used phosphoramidite/sulfurization method, the protected nucleoside 30 -O-(O-alkyl-N,N-dialkylphosphoramidites) are P-chiral species (1, Scheme 1), but in routine syntheses their diastereomeric mixtures are used. Early attempts to use the chromatographically separated pure P-diastereoisomers (RP- or SP-isomers) of 50 -O-DMT-deoxycytidine-30 -O-(O-methyl-N,Ndiisopropylphosphoramidite) have shown that the large excess of 1H-tetrazole catalyst used for effective coupling leads to an RP þ SP mixture of both P-epimers of dicytidine 30 ,50 -phosphorothioates. Reports of stereoselective syntheses of short PS-oligo, published by groups led by Hayakawa, Sekine, Beaucage, and Agrawal, have been covered in an earlier review (Guga and Koziołkiewicz 2011). PS-oligo as short as pentamers prepared by nonstereoselective or partially stereoselective methods can sometimes be separated into P-diastereomers by chromatographic techniques, but this approach is not general.

1430

R. Pawłowska and P. Guga

Synthesis of P-Stereodefined Phosphorothioate Oligonucleotides A first method for the stereocontrolled chemical synthesis of PS-oligo, elaborated in the laboratory of Stec, is based on chemistry using not the commonly known phosphoramidite monomers 1 (Scheme 1) but P-diastereomerically pure nucleoside monomers possessing the 2-thio-1,3,2-oxathiaphospholane moiety in the 30 -O position (3, Scheme 2) (Stec et al. 1991). Currently, monomers 3 bearing a pentamethylene ring ((R-)2 ¼ (CH2)5-) in spiro arrangement at position 4 of the oxathiaphospholane moiety are mainly used (Stec et al. 1998). In the presence of DBU, the monomers 3 react with the 50 -OH group of a nucleoside or a growing oligonucleotide (attached at the 30 -end to a DBU-resistant, sarcosine containing solid support, LCA CPG SAR) to form a product 4 with an internucleotidephosphorothioate diester bond, as shown in Scheme 2. The process is fully stereoselective and occurs with retention of the configuration at the phosphorus atom. The chemical yield of the condensation process is not as good as that of the phosphoramidite or H-phosphonate methods, but the repetitive yield of 92–94%

DBU

3, R = H or CH3 or (R-)2= -(CH2)5-

-

n 1. H+, detritylation 2. Cycles 2 n 3. End-deprotection

4 B'= AdeBz, Cyt Bz, GuaiBu, (DPC), Thy B = Ade, Cyt, Gua, Thy

= solid support, LCA CPG SAR

Scheme 2 The principle of the oxathiaphospholane (stereocontrolled) method of PS-DNA synthesis

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1431

allows the synthesis of medium size oligomers (up to about 15-mers). Longer oligonucleotides were usually obtained in rather low yield. The oxathiaphospholane monomers of a DNA series are not commercially available, but methods for their preparation as well as detailed protocols for the synthesis of oligonucleotides have been published (Guga and Stec 2003). This methodology has been extended to obtain the analogous monomers of LNA (Jastrzębska et al. 2015; Jastrzębska et al. 2020), RNA, and (20 -OMe)-RNA (Jastrzębska et al. 2022) series. The corresponding P-diastereomerically pure monomers of GNA (Tomaszewska-Antczak et al. 2018) and 30 -deoxy-30 -amino-DNA (Radzikowska et al. 2020) series were also synthesized, but undesirable side reactions and unfavorable conformational factors made the syntheses of oligomers very inefficient. The oxathiaphospholane method suffers from a serious limitation in that it cannot be readily combined with the phosphoramidite method (to obtain chimeric PS/POoligonucleotides), since the routine oxidation step (iodine/water/pyridine), when applied to the oligomer containing diester phosphorothioate units, causes its fast and quantitative desulfurization. To some extent, this problem could be solved by synthesizing stereodefined dinucleoside phosphorothioates 5 (Scheme 3) (from the corresponding diastereomerically pure oxathiaphospholane monomers and an appropriately protected nucleoside) and alkylating the anionic sulfur atom with 2-nitrobenzyl bromide (Nawrot et al. 2005). The resulting S-alkylated intermediate

_

5

Br-CH2-C6H4-(2-NO2)

6

7

1) elongation 2) PhS1H-tetrazole

3) final deprotection

Scheme 3 Synthesis of P-stereodefined chimeric PO/PS-oligonucleotides using transient protection of the anionic sulfur atom in 5 with 2-nitrobenzyl bromide

1432

R. Pawłowska and P. Guga

6 was then converted to the corresponding 30 -O-phosphoramidite 7, which could be used in the phosphoramidite method of DNA synthesis. The nitrobenzyl protecting group could be efficiently removed in a stereospecific manner using thiophenolate anion before the oligonucleotide was cleaved from the solid support using aqueous ammonia. Later studies (Radzikowska et al. 2020) showed that the destructive PS ! PO exchange in PS-oligo is avoided when the PIII!PV conversion is performed using t-Bu-OOSiMe3, but this method needs to be optimized for the conditions of automated solid-phase synthesis. Since the oxathiaphospholane approach furnishes stereodefined PS-oligo with relatively moderate yield, some researchers continued work on modification of the phosphoramidite approach. Wada and coworkers investigated a method to synthesize stereoregular PS-DNA oligomers using nucleoside 30 -O-(1,3,2-oxazaphospholidine) monomers 8 (synthesis is shown in Scheme 4), which were stereoselectively synthesized from enantiopure 1,2-amino alcohols (Oka et al. 2002). However, using the best monomer 8 in which R1 ¼ Me, R2, R3 ¼ H, and R4 ¼ Ph, a partial loss of diastereopurity (up to 6% in total) during the syntheses of tetramers was observed. For that reason, Wada’s group examined several ring substituted 30 -O-(1,3,2-oxazaphospholidine) monocyclic and bicyclic monomers to find those configurationally stable and not undergoing any measurable epimerization during the condensation on a solid support. As a result, using the bicyclic, phenyl substituted oxazaphospholidine monomer 8 (Scheme 4) in which (R1 þ R2) ¼ (CH2)3-, R3 ¼ Ph, and R4 ¼ H, the formation of both RP- and SP-phosphorothioate internucleotide linkages proceeded without any epimerization (diastereoselectivity >99:1) (Oka et al. 2008). Several PS-oligo were synthesized (10–12 mers long) and 95–99% coupling yields were claimed (based on the DMT+ cation assay), but isolated yields (after RP-HPLC purification) were between 12 and 34%. Similar approach has been developed for synthesis of stereodefined PS-RNA decamers (Oka et al. 2009). For model dinucleoside phosphorothioates, the stereoselectivities exceeded 99:1.

NEt3

8 Scheme 4 Synthesis of ring substituted 30 -O-(1,3,2-oxazaphospholidine) monomers

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1433

Stereodefined Phosphorothioate Nucleotides Natural nucleotides, such as nucleoside 50 -O-monophosphates (NMP), diphosphates (NDP), and triphosphates (NTP), as well as dinucleoside polyphosphates (e.g., 50 ,50 -NPnN0 , n ¼ 2–7) or mRNA caps (7-methyl-guanosine (m7G) attached to the first transcribed nucleotide of an mRNA chain via a 50 -50 triphosphate bridge) play important roles in a living cell by serving as energy sources or as regulatory factors in many metabolic processes. While nucleoside 50 -O-phosphorothioates (NMPS) and symmetrical 50 ,50 -NPSN are prochiral, the phosphorothioate analogs of NTP and NDP, i.e., NTPαS and NDPαS, which have a nonbridging sulfur atom at the α-phosphate group, are P-chiral species that may be useful for studying the mechanisms of action of the original, unmodified compounds. For many projects, the nucleoside component N is not limited to the canonical natural nucleosides, and the phosphorothioate moiety can also be selected from a library of compounds. Usually P-diastereomers of such compounds can be separated chromatographically. Stereodefined synthesis of three unnatural analogs was attempted using the oxathiaphospholane approach, and P-diastereomers of 30 -O-camphanoylthymidine-50 -O-(2-thio-4,4-pentamethylene-1,3,2-oxathiaphospholane) were reacted with three anions of phosphorus-containing acids. Thymidine-50 -O-(α-thiodiphosphate) (TDPαS), thymidine-50 -O-(β,γ-methylene-α-thiotriphosphate), and thymidine-50 -O(benzylphosphono-α-thiophosphate) were obtained in a highly stereoselective manner (Tomaszewska et al. 2010). The 31P NMR spectra showed that the ring opening of the oxathiaphospholane with a phosphate anion yielding TDPαS proceeded with 90% stereoselectivity, and two other compounds were formed with slightly higher stereoselectivity. This lack of complete stereoselectivity was rather unexpected, although previous experiments showed that reaction of 50 -O-DMTthymidine-30 -O-(2-thio-4,4-pentamethylene-1,3,2-oxathiaphospholane) with a fluoride anion formed the corresponding phosphorus fluoridate derivative with epimerization at the P atom.

Interactions of P-Stereodefined PS-Oligomers with DNA, RNA, and Protein Molecules All modifications leading to a P-chiral internucleotide bond (with the exception of an isotopomeric substitution) usually change the size, polarity, charge density, hydrophobicity, and basicity of the resulting oligonucleotides. Thus, different modifications can lead to conformational changes that may be important for interaction with specific enzymes or receptors. For the phosphorothioate modification, the crystal structure of a chimeric hexamer d(GPSCGPSCGPSC) with phosphorothioate bonds of the RP configuration showed its B conformation (Saenger et al. 1986). Further studies in solution (Further Reading: González C et al. (1995) doi: 10.1021/ bi00015a008; Clark CL et al. (1997) doi:10.1093/nar/25.20.4098; Bachelin M et al. (1998) doi:10.1038/nsb0498-271) and molecular modeling (Further Reading: Hartmann B et al. (1999) doi:10.1093/nar/27.16.3342) showed little effect of

1434

R. Pawłowska and P. Guga

substitution on the overall conformation of the duplexes. However, a single phosphorothioate substitution in the GAAA loop region of RNA hairpins ([RP]GAPSAA) significantly stabilized the hairpin (ΔTm ¼ 9  C, ΔH ¼ 18.5 kcal/mol), whereas neither [SP]-GAPSAA nor substitutions elsewhere in the loop affected the stability of the structure (Horton et al. 2000). Studies of NaCl-promoted B ! Z conversion of specifically designed stereodefined phosphorothioate octamers showed that [SPRP]-d(CPSGPS)3CPSG converts to the Z form with a transition point at 2 M NaCl, 1 M less than unmodified d(CG)4. Similarly, [RPSP]-d(GPSCPS)3GPSC underwent approximately 50% conversion at 5 M NaCl, whereas unmodified d(GC)4 did not undergo the B ! Z transition at all (Boczkowska et al. 2000). This is a rare example of modified oligomers exhibiting “enhanced” properties compared to their natural counterparts. These effects reflect a unique functionality of phosphorothioate units, in which sulfur atoms in the correct spatial orientation confer unexpected properties to phosphorothioate oligomers, most likely due to a specifically altered distribution of negative charges that allows the formation of stronger or new hydrogen bonds or water bridges.

Formation of the Homoduplexes DNA/DNA and RNA/RNA and Heteroduplexes DNA/RNA Theoretical considerations suggested that duplexes formed by [RP-PS]-oligo with complementary DNA strands should be thermally less stable than those formed by [SP-PS]-counterparts. This prediction was based on two points: first, the higher steric requirements of the sulfur atom (compared to the oxygen atom), which directs in the B-type double helix “inward,” and second, the negative charges predominantly present on the sulfur atoms (Frey and Sammons 1985), which were expected to repel more strongly the negative charges present in the complementary strands. Similar conclusions were derived from molecular modeling (Jaroszewski et al. 1992). However, contrary to these predictions, the measured melting temperatures (Tm) showed that the relative thermal stability of the [RP-PS]-DNA/DNA and [SPPS]-DNA/DNA duplexes depended on their sequential composition and not on the absolute configuration of the PS-oligo (Boczkowska et al. 2002). For example, melting temperatures of 14  C and 18  C were found for the duplexes [SP-PS]and [RP-PS]-T12/dA12, respectively, whereas the duplexes formed by [SP-PS]- and [RP-PS]-d(CCTATAATCC) with a complementary strand have the same Tm values. Unlike PS-DNA/DNA duplexes, the stereodifferentiation of the stability of PS-DNA/RNA heteroduplexes does not depend on their sequence, and in all pairs of diastereomers studied, the RP isomers form more stable structures that are presumably stabilized by a particular hydration of the phosphate groups in a duplex present in the A conformation. Long-lived hydration patterns are known to be present in the deep major groove of A-RNA (Auffinger and Westhof 2001; Sundaralingam and Pan 2002) or A-DNA (Egli et al. 1998) and involve sequenceindependent water bridges between pro-RP oxygen atoms of adjacent phosphate

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1435

groups. There are no such bridges in B-DNA because the corresponding distances are too large (Egli et al. 1998). The DNA/RNA hybrids tend to adopt the A conformation so that the adjacent phosphate groups of the DNA strand can be close enough to form similar bridges. In addition, the sulfur atom in a phosphorothioate group, which carries most of the negative charge of the internucleotide bond, is a strong acceptor for charge-assisted hydrogen bonding (Boczkowska et al. 2000). Consequently, a nonbridging oxygen in the same phosphorothioate group is a much weaker acceptor because its charge is much lower.

Formation of Higher-Order Structures Typically, PS-DNA/DNA and PS-DNA/RNA complexes exhibit lower thermal stability than unmodified congeners. However, this generally accepted opinion was found not to hold for certain homopurine [RP-PS]-DNA oligomers interacting with complementary (in the Watson-Crick sense) RNA templates. It was found that when homopurine sequences (with at least six nucleotides) are palindromic, thermally stable parallel triplexes RNA/PS-DNA/RNA are formed (Guga et al. 2007a). For [PO]- and [SP-PS]-DNA oligomers mixed with the corresponding RNA templates, only duplexes RNA/DNA are observed. Since only [RP-PS]-DNA oligomers are able to efficiently stabilize the triplex, it was suggested that the third strand associates not only because of Hoogsteen hydrogen bonding but also because of a water bridge between the sulfur atom of an RP phosphorothioate (which carries most of the negative charge in a phosphorothioate anion) and the oxygen atom O2 in the pyrimidine ring. The strength of this effect is surprising, as the melting profiles showed only a single transition, without the commonly observed “premelting” associated with dissociation of the third strand at a temperature well below the Tm of the Watson-Crick duplex. For example, the triplex 50 -CUCUUUUUUCUC-30 [RP-PS]-50 -d(GAGAAAAAAGAG)-30 30 -CUCUUUUUUCUC-50 melts at 54  C (a buffer pH 7.4), while the duplex 50 -d(GAGAAAAAAGAG)-30 30 -CUCUUUUUUCUC-50 melts at 26  C. Interestingly, the triplexes formed by phosphorothioate DNA dodecamers containing 4–6 dG residues are thermally stable at pH 7.4 (Tm’s ¼ 50–80  C), but their stability is significantly increased at pH 5.3. If the palindromic lane is too short and the Watson-Crick sense complementary strand cannot serve as a complementary Hoogsteen strand (e.g., a pair of [RP-PS]-(50 -d (GAGAGGAAAGAG)-30 ) and 30 -CUCUCCUUUCUC-50 ), the thermally stable triplex is formed only when the missing RNA strand complementary in the Hoogsteen sense (i.e., 50 -CUCUCCUUUCUC-30 ) is added to the mixture. Further studies revealed that nonpalindromic homopurine [RP-PS]-oligo form parallel duplexes with an RNA or 20 -OMe RNA strand with Hoogsteen complementarity (Guga et al. 2007b). The parallel orientation of the strands was confirmed by fluorescence

1436

R. Pawłowska and P. Guga

quenching experiments. Remarkably, these duplexes are thermally more stable than the typical antiparallel Watson-Crick duplexes formed by unmodified homopurine DNA molecules of the same sequence with corresponding RNA templates. The mechanism for this strong stabilization is undoubtedly the same as that proposed for the parallel triplexes, namely, the water bridges between the sulfur and oxygen atoms at position 2 in the pyrimidines.

Stereodefined PS-Oligomers As Tools in Mechanistic Studies Chemically synthesized, stereodefined PS-oligo (both RP- and SP-isomers) were used for mapping the functional phosphate groups in the catalytic core of deoxyribozyme 10–23, which has catalytic activity to cleave RNA with sequencespecific manner (Nawrot et al. 2007). Experiments with stereorandom PS-deoxyribozymes allowed the identification of nonbridging phosphate oxygens (pro-RP or pro-SP) at positions P2, P4, and P9–13 of the core as those potentially involved in coordination with a divalent metal ion. In contrast, phosphorothioates at positions P3, P6, P7, and P14–16 showed no functional significance for deoxyribozyme-mediated catalysis. Interestingly, phosphorothioate modifications at positions P1 or P8 increased the catalytic efficiency of the enzyme. Thio substitution at position P5 had the greatest negative effect on the catalytic rate in the presence of Mg2+, which was reversed in the presence of Mn2+. Further experiments using thio-deoxyribozymes with stereodefined P-chirality suggested direct involvement of the two oxygen atoms of the P5 phosphate and the pro-RP oxygen at P9 in metal ion coordination. Systematic site-directed PS substitutions led to results that allowed to propose a model for the metal-binding site in the catalytic core of deoxyribozyme 10–23.

Interactions with Proteins Phosphate groups in nucleic acids play an important role in their interaction with proteins. In a typical B-helix, one of the nonbridging oxygen moieties points outward into solution and the other points toward the major groove. Therefore, these two types of oxygen are readily accessible to proteins and can both form hydrogen bonds with the protein and participate in electrostatic interactions with positively charged amino acid side chains. Uhlenbeck and coworkers described the effects of individual phosphorothioate substitutions on the binding of RNA to the MS2 coat protein (Dertinger et al. 2000). Twelve analogs of a 15-nucleotide hairpin GCGAGGAUUACCCGC (which has all the sequence elements required for recognition by the MS2 coat protein) modified at successive positions with a phosphorothioate function were chemically synthesized. In each case, the phosphorothioate stereoisomers were chromatographically separated into pure RP and SP forms. The affinities of the oligomers for the MS2 coat protein were then determined. Comparison of these biochemical data with the crystal

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1437

structure of the protein-hairpin complex showed that introduction of a phosphorothioate group affected binding only at sites where protein-phosphate contact was observed in the crystal structure (3–5-fold enhancement or attenuation of the interaction). However, an RNA molecule with a sulfur atom in place of the pro-RP oxygen atom involved in hydrogen bonding with an asparagine moiety (Asn55) was found to bind 20-fold more tightly to the MS2 coat protein than the unmodified RNA. Undoubtedly, this excellent stabilization is due to the strong and specific interaction of the positively charged asparagine with the negatively charged sulfur atom of the phosphorothioate moiety. One might assume that the interaction of proteins with internucleotidephosphorothioate bonds with given absolute configuration at the phosphorus atom should be stereodependent. However, PS-T19 and PS-d(TC)9T prepared as mix-, RP-, and SP-diastereomers were found to bind to basic fibroblast growth factor, recombinant soluble CD4, laminin, and fibronectin independently of the sense of P chirality (Benimetskaya et al. 1995). Most likely, neither the nucleotide sequence nor the sense of P-chirality, but the global shape of the oligonucleotide molecule and its location at the protein-oligonucleotide interface are the most important factors for the interactions between oligonucleotides and these proteins. On the other hand, experiments on the inhibition of vascular smooth muscle cell proliferation by a PS-oligo probe showed that the RP-diastereomer is more active than a random mixture of diastereoisomers, with the SP-form having the least activity (Fearon et al. 1997). In the field of cardiovascular disease, the inhibitory activity of the SPdiastereomer of the phosphorothioate hexadecanucleotide toward the synthesis of human plasminogen activator inhibitor (PAI-1) was highest compared with the isosequential random mixture of diastereoisomers and the RP-diastereomer (Stec et al. 1997). Inhibition of AMV reverse transcriptase by the SP-phosphorothioate hexadecamer complementary to the viral sequence was twice as effective as its RPcounterpart (Krakowiak and Koziołkiewicz 1998).

Metabolism of PS-Oligomers Intracellular PS-oligo can either be of natural origin (not common) or acquired as externally supplied synthetic compounds. Nucleases of type I and II, i.e., the enzymes that can hydrolyze PS-oligo to nucleoside 50 - and 30 -O-phosphorothioates (NMPS), respectively, have been identified. Most nucleases (snake venom phosphodiesterase, nucleotide phosphodiesterase/pyrophosphatase, and restriction endonucleases) are RP-selective, i.e., they degrade internucleotide phosphorothioate bonds containing the phosphorus atom of the absolute RP configuration rather than those of the SP configuration. There are also some nucleases that degrade only phosphorothioate bonds of the SP configuration (e.g., nuclease P1). Regarding the metabolism of NMPS in vivo, it has been suggested that extracellular NMPS can be converted to the corresponding nucleoside by ecto-50 -nucleotidase (Koziołkiewicz et al. 2001). Krakowiak et al. found that Hint-1 (histidine triad nucleotide-binding protein 1, a phosphoramidase belonging to the histidine triad superfamily) catalyzes the

1438

R. Pawłowska and P. Guga

conversion of adenosine 50 -O-phosphorothioate (AMPS) to 50 -O-phosphate (AMP) (Ozga et al. 2010). In addition, not only AMPS, but also other ribonucleoside and 20 -deoxyribonucleoside phosphorothioates are desulfurized by Hint-1 at the following relative rates: GMPS > AMPS > dGMPS > CMPS > UMPS > dAMPS >>dCMPS > TMPS. Hydrogen sulfide was released during the reaction, which is thought to be a gaseous mediator in mammalian cells along with nitric oxide (NO) and carbon monoxide (CO). This process could have remarkable biological consequences, and regulation of H2S production could be of therapeutic value.

Biological Activity of Synthetic Nucleic Acids Containing Phosphorothioate Backbone As mentioned in the introduction, phosphorothioate linkages in oligonucleotides provide nuclease resistance and facilitate cellular uptake and bioavailability. Therefore, numerous therapeutic approaches based on phosphorothioate-modified nucleic acids, both DNA and RNA, have been developed. Some of the observed phenomena are truly amazing. For example, Jemielity and coworkers showed that phosphorothioate modification of the poly(A) tail of mRNA prevented deadenylation without affecting protein expression, allowing the rate-limiting step of mRNA decay to be tuned (Strzelecka et al. 2020). The phosphorothioate linkages introduced into mRNA have been shown to accelerate translation initiation and result in more efficient protein synthesis (Kawaguchi et al. 2020). PS-modified antisense or siRNA oligonucleotides have been intensively studied for many years in the context of their biodistribution (Braasch et al. 2004), cellular uptake (Detzer and Sczakiel 2009), improvement of biophysical properties, and biological activity (Wan et al. 2014), as well as potency and safety (Østergaard et al. 2020) and pharmacodynamic and pharmacokinetic properties (Berk et al. 2021). However, it is well known that not only the presence of phosphorothioate linkages matters but also their stereochemistry is important in the context of the pharmacological properties of potential nucleic acid-based drugs, in particular their tissue penetration, in vivo half-life, and efficacy. Single-stranded [SP-PS]-oligomers have been shown to have higher stability (in human plasma) and lipophilicity compared to their RP-counterparts (Koziołkiewicz et al. 1997; Iwamoto et al. 2017). On the other hand, [RP-PS]-DNA/RNA complexes are more susceptible to RNase H-dependent degradation [Koziolkiewicz et al. 1995]. The chirality of the PS linkage also has implications for immune activation. PS-Oligodeoxyribonucleotides with a d(CpG) motif bearing phosphorothioate linkages with RP configuration at the phosphorus atoms cause stronger immune stimulation compared to their SP-counterpart (Krieg et al. 2003). In addition to these general features observed for P-stereoregular PS-oligo, some advantages of maintaining certain patterns of stereochemistry in oligonucleotide molecules have been noted. For example, control of the chirality of PS-DNA segments of gapmer antisense oligonucleotides allows modulation of RNase H1 activity and is important for the safety and efficacy of such gapmers (Wan et al. 2014;

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1439

Østergaard et al. 2020). It has also been shown that the efficiency of RNA cleavage by RNase H1 can be altered by using a specific stereochemical configuration pattern on the PS DNA strand. The presence of the 30 -SPSPRP-50 configuration triplet (using the antisense strand) seems to be preferred for better antisense activity. The mentioned triplet should be located at the position of the DNA-RNA duplex bound directly by RNAse H1 (two nucleotides upstream from the RNA cleavage site). This specific stereochemical pattern of the DNA strand, the so-called SSR code, has been shown to increase the efficiency of RNA cleavage in vitro and the durability of the antisense effect in vivo (Iwamoto et al. 2017). Some in vitro and in vivo results suggest that the stereochemistry of phosphorothioate bonds also affects the potency of siRNA oligomers. Interestingly, contrary to single-stranded oligomers, duplexes with a higher proportion of linkages with RP configuration exhibit higher serum nuclease resistance and are more effective inhibitors in cells than their SP-counterparts (Jahns et al. 2015). This observation is probably related to the greater thermodynamic stability of the duplexes containing phosphorothioate modification in RP configuration, which was reported previously (Boczkowska et al. 2002). However, it is more advantageous to introduce modifications at selected positions of siRNA than to use per-modified molecules. Fully phosphorothioated siRNAs have been shown to cause off-target effects and lower inhibitory activity compared to unmodified siRNAs. It has been reported that the introduction of P-stereodefined phosphorothioate bonds at the 50 and/or 30 termini of siRNA molecules affects their pharmacological properties in vivo. There are two effects caused by PS units with opposite absolute configuration of P atoms. The presence of the phosphorothioate modification with SP configuration at the 30 -end of the antisense siRNA strand (in vitro and in vivo) leads to increased Argonaute 2 (Ago2) loading. In addition, the introduction of stereodefined phosphorothioate linkages can lead to increased stability to hydrolytic degradation by certain enzymes. While the PS-oligo containing a phosphorothioate bond with SP configuration are known to be resistant to 30 -exonucleases, their RPisomers in combination with a 20 -F modification have been shown to be resistant to 50 -exonucleases. Therefore, the best pharmacokinetic and pharmacodynamic profiles were obtained when RP-diastereomers at the 50 -end and SP-diastereomers at the 30 -end of the antisense siRNA strand were used, probably due to a combination of both positive effects – better protection against exonuclease degradation and increase of Ago2 loading in vivo (Jahns et al. 2022). Despite the above advantages of introducing PS-oligo into cells, some reports have shown their cytotoxicity, induction of stress granule formation, and other negative biological effects. These nonspecific properties are attributed to the intracellular interactions between the oligonucleotides and proteins, which are controlled by phosphorothioate components. This binding is much more frequent compared to natural oligomers. To date, many proteins with different functions and localizations, including both, membranous and intracellular proteins, have been found to bind PS-modified nucleic acids (Further Reading: for review see: Crooke ST et al. (2020) doi: 10.1093/nar/gkaa299).

1440

R. Pawłowska and P. Guga

Binding to protein components affects the cellular uptake of PS-oligo, their subcellular distribution (Bailey et al. 2017), and in this way affects the biological activity of PS-oligo in cells and tissues. It should be noted that proteins with both, stimulatory and inhibitory, effects on the active potential of PS-oligo have been described, but the role of individual proteins in these processes requires further investigation. The interactions of PS-oligo with proteins can be phosphorothioateand sequence-dependent (Liu et al. 2018; Lutz et al. 2020), or occur in a nonsequence-specific manner. PS-dependent restriction endonucleases which accurately recognize and cleave PS-modified DNA might differ in the requirement for PS enrichment rate and the divalent cation dependencies (Lutz et al. 2020). Specific recognition of PS-oligo might be associated with the presence of a sulfur-binding domain (SBD) (Liu et al. 2018). The nonspecific interactions of PS-oligo with certain intracellular proteins may lead to the formation of nuclear inclusions associated with cytotoxicity (Flynn et al. 2022). In addition, 20 -O-methyl PS-oligo have been shown to interfere with rRNA processing, disrupt protein distribution from subnuclear bodies, and impair transcription without RNase H1 involvement. It is thought that these disruptions may occur by enhancing nonspecific interactions with proteins, but the exact mechanism and role of the 20 -OMe modification in these processes are still unclear. Initial in vivo studies on the biodistribution of PO/PO and PO/PS siRNA duplexes in mice yielded quite similar results. After the first 4 hours, some modest differences in distribution were observed in spleen, heart, and lung, and after 24 hours, some weak differences in distribution were observed in liver and kidney. However, the differences were too small to claim that the introduction of PS linkages into siRNA could be important for its in vivo distribution (Braasch et al. 2004). Recent studies on siRNAs with highly hydrophobic PS-modified strands have shown that phosphorothioate content may have an impact on siRNA distribution and potency and pointed to the liver and kidney as major sites of PS-oligo accumulation (Miller et al. 2018). However, the efficacy of siRNA has been shown to depend not only on tissue accumulation, but also on endosomal escape and subsequent intracellular distribution (Biscans et al. 2020). Summarizing the data obtained for PS-nucleic acids in the context of their therapeutic applications, it should be noted that although many pharmacological advantages can be obtained by the thoughtful introduction of phosphorothioate groups at specific positions in the nucleic acid scaffold, their effects on cellular uptake, tissue distribution and transport, intracellular trafficking, efficacy, and toxicity must be assessed on a case-by-case basis.

Physiological Phosphorothioate Modification of Nucleic Acids Discovery of phosphorothioate modification of nucleic acids in living organisms was a breakthrough event (Wang et al. 2007). Intensive research on the mechanism of sulfur incorporation into DNA strands in living systems provided evidence that this is a postreplicative modification carried out by products of the dnd genes. The name

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1441

dnd refers to the characteristic DNA degradation profile observed during electrophoresis and is known as the Dnd phenotype. This cluster consists of five elements: dndA, dndB, dndC, dndD, and dndE (Zhou et al. 2005; Wang et al. 2007). The dndA gene encodes cysteine desulfurase (DndA) (You et al. 2007), which is very similar to the L-cysteine desulfurase (IscS) from Escherichia coli. Both proteins are capable of forming an iron-sulfur cluster protein and may play a similar role in introducing sulfur modifications (An et al. 2012). DndA transfers a sulfur atom from L-cysteine to the Fe-S cluster of DndC. The DndC protein, which has ATP pyrophosphatase activity, can bind directly to IscS or DndA and DndD. Thus, DndC is an essential element for the uptake of sulfur from IscS/DndA and its transfer to the next element of the complex – the DndD protein. No direct interaction between IscS and DndD was observed. DndD is a protein responsible for the incorporation of sulfur atoms into the DNA strand. It exhibits ATPase activity and has a strong affinity for the DndE protein (Xiong et al. 2015). Physiological phosphorothioation of DNA in bacteria is sequence- and stereospecific (Liang et al. 2007). Initially, the 50 -d(GGGCCGCCG)-30 sequence was identified as the phosphorothioate insertion site in S. lividans, with the underlined central 4 bp long core (50 -d(GGCC)-30 ) highly conserved. Since then, intensive research has been conducted to identify new sites of physiological insertion of PS function, its genomic abundances, and distributions (Wang et al. 2011). To date, several other modified sequences have been identified, both double- and singlestranded: 50 -d(GPSAAC)-30 /50 -d(GPSTTC)-30 , 50 -d(GPSGCC)-30 , 50 -d(CPSCA)-30 , 50 -d(CPSC)-30 (Xiong et al. 2020). Phosphorothioate modifications have been identified in several prokaryotic genomes (Wang et al. 2011), but their distribution is variable. The dnd genes responsible for this modification are present in archaea (Xiong et al. 2019) and bacteria (Tong et al. 2018). The evolutionary scenario of the Dnd system assumes that Cyanobacteria (probably Nostocales) are the source of the phosphorothioate modification. This sulfur-based metabolism is thought to be ancient and to have evolved in response to rapid oxidation on Earth after the Great Oxygenation Event (GOE) and was originally used by Cyanobacteria as an antioxidant system to defend against reactive oxygen species (ROS) (Jian et al. 2021). Recently, an alternative system has been discovered in which the PS group is incorporated into the 50 -d(CC)-30 motifs of single-stranded bacterial DNA molecules in Vibrio cyclitrophicus, Escherichia coli, and Streptomyces yokosukanensis (Xiong et al. 2020). This phenomenon, referred to as Ssp, is based on an SspABCD-SspE system. The first element – an SspA protein – has the activity of a cysteine desulfurase, similar to DndA in the dnd system. An SspC protein has ATPase activity and provides energy, similar to DndD, while SspD has ATP pyrophosphatase activity. The similarities between the action of DndA and SspA and of DndC and SspD suggest a resemblance in sulfur mobilization stage, but other steps such as DNA target recognition and sulfur incorporation differ. The critical step in DNA phosphorothioation by the Ssp system appears to be the nicking of the DNA strand by the SspB protein. It has been shown that deficiency at this stage abrogates the

1442

R. Pawłowska and P. Guga

incorporation of sulfur into the DNA backbone via the Ssp system (Xiong et al. 2020). Isotopic labeling and mass spectrometry analysis have shown that the exchange of sulfur atoms in PS-oligo is a natural phenomenon that occurs during bacterial growth. The in vivo dynamics of this process were quantified by isotopic labeling of the culture medium followed by mass spectrometry analysis of bacterial DNA fragments. Isotopes of sulfur [34S] and nitrogen [15N] have been delivered to the original bacterial growth medium (with Na2[34S]O4 and [15N]H4Cl, respectively), so that the newly synthesized DNA had nucleobases labeled with [15N] and PS-linkages with [34S]. Then the medium was replaced with normal medium containing natural [14N] and [32S] isotopes, and bacterial growth continued. The LC-MS/MS analysis of the isolated PS-dinucleotides revealed the presence of [14N]- and [32S]-labeled dinucleotides generated during the second step of DNA replication, as well as [15N]and [32S]-labeled dinucleotides, indicating the replacement of sulfur atoms in the original double-labeled [15N, 34S]-DNA strands. In addition, [14N]- and [34S]labeled dinucleotides were also detected, likely resulting from reuse of [34S] from nutrient pools or other PS modification sites. During bacterial growth under normal conditions, sulfur exchange occurs at a rate of about 2% per hour, whereas it increased to 3.8%/h for Escherichia coli and 10%/h for Salmonella enterica after hypochlorous acid (HOCl) treatment. Phosphorothioate modification is thus a highly dynamic process that can be modulated under oxidative stress conditions (Kellner et al. 2017). Physiological DNA phosphorothioate modification is widespread in several bacterial genomes, and its functions are related to viral resistance, antioxidant properties (Jian et al. 2021), and protection against stress conditions (Yang et al. 2017). It appears to be an epigenetic system for regulating gene expression that can control a variety of bacterial features (Tong et al. 2018); however, the exact mechanism of regulating transcription of non-dnd genes in vivo is still poorly understood. It is proposed that epigenetic regulation of gene transcription occurs through a change in binding affinity between transcriptional regulators and their cognate operator DNA caused by conformational changes in thio-modified DNA (Jian et al. 2021). A negative correlation between the presence of dnd genes and the occurrence of prophages has been observed in numerous bacteria. In addition, it has been shown that PS modification can affect DNA replication and gene transcription of some phages, such as phage SW1. Activation of phages triggered by PS is likely caused by a change in the binding affinity of their repressor. Such activation has been experimentally confirmed for some filamentous phages. However, it is suggested that this mechanism may be widespread among taxonomically diverse bacteriophages. The phosphorothioate modification in bacterial DNA is thought to exert a protective function against the invasion of foreign DNA. Similar to DNA methylation, the PS modification functions as part of a bacterial restriction-modification (R-M) system that protects cells from phage infection (Xiong et al. 2020) This type of prokaryotic epigenetic regulation usually involves two elements: sequence-specific DNA modification and a complementary restriction endonuclease. In the most studied methylation-based R-M system, the restriction endonuclease eliminates the

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1443

unmodified invasive DNA, while the host DNA (marked by methylation) remains unaffected. Based on the nucleolytic stability of phosphorothioate oligonucleotides, a similar mechanism for PS modification has been hypothesized. Initial studies appear to support this hypothesis and suggest that site-specific phosphorothioation can protect its own DNA from host-specific restriction by foreign genetic material. The Dnd system organized in this way, responsible for the recognition and digestion of unmarked DNA, in which the DndABCDE component is used to insert sulfur into the DNA backbone and DndFGH functions as the restriction part, has been proposed (Xu et al. 2010; Xiong et al. 2020). Such a system, based on thio-modification of DNA instead of its methylation, may have been created by nature as an additional defense method against foreign DNA invasion. It is not clear whether this protective system can be shifted between genomes, as is the case with some other R-M gene complexes commonly found on plasmids and prophages. There are also some differences between the PS-based and traditional R-M systems. Studies on the PS-based restriction system in Salmonella revealed that although the PS-modifications protect host cells from unmodified plasmid DNA (Xu et al. 2010), the presence of restriction genes is not lethal in a mutant lacking the components for phosphorothioate modifications. On the other hand, overexpression of restriction genes leads to host cell death despite the PS-modified DNA (Cao et al. 2014). These results illustrate an atypical character of PS-dependent Dnd restriction systems. Further research is needed to elucidate more details of this phenomenon. Both the Dnd and Ssp systems are defense barriers with extensive antiphage activities and provide protection against phages in a PS-dependent manner. In addition to certain similarities with the Dnd system, evidences suggest that the SspABCD-SspE system prevents phage spread in ways other than direct DNA cutting or deprivation. SspE presumably acts in the early stages of infection and requires both NTPase and nicking endonuclease activity. SspE appears to impair phage replication by nicking phage DNA rather than inducing dsDNA degradation typical of other R-M systems. However, antiphage activity depends on sequencespecific phosphorothioation. Interestingly, in a given genome, not all sites containing the target sequences are modified. In Escherichia coli B7A, where PS modifications were detected at double-stranded 50 -GAAC-30 /50 -GTTC-30 segments, only 12% of these sequences were modified. In the single-stranded DNA of Vibrio cyclitrophicus FF75, the proportion of modified target sites does not exceed 14% (Xiong et al. 2020). Combined analysis of epigenome, transcriptome, and metabolome in bacteria has shown that PS modifications can also affect cellular redox state and stress resistance (Tong et al. 2018). In the context of the antioxidant properties of PS-DNA, the matter is not clear. It is believed that PS-DNA plays an antioxidant physiological role in the cells of host bacteria by protecting them from oxidative agents such as H2O2 (Xie et al. 2012). However, the properties of PS-DNA under oxidative conditions support both protection and damage to the genome because the modification is highly labile under oxidative stress. Although PS-DNA is thought to act as an antioxidant in bacteria (Xie et al. 2012), its oxidation has also been reported to lead to strand breaks and lethal genomic instability (Kellner et al. 2017). Several scenarios for the

1444

R. Pawłowska and P. Guga

response of PS-DNA to oxidative stress are possible. The P-S bond can be oxidized by H2O2 or peracetic acid, and when the sulfur atom is replaced by oxygen, restoration of the phosphodiester bond occurs. However, when the sulfur atom is removed, the corresponding H-phosphonate group is formed, which is susceptible to hydrolysis, leading to strand breakage (Xie et al. 2012). It has been suggested that the PS linkage may act as a reducing agent in vivo, using the sulfur atom for consumption and neutralization of H2O2 (Xie et al. 2012). The antioxidant properties of PS-DNA are also described as a direct effect of the binding of the DndCDE-FeS cluster (DndCDE with an iron-sulfur cluster) to PS-DNA, where the DndCDE-FeS cluster acts as a catalase (exerting H2O2 decomposition activity) and in this way protects the bacterial genome from H2O2-induced damage (Pu et al. 2019). Phosphorothioate DNA does not only affect the host bacteria. Based on the presence of PS-DNA units in pathogens and the gut microbiome, they appear to affect other organisms after ingestion or infection. A PS-DNA-rich diet of Caenorhabditis elegans resulted in higher resistance to various stressors, a decrease in ROS levels and an increase in animal activity. This type of diet reduced age-related changes and resulted in a significant increase in nematode life span. The expression of numerous genes was altered in C. elegans fed a diet containing PS-DNA compared to controls. Overexpression has been observed for numerous genes related to antioxidant functions, such as sod, gpx, and genes for heat shock proteins (hsp) as well as genes for the stress response, such as jnk-1. Interestingly, expression of certain glutathione S-transferases (GST) genes was also increased, suggesting that long-term PS-DNA-rich diet may have effects on sulfur metabolism. On the other hand, the expression of certain genes related to the promotion of aging, such as the glp-1 gene, was reduced. In general, analysis of transcriptomic changes showed the most significant differences in the expression of genes whose products are related to antioxidant properties, stress response, and aging. Upregulation was mainly observed in signaling pathways related to worm neuroactivity, calcium signaling, and purine metabolism, while genes related to DNA repair and DNA replication were downregulated. These results shed new light on the possible consequences of long-term feeding with PS-DNA containing bacteria (Huang et al. 2021). The bacteria containing PS-modified genes are widely distributed in various parts of the human body, such as the intestine, oral and urogenital systems, and skin (Sun et al. 2020). Six dinucleotides containing phosphorothioate modification d(CPSG, CPST, APSG, TPSG, GPSC, and APST) have been identified in DNA from human feces. A variety of related features suggest the importance of the PS modification in the bacterial and human worlds. Important aspects to be investigated are the possible uptake of PS-containing DNA fragments from dead bacteria by surrounding cells and the long-term consequences of consuming PS-DNA-rich bacteria. The origin of PS modification in eukaryotic cells and the possible effects of bacterial PS-oligonucleotides on humans are not yet clear, but a human protein SAMDH1 has recently been identified that could be a target for bacterial PS-DNA (Yu et al. 2021). SAMDH1 is an enzyme that causes depletion of dNTPs in cells by

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1445

catalyzing the hydrolysis of dNTPs to nucleosides and inorganic triphosphate, thereby preventing reverse transcription of viruses during infection. A process critical for the hydrolytic activity of SAMHD1 is the binding of GTP and dNTP at the two allosteric binding sites, leading to oligomerization of the protein to the catalytically active tetramer. It has been shown that SAMHD1 may bind PS-DNA oligonucleotides with the motif 50 -d(GPSN)-30 with RP chirality. Short PS oligonucleotides can bind to SAMHD1 protein with an affinity comparable to that of longer RNA and DNA oligonucleotides. Upon binding of DNA with the 50 -d(GPSN)-30 motif, enhancement of SAMHD1 antiretroviral activity was observed (Yu et al. 2021). The recognized PS sequence is identical to that produced by bacteria to defend against viral attack. These findings have revealed a possible mechanism of the immunomodulatory effect of PS nucleic acids and may shed new light on the potential impact of acquired bacterial PS-DNA on the human body.

References An X, Xiong W, Yang Y, Li F, Zhou X, Wang Z, Deng Z, Liang J (2012) A novel target of IscS in Escherichia coli: participating in DNA phosphorothioation. PLoS One 7:e51265. https://doi. org/10.1371/journal.pone.0051265 Auffinger P, Westhof E (2001) RNA solvation: a molecular dynamics simulation perspective. Biopolymers 56:266–274. https://doi.org/10.1002/1097-0282(2000)56:4%3C266::AIDBIP10027%3E3.0.CO;2-3 Bailey JK, Shen W, Liang XH, Crooke ST (2017) Nucleic acid binding proteins affect the subcellular distribution of phosphorothioate antisense oligonucleotides. Nucleic Acids Res 45: 10649–10671. https://doi.org/10.1093/nar/gkx709 Benimetskaya L, Tonkinson JL, Koziołkiewicz M, Karwowski B, Guga P, Zelser R, Stec WJ, Stein CA (1995) Binding of phosphorothioate oligonucleotides to basic fibroblast growth factor, recombinant soluble CD4, Laminin and fibronectin is P-chirality independent. Nucleic Acids Res 23:4239–4245 Berk C, Civenni G, Wang Y, Steuer C, Catapano CV, Hall J (2021) Pharmacodynamic and pharmacokinetic properties of full phosphorothioate small interfering RNAs for gene silencing in vivo. Nucleic Acid Ther 31:237–244. https://doi.org/10.1089/nat.2020.0852 Biscans A, Caiazzi J, Davis S, McHugh N, Sousa J, Khvorova A (2020) The chemical structure and phosphorothioate content of hydrophobically modified siRNAs impact extrahepatic distribution and efficacy. Nucleic Acids Res 48:7665–7680. https://doi.org/10.1093/nar/gkaa595 Boczkowska M, Guga P, Karwowski B, Maciaszek A (2000) Effect of P-chirality of internucleotide bonds on B-Z conversion of stereodefined selfcomplementary phosphorothioate oligonucleotides of [PS]-d(CG)4 and [PS]-d(GC)4 series. Biochemistry 39:11057–11064. https://doi.org/10. 1021/bi000638n Boczkowska M, Guga P, Stec WJ (2002) Stereodefined phosphorothioate analogues of DNA: relative thermodynamic stability of model PS-DNA/DNA and PS-DNA/RNA complexes. Biochemistry 41:12483–12487. https://doi.org/10.1021/bi026225z Braasch DA, Paroo Z, Constantinescu A, Ren G, Oz OK, Mason RP, Corey DR (2004) Biodistribution of phosphodiester and phosphorothioate siRNA. Bioorg Med Chem Lett 14: 1139–1143. https://doi.org/10.1016/j.bmcl.2003.12.074 Breaker RR (2000) Making catalytic DNAs. Science 290:2095–2096. https://doi.org/10.1126/ science.290.5499.2095 Cao B, Cheng Q, Gu C, Yao F, DeMott MS, Zheng X, Deng Z, Dedon PC, You D (2014) Pathological phenotypes and in vivo DNA cleavage by unrestrained activity of a

1446

R. Pawłowska and P. Guga

phosphorothioate-based restriction system in salmonella. Mol Microbiol 93:776–785. https:// doi.org/10.1111/mmi.12692 Crooke ST (ed) (1998) Handbook of experimental pharmacology: antisense research and applications, vol 131. Springer, Berlin/Heidelberg Crooke ST, Wang S, Vickers TA, Shen W, Liang X (2017) Cellular uptake and trafficking of antisense oligonucleotides. Nat Biotechnol 35:230–237. https://doi.org/10.1038/nbt.3779 Dertinger D, Behlen LS, Uhlenbeck OC (2000) Using phosphorothioate-substituted RNA to investigate the thermodynamic role of phosphates in a sequence specific RNA-protein complex. Biochemistry 39:55–63. https://doi.org/10.1021/bi991769v Detzer A, Sczakiel G (2009) Phosphorothioate-stimulated uptake of siRNA by mammalian cells: a novel route for delivery. Curr Top Med Chem 9:1109–1116. https://doi.org/10.2174/ 15680260978963088 Detzer A, Overhoff M, Mescalchin A, Rompf M, Sczakiel G (2008) Phosphorothioate-stimulated cellular uptake of siRNA: a cell culture model for mechanistic studies. Curr Pharm Des 14: 3666–3673. https://doi.org/10.2174/138161208786898770 Eckstein F (2000) Phosphorothioate oligodeoxynucleotides: what is their origin and what is unique about them? Antisense Nucleic Acid Drug Dev 10:117–121. https://doi.org/10.1089/oli.1.2000. 10.117 Egli M, Tereshko V, Teplova M, Minasov G, Joachimiak A, Sanishvili R, Weeks CM, Miller R, Maier MA, An H, Dan Cook P, Manoharan M (1998) X-ray crystallographic analysis of the hydration of A- and B-form DNA at atomic resolution. Biopolymers 48:234–252. https://doi. org/10.1002/(SICI)1097-0282(1998)48:4%3C234::AID-BIP4%3E3.0.CO;2-H Fearon KL, Hirschbein BL, Chiu C-Y, Quijano MR, Zon G (1997) Phosphorothioate oligodeoxynucleotides: large scale synthesis and analysis, impurity characterization, and the effects of phosphorus stereochemistry. Ciba Foundation Symp 209:19–31. https://doi.org/10.1002/ 9780470515396.ch3 Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE, Mello CC (1998) Potent and specific genetic interference by doublestranded RNA in Caenorhabditis elegans. Nature 391:806–811. https://doi.org/10.1038/35888 Flynn LL, Li R, Pitout IL, Aung-Htut MT, Larcher LM, Cooper JAL, Greer KL, Hubbard A, Griffiths L, Bond CS, Wilton SD, Fox AH, Fletcher S (2022) Single stranded fully modifiedphosphorothioate oligonucleotides can induce structured nuclear inclusions, alter nuclear protein localization and disturb the transcriptome in vitro. Front Genet 6:791416. https://doi.org/10. 3389/fgene.2022.791416 Frey PA, Sammons RD (1985) Bond order and charge localization in nucleoside phosphorothioates. Science 228:541–545. https://doi.org/10.1126/science.2984773 Guga P, Koziołkiewicz M (2011) Phosphorothioate nucleotides and oligonucleotides – recent progress in synthesis and application. Chem Biodivers 8:1642–1681. https://doi.org/10.1002/ cbdv.201100130 Guga P, Stec WJ (2003) Synthesis of phosphorothioate oligonucleotides with Stereodefined Phosphorothioate linkages. In: Beaucage SL, Bergstrom DE, Glick GD, Jones RA (eds) Current protocols in nucleic acid chemistry. Wiley, Hoboken, pp 4.17.1–4.17.28. https://doi.org/10. 1002/0471142700.nc0417s14 Guga P, Boczkowska M, Janicka M, Maciaszek A, Kuberski S, Stec WJ (2007a) Unusual thermal stability of RNA/[All-RP-PS]-DNA/RNA triplexes containing a Homopurine DNA Strand. Biophys J 92:2507–2515. https://doi.org/10.1529/biophysj.106.099283 Guga P, Janicka M, Maciaszek A, Rębowska B, Nowak G (2007b) Hoogsteen paired Homopurine [RP-PS]-DNA and homopyrimidine RNA strands form a thermally stable parallel duplex. Biophys J 93:3567–3574. https://doi.org/10.1529/biophysj.107.108183 Horton TE, Maderia M, DoRose VJ (2000) Impact of phosphorothioate substitutions on the thermodynamic stability of an RNA GAAA Tetraloop: an unexpected stabilization. Biochemistry 39:8201–8207. https://doi.org/10.1021/bi000141d

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1447

Huang Q, Li R, Yi T, Cong F, Wang D, Deng Z, Zhao YL (2021) Phosphorothioate-DNA bacterial diet reduces the ROS levels in C. elegans while improving locomotion and longevity. Commun Biol 4:1335. https://doi.org/10.1038/s42003-021-02863-y Iwamoto N, Butler D, Svrzikapa N, Mohapatra S, Zlatev I, Sah DWY, Meena SSM, Lu G, Apponi LH, Frank-Kamenetsky M, Jingxin Zhang J, Vargeese C, Verdine GL (2017) Control of phosphorothioate stereochemistry substantially increases the efficacy of antisense oligonucleotides. Nat Biotechnol 35:845–851. https://doi.org/10.1038/nbt.3948 Jahns H, Roos M, Imig J, Baumann F, Wang Y, Gilmour R, Hall J (2015) Stereochemical bias introduced during RNA synthesis modulates the activity of phosphorothioate siRNAs. Nat Commun 6:6317. https://doi.org/10.1038/ncomms7317 Jahns H, Taneja N, Willoughby JLS, Akabane-Nakata M, Brown CR, Nguyen T, Bisbe A, Matsuda S, Hettinger MRM, Rajeev KG, Maier MA, Zlatev I, Charisse K, Egli M, Manoharan M (2022) Chirality matters: stereo-defined phosphorothioate linkages at the termini of small interfering RNAs improve pharmacology in vivo. Nucleic Acids Res 50:1221–1240. https://doi. org/10.1093/nar/gkab544 Jaroszewski JW, Syi J-L, Maizel J, Cohen JS (1992) Towards rational design of antisense DNA: molecular modelling of phosphorothioate DNA analogues. Anticancer Drug Des 7:253–262 Jastrzębska K, Maciaszek A, Dolot R, Bujacz G, Guga P (2015) Thermal stability and conformation of antiparallel duplexes formed by P-stereodefined phosphorothioate DNA/LNA chimeric oligomers with DNA and RNA matrices. Org Biomol Chem 13:10032–10040. https://doi.org/ 10.1039/c5ob01474c Jastrzębska K, Mikołajczyk B, Guga P (2020) LNA units present in [RP-PS]-(DNA#LNA) chimeras enhance the thermal stability of parallel duplexes and triplexes formed with (20 -OMe)-RNA strands. RSC Adv 10:22370–22376. https://doi.org/10.1039/d0ra03934a Jastrzębska K, Maciaszek A, Dolot R, Tomaszewska-Antczak A, Mikołajczyk B, Guga P (2022) Synthesis and hybridizing properties of P-stereodefined chimeric [PS]-{DNA:RNA} and [PS]-{DNA:(20 -OMe)-RNA} oligomers. RSC Adv 12:26815–26824. https://doi.org/10.1039/ d2ra04855h Jian H, Xu G, Yi Y, Hao Y, Wang Y, Xiong L, Wang S, Liu S, Meng C, Wang J, Zhang Y, Chen C, Feng X, Luo H, Zhang H, Zhang X, Wang L, Wang Z, Deng Z, Xiao X (2021) The origin and impeded dissemination of the DNA phosphorothioation system in prokaryotes. Nat Commun 12:6382. https://doi.org/10.1038/s41467-021-26636-7 Kaiser S, Byrne SR, Ammann G, Atoi PA, Borland K, Brecheisen R, DeMott MS, Gehrke T, Hagelskamp F, Heiss M, Yoluc Y, Liu L, Zhang Q, Dedon PC, Cao B, Kellner S (2021) Strategies to avoid artifacts in mass spectrometry-based epitranscriptome analyses. Angew Chem Int Ed Engl 60:23885–23893. https://doi.org/10.1002/anie.202106215 Kawaguchi D, Kodama A, Abe N, Takebuchi K, Hashiya F, Tomoike F, Nakamoto K, Kimura Y, Shimizu Y, Abe H (2020) Phosphorothioate modification of mRNA accelerates the rate of translation initiation to provide more efficient protein synthesis. Angew Chem Int Ed Engl 59: 17403–17407. https://doi.org/10.1002/anie.202007111 Kellner S, DeMott MS, Cheng CP, Russell BS, Cao B, You D, Dedon PC (2017) Oxidation of phosphorothioate DNA modifications leads to lethal genomic instability. Nat Chem Biol 13: 888–894. https://doi.org/10.1038/nchembio.2407 Koziolkiewicz M, Krakowiak A, Kwinkowski M, Boczkowska M, Stec WJ (1995) Stereodifferentiation – the effect of P chirality of oligo (nucleoside phosphorothioates) on the activity of bacterial RNase H. Nucleic Acids Res 23:5000–5005. https://doi.org/10.1093/nar/23.24.5000 Koziołkiewicz M, Wójcik M, Kobylańska A, Karwowski B, Rȩbowska B, Guga P, Stec WJ (1997) Stability of stereoregular oligo(nucleoside phosphorothioate)s in human plasma: diastereoselectivity of plasma 30 -exonuclease. Antisense Nucleic Acid Drug Dev 7:43–48. https://doi.org/10.1089/oli.1.1997.7.43 Koziołkiewicz M, Gendaszewska E, Maszewska M, Stein CA, Stec WJ (2001) The mononucleotide-dependent, non-antisense mechanism of action of phosphodiester and

1448

R. Pawłowska and P. Guga

phosphorothioate oligonucleotides depends upon the activity of an ecto-50 -nucleotidase. Blood 98:995–1002. https://doi.org/10.1182/blood.v98.4.995 Krakowiak A, Koziołkiewicz M (1998) Influence of P-chirality of Phosphorothioate oligonucleotides on the activity of AMV-reverse transcriptase. Nucleosides Nucleotides 17:1823–1834. https://doi.org/10.1080/07328319808004720 Krieg AM, Guga P, Stec WJ (2003) P-chirality-dependent immune activation by phosphorothioate CpG oligodeoxynucleotides. Oligonucleotides 13:491–499. https://doi.org/10.1089/ 154545703322860807 Kulkarni JA, Witzigmann D, Thomson SB et al (2021) The current landscape of nucleic acid therapeutics. Nat Nanotechnol 16:630–643. https://doi.org/10.1038/s41565-021-00898-0 Laurent Q, Martinent R, Moreau D, Winssinger N, Sakai N, Matile S (2021) Oligonucleotide phosphorothioates enter cells by thiol-mediated uptake. Angew Chem Int Ed 60:9102–19106. https://doi.org/10.1002/anie.202107327 Liang JD, Wang ZJ, He XY, Li JL, Zhou XF, Deng ZX (2007) DNA modification by sulfur: analysis of the sequence recognition specificity surrounding the modification sites. Nucleic Acids Res 35:2944–2954. https://doi.org/10.1093/nar/gkm176 Liu G, Fu W, Zhang Z, He Y, Yu H, Wang Y, Wang X, Zhao YL, Deng Z, Wu G, He X (2018) Structural basis for the recognition of sulfur in phosphorothioated DNA. Nat Commun 9:4689. https://doi.org/10.1038/s41467-018-07093-1 Lutz T, Czapinska H, Fomenkov A, Potapov V, Heiter DF, Cao B, Dedon P, Bochtler M, Xu S (2020) Protein domain guided screen for sequence specific and phosphorothioate-dependent restriction endonucleases. Front Microbiol 11:1960. https://doi.org/10.3389/fmicb.2020.01960 Michienzi A, Rossi JJ (2001) Intracellular application of ribozymes. Methods Enzymol 341: 581–596. https://doi.org/10.1016/s0076-6879(01)41178-5 Miller CM, Tanowitz M, Donner AJ, Prakash TP, Swayze EE, Harris EN, Seth PP (2018) Receptormediated uptake of phosphorothioate antisense oligonucleotides in different cell types of the liver. Nucleic Acid Ther 28:119–127. https://doi.org/10.1089/nat.2017.0709 Nawrot B, Rębowska B, Cieslińska K, Stec WJ (2005) New approach to the synthesis of oligodeoxyribonucleotides modified with phosphorothioates of predetermined sense of P-chirality. Tetrahedron Lett 46:6641–6644. https://doi.org/10.1016/j.tetlet.2005.07.158 Nawrot B, Widera K, Wojcik M, Rebowska B, Goss W, Stec WJ (2007) Mapping of the functional phosphate groups in the catalytic core of deoxyribozyme 10-23. FEBS J 274:1062–1072. https:// doi.org/10.1111/j.1742-4658.2007.05655.x Oka N, Wada T, Saigo K (2002) Diastereocontrolled synthesis of dinucleoside phosphorothioates using a novel class of activators, Dialkyl(cyanomethyl)ammonium Tetrafluoroborates. J Am Chem Soc 124:4962–4963. https://doi.org/10.1021/ja017275e Oka N, Yamamoto M, Sato T, Wada T (2008) Solid-phase synthesis of stereoregular oligodeoxyribonucleoside phosphorothioates using bicyclic oxazaphospholidine derivatives as monomer units. J Am Chem Soc 130:16031–16037. https://doi.org/10.1021/ja805780u Oka N, Kondo T, Fujiwara S, Maizuru Y, Wada T (2009) Stereocontrolled synthesis of oligoribonucleoside phosphorothioates by an Oxazaphospholidine approach. Org Lett 11:967–970. https://doi.org/10.1021/ol802910k Østergaard ME, De Hoyos CL, Wan WB, Shen W, Low A, Berdeja A, Vasquez G, Murray S, Migawa MT, Liang XH, Swayze EE, Crooke ST, Seth PP (2020) Understanding the effect of controlling phosphorothioate chirality in the DNA gap on the potency and safety of gapmer antisense oligonucleotides. Nucleic Acids Res 48:1691–1700. https://doi.org/10.1093/nar/ gkaa031 Ozga M, Dolot R, Janicka M, Kaczmarek R, Krakowiak A (2010) Histidine triad nucleotidebinding protein 1 (HINT-1) phosphoramidase transforms nucleoside 50 -O-phosphorothioates to nucleoside 50 -O-phosphates. J Biol Chem 285:40809–40818. https://doi.org/10.1074/jbc.M110. 162065

45

Phosphorothioate Nucleic Acids: Artificial Modification Envisaged by Nature

1449

Pu T, Liang J, Mei Z, Yang Y, Wang J, Zhang W, Liang WJ, Zhou X, Deng Z, Wang Z (2019) Phosphorothioated DNA is shielded from oxidative damage. Appl Environ Microbiol 85: e00104–e00119. https://doi.org/10.1128/AEM.00104-19 Radzikowska E, Kaczmarek R, Korczyński D, Krakowiak A, Mikołajczyk B, Baraniak J, Guga P, Wheeler KA, Pawlak T, Nawrot B (2020) P-stereocontrolled synthesis of oligo(nucleoside N3’/ O5’ phosphoramidothioate)s – opportunities and limitations. RSC Adv 10:35185–35197. https://doi.org/10.1039/d0ra04987e Saenger W, Hunter WN, Kennard O (1986) DNA conformation is determined by economics in the hydration of phosphate groups. Nature 324:385–388. https://doi.org/10.1038/324385a0 Stec WJ, Grajkowski A, Koziołkiewicz M, Uznański B (1991) Novel route to oligo (deoxyribonucleoside phosphorothioates). Stereocontrolled synthesis of P-chiral oligo (deoxyribonucleoside phosphorothioates). Nucleic Acids Res 19:5883–5888. https://doi.org/ 10.1093/nar/19.21.5883 Stec WJ, Cierniewski CS, Okruszek A, Kobylańska A, Pawłowska Z, Koziołkiewicz M, Pluskota E, Maciaszek A, Rębowska B, Stasiak M (1997) Stereodependent inhibition of plasminogen activator inhibitor type 1 by Phosphorothioate oligonucleotides: proof of sequence specificity in cell culture and in vivo rat experiments. Antisense Nucleic Drugs Dev 7:567–573. https://doi.org/10.1089/oli.1.1997.7.567 Stec W, Karwowski B, Boczkowska M, Guga P, Koziołkiewicz M, Sochacki M, Wieczorek M, Błaszczyk J (1998) Deoxyribonucleoside 3'-O-(2-Thio- and 2-Oxo-"Spiro"-4,4-Pentamethylene-1,3,2-Oxathiaphospholane)s: monomers for Stereocontrolled synthesis of oligo (deoxyribonucleoside phosphorothioate)s and chimeric PS/PO oligonucleotides. J Am Chem Soc 120:7156–7167. https://doi.org/10.1021/ja973801j Stein CA, Krieg AM (eds) (1998) Applied antisense oligonucleotide technology. Wiley, New York Strzelecka D, Smietanski M, Sikorski PJ, Warminski M, Kowalska J, Jemielity J (2020) Phosphodiester modifications in mRNA poly(A) tail prevent deadenylation without compromising protein expression. RNA 26:1815–1837. https://doi.org/10.1261/rna.077099.120 Sun Y, Kong L, Wu G, Cao B, Pang X, Deng Z, Dedon PC, Zhang C, You D (2020) DNA phosphorothioate modifications are widely distributed in the human microbiome. Biomol Ther 10:1175. https://doi.org/10.3390/biom10081175 Sundaralingam M, Pan B (2002) Hydrogen and hydration of DNA and RNA oligonucleotides. Biophys Chem 95:273–282. https://doi.org/10.1016/S0301-4622(01)00262-9 Tomaszewska A, Guga P, Stec WJ (2010) Diastereomerically pure nucleoside-50 -O-(2-thio-4,4pentamethylene-1,3,2-oxathiaphospholane)s – substrates for synthesis of P-chiral derivatives of nucleoside-50 -O-phosphorothioates. Chirality 23:237–244. https://doi.org/10.1002/chir.20905 Tomaszewska-Antczak A, Jastrzębska K, Maciaszek A, Mikołajczyk B, Guga P (2018) P-Stereodefined phosphorothioate analogs of glycol nucleic acids – synthesis and structural properties. RSC Adv 8:24942–24952. https://doi.org/10.1039/c8ra05568h Tong T, Chen S, Wang L, Tang Y, Ryu JY, Jiang S, Wu X, Chen C, Luo J, Deng Z, Li Z, Lee SY, Chen S (2018) Occurrence, evolution, and functions of DNA phosphorothioate epigenetics in bacteria. Proc Natl Acad Sci U S A 115:E2988–E2996. https://doi.org/10.1073/pnas.1721916115 Wan WB, Migawa MT, Vasquez G, Murray HM, Nichols JG, Gaus H, Berdeja A, Lee S, Hart CE, Lima WF, Swayze EE, Seth PP (2014) Synthesis, biophysical properties and biological activity of second generation antisense oligonucleotides containing chiral phosphorothioate linkages. Nucleic Acids Res 42:13456–13468. https://doi.org/10.1093/nar/gku1115 Wang L, Chen S, Xu T, Taghizadeh K, Wishnok JS, Zhou X, You D, Deng Z, Dedon PC (2007) Phosphorothioation of DNA in bacteria by dnd genes. Nat Chem Biol 3:709–710. https://doi. org/10.1038/nchembio.2007.39 Wang L, Chen S, Vergin KL, Giovannoni SJ, Chan SW, DeMott MS, Taghizadeh K, Cordero OX, Cutler M, Timberlake S, Alm EJ, Polz MF, Pinhassi J, Deng Z, Dedon PC (2011) DNA phosphorothioation is widespread and quantized in bacterial genomes. Proc Natl Acad Sci U S A 108:2963–2968. https://doi.org/10.1073/pnas.1017261108

1450

R. Pawłowska and P. Guga

Wu Y, Tang Y, Dong X, Zheng YY, Haruehanroengra P, Mao S, Lin Q, Sheng J (2020) RNA phosphorothioate modification in prokaryotes and eukaryotes. ACS Chem Biol 15:1301–1305. https://doi.org/10.1021/acschembio.0c00163 Xie X, Liang J, Pu T, Xu F, Yao F, Yang Y, Zhao Y-L, You D, Zhou X, Deng Z, Wang Z (2012) Phosphorothioate DNA as an antioxidant in bacteria. Nucleic Acids Res 40:9115–9124. https:// doi.org/10.1093/nar/gks650 Xiong W, Zhao G, Yu H, He X (2015) Interactions of Dnd proteins involved in bacterial DNA phosphorothioate modification. Front Microbiol 6:1139. https://doi.org/10.3389/fmicb.2015. 01139 Xiong L, Liu S, Chen S, Xiao Y, Zhu B, Gao Y, Zhang Y, Chen B, Luo J, Deng Z, Chen X, Wang L, Chen S (2019) A new type of DNA phosphorothioation-based antiviral system in archaea. Nat Commun 10:1688. https://doi.org/10.1038/s41467-019-09390-9 Xiong X, Wu G, Wei Y, Liu L, Zhang Y, Su R, Jiang X, Li M, Gao H, Tian X, Zhang Y, Hu L, Chen S, Tang Y, Jiang S, Huang R, Li Z, Wang Y, Deng Z, Wang J, Dedon PC, Chen S, Wang L (2020) SspABCD-SspE is a phosphorothioation-sensing bacterial defence system with broad anti-phage activities. Nat Microbiol 5:917–928. https://doi.org/10.1038/s41564-020-0700-6 Xu T, Yao F, Zhou X, Deng Z, You D (2010) A novel host-specific restriction system associated with DNA backbone S-modification in salmonella. Nucleic Acids Res 38:7133–7141. https:// doi.org/10.1093/nar/gkq610 Yang Y, Xu G, Liang J, He Y, Xiong L, Li H, Bartlett D, Deng Z, Wang Z, Xiao X (2017) DNA backbone sulfur-modification expands microbial growth range under multiple stresses by its anti-oxidation function. Sci Rep 7:3516. https://doi.org/10.1038/s41598-017-02445-1 You D, Wang L, Yao F, Zhou X, Deng Z (2007) A novel DNA modification by sulfur: DndA is a NifS-like cysteine desulfurase capable of assembling DndC as an iron-sulfur cluster protein in Streptomyces lividans. Biochemistry 46:6126–6133. https://doi.org/10.1021/bi602615k Yu CH, Bhattacharya A, Persaud M et al (2021) Nucleic acid binding by SAMHD1 contributes to the antiretroviral activity and is enhanced by the GpsN modification. Nat Commun 12:731. https://doi.org/10.1038/s41467-021-21023-8 Zhou X, He X, Liang J, Li A, Xu T, Kieser T, Helmann JD, Deng Z (2005) A novel DNA modification by sulphur. Mol Microbiol 57:1428–1438. https://doi.org/10.1111/j.1365-2958. 2005.04764.x

Part VI Analytical Methods and Applications of Nucleic Acids

Aptamer Molecular Evolution for Liquid Biopsy

46

Lingling Wu, Qi Niu, and Chaoyong Yang

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Molecular Evolution of Aptamers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Generation of Oligonucleotide Library with Increased Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . Selection of Aptamer Candidates from Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Identification and Characterization of Aptamer Candidates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synthesis and Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aptamer-Based Detection of CTCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTC Isolation and Enrichment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Release of CTCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detection of CTCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1454 1456 1457 1461 1467 1470 1470 1470 1478 1480

L. Wu Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China e-mail: [email protected] Q. Niu The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China C. Yang (*) Institute of Molecular Medicine, Shanghai Key Laboratory for Nucleic Acid Chemistry and Nanomedicine, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China The MOE Key Laboratory of Spectrochemical Analysis & Instrumentation, the Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_52

1453

1454

L. Wu et al.

Aptamer-Based Detection of EVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Isolation-Free Homogeneous Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detection on Solid-Liquid Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1481 1482 1484 1491 1492

Abstract

Liquid biopsy can provide comprehensive and real-time physiological and pathological information in a noninvasive and longitudinal monitoring manner, facilitating precision medicine practice and clinical investigations. However, the rarity of circulating targets in extremely complex body liquid matrixes poses daunting challenges for liquid biopsy. Aptamers are single-stranded oligonucleotides with unique tertiary structures to selectively bind to their targets with excellent affinity and selectivity. The structural and functional merits of aptamers, including facile evolution, convenient functionalization and assembly, and adjustable affinity, render them ideal recognition ligands for liquid biopsy. This chapter focuses on the molecular evolution of aptamers for liquid biopsy. First, recent progress in molecular evolution strategies of aptamers is summarized including each step during the whole SELEX (systematic evolution of ligands by exponential enrichment) procedures. Then, state-of-the-art approaches of aptamer-based liquid biopsy with the emphasis on isolation and detection of circulating tumor cells and extracellular vesicles are introduced. Finally, the challenges and future perspectives in this field are discussed. Keywords

Aptamers · Molecular evolution · Liquid biopsy · Circulating tumor cells · Extracellular vesicles

Introduction Cancer is a set of disorders including abnormal cell growth, local invasion, and distant metastases. It is responsible for the leading cause of global deaths and ranks as one of the most important barriers to increasing life expectancy and quality of life worldwide in the twenty-first century. Over the past century, many efforts have been devoted to improving the clinical management of cancer patients and clinical investigation of biological mechanisms of tumorigenesis and development. To date, tissue biopsy which requires tissue extraction is the golden standard for cancer diagnosis. However, tissue biopsy requires invasive sampling methods such as surgery and fine-needle aspiration, and depends on medical imaging with observable lesions. Therefore, these features make it impossible for tissue biopsy as routine disease screening, early diagnosis, and regular examination of treatment or recurrence monitoring. More importantly, tissue biopsy only acquires local and temporal biological information of tumors because it analyzes extracted tissue samples which

46

Aptamer Molecular Evolution for Liquid Biopsy

1455

only reflect partial information at the state of sampling time. Thus, it remains an unmet demand to achieve accurate diagnosis and therapy guidance of cancers. Liquid biopsy, also known as fluid biopsy, has attracted much attention in clinical and scientific research. It assesses individual physiological and pathological states by the detection and analysis of circulating targets from body liquids. Circulating tumor cells (CTCs), tumor-derived extracellular vesicles (T-EVs), and cell-free circulating tumor DNA which are shed or released from solid tumor tissues are considered as promising liquid biopsy targets for cancer patients (Siravegna et al. 2017). Liquid biopsy surpasses traditional tissue biopsy in cancer clinical management and investigation with two prominent advantages. First of all, the noninvasive and convenient sampling manner makes liquid biopsy accessible in resource-limited settings for cancer screening and diagnosis. It also enables multiple sampling and examination in a certain time span of therapy and clinical follow-ups for real-time monitoring of treatment and recurrence. On the other hand, circulating targets carry both phenotypic and genetic compositions of primary and metastatic tumors, and thus, liquid biopsy provides comprehensive information for cancer diagnosis and therapy guidance. Especially, liquid biopsy is suitable for the microscopic lesions of metastases that are inaccessible for tissue biopsy. Although promising, liquid biopsy still faces some daunting challenges. First, tumor-derived circulating targets commonly exist in complex body fluid matrixes at a low concentration, which requires efficient and selective isolation and gentle release of circulating targets for their sensitive and accurate detection. Second, the rarity and small size of circulating targets call for signal amplification strategies. Last, the inherent heterogeneity of circulating targets makes it necessary to analyze targets at the single-cell/particle level. Affinity recognition has been widely applied in liquid biopsy by harnessing specific interactions between recognition ligands and biomarkers on the membrane surface of circulating targets. For example, affinity isolation enables efficient and selective isolation of circulating targets, which is usually followed by immunolabeling for their identification and detection. But antibodies as common recognition ligands suffer from difficulties in affinity regulation and structure assembly, thus, it is unfavorable for gentle release and signal amplification. Nucleic acids make up genetic materials in life systems, which encode, store, transmit, and express information of living cells. Aside from information carriers, single-stranded nucleic acids with tertiary structures known as aptamers can act as recognition ligands (Ma et al. 2015). Compared with other recognition ligands, aptamers possess many distinct advantages in liquid biopsy. First, aptamers evolved by in vitro molecular evolution procedure termed SELEX (systematic evolution of ligands by exponential enrichment) (Ellington and Szostak 1990; Tuerk and Gold 1990). Theoretically, aptamers against any target can be generated by SELEX with a rich enough library and appropriate selection conditions. Especially, selection efficiency, accuracy, and aptamer properties could be largely improved by integrating new techniques with SELEX (Gotrik et al. 2016). Thus, by taking tumor biomarkers or circulating targets as selection targets, diverse aptamers can be obtained for liquid biopsy. Second, aptamers can be chemically synthesized in large batches with high

1456

L. Wu et al.

reproducibility and stability through facile solid-state phosphoramidite chemistry (Ma et al. 2015). In particular, site-specific modification of functional moieties to aptamers can be easily achieved during chemical synthesis (Vaught et al. 2010). The modification of active groups (e.g., amine, thiol, azide, etc.) and molecular labels (e.g., biotin, fluorescent dye, etc.) enables diverse interface functionalization of aptamers and signal read-out, respectively (Wu et al. 2021). Third, aptamer affinity relies on folded tertiary conformations, which can be broken down by some physical and chemical stimulus, allowing the convenient and gentle release of circulating targets for downstream analysis (Wu et al. 2020b). Last, aptamers essentially as nucleic acids follow the principle of Watson-Crick base-pairing. Thus, they share the merits of programmable structure design and engineering for multivalent assembly and signal transduction and amplification, allowing efficient capture and sensitive detection of circulating targets (Huang et al. 2019b; Zhao et al. 2012). With these superior properties, aptamers are considered as ideal recognition ligands for liquid biopsy. In light of the outstanding properties of aptamers and their applicational advantages in liquid biopsy, this chapter focuses on the molecular evolution of aptamers for liquid biopsy. First, the summary of recent progress in molecular evolution for aptamer selection is summarized. Then, aptamer-based liquid biopsy methods are described, mainly including aptamer-based isolation, release, and detection of CTCs and extracellular vesicles (EVs). Finally, future possibilities and challenges in the field of aptamer-based liquid biopsy are discussed.

Molecular Evolution of Aptamers Aptamers are single-stranded functional oligonucleotides with unique tertiary structures to bind to target molecules with high affinity and high selectivity. They are regarded as alternative recognition ligands to antibodies with distinctive properties, such as small size, low cost, high stability, easy synthesis and modification, etc. Aptamers have evolved from a random oligonucleotide library by in vitro SELEX procedure, a technology first reported in 1990 (Ellington and Szostak 1990; Tuerk and Gold 1990). The iterative cycle of SELEX is composed of the following four major steps (Fig. 1). 1. Generation of a random oligonucleotide library. A library of 1014–1016 singlestranded oligonucleotides is chemically synthesized in vitro, which contains a central random sequence of 20–50 bases flanked by fixed primer-binding sequences at each end. The oligonucleotide library possesses a theoretic capacity of 4n (n is the number of nucleotides), i.e., 1014–16. Such enormous library capacity allows the evolution of diverse aptamer candidates with high performance. 2. Selection and enrichment. The random oligonucleotide library is incubated with targets under certain selection conditions (e.g., temperature, ionic strength, and incubation time). The sequences bound to targets are separated from those lacking

46

Aptamer Molecular Evolution for Liquid Biopsy

1457

Fig. 1 Schematic diagram of SELEX for aptamer selection

binding properties and amplified by PCR or RT-PCR using a fixed primer to produce a new library for the next round of selection. 3. Identification and characterization. After repeating several rounds of selection processes with increased stringency, the oligonucleotide pool with desired binding property is sequenced and analyzed. The properties of aptamer candidates including affinity and selectivity are characterized to determine the ultimate aptamer sequences with the best performance. 4. Synthesis and modification. The aptamer sequences are synthesized with chemical modification for improving aptamer properties and subsequent applications. Various aptamers have been evolved by SELEX toward a wide range of targets ranging from ions and small molecules to proteins, viruses, cells, and even tissues. Thereinto, some of them have been widely applied as recognition ligands in novel diagnostic methods and as therapeutic agents. For example, Macugen, an antivascular endothelial growth factor (VEGF) aptamer, has even been clinically approved as an antiangiogenic medicine. Nevertheless, SELEX usually requires several to dozens of selection cycles to obtain potential aptamer sequences, which commonly takes several weeks for manual completion. Thus, SELEX suffers from cumbersome and time-consuming operations and low success rates. On the other hand, some selected aptamers have unsatisfactory binding performances. In light of these, many efforts have been devoted to developing efficient evolution strategies for the selection of high-performance aptamers. This section will introduce modified SELEX strategies to overcome limitations at each step during SELEX procedures.

Generation of Oligonucleotide Library with Increased Diversity The binding function of aptamers is derived from their folded tertiary structures, namely, structure determines function. During in vitro evolution of SELEX, oligonucleotides in the library exhibit various structures to guarantee the possibility of

1458

L. Wu et al.

aptamer selection. Obviously, increased diversity of library including chemical and structural diversity of oligonucleotides can increase the success rates of SELEX and functionality as well as stability of aptamers. Just as polymers are synthesized from a set of monomers as building blocks, nucleic acids are built from four types of nucleotide building blocks including A, T/U, G, and C. Their “structure space” is well defined with a countable number of possible sequences, determining the probability of selecting desired aptamers within that space. However, compared with 20 amino acid building blocks of proteins, aptamers have only four nucleotide building blocks with relatively constrained “structure space.” Thus, the limited chemical diversity in the natural set of canonical nucleotides severely limits the success rates of SELEX and aptamer properties. Over the past decades, many efforts have been devoted to expanding the “chemical space” and “structure space” of oligonucleotide libraries by modifying nucleotides and harnessing artificial nucleotides. Below modified SELEX methods with increased diversity of oligonucleotide library are summarized in detail.

Library Containing Modified Oligonucleotides Nucleotides as the monomers of nucleic acids are composed of three components: a 5-carbon sugar, a phosphate group, and a nucleobase. To date, a wealth of synthetic chemistry is utilized for the modification of these three components to expand the chemical space of nucleic acids. Nucleobase modification was the earliest means to expand libraries. Thereinto, 5-position of pyrimidines, 8-position of purines, 7-position of 20 -deoxypurines, and 20 -position of all nucleotides have been explored to modify hydrophobic, hydrophilic, and charged groups. For example, modification of aromatic groups can enhance the possibility of forming hydrophobic and π-stacking interactions to improve aptamer-protein interaction, making it possible to select high-affinity aptamers of subnanomolar or even picomolar range. Modified nucleoside triphosphates can be incorporated into nucleic acid sequences with suitable polymerases during PCR. For example, Benner’s group designed an oligonucleotide library containing a modified nucleoside 5-(30 -Aminopropynyl)-20 -deoxyuridine (dJ), which had a side chain carrying a cationic functional group (Battersby et al. 1999). Using Vent DNA polymerase, the nonstandard dJ base could be incorporated into DNA sequences during PCR. Using this modified library, aptamers against adenosine triphosphate (ATP) have successfully evolved. It is noted that selecting compatible polymerases is crucial for incorporating modified nucleobases into DNA libraries. Eaton’s group synthesized six 5-position modified dUTP derivatives and explored six commercially available DNA polymerases to select compatible polymerases (Vaught et al. 2010). With the modified oligonucleotide libraries, DNA aptamers against necrosis factor receptor superfamily member 9 have been successfully evolved for the first time with high affinity. However, in these modified libraries, the modified chemical groups were introduced during enzymatic replication, which may suffer from enzymatic incompatibility and required cumbersome synthesis of the desired nucleotide building blocks. To overcome this issue, Mayer’s group reported a versatile approach to generate a nucleobase-modified nucleic acid

46

Aptamer Molecular Evolution for Liquid Biopsy

1459

Fig. 2 Generation of aptamer libraries. (a) Click-SELEX with modular expansion of the chemical space of nucleic acid libraries. (Adapted with permission from (Tolle et al. 2015). Copyright 2015 Wiley-VCH). (b) Artificially expanded genetic information system-SELEX with two types of unnatural nucleotides. (Adapted with permission from (Zhang et al. 2015). Copyright 2015 American Chemical Society). (c) Clipped-SELEX based on small-molecule-clipping of intramolecular motifs with diverse high-order structures. (Adapted with permission from (Huang et al. 2021). Copyright 2021 Wiley-VCH)

library by harnessing click chemistry, and the evolution procedure was termed clickSELEX (Fig. 2a) (Tolle et al. 2015). Briefly, C5-ethynyl-20 -deoxyuridine (EdU) was used to replace the canonical thymidine building block to prepare an alkynemodified DNA library. Various azide-containing compounds could be modularly modified with click reaction to generate modified libraries. This strategy was first proposed in situ nucleobase modification after PCR during SELEX, which enabled high enzymatic compatibility and offered broad access to many chemical modifications. Aptamers as nucleic acids share the drawbacks of nuclease sensitivity, which restricts their applications in liquid biopsy due to abundant nucleases in body fluids. Thus, it is highly desirable to obtain aptamers with high stability. Since the 20 -carbon of ribose is a nuclease-sensitive position, its chemical modification can stabilize aptamers from nuclease degradation. Ribonucleotides substituted by 20 -fluoro, 20 -amino, and 20 -O-methyl can enhance resistance to nuclease degradation and increase in vivo aptamer half-life to several days, and their libraries have been successfully used to screen aptamers with increased stability in a variety of biological fluids (Xiao et al. 2012). Such modified nucleotides require an engineered mutant polymerase during PCR. Beyond 20 -carbon of ribose, 40 -oxygen is an alternative position for ribose modification. Aptamers selected from libraries containing 40 -thio cytidine and 40 -thio uridine exhibited better properties, including

1460

L. Wu et al.

enhanced nuclease resistance and base-pair strengths. Locked nucleic acids (LNA) are nucleotide analogues containing bridged bicyclic sugar moiety, which are locked by a methylene bridge between 20 -O and 40 -C of their furanose ring (Pinheiro et al. 2012). It has been proven that introducing LNA into libraries could generate aptamers with increased in vivo biostability and binding affinity against diverse target molecules. Besides modified ribose, libraries with phosphodiester backbone modification also facilitate the selection of aptamers with resistance to nucleases, such as phosphorothioates, phosphorodithioates, or boranophosphates. For example, with wild-type T7 RNA polymerase and 50 -α-P-borano nucleotide triphosphates, boranophosphate linkage can be introduced into RNA sequences during transcription. Based on this, Lato et al. successfully selected boron-containing aptamers against ATP from libraries of transcripts containing 50 -α-P-borano nucleotides (Lato et al. 2002). Despite the high flexibility and diversity of libraries containing modified oligonucleotides, the modified chemical groups may inhibit hydrogen bonding and interfere with nucleic acid folding. Therefore, it is crucial to systematically evaluate the influence of chemical modification on the binding properties of aptamers.

Expanded Libraries with Artificial Oligonucleotides Although only four types of canonical nucleotides exist in the natural living system, synthetic biology has expanded the genetic alphabet by increasing the number of independently replicable building blocks (e.g., artificial nucleotides). Modified natural nucleotides still pair with their natural complementary nucleotides, but artificial nucleotides can pair with each other as another base pair during nucleic acid amplification. Therefore, libraries with additional artificial nucleotides offer a vast variety of folds and more control over folding to increase “structure space,” which renders them sequence diversities and functionality similar to proteins. For example, Hirao’s group described a SELEX procedure to select aptamers against IFN-gamma and VEGF-165, in which a fifth nucleotide Ds was harnessed (Kimoto et al. 2013). Up to three Ds were incorporated in a random sequence library to generate DNA sequences with increased chemical and structural diversity. Owing to the hydrophobic bases of Ds, they yielded aptamers with dissociation constant (Kd) values at the picomolar level and augmented affinity with over 100-fold improvement compared with aptamers containing only natural nucleotides. These results verified the potential of genetic alphabet expansion to create high-performance aptamers. Further, the artificially expanded genetic information systems (AEGIS) were combined with SELEX to generate AEGIS-SELEX, in which paired artificial nucleotides were incorporated into the library (Zhang et al. 2015). Commonly, the library was comprised of six types of nucleotides: four types of natural and two types of artificial nucleotides (Z, P), allowing the selection of aptamers with nanomolar affinities (Fig. 2b). Although promising, artificial nucleotides still have poor recognition by the polymerase. Therefore, polymerases with specific recognition of artificial nucleotides need to be developed for further research and applications.

46

Aptamer Molecular Evolution for Liquid Biopsy

1461

Library of High-Order Structures The diversity of libraries is commonly improved by increasing the chemical space of libraries using modified or artificial nucleotides as mentioned above. However, these unnatural oligonucleotides are often incompatible with PCR amplification. To address the dilemma, Huang et al. proposed a “clipped aptamer SELEX” with natural oligonucleotide libraries, whose nucleic acid sequences contain CGG/CGG for small-molecule-clipping of intramolecular motifs with diverse high-order structures (Fig. 2c) (Huang et al. 2021). DNA-mismatch-binding small molecules (Z-NCTS), known as DNA molecular glue, could selectively bind to tandem mismatch sites in DNA duplex with CGG/CGG and clip two CGG together. The CGG/ CGG-containing libraries coupled with Z-NCTS provided sequences with highorder structures beyond traditional hairpin shapes. The resultant diverse structures allowed the successful selection of “clipped aptamers” against epithelial cell adhesion molecule (EpCAM) proteins. More importantly, this de novo selection endowed aptamers with a gain-of-binding function; that is, Z-NCTS molecular glue can trigger the efficient transition of aptamer structures from an inactive state to an active state with ideal thermodynamics. Meanwhile, this structure-diverse library avoided chemical modification of nucleotides or introduction of artificial nucleotides; thus, it is fully compatible with PCR amplification. Therefore, the “clipped aptamer SELEX” strategy opens a new avenue for library design, and clipped aptamers have great potential in the field of biochemistry and pharmaceutics due to their superior activation mechanisms and structural diversity.

Selection of Aptamer Candidates from Library The efficiency and success rates of selection and properties of aptamers largely rely on selection procedures, which include the incubation of targets with libraries under certain selection conditions, enriching oligonucleotides with binding properties, and amplifying the selected oligonucleotides by PCR with a new library of their descendant for the next round of selection. After several to dozens of rounds of selection for increased stringency, the oligonucleotide pool with desired binding property is obtained with aptamer candidates. During this process, the selection platform and the target type (e.g., free proteins, whole cells or tissues, etc.) affect selection efficiency and aptamer properties. This section will summarize advanced SELEX with improved techniques in the selection process.

Efficient Selection Platforms Partitioning performance is a key factor determining selection efficiency and success rates, which enables rapid and efficient recovery of aptamer candidates with bare residues of unbound oligonucleotides, allowing enrichment of oligonucleotide pools for aptamer evolution. So far, many novel selection platforms with efficient partitioning strategies have been developed, which can be mainly divided into two groups: homogeneous targets without solid support and targets immobilized on solid supports.

1462

L. Wu et al.

In homogeneous incubation systems, free targets without supports are directly incubated with oligonucleotide libraries. Nitrocellulose filtration was first utilized to partition free oligonucleotides from the protein-bound ones because proteins can be trapped on nitrocellulose filters (Tuerk and Gold 1990). But this method is only applicable for protein targets, and it suffers from low separation efficiency. Further, capillary electrophoresis SELEX (CE-SELEX) was developed based on electrophoretic mobility differences between free oligonucleotides and target-oligonucleotide complexes (Fig. 3a) (Mendonsa and Bowser 2004). Owing to the high partitioning efficiency and minimal nonspecific binding capability of CE, high-performance aptamers could be evolved with relatively fewer rounds of selection, which is time-saving. However, CE-SELEX can only handle several nL of library samples, which requires the pretreatment of reducing the library scale. To address this problem, Bowser’s group reported a microscaled free flow electrophoresis (μFFE) technique for continuous sample loading, partitioning, and product collection (Jing and Bowser 2011). When the library was introduced into a planar separation chamber, an electric field perpendicularly to flow direction was applied to continuously separate analytes based on their mobilities. The μFFE-SELEX can handle a

Fig. 3 Efficient SELEX platforms for selecting aptamer candidates. (a) Capillary electrophoresisSELEX to separate bound oligonucleotides from unbound oligonucleotides. (Adapted with permission from (Mendonsa and Bowser 2004). Copyright 2003 American Chemical Society). (b) Multifunctional screening platform with MBs on the micrometer scale. (Adapted with permission from (Hong et al. 2017). Copyright 2017 American Chemical Society). (c) Sol-gel SELEX for cross-contamination-free selection of multiple target aptamer. (Adapted with permission from (Lee et al. 2013). Copyright 2012 Korean BioChip Society and Springer). (d) Atomic force microscopySELEX (AFM-SELEX). (Adapted with permission from (Takenaka et al. 2017). Copyright 2016 Elsevier)

46

Aptamer Molecular Evolution for Liquid Biopsy

1463

300-fold larger size of library compared with CE-SELEX and generate aptamers with dissociation constants of nM by only a single round of selection. Alternatively, targets could be immobilized on solid supports for SELEX via covalent reaction, affinity interaction, etc. Targets with solid supports enlarge the differences of oligonucleotides bound/unbound for more convenient and efficient partitioning. Moreover, the physicochemical properties of solid supports provide a flexible partition mode, such as affinity chromatography (Latulippe et al. 2013), filtration (Huang et al. 2019a, 2020), magnetic isolation (Hong et al. 2017), microfluidics-assisted partitioning (Lou et al. 2009), etc. For example, recombinant EpCAM proteins with histidine tags were modified on Ni-beads, and unbound oligonucleotides were conveniently removed by filtration after library incubation for aptamer selection (Huang et al. 2019a, 2020). Microfluidics with merits of precise flow manipulation, high miniaturization, automation, and integration has opened new avenues for efficient partitioning and streamlined SELEX procedures. First, microfluidics-SELEX (μ-SELEX) provides fluid drag force and systematic regulation of selection pressure to efficiently remove unbound or nonspecific binding oligonucleotides. Thus, it largely improves partitioning efficiency, enabling fast convergence of oligonucleotide pools for efficient evolution. Second, the miniaturization of microfluidic chips with the reduced capacity of targets increases selection stringency for the selection of high-affinity aptamers. The affinity of recognition ligands is quantified as Kd, where a low Kd value indicates high affinity. Kd has a positive correlation with the amounts of both unbound oligonucleotides and unbound targets during equilibrium, and their amounts approaching zero indicate the minimized Kd (Gotrik et al. 2016). Therefore, the minimized amount of targets and adequate removal of unbound oligonucleotides would contribute to the selection of high-affinity aptamers. Soh’s group has developed several magnetic-controlled μ-SELEX platforms for high-stringency selection, which harnessed magnetic force and fluid force for rapid and efficient separation of unbound oligonucleotides from those bound to magnetic beads (MBs) with immobilized targets (Cho et al. 2010; Lou et al. 2009). Third, microfluidics allow flexible integration of counterselection and automated on-chip SELEX processes, avoiding manual interference with improved aptamer properties. For example, Hong et al. developed a multifunctional screening platform for on-chip counter and positive selection, partitioning, and real-time monitoring (Fig. 3b) (Hong et al. 2017). The synergetic effects of counter and positive selection, incubation in flowing stream, and rigorous washing improved the affinity and specificity of selected aptamers. Meanwhile, in situ and real-time monitoring of the selection process avoided the selection blindness with minimal resources. Besides MBs, nonmagnetic microbeads and microchannels can be used to immobilize targets for μ-SELEX. Park et al. proposed an acoustophoresis SELEX with PSA immobilized microbeads. The continuous flow mode of acoustophoresis allowed simultaneous washing and separation for efficient enrichment of aptamers with Kd of 0.7 nM from the library (Park et al. 2016). It is well known that immobilizing targets on support surfaces with chemical reaction would affect target conformation and mask potential binding sites (Mendonsa and Bowser 2004). To solve this

1464

L. Wu et al.

problem, sol-gel SELEX was thus developed by taking sol-gel composites as supports, whose sol-gel transformation enabled the entrapment of targets with their natural structure and function. Based on this, Kim’s group leveraged sol-gel SELEX for multitarget aptamer selection by designing a pneumatic valve-based microfluidic network platform (Fig. 3c) (Lee et al. 2013). The microfluidic platform allowed serial incubation of the library with five different target proteins entrapped in sol-gel spots. Moreover, the integration of microvalves and localized heat sources provided cross-contamination elution of bounded aptamers of different targets. Nevertheless, this platform was only available for single-round multitarget selection, and thus, versatile microfluidic design is required for multitarget selection of repeated rounds. Atomic force microscopy (AFM) is widely utilized for force measurement, topographic imaging, and manipulation by measuring the force between the cantilever and the sample surface, which reflects their mutual separation. Its feature of force measurement and cantilever-sample separation makes AFM a promising SELEX platform. So far, several AFM-SELEX has been reported, in which oligonucleotides and target molecules were modified on a cantilever and a substrate, respectively, or in reverse (Takenaka et al. 2017). Commonly, linker molecules are utilized to introduce oligonucleotides or targets to a cantilever or substrate, such as streptavidin (SA), partially complementary chain, etc. When the cantilever approaches the substrate for affinity screening, only sufficient high affinity between aptamers and targets allows the disruption of linker molecules with aptamers or targets for aptamer partitioning. For example, DNA-duplex interaction was utilized to link oligonucleotides on a substrate, and DNA aptamers against human serum albumin could be selected with only four rounds (Fig. 3d) (Takenaka et al. 2017). AFM-SELEX allows direct selection of high-affinity aptamers within a few rounds with a small volume of biological samples.

Various Target Types Proteins execute diverse sets of functions within organisms and also serve as important disease biomarkers. Thereinto, membrane proteins are widely utilized as recognition targets for the detection of circulating targets, and corresponding aptamers have been selected with different forms of proteins, such as recombinant proteins, proteins on whole cells, or proteins on tissues. Protein-based SELEX is the earliest and the most widely adopted technique to select aptamers, which takes recombinant proteins as targets. Recombinant proteins are expressed by a prokaryote system, purified, and immobilized on selection supports. Based on this, multiple aptamers against various tumor-related surface proteins have been successfully selected. Nevertheless, there are some inherent limitations of protein-based SELEX. First, it is limited to known markers and fails to provide aptamers for some tumors without identified markers. Second, recombinant proteins would lose their native conformation and bioactivity due to the lack of posttranslational modifications. Thus, some selected aptamers sometimes fail to recognize their targets on whole cells, limiting their practical applications. In addition, the selection matrix directly affects the properties of aptamers. Because it provides an environment where oligonucleotide-target interaction takes place.

46

Aptamer Molecular Evolution for Liquid Biopsy

1465

Fig. 4 Various forms of selection targets. (a) Molecular crowding SELEX for discover enthalpydriven aptamers. (Adapted with permission from (Huang et al. 2019a). Copyright 2019 American Chemical Society). (b) Cell-uptake selection to evolve internalizing aptamers. (Adapted with permission from (Xiao et al. 2012). Copyright 2012 American Chemical Society). (c) In vivo SELEX. (Adapted with permission from (Zhou and Rossi 2017). Copyright 2017 Macmillan Publishers Limited, part of Springer Nature)

Huang et al. proposed molecular crowding SELEX to select enthalpy-driven aptamers by taking blood plasma as the selection matrix instead of a traditional buffer (Fig. 4a) (Huang et al. 2019a). The standard Gibbs free energy change (ΔG) is commonly used to evaluate the binding performance of aptamers, which consists of both enthalpic and entropic components. Enthalpy-driven aptamers with more bonds toward targets possess higher affinity and stronger stability in practical applications compared with entropy-driven aptamers. Plasma as a molecular crowding matrix could decrease the degree of freedom of target proteins to suppress the entropic contribution. Thus, enthalpy-driven aptamers were obtained with 6.5-fold higher affinity compared with those selected in the buffer and were successfully applied to detect CTCs. The high flexibility of SELEX allows whole cells as selection targets, which is known as cell-SELEX. Cell-SELEX offers a reasonable alternative to overcome the abovementioned limitations of protein-based SELEX. First, it does not require prior knowledge of target proteins on cells, which makes it possible to obtain recognition ligands for tumor cells without known biomarkers. In addition, paired cells of interest can be selected for both positive- and counterselection, where high-specific

1466

L. Wu et al.

aptamers could be obtained even with the capability to distinguish different cell subtypes. For example, Li et al. harnessed cell-SELEX to successfully select aptamers specifically toward metastatic tumor cells by taking highly metastatic MDA-MB-231cells and low-metastatic MCF-7 cells as target and negative cells, respectively (Li et al. 2018a). The obtained aptamers exhibited the affinity of nM and good specificity against several types of metastatic tumor cells and were successfully utilized for selective isolation of metastatic phenotypic CTCs. Second, coupled with identification techniques such as mass spectrometry, the target molecules of aptamers could be identified to discover new biomarkers for clinical diagnosis and therapy applications. Third, even for pre-identified protein biomarkers, cell-SELEX ensures the aptamers can be obtained against protein targets in their native structures and complexes. It allows the selection of high-performance aptamers even capable of binding to the original glycosylation pattern of extracellular domains for clinical and research applications (Zhou and Rossi 2017). Nevertheless, the complex molecule compositions and structures on cell surfaces would increase the blindness of selection and reduce the specificity of aptamers. For example, high-abundant molecules would influence the selection of aptamers against low-abundant biomarkers. To improve the selectivity of SELEX procedures, some combinatorial SELEX strategies have been developed to guide the selection of aptamers against specific markers, including immunoprecipitation-coupled SELEX (Wang et al. 2013), hybrid-SELEX (Zhu et al. 2017), ligand-guided selection SELEX (LIGS-SELEX) (Zamay et al. 2019), etc. Aside from selecting aptamers against cell surface biomarkers, cellSELEX could be modified to select internalizing aptamers, which is called cell internalization SELEX. As shown in Fig. 4b, after counterselection, target tumor cells were incubated with the library, washed and trypsinized to remove membranebound oligonucleotides, and lysed to collect internalized oligonucleotides (Xiao et al. 2012). To better mimic the natural environment of cells, upgraded cell-SELEX have been proposed, including 3D cell-SELEX, tissue-SELEX, and even in vivo SELEX. Souza et al. developed 3D cell-SELEX to select aptamers against spheroid cells in 3D cell culture (Souza et al. 2016). This strategy mimicked the tissue microenvironment in vitro to obtain aptamers against PC-3 prostate cancer cell lines with low free energy and dissociation constant in the nM scale. Beyond 3D cell culture, tissues represent the real microenvironment of target cells in vivo. Shao et al. developed tissue-SELEX by taking tumor tissue sections of breast carcinomas and adjacent normal tissue as targets and control (Li et al. 2009). Aptamer BC-15 was successfully selected with the identified target as heterogeneous nuclear ribonucleoprotein A1 (hnRNP A1). The abovementioned SELEX strategies were performed in vitro, and selected aptamers may have insufficient stability and half-life for in vivo applications. To overcome this issue, in vivo SELEX was reported by using animal models to obtain tissue- and organ-specific aptamers (Fig. 4c) (Zhou and Rossi 2017). The procedure of in vivo SELEX is similar to traditional SELEX with some modifications, generally including intravenous injection of oligonucleotide libraries into animal models of selected diseases, using the tissue or organ of interest to extract the bound oligonucleotides, and the amplification and preparation of the next

46

Aptamer Molecular Evolution for Liquid Biopsy

1467

round library for the next selection cycle. With this technique, Clary’s group achieved in vivo selection of nuclease-resistant aptamers targeting hepatic colon cancer metastases (Mi et al. 2010). One of the selected aptamers was identified to bind to p68, an RNA helicase upregulated in colorectal cancer. Overall, in vivo SELEX provides an efficient strategy for selecting high-performance aptamers that are suitable for in vivo applications.

Identification and Characterization of Aptamer Candidates After multiple rounds of directed evolution, the library pool is shrunken with binding sequences toward targets. It requires highly efficient and cost-effective methods to identify and screen aptamer candidates from the enriched library. Normally, the enriched library undergoes cloned plasmids, bacteria transfection and growth, and Sanger sequencing of picked colonies. Alternatively, the enriched library can be directly sequenced with next-generation sequencing technologies. After analyzing the information of hundreds to thousands of sequences, some highly repeated sequences are regarded as aptamer candidates, followed by chemical synthesis and modification to screen their binding performance individually. Among them, the one with the best binding affinity and specificity is identified as the aptamer. Such a process suffers from poor efficiency, high blindness, and low success rate for some reasons. First of all, to reduce the cost of sequencing and the complexity of data analysis, library should undergo dozens of direct evolution to be fully converged, resulting in a time-consuming and labor-intensive process. Second, only sequences of high frequency are screened for aptamer identification and sequences of good binding performance, while with low frequency are ignored, leading to the low success rate of aptamer selection. Third, the binding performances of different aptamer candidates are individually characterized, which is time-consuming and expensive. To overcome these issues, researchers have developed some effective and cost-effective methods to identify and screen aptamer candidates, which are summarized in this section.

Effective Identification Techniques Due to limited sequencing throughput, the Sanger sequencing-based identification method only analyzes the picked colonies (100) of the last round, where only a small fraction of enriched oligonucleotides are identified. Such a strategy only gives access to the most frequent clones which do not represent the best-performing sequences in the pool. Because some well-performing sequences with lower amplification efficiency may be outcompeted by weaker-performing sequences with higher amplification efficiency after many rounds of selection due to amplification bias. To narrow this gap, high-throughput next-generation sequencing (HST) can be introduced to read out all sequences in each round. In order to efficiently analyze immense amounts of sequencing data, many bioinformatics tools and algorithms have been developed for the identification of aptamer candidates, which have been summarized in other reviews (Hamada 2018). For example, through tracking the

1468

L. Wu et al.

Fig. 5 Efficient identification techniques. (a) Quantitative selection of aptamers through sequencing based on copy number and enrichment-fold. (Adapted with permission from (Gotrik et al. 2016). Copyright 2016 American Chemical Society). (b) Sequential multidimensional analysis algorithm for aptamer identification. (Adapted with permission from (Song et al. 2020). Copyright 2019 American Chemical Society)

dynamic enrichment trajectory of library, Soh’s group successfully developed a method of quantitative selection of aptamers through sequencing (QSAS) to identify high-affinity aptamers from only three rounds of HTS data (Fig. 5a) (Cho et al. 2010). Based on over 107 individual sequences from each round of library, the copy number and enrichment-fold were analyzed. Results revealed that the most highly represented sequences were not necessarily the best binders and vice versa. They identified the aptamers with high affinity as the sequences of highest-fold enrichment. The identified aptamers had ~3–8-fold higher affinity and 2–4-fold higher specificity compared with those identified using Sanger sequencing. Other works also confirmed the conclusion that the best binders are usually those enriched most rapidly in the very early rounds of selection rather than those with the highest copy number in a final pool (Hoinka et al. 2014). These studies suggested that HTS-based bioinformatics analysis methods provided a superior means to identify aptamers. Besides enrichment trajectory, more and more sequence information has been extracted and comprehensively evaluated from HTS data, such as family size, and structure/substructure of sequence pattern, for aptamer identification with advanced bioinformatics algorithms (Alam et al. 2015; Hamada 2018; Hoinka et al. 2014). For example, Song et al. proposed a multidimensional analysis algorithm, SMARTAptamer, by comprehensive and balanced analysis of motif contents, family size, and structure stability (Song et al. 2020). Meanwhile, a fast data filtering process was also proposed to deal with billions of sequences within several minutes. Thus, this algorithm possessed the merits of high efficiency, accuracy, robustness, and speed, achieving successful identification of high-affinity aptamers from three sets of HTS data of SELEX pools. Hitherto, a lot of bioinformatics tools have become opensource for researchers to facilitate analysis of sequence information from vast troves of HTS data for aptamer identification (Alam et al. 2015; Gotrik et al. 2016; Hoinka et al. 2014; Song et al. 2020). Moreover, with the decline of HTS cost and the advance of bioinformatics techniques, the HST-based bioinformatics analysis strategy will become a more ubiquitous tool for efficient identification of highperformance aptamers.

46

Aptamer Molecular Evolution for Liquid Biopsy

1469

Effective Characterization Methods After identification, candidate sequences are screened to characterize their binding affinity and specificity, and the one with the best binding performance is chosen as the aptamer. As abovementioned, the Kd of affinity of aptamers is determined by titrating different concentrations of aptamers to their targets, in which the binding events are transformed into physical and chemical signals for quantification. However, traditional binding characterization of each candidate sequence is performed serially, which is burdensome and time-consuming. To overcome the challenges of serial characterization, Soh’s group developed a system termed “quantitative parallel aptamer selection system (QPASS)” for multiplexed characterization of large numbers of aptamer candidates in parallel (Fig. 6a) (Cho et al. 2013). Aptamer candidates were synthesized in situ into eight identical subarrays of a DNA microarray chip. These subarrays were incubated with serial concentrations of fluorescently labeled targets, whose fluorescence intensities were measured to directly determine Kd for each aptamer candidate. This system allowed simultaneous screening of thousands of aptamer candidates with unique advantages of high throughput and high efficiency. To achieve equipment-free and rapid binding characterization, Afi-Chip was designed to translate binding reactions into distance-based signals based on a catalase-linked reaction for gas generation to move dye bars (Song et al. 2016). Furthermore, digital microfluidics coupled with magnetic separation modules could achieve automated characterization of aptamer affinity (Fig. 6b) (Guo et al. 2020). Therefore, combining high-throughput analysis methods with smart microfluidic chips requires less time and resources to characterize binding performance of aptamer candidates.

Fig. 6 Efficient characterization techniques. (a) DNA microarray for simultaneous and parallel characterization of aptamer affinity. (Adapted with permission from (Gotrik et al. 2016). Copyright 2016 American Chemical Society). (b) Digital microfluidics-based automated evaluation platform of aptamer affinity. (Adapted with permission from (Guo et al. 2020). Copyright 2020 Royal Society of Chemistry)

1470

L. Wu et al.

Synthesis and Modification After the identification of sequences, aptamers can be chemically synthesized in large quantities with high reproducibility by facile solid-state phosphoramidite chemistry. Moreover, aptamers can be site-specifically modified with some groups and functional moieties to improve the properties and increase the functionalities of aptamers (Ma et al. 2015). For example, to improve the stability of aptamers in complex body liquids, the vulnerable 20 -carbon of ribose as nuclease-sensitive position can be substituted with other groups as mentioned in the section of “library containing modified oligonucleotides.” In addition, capping aptamers with inverted nucleotides can avoid the degradation of exonucleases. Moreover, for isolation and detection applications, aptamers as recognition ligands can be modified with functional moieties, such as biotin, fluorescence dye, thiol, etc. On the one hand, functional groups enable efficient and facile interface assembly of aptamers, providing diverse isolation platforms for liquid biopsy. On the other hand, aptamers with different functional moieties can act as probes to produce different signal readouts. Moreover, aptamers naturally as nucleic acids follow the principle of Watson-Crick base-pairing and can be directly amplified, allowing the integration of various strategies for signal transduction and amplification to improve detection sensitivity for liquid biopsy.

Aptamer-Based Detection of CTCs Cancer metastasis is the leading cause of death for cancer patients. Numerous studies indicate that CTCs, which are tumor cells shed from solid tumor tissues to bloodstream, play an important role in hematogenous metastasis of cancers. With an intact cell structure and function, CTCs are considered as ideal liquid biopsy targets. On the one hand, CTC enumeration facilitates cancer diagnosis, treatment monitoring, and prognosis evaluation. On the other hand, in-depth genotypic and phenotypic analysis of CTCs provides comprehensive information in a noninvasive manner for therapy guidance and biological study of metastasis and drug resistance. Aptamers with many distinct structural and functional merits afford new opportunities for the isolation, release, and detection of CTCs.

CTC Isolation and Enrichment Despite of their great clinical value, CTCs exist at an extremely low concentration in the complex whole-blood matrix, posing great challenges for CTC detection and analysis. There are only several to hundreds of CTCs in one milliliter of blood with billions of blood cells and milligrams of proteins and small molecules. The complex composition of whole-blood matrixes would obscure CTC signals and affect the performance of detection. Thus, efficient and selective isolation of CTCs is necessary for accurate and sensitive detection of CTCs from whole blood. Affinity

46

Aptamer Molecular Evolution for Liquid Biopsy

1471

isolation methods utilize specific binding between recognition ligands on capture mediums and biomarker molecules on the cell membrane of CTCs, exhibiting the merits of high isolation efficiency and selectivity. The performance of affinity isolation methods depends on interfacial binding reaction, which mainly includes the mass transfer of CTCs to capture the surface and the interaction between binding ligands and cellular biomarkers. Over the past two decades, micro/nanofabrication technologies have boosted the development of CTC isolation methods by enhancing mass transfer. Meanwhile, aptamers with the merits of small-size, site-specific modification with functional groups and versatile structural design and assembly serve as promising recognition ligands to engineer high-affinity multivalent interfaces. In this section, aptamer-based CTC isolation platforms are introduced, including aptamer-based magnetic isolation, aptamer-based microfluidic isolation, and multivalent aptamer interfaces.

Aptamer-Based Magnetic Isolation Affinity magnetic separation methods use recognition-ligand modified MBs to capture targets, and the generated MB-target complexes can be isolated under the magnetic field. It has the advantages of easy manipulation and convenient coupling with detection methods, thus playing an important role in liquid biopsy. One typical example is the CellSearch system, which is the first and the only one approved by the Food and Drug Administration (FDA) for the detection of CTCs in metastatic breast cancer, colorectal cancer, or prostate cancer. The CellSearch system is based on antibody-modified magnetic nanoparticles (MNPs) with slow magnetic responses. Thus, it requires two-step amplification reactions to bind sufficient MNPs on the CTC surface for improved magnetic responses to guarantee high capture efficiency, which makes the process complicated and time-consuming. Compared with antibodies, aptamers have the advantages of convenient chemical synthesis, programmable structure design, and precise assembly. Zhi et al. developed DNA-templated magnetic nanoparticle-quantum dot (QD)-aptamer copolymers (MQAPs) to amplify magnetic responses for rapid capture and sensitive detection of CTCs (Fig. 7a) (Li et al. 2018b). These MQAPs were fabricated by hybridization chain reaction (HCR) with polymeric DNA templates, in which two building blocks (H1 and H2) were utilized to conjugate QD and aptamers. Meanwhile, an additional linker (LH1) was hybridized to a certain percentage of H1 to conjugate MNPs. MQAPs were co-assembled with multiple DNA aptamers, DNA-QDs, and DNA-MNPs to achieve multivalent binding, rapid magnetic responses, and simultaneous phenotypic profiling of CTCs. However, the endocytosis of MNPs during magnetic isolation can induce nonspecific adsorption of blood cells and increase intracellular oxidative stress. To overcome this challenge, Zhang’s group designed a double-sided-tape (DST)-like DNA structure to connect CTCs with MNPs, which prevented direct interaction between CTC and MNPs (Fig. 7b) (Chen et al. 2019b). The long single-stranded DST-like DNA sequence contained multiple copies of aptamers and polymeric 15 T spacer, which enabled multivalent binding of CTCs and the introduction of MNPs, respectively. This DNA structure minimized MNP endocytosis and allowed CTC detection in 20 patients with breast cancer.

1472

L. Wu et al.

Fig. 7 Aptamer-based magnetic isolation. (a) DNA-templated magnetic nanoparticle (MNP)quantum dot (QD)-aptamer copolymers for rapid capture and simultaneous detection of CTCs. (Adapted with permission from (Li et al. 2018b). Copyright 2018 Wiley-VCH). (b) Double-sided tape (DST) DNA device with multimeric aptamers as well as the capability of anchoring multiple magnetic nanoparticles for CTC isolation. (Adapted with permission from (Chen et al. 2019b). Copyright 2019 Royal Society of Chemistry). (c) Magnetic microparticles (MMPs) functionalized with multimerized aptamer DNA strands for tumor cell isolation. (Adapted with permission from (Chen et al. 2019a). Copyright 2019 American Chemical Society). (d) Tetrahedral DNA framework-programmed CTC capture. (Adapted with permission from (Li et al. 2019). Copyright 2019 American Chemical Society)

Although magnetic microbeads with rapid magnetic response can reduce loss rates of beads and CTCs, they suffer from lower surface volume ratios and larger steric hindrances. To solve these problems, Zhang’s group developed a “NanoOctopus” device with long multimerized aptamer sequences on magnetic microbeads (Fig. 7c) (Chen et al. 2019a). Like tentacles of octopus containing many suckers for prey, long DNA sequences with >500 bp can bind to target proteins on CTC surface in a synergetic multivalent manner with reduced spatial steric hindrance, achieving effective separation of tumor cells (88  6%). To topologically control the spatial distribution of multivalent ligands, Zuo’s group developed a tetrahedral DNA framework (TDF) programmed for CTC capture (Fig. 7d) (Li et al. 2019). The TDF was anchored with biotinylated aptamers to recognize EpCAM proteins and then magnetically captured by SA-labeled magnetic microbeads. The synergistic effects of efficient receptor-ligand interaction and rapid magnetic response guaranteed high capture efficiency of 97% toward tumor cells and successful detection of CTCs from clinical cancer patient samples. Current magnetic isolation is performed in homogeneous solutions in a stationary device (e.g., a tube or plate). While, homogeneous magnetic separation suffers from limited mass transfer and high nonspecific adsorption of blood cells, resulting in

46

Aptamer Molecular Evolution for Liquid Biopsy

1473

unsatisfactory capture efficiency and low purity of CTCs. It is desirable to fabricate high-performance MBs and develop novel isolation platforms.

Aptamer-Based Microfluidic Isolation With the advance of micro-nanomachining technology, significant progress has been achieved in microfluidic technology and its biomedical applications. Microfluidics with the merits of facile engineering of microstructures and precise manipulation of fluids opens a new avenue to increase the mass transfer of CTCs toward capture substrates and reduce nonspecific adsorption of blood cells for efficient and selective isolation of CTCs. Moreover, taking aptamers as recognition ligands, high-affinity interfaces with multivalent binding effects could be engineered. Thus, aptamerbased microfluidic isolation platforms can achieve simultaneous efficient mass transfer and high-affinity binding reaction, which are two key factors determining capture performance of CTCs. Various microfluidic chips have been designed to improve the performance of CTC isolation. They can be mainly divided into three types: (1) microchannel chips, (2) microarray chips, and (3) herringbone chips (HB-Chips). Microchannel chips with one dimension (height or width) comparable to the diameters of CTCs allow size-selective enhancement of the contact between CTCs and aptamer-modified substrates when samples pass through channels. For example, in 2009, Tan’s group reported the first aptamer-modified microfluidic device for tumor cell isolation (Phillips et al. 2009). The functionalized sgc8 aptamers in microfluidic channels with a height of 25 μm yielded a capture efficiency of >80% and purity of >97%. Nevertheless, microchannel-based isolation platforms have relatively small channels due to one dimension of cell-similar size, restricting the throughput of sample processing. When enlarging the cross-sectional area to improve throughput, laminar flows across the channel render cells minimal molecular diffusion, in which cells follow streamlines and have fewer chances to contact and interact with the affinity interface. Microarray chips with micropillars arrayed in a certain geometric arrangement can break up streamlines and enhance cell-surface interactions. Meanwhile, micropillar arrays can increase capture surfaces to anchor more aptamers. Fan’s group developed a micropillar-based microfluidic device with >59,000 micropillars functionalized by sgc8 aptamers for CTC isolation (Fig. 8a) (Sheng et al. 2012). The unique geometric design of micropillar array was based on the principle of deterministic lateral displacement (DLD), which enabled flow streamlines to be distorted for enhanced cell-micropillar interactions and thereby improving capture efficiency. This microdevice achieved the capture efficiency of 95%, purity of 81%, and cell viability of 93% under the optimum flow rate of 2 mL h1. However, CTC isolation by micropillar array chips still depends on laminar flow, which leads to limited interaction between cells and interfaces. HB-Chips with transverse flows can induce microvortices to disrupt the laminar flow and enhance mass transfer, which are fabricated by engineering ridge substrates or herringbone microstructures on the upper surface of channels. To further enhance CTC-substrate interaction, three-dimensional (3D) nanostructured substrates which

1474

L. Wu et al.

Fig. 8 Aptamer-based microfluidics for CTC isolation. (a) Aptamer-functionalized micropillar microfluidic device for efficient CTC isolation. (Adapted with permission from (Sheng et al. 2012). Copyright 2012 American Chemical Society). (b) NanoVelcro chip comprised of aptamer-modified silicon nanowire substrate and herringbone chip. Adapted with permission from (Shen et al. 2013). Copyright 2013 Wiley-VCH). (c) Integrated chip for on-chip magnetic incubation, capture, and release of tumor cells. (Adapted with permission from (Zhang et al. 2013). Copyright 2012 American Chemical Society). (d) A liquid biopsy-guided drug release system. (Adapted with permission from (Xu et al. 2020). Copyright 2020 Royal Society of Chemistry). (e) Magnetically controlled isolation microfluidic chip for two-dimensional isolation of CTC subpopulations. (Adapted with permission from (Labib et al. 2016). Copyright 2016 American Chemical Society)

allow enhanced local topographic interactions can be incorporated into HB-Chips. Meanwhile, various aptamer probes or multivalent aptamer complexes have been incorporated into HB-Chips to construct versatile isolation and detection platforms. Tseng’s group developed “NanoVelcro” substrates of recognition ligand-modified silicon nanowire substrates, which were integrated with HB-based chaotic mixer to form NanoVelcro Chip. By taking the aptamers against NSCLC cells by cell-SELEX as recognition ligands, the NanoVelcro Chip achieved efficient capture, release, and gene mutation detection of NSCLC CTCs (Fig. 8b) (Shen et al. 2013). The synergetic effects of herringbone micromixing, topographical interaction, and aptamer binding afforded capture efficiency of ~80% at a flow rate of ~0.9 mL h1. Despite efficient mass transfer and a convenient combination of interface engineering, HB-Chips fail to selectively enhance CTC-interface interaction, suffering from the trade-off between high efficiency and high purity.

Microfluidics-Assisted Magnetic Isolation Microfluidics-assisted magnetic isolation methods combine the advantages of magnetic separation and microfluidics for high-performance isolation of CTCs. It

46

Aptamer Molecular Evolution for Liquid Biopsy

1475

manipulates both magnetic field and fluid field to enhance the interaction between CTCs and aptamer-functionalized MBs (Apt-MBs) and control trapping force for efficient and selective isolation and convenient release of CTCs. As mentioned above, samples containing CTCs were incubated with Apt-MBs in a tube or plate in a diffusion-limited environment. Thus, long incubation time and large numbers of Apt-MBs are required to guarantee efficient capture of Apt-MBs on CTC surfaces, which results in time-consuming operation and nonspecific adsorption of blood cells. To overcome these shortcomings, Zhu’s group developed an integrated chip for magnetic incubation, separation, and release of tumor cells (Fig. 8c) (Zhang et al. 2013). The periodic serpentine channel allowed efficient mixing and contact of cells and Apt-MBs, and magnetically labeled cells could be directly captured in the sorting chamber with an external magnet. After removing the magnet, the magnetic trapping force disappeared, and cells were released for imaging analysis. With this device, tumor cells could be rapidly captured within 10 min with an efficiency of >98%. In addition to providing better mixing conditions for magnetic labeling, microfluidics can harness Apt-MBs to generate an affinity interface. When the chip substrate is assembled with a nickel pattern of large magnetic permeability, a highgradient magnetic field was generated after magnetizing to attract Apt-MBs with different levels of magnetism. This type of magnetically controlled self-assembled MB array inherits the merits of microarray chips, including enhanced cell-surface interaction and reducing nonspecific adsorption of blood cells. More importantly, MB arrays as affinity interfaces possess unique advantages: (1) simplified interface biofunctionalization due to the convenience of MB modification and magnetic manipulation and (2) efficient and gentle release of CTCs owing to reversible assembly of MB arrays by simply removing the external magnet. For example, Zhang’s group proposed a smart liquid biopsy-guided drug release microfluidic system, which consisted of two areas of MB arrays for integrated CTC capture and drug release, respectively (Fig. 8d) (Xu et al. 2020). The MB array of Area I was comprised of MBs modified with aptamers which were hybridized with cDNA (MNs-apt-cDNA). MNs-apt-cDNA recognized and bound to CTCs via aptamers to release cDNA, which triggered the release of anticancer drugs from the drug-loaded MBs in Area II. Thus, this system afforded rational release of drugs according to the number of captured CTCs, possessing great potential for accurate diagnosis and personalized therapy of tumors. Current separation techniques leverage an external magnetic field to capture all magnetically tagged cells. These techniques fail to distinguish different cell subpopulations labeled with different levels of magnetic tags and induce strong nonspecific capture of blood cells that are nonspecifically labeled with a small number of magnetic tags. To overcome this dilemma, Kelly’s group developed a magnetically controlled isolation microfluidic chip to balance magnetic force of magnetically tagged CTC subpopulation and drag force in discrete spatial bins for spatial sorting (Fig. 8e) (Labib et al. 2016). CTC subpopulations with different expression levels of marker proteins could be labeled with different numbers of aptamer-functionalized MNPs (Apt-MNPs) with different magnetic susceptibility. The chip had four zones

1476

L. Wu et al.

whose volumes decreased from the inlet to the outlet and allowed decreased linear velocity of the flowing solution to trap magnetically tagged CTC subpopulations in different zones. Moreover, Apt-MNPs enabled CTC to release with antisense sequences, which allowed two-dimensional (2D) sorting of CTC subpopulations according to the expression level of two types of markers.

Multivalent Aptamer Capture Interface CTC capture is a process of CTC contacting, interacting with, and attaching onto a surface. Beyond efficient mixing and collision as mentioned above, interface affinity is another key factor determining the capture efficiency of CTCs, which affects bond formation and stability. Considering that some aptamers have an unsatisfactory affinity of micromolar levels and aptamers commonly suffer from compromised binding affinity and stability in complex body liquids, many efforts have been devoted to improving the affinity of aptamer-functionalized interface. The multivalent binding strategy characterized by simultaneous binding of multiple ligands on one entity to one or multiple receptors on another entity has been widely adopted. Multivalent interaction has been proved to increase binding affinity by 1 to 9 orders of magnitude. Aptamers with prominent merits of small size, convenient modification, and programmable structure design allow diverse assembly strategies to generate multivalent assemblies as recognition elements. Aptamers can undergo nucleic acid amplification and hybridization reaction to form multivalent assemblies. Zhao et al. harnessed a rolling circle amplification (RCA) reaction to synthesize a 3D DNA network on the microfluidic surface (Fig. 9a) (Zhao et al. 2012). By encoding aptamer sequences in the circular template, the amplified DNA products with repetitive aptamer domains were hundreds of nanometers to hundreds of microns in length as 3D DNA networks. This multivalent interface improved capture efficiency by three- to fivefold compared with monovalent aptamers and antibodies. In addition to direct amplification of aptamer-encoded template, hybridization of aptamers to periodic sequence units of the RCA products provides an alternative strategy to engineer multivalent aptamer assemblies. This strategy offers flexible assembly of different aptamers, and it also enables toeholdmediated strand displacement of aptamers for CTC release. Additionally, the hybridization reaction provides 2D long-chain multivalent structures and 3D multivalent nanostructures as dendrimers, frameworks, and hydrogels (Wu et al. 2021). Despite the merits of precise and programmable assembly, this type of multivalent assembly strategy requires elaborate design of template and complementary sequences and sometimes suffers from time-consuming reaction and purification. Aptamers can be site-specifically modified with functional moieties (e.g., thiol, amino, carboxyl, cholesterol, etc.), enabling high-density assembly on nanostructures or nanomaterials with a high ratio of surface area to volume for multivalent assembly. Such multivalent assemblies can provide both multivalent interaction and local topographic interaction. For example, Fang’s group modified biotinylated aptamer cocktail on the SA-functionalized silicon nanowire substrate in HB-Chip for enhanced and differential capture of CTCs from lung cancer patients (Fig. 9b) (Zhao et al. 2016). The synergistic effect of aptamer cocktails improved the capture

46

Aptamer Molecular Evolution for Liquid Biopsy

1477

Fig. 9 Multivalent aptamer interfaces. (a) Bioinspired multivalent DNA network for tumor cell capture. (Adapted with permission from (Zhao et al. 2012). Copyright 2012 National Academy of Science). (b) Aptamer cocktail-functionalized silicon nanowire substrate for enhanced and differential capture of CTCs. (Adapted with permission from (Zhao et al. 2016). Copyright 2016 WileyVCH). (c) Multivalent DNA nanospheres to enhance the capture of CTCs in microfluidic devices. (Adapted with permission from (Sheng et al. 2013). Copyright 2013 American Chemical Society). (d) Fluid multivalent membrane nanointerface for efficient, high-selective, and gentle capture of CTCs. (Adapted with permission from (Wu et al. 2020a). Copyright 2020 American Chemical Society)

performance of chips toward CTCs, providing comprehensive and accurate information for patients’ prognosis and treatment response. Aside from direct modification of aptamers on etched nanosubstrates, an alternative strategy is to assemble aptamer-functionalized nanomaterials on capture substrates. Gold nanoparticles (AuNPs) are widely utilized as scaffolds to anchor aptamers, generating multivalent spherical nucleic acids. With biotin and thiol-biofunctionalized aptamers, Sheng et al. fabricated aptamer-modified AuNPs (Apt-AuNPs) via gold-thiol interaction, which were further assembled on the SA-modified substrate of HB-Chip via SA-biotin interaction (Fig. 9c) (Sheng et al. 2013). The affinity of Apt-AuNPs was 39-fold higher than that of monovalent aptamers, achieving capture efficiency of 95% and purity of 70% toward target tumor cells. To improve isolation selectivity, Song et al. further engineered Apt-AuNPs on substrates of a DLD-patterned microarray microfluidic chip (DLD-Chip) (Song et al. 2019). The geometrically patterned microfluidic chip enabled size-selective enhancement of CTC-micropillar collision. Thus, the synergetic effects of size-selective mass transfer and high-affinity Apt-AuNPs interfaces achieved a threefold enhancement of capture efficiency compared with the monovalent aptamer-modified chip. But the abovementioned affinity nanointerfaces depend on aptamers anchored on the substrate in a relatively static arrangement without the possibility of substantial ligand mobility or clustering.

1478

L. Wu et al.

Inspired by the rearrangement and clustering of ligands and receptors on fluid cell membrane interfaces, Wu et al. engineered a fluid aptamer nanointerface on DLD-Chip for efficient and selective capture of CTCs with high viability (Wu et al. 2020a). They adopted a top-down approach by taking natural leukocyte membranes to assemble nanovesicles, and cholesterol-terminated aptamers were modified on them via hydrophobic interaction to fabricate aptamer-functionalized nanovesicles. The fluid nanovesicles enabled recruitment and clustering of aptamers and corresponding marker proteins at binding sites, and the resulted synergetic multivalent enhanced affinity by four orders of magnitude. Thus, the fluid aptamer interface improved capture efficiency by sevenfold compared with a monovalent aptamer functionalized-chip in blood. Meanwhile, the soft and anti-adsorption cell membrane could reduce CTC damage and contamination of blood cells. With this chip, CTCs were successfully detected from 17 cancer patients.

Release of CTCs CTCs carry abundant biological information about tumors such as genome, transcriptome, and proteome, which can advance our understanding of the mechanisms of tumorigenesis, development and metastasis, as well as guide therapy. Therefore, it is essential to release the captured tumor cells from the captured substrates to perform downstream analysis. Compared with other recognition ligands, aptamers possess facile regulation of affinity and are considered as idea recognition ligands for CTC isolation and release. During the last few years, several types of aptamer-based CTC release techniques were developed, which will be briefly summarized in this section.

Release by Disrupting the Conformations of Aptamers The binding capability of aptamers toward their corresponding targets originated from their folded 3D conformations. Therefore, disrupting their conformations could alter the affinity to release the captured CTCs on substrates. The most straightforward way to disrupt aptamer conformation is heating, which could change intramolecular interaction of aptamers. As the example shown by Zhu et al. (2012), after capturing CEM cells with sgc8 aptamers, ~80% of CEM cells could be released from the chip when heated up to 48  C and rinsed with buffer at 5 μL min1 for 2 min. The released cells had nearly the same growth ability as untreated CEM cells. With the reversibility of this heating strategy, the chip could also retain the ability to capture CEM cells after cooling to room temperature, facilitating multiple rounds of capture and release with a single chip. However, high temperature would induce biomolecule degradation, cell death, and cell function loss to some extent, hampering downstream analysis of CTCs. Alternatively, it is much more moderate to disrupt aptamer conformation through hybridizing complement strands to aptamers. For example, Liu et al. developed a cell-imprinted hydrogel modified with aptamers for CTC capture and release (Fig. 10a) (Liu et al. 2019b). By incubating with complementary strands of aptamers, 94% of captured cells could be released from the hydrogel for culture, which was higher than those of reported antibody-based release techniques.

46

Aptamer Molecular Evolution for Liquid Biopsy

1479

Fig. 10 CTC release strategies based on aptamer-functionalized capture interfaces. (a) Disrupting aptamer conformation via the hybridization of complement strands for CTC release. (Adapted with permission from (Liu et al. 2019b). Copyright 2019 American Chemical Society). (b) CTC release by digesting aptamers with nuclease. (Adapted with permission from (Zhang et al. 2020). Copyright 2020 Wiley-VCH). (c) Schematic illustration of CTC release by detaching aptamers from capture substrate. (Adapted with permission from (Song et al. 2019). Copyright 2019 Wiley-VCH)

Release by Digesting Aptamers with Nucleases Aptamers are sensitive to nucleases. Similar to CTC release from antibody-modified substrates by digesting antibodies with proteases such as trypsin, CTCs could be released from aptamer-modified substrates by digesting aptamers with DNA structures using nuclease. Different from protease which digests both antibodies and cell proteins, and reduces cell viability, nuclease only affects aptamers, which is of great benefit for downstream analysis. For example, Zhang et al. modified tetrahedral DNA nanostructures with a pendant aptamer at the top vertex on DLD-Chips to efficiently capture and release CTCs (Fig. 10b) (Zhang et al. 2020). With this highly ordered capture interface of tetrahedral DNA, DNase I could easily approach and digest DNA nanostructures for efficient CTC release. This strategy enabled the release of ~80% of captured CTCs with the viability of 90.8% and downstream KRAS mutation detection. Yu et al. proposed a two-stage platform for the capture and release of CTCs with high purity (Yu et al. 2015). Tumor cells were first captured by Apt-MB-assembled microfluidic chips. By removing the magnet, captured tumor cells could be washed off from the chip. With exonuclease treatment, captured tumor cells would be detached from MBs, while nonspecifically adsorbed background cells would remain on Apt-MBs, improving the CTC purity from 24.3% to 65.7%. However, the nuclease digestion-based release strategy is not applicable for chemically stabilized aptamers. Release by Detaching Aptamers from Capture Substrates Aside from destroying the affinity of aptamers, some release methods aimed at disrupting aptamer-interface interactions to detach aptamers and captured CTCs from capture substrates. As introduced above, functional groups could be accurately

1480

L. Wu et al.

introduced to the specific position of aptamers to engineer aptamer-functionalized substrates. According to different functional groups and interface modification methods, some physicochemical stimuli have been utilized to release CTCs. For example, thiol-aptamer-conjugated AuNPs were utilized to form multivalent capture interfaces on DLD-Chips for CTC isolation and release (Fig. 10c) (Song et al. 2019). The introduction of glutathione could disrupt the Au-S bond between aptamers and AuNPs, enabling efficient release (80%), high viability (95.8%) of released cells, and downstream KRAS mutation analysis by Sanger sequencing.

Detection of CTCs Numerous evidence suggest that the number of CTCs can be utilized for tumor diagnosis, prognosis, and therapy assessment. CTC enumeration usually requires cumbersome immunocytochemistry staining and professional personnel to identify CTCs from all captured cells. In contrast, due to structural merits, aptamer probes provide diverse signal transduction and amplification strategies for highly sensitive detection of analytes, which will be introduced in detail in the section “Aptamerbased EV detection.” These strategies open a new avenue for CTC quantification (Wu et al. 2020b). For example, Abate et al. developed a CTC visual detection chip with single-cell sensitivity based on aptamer-conjugated Pt nanoparticles (Fig. 11a) (Abate et al. 2019). After the incubation with aptamer-conjugated Pt nanoparticles, samples were loaded in a volumetric bar chart chip, where O2 was generated by H2O2 under the catalysis of Pt nanoparticles. The number of tumor cells could be visualized by the concentration-dependent distance of ink movement in the channel of the chip pushed by the generated O2. With this strategy, even one tumor cell could be detected with 20 min incubation in the chip, which showed great sensitivity of tumor cell detection. Qu et al. proposed an electrochemical (EC) tumor cell detection with a dual-aptamer-modified electrode (Fig. 11b) (Qu et al. 2014). By introducing two aptamers simultaneously on glassy carbon electrodes with single-stranded DNA (ssDNA) linkers, the crowding effect could be minimized to improve the sensitivity. With more cells captured on electrodes, more steric hindrance on the electron transfer between [Fe(CN)6]3/4 redox pair would be experienced, which could be measured by differential pulse voltammetry. With this method, highly specific and sensitive (1 tumor cell in 109 white blood cells) detection of tumor cells could be achieved. Apart from direct enumeration, CTCs with inherent heterogeneity call for subpopulation analysis to reveal their relevance to tumor metastasis and drug resistance, which could be achieved by diverse aptamers against different CTC markers. Kelly’s group developed a magnetically controlled microfluidic chip coupled with Apt-MBs to sort 16 CTC subpopulations (Labib et al. 2016) and profile functional and biochemical phenotypes of CTCs (Poudineh et al. 2017). In order to analyze the phenotype of CTCs at the single-cell level, Zhang et al. proposed on-chip multiple spectrally orthogonal surface-enhanced Raman spectroscopy (SERS) analysis (Zhang et al. 2018). With microfluidic filters, CTCs could be individually trapped

46

Aptamer Molecular Evolution for Liquid Biopsy

1481

Fig. 11 Aptamer-based CTC detection. (a) Aptamer-conjugated Pt nanoparticles to catalyze gas generation reaction for CTC detection. (Adapted with permission from ref. (Abate et al. 2019). Copyright 2019 Wiley-VCH). (b) Dual-aptamer-modified electrode for CTC detection. (Adapted with permission from (Qu et al. 2014). Copyright 2014 American Chemical Society)

on gap array structures based on size discrepancy. Further, three types of SERS aptamer nanovectors labeled three types of important biomarkers at the same time, and the resultant composite spectral signatures of single cells allowed the profiling of proteomic phenotypes of single CTCs.

Aptamer-Based Detection of EVs EVs are nano/microscale lipid bilayer vesicles secreted by most cell types and circulated in body fluids. Based on biogenesis mechanisms, EVs are generally classified into three subtypes: exosomes, microvesicles, and apoptotic bodies. EVs carry diverse molecular cargoes including proteins, nucleic acids, lipids, and other bioactive molecules from maternal cells and transfer them into recipient cells (Kalluri and LeBleu 2020). They participate in cell-to-cell communication and

1482

L. Wu et al.

play a significant role in physiological and pathological processes. Thereinto, tumorderived EVs reflect the biological status of maternal tumor cells and participate in many important biological processes of cancers, such as tumorigenesis, metastasis, and immune response. Thus, they are emerging as promising liquid biopsy targets for clinical management and the study of cancers. Compared with other liquid biopsy targets, EVs possess the merits of high stability and relatively higher abundance in various body fluids. Nevertheless, there are many technical challenges to efficient isolation and sensitive detection of EVs due to their nano/microscale size. Traditional EV isolation methods rely on the differences of size and density among EVs, other vesicles, and cell debris, such as density-gradient centrifugation, size exclusion chromatography, and ultrafiltration. These methods suffer from the complicated and time-consuming operation, low isolation efficiency, and contamination from free proteins or aggregates. Aptamers featuring convenient affinity regulation and signal transduction and amplification have offered various biosensors (termed aptasensor) for efficient isolation and sensitive detection of EVs. This section will summarize aptamer-based EV detection strategies: (1) homogeneous detection by direct transformation of the molecular recognition events of aptamer-EVs binding into signals and (2) detection on solidliquid interface via affinity isolation coupled with signal transduction and amplification.

Isolation-Free Homogeneous Detection The complex matrices of body fluids seriously interfere with EV detection. For example, serum or plasma samples contain a large amount of high/low-density lipoproteins, serum albumin, etc. To achieve sensitive detection of EVs in a homogeneous system, recognition ligands are expected to possess high affinity, high specificity, and fast binding kinetics. Compared with antibodies, aptamers possess the merits of smaller size, more fast binding kinetics, and less steric hindrance during binding interaction, facilitating highly sensitive detection of EVs in a separation-free homogeneous manner (Wu et al. 2021). Thermophoresis is the directed motion of molecules and micro/nanoparticles induced by temperature gradients. An infrared laser is focused onto a thin capillary containing a sample to generate a heated spot and temperature gradient of micronscale diameter. Thus, molecules and small particles would move out of the heated spot and can be quantified by the Soret coefficient which is sensitive to their size, charge, and solvation energy. Due to small size and electronegative charges, fluorescent aptamers are considered as idea tags for thermophoresis-based biosensors. Thermophoretic aptasensors (TAS) have been developed for sensitive profiling of EV proteins, which combine the merits of efficient protein labeling of aptamers and contact-free and isolation-free detection of thermophoresis. Huang et al. developed a TAS for a homogeneous, low-volume, efficient, and sensitive exosomal PD-L1 (HOLMES-ExoPD-L1) quantitative detection (Fig. 12a) (Huang et al. 2020). PD-L1 aptamer MJ5C was selected, which had a smaller size than the PD-L1 antibody and

46

Aptamer Molecular Evolution for Liquid Biopsy

1483

Fig. 12 Aptamer-based homogeneous detection platforms of EVs. (a) TAS for the detection of exosomal PD-L1. (Adapted with permission from (Huang et al. 2020). Copyright 2020 WileyVCH). (b) Microchamber-based thermophoretic accumulation of aptamer-bound EVs to profile surface proteins of EVs by a panel of aptamers. (Adapted with permission from (Liu et al. 2019a). Copyright 2019 Springer Nature Limited). (c) Fluorescence polarization aptasensor enables separation-free quantification of exosomes. (Adapted with permission from (Zhang et al. 2019a). Copyright 2019 Royal Society of Chemistry). (d) Dual-aptamer-activated proximity ligation assay coupled with ddPCR for sensitive quantification of tumor-derived exosomal PD-L1. (Adapted with permission from (Lin et al. 2021). Copyright 2020 Wiley-VCH)

could efficiently bind to exosomal PD-L1 with less hindrance by PD-L1 glycosylation. This aptasensor achieved sensitive detection of exosomal PD-L1 with a limit of detection (LOD) of 17.6 pg/mL and revealed a highly positive correlation of exosomal PD-L1 concentration with degrees of malignancy in adenocarcinoma patients. With a panel of fluorescent aptamers, Sun’s group developed an EV protein profiling method based on thermophoresis (Liu et al. 2019a). They set up a microchamber for thermophoretic accumulation of aptamer-bound EVs toward the center of the chamber bottom, generating an amplified fluorescence signal whose intensity was correlated with the expression level of EV proteins (Fig. 12b). Multiplex protein detection enabled by this TAS together with linear discriminant analysis achieved early detection of 95% sensitivity and 100% specificity and classification of cancers with an overall accuracy of 68% (Liu et al. 2019a). Aside from thermophoresis, fluorescence polarization assay also provides separation-free and amplification-free homogeneous detection of EVs (Fig. 12c) (Zhang et al. 2019a). When dye-labeled aptamers are bound to exosomes, the inherent huge mass/volume acted as a fluorescence polarization amplifier of dye-labeled aptamers. The change of fluorescence polarization value could be applied for exosome detection. This method yielded a detection range of 5  102 to 5  105 particles per μL with a LOD of 500 particles μL1 without any sample pretreatment and isolation. T-EVs exist in body fluids with the background of more abundant non-tumorderived EVs, and they have similar sizes and some overlapped protein marker

1484

L. Wu et al.

expression. It, thus, remains a technical challenge to achieve accurate molecular profiling of T-EVs and explore their biological functions. DNA computation technology has opened a new avenue for the analysis of molecular signatures of targets with high sensitivity and specificity. Aptamer probes with the rational design of sequences offer DNA computation aptasensors for molecular identification of T-EVs. Li et al. reported a thermophoresis-mediated DNA computation method to detect T-EVs for diagnosis and molecular phenotyping of breast cancers (Li et al. 2021). Taking EpCAM and human epidermal growth factor receptor 2 (HER2) as inputs for logic gate activation and aptamer-containing sequences for AND gate operation, this platform yielded an identification accuracy of 97% to distinguish breast cancer patients from healthy donors (n ¼ 30). Droplet digital PCR (ddPCR) can also be coupled with aptamer-based DNA computation for sensitive detection of T-EV proteins. Lin et al. developed a dual-target-specific aptamer recognition activated in situ connection system on exosome membrane combined with ddPCR (TRACER) for quantitation of tumor-derived exosomal PD-L1 (Fig. 12c) (Lin et al. 2021). Two types of extended aptamers against EpCAM and PD-L1 simultaneously labeled both two important types of markers, and the extended ends of aptamerinitiated proximity ligation assays with connector probes and ligase. The ligation products could be absolutely quantified by ddPCR. Due to the excellent selectivity of dual-aptamer recognition and the high sensitivity of ddPCR, this TRACER system identified tumor-derived exosomal PD-L1 as a more reliable tumor diagnostic marker than total exosomal PD-L1 for the first time. Aptamer-based homogeneous detection methods are performed in solutions with low consumption of samples and fast binding kinetics. Moreover, they are isolationfree, washing-free, and contact-free, greatly minimizing operation and contamination. Nevertheless, they depend on sensitive signal readout instruments and interference-compatible analysis methods.

Detection on Solid-Liquid Interface Large numbers of aptasensors harness affinity solid substrates to recognize or capture EVs, where the recognition reaction between aptamer probes and EV biomarkers induces signal transduction and amplification for EV quantification and analysis. Solid substrates can act as isolation and signal media for efficient enrichment of EVs from interferents to achieve sensitive detection. Aptamers with the merits of facile structural design and modification can be synthesized with extended functional sequences (e.g., primer and trigger sequence) and molecular labels (e.g., dye, biotin) for direct or indirect signal amplification strategies. For direct amplification assays, the aptamer-EV binding could alter the conformation of DNA sequences to expose functional sequences to initiate HCR and RCA for signal amplification. For indirect amplification assays, aptamer-EV binding causes the release of aptamers or triggering sequences from their carriers, which further induces a signal amplification reaction. On the other hand, aptamers can be modified on functional nanostructures and nanomaterials and act just as recognition ligands to

46

Aptamer Molecular Evolution for Liquid Biopsy

1485

introduce these functional moieties for signal transduction and amplification. Especially, with the ongoing development of nanotechnology, nanostructures and nanomaterials with superior physicochemical properties offer various signal readout manners and detection methods, including EC, fluorescence, and colorimetric detection.

Electrochemical Detection EC aptasensors with the advantages of high sensitivity, low cost, fast response, and low requirement of sample volume have attracted increasing attention in EV detection. The solid electrode interfaces produce quantitative EC signals with the stimulation of EV-aptamer binding. These EC aptasensors are mainly classified into three types. The first type is aptamer-modified electrodes, on which the captured EVs block the electron transport or cause the detachment of electroactive molecules from the electrode surface. The resultant decreased EC signals offer direct quantitative detection of EVs. For example, Zhou et al. developed an EC aptasensor by CD63 aptamer-functionalized Au electrode arrays in microchannels, on which redox moiety-terminated probe sequences were hybridized with aptamers (Fig.13a) (Zhou et al. 2016a). In the presence of target exosomes, the binding between aptamers and CD63 proteins on EVs induced the release of redox reporters, resulting in decreased EC signals with an LOD of 1  106 particles mL1. To avoid signal fluctuation caused by inevitable intrinsic or extrinsic factors (e.g., instrument errors, background signals, etc.), Gui’s group developed a dual-signal self-calibration EC aptasensor for sensitive and specific detection of tumor-derived exosomes (Fig. 13b) (Sun et al. 2020). The aptasensor was constructed by assembling black phosphorus nanosheets (BPNSs) and ferrocene (Fc)-doped metal-organic frameworks (ZIF-67) on indium tin oxide (ITO) slice and adsorbing methylene blue-labeled aptamers. The aptasensor exhibited dual redox-signal responses from methylene blue and Fc as response signals and reference signals, respectively. The binding of exosomes released aptamers and thus reduced the redox current of methylene blue regularly but not Fc, affording an LOD of ~100 particles mL1 via self-calibration. In addition, to avoid aggregation and entanglement of aptamers immobilized on the electrode surface, Wang et al. developed a nanotetrahedron auxiliary sensor with directional immobilization of aptamers for direct detection of hepatocellular exosomes (Fig. 13c) (Wang et al. 2017a). This EC aptasensor exhibited 100-fold higher sensitivity compared with the single-chain DNA-functionalized aptasensor. Despite simple operation and direct detection, this type of EC aptasensors only relied on the binding of EVs on electrode surfaces, which suffered from false-positive signals caused by nonspecific adsorption of interferents and the incapability of signal amplification and multiplex analysis. The second type of EC aptasensors leverages sandwich assays on affinity electrodes to improve sensitivity and accuracy of EV detection. Commonly, EVs are captured as a model of capture probe-EV-signal probe structure, which allows the introduction of aptamer signal probes containing catalytic moiety (e.g., enzymes, nanozymes, electroactive tags, etc.) to trigger an electrocatalysis reaction with signal amplification for ultrasensitive detection of EVs. For example, Huang et al.

1486

L. Wu et al.

Fig. 13 Aptamer-functionalized electrode for direct EC detection of EVs. (a) Au electrode arrays functionalized with CD63 aptamers which were hybridized with redox moiety-terminated probe sequences for exosome detection. (Adapted with permission from (Zhou et al. 2016a). Copyright 2016 Elsevier). (b) A dual-signal and intrinsic self-calibration aptasensor for exosome detection. (Adapted with permission from (Sun et al. 2020). Copyright 2020 American Chemical Society). (c) A nanotetrahedron (NTH)-assisted aptasensor for direct capture and detection of hepatocellular exosomes. (Adapted with permission from (Wang et al. 2017a). Copyright 2017 American Chemical Society)

developed an EC aptasensor with primer-containing MUC1 aptamer probe to detect gastric cancer exosomes (Fig. 14a) (Huang et al. 2019b). The primer sequence was complementary to a G-quadruplex circular template. The binding of aptamer probes on target exosomes could initiate RCA to produce multiple G-quadruplex units as HRP mimicking DNAzyme to generate EC signals, which achieved an LOD of 9.54  102 particles mL1. Besides signal amplification, the sandwich format enabled multiplex analysis by adopting aptamer probe-modified distinct redox active reporters to label different targets. Kelly’s group developed an EC microfluidic aptasensor for simultaneous analysis of EpCAM and EGFR expression on EVs (Fig. 14b) (Zhou et al. 2016b). When EVs were captured on EpCAM aptamer-

46

Aptamer Molecular Evolution for Liquid Biopsy

1487

Fig. 14 EC aptasensors coupled with signal amplification strategies for highly sensitive detection of EVs. (a) Aptamer-triggered RCA to generate multiple G-quadruplexes for EC detection of gastric cancer exosomes. (Adapted with permission from (Huang et al. 2019b). Copyright 2019 WileyVCH). (b) Simultaneous analysis of two types of protein markers on EVs based on aptamermodified metal nanoparticles. (Adapted with permission from (Zhou et al. 2016b). Copyright 2016 Wiley-VCH). (c) Magnetically controlled electrode combined with aptamer-based signal amplification for EC detection. (Adapted with permission from (Xu et al. 2018). Copyright 2018 American Chemical Society). (d) Aptamer recognition induced multi-DNA release for cyclic enzymatic amplification detection of exosomes. (Adapted with permission from (Dong et al. 2018). Copyright 2018 American Chemical Society)

functionalized gold electrodes, their surface EpCAM and EGFR proteins were labeled by corresponding aptamer-modified metal nanoparticles with different oxidation peaks for multiplex detection. The abovementioned two types of EC aptasensors rely on affinity electrodes to capture EVs, which suffer from some drawbacks including complicated electrode modification, inefficient EV-aptamer binding from steric hindrance effect, and electrode pollution from the complex matrixes. To solve these challenges, another type of EC aptasensors incorporated affinity magnetic isolation into EC detection. On the one hand, with affinity magnetic isolation, the purified and enriched MB-EVs complexes can be captured on a magnetically controlled electrode for EC detection, avoiding electrode pollution and background interference. For example, Ye’s group proposed a magnetically controlled microfluidic EC device for on-chip isolation and in situ EC detection of tumor-derived exosomes (Fig. 14c) (Xu et al. 2018). The chip was composed of a Y-shaped micropillar array and a cascading ITO electrode. The former allowed enhanced collision between affinity MBs and exosomes, and the latter coupled with an external magnet rendered MB-exosomes to be magnetically enriched for EC signal transduction. The aptasensor termed LGCD was a hairpin structured ssDNA consisting of aptamer and mimicking DNAzyme sequences. After recognition and binding with CD63-positive exosomes, the ssDNA hairpin was

1488

L. Wu et al.

opened to form a G-quadruplex, which could be employed as DNAzyme with the aid of hemin. With catalytic amplification, this EC aptasensor achieved an LOD of 4.39  103 exosomes mL1. On the other hand, MBs can act as trigger molecule carriers to induce an amplification reaction for EC detection of EVs after EV-MB binding. Shen’s group proposed aptamer recognition-induced DNA release and cyclic enzymatic amplification strategy for exosome detection (Fig. 14d). The signal was amplified by three kinds of messenger DNA (mDNA) released upon exosome capture and recycled under the assistance of Exo III with a LOD of 70 particles μL1 (Dong et al. 2018).

Fluorescence Detection Aptamer-based fluorescence biosensors utilize fluorescent aptamer probes to label EVs or use the binding between aptamers and EVs to induce fluorescence quenching, recovery, and amplification. They share the merits of simple operation, high sensitivity, convenient combination with separation methods, and multiplex detection capability, which greatly boosted the development of EV detection. When EVs were captured on isolation carriers, such as MBs, microfluidic chips, and glass slides, fluorescence signals transformed from aptamer probes could be determined by fluorescence imaging, imaging spectroscopy, flow cytometry, etc. Moreover, some novel fluorophores have been explored to improve analysis performance. Fluorescence microscopic imaging can offer a visual characterization of EVs labeled by fluorescent aptamer probes for qualitative and quantitative analysis. Moreover, with total-internal-reflection-fluorescence (TIRF) microscopy, the visualization and quantification of EVs at the single-particle level can be achieved. For example, Li’s group developed a TIRF-based single-vesicle counting method by designing activatable aptamer probes (Fig. 15a) (He et al. 2019). The probes are recognized and bound to tumor exosomes captured on the immunoCoverslip to initiate HCR for fluorescence amplification. This assay featured a broad linear range (103–107 particles μL1) and had the potential for diagnosis and monitoring of tumor progression and early response to therapy. In contrast, fluorescence spectrometric analysis offers more rapid and convenient quantification methods by collecting spectra signals where the types and concentrations of targets were indicated by peak positions and intensities. Moreover, various amplification strategies can be incorporated for ultrasensitive detection. For example, Jin et al. adopted an enzyme-mediated signal amplification strategy for the profiling of surface proteins and detection of exosomes (Fig. 15b) (Jin et al. 2018). Exosome targeting singlestranded fluorescent aptamers were adsorbed on GO with quenched fluorescence as aptamer-confined nanoprobes, whose fluorescence could be recovered in the presence of target exosomes. Aptamers preferentially bound to specific exosomal proteins, which switched aptamers into a rigid conformation to detach from GO. Afterward, the aptamers on exosome surfaces without GO protection would be digested by deoxyribonuclease I, which exposed the protein markers for the next round of aptamer binding with signal amplification based on a 1: N molecular recognition mechanism. This method allowed the monitoring of epithelialmesenchymal transition of tumors with an LOD of 1.6  105 particles mL1.

46

Aptamer Molecular Evolution for Liquid Biopsy

1489

Fig. 15 Fluorescence-based aptasensor for sensitive detection of EVs. (a) TIRF microscopy coupled with aptamer-based fluorescent DNA nanodevices for quantification of exosomes. (Adapted with permission from (He et al. 2019). Copyright 2019 American Chemical Society). (b) Aptamer-confined nanoprobes coupled with enzyme-mediated signal amplification for surface marker profiling of EVs. (Adapted with permission from (Jin et al. 2018). Copyright 2018 American Chemical Society). (c) NIR afterglow aptasensor for multiplex differentiation of cancer exosomes. (Adapted with permission from (Lyu et al. 2019). Copyright 2019 Wiley-VCH)

Fluorescent aptasensors commonly suffer from fluorescence interference (e.g., autofluorescence, scattering, etc.), reducing detection sensitivity. To overcome this, some novel fluorophores have been developed and incorporated to fabricate fluorescent probes, such as quantum dots, upconversion nanoparticles, near-infrared (NIR) fluorophores, etc. For example, Pu’s group developed a NIR afterglow semiconducting polyelectrolyte nanosensor for orthogonal analysis of multiple exosomes (Fig. 15c) (Lyu et al. 2019). Semiconducting polyelectrolyte nanocomplexes were assembled with quencher-tagged aptamers via electrostatic interaction, resulting in the quenching of afterglow signals of nanocomplexes due to efficient electron transfer from nanocomplexes to the quenchers. Meanwhile, the binding of aptamers to target exosomes would increase the distance between nanocomplexes and quenchers to disrupt electron transfer, turning on afterglow signals. Since afterglow signals were collected after the cessation of light excitation, background signals were minimized, and the LOD was decreased by two orders of magnitude compared with those of traditional fluorescence detection methods. Moreover, by tailoring exosomal proteins of interest and changing corresponding aptamers, this strategy enabled multiplex differentiation of cancer exosomes.

Visual Detection Visual detection of EVs enables point-of-care testing for disease screening and monitoring. Visual aptasensors translate the molecular recognition events of

1490

L. Wu et al.

aptamer-EV binding into visual signals that can be directly read by the naked eye, such as the changes of solution colors. Signals can also be quantitatively detected with transportable, portable, and handheld instruments. Aptamers as ssDNA have been reported to enhance the peroxidase activity of nanozymes. Wang et al. designed hybrid nanozymes of aptamer-adsorbed graphitic carbon nitride nanosheets (NSs) as visual aptasensors for EV detection (Fig. 16a) (Zhang et al. 2019b). The hybrid nanozymes enhanced catalytic activity for the oxidation of 3,30 ,5,50 -tetramethylbenzidine (TMB) with the faster generation of colorimetric signals. When aptamers are bound to exosomes in samples with altered conformation, they would detach from nanosheets to recover the original catalytic activity of nanosheets. The resultant color change depended on the concentration of exosomes, enabling visual analysis and quantification detection of exosomes using a UV-vis spectrometer. The localized surface plasmon resonance of noble metal nanomaterials offers an alternative strategy of colorimetric detection. Taking aptamerfunctionalized AuNPs as tags, the binding of aptamers to target EVs can cause the aggregation or disaggregation of AuNPs, leading to color changes for visual detection. In contrast to single color change, color variation provides more vivid and more distinguishable signal for the naked eyes. Gold nanorods (Au NRs) with high colorimetric sensitivity offer distinguishing color variation during their growth or etching. Zhang et al. developed a dual-signal amplification strategy for multicolor visual detection of exosomes with the assistance of affinity magnetic isolation (Fig. 16b) (Zhang et al. 2019b). After exosomes were captured using aptamermodified MBs, cholesterol-modified DNA was harnessed to label exosomes to trigger HCR with enhanced enzyme loading to catalyze the metallization of Au NRs. The dual-signal amplification of HCR and enzyme-catalyzed metallization yield an LOD of 1.6  102 particles μL1.

Fig. 16 Aptasensor-based visual detection for EVs. (a) Aptamer adsorbed nanosheets with enhanced peroxidase activity for exosomes detection. (Adapted with permission from (Wang et al. 2017b). Copyright 2017 American Chemical Society). (b) Multicolor visual detection of exosomes by combining HCR with enzyme-catalyzed metallization of Au nanorods. (Adapted with permission from (Zhang et al. 2019b). Copyright 2019 American Chemical Society)

46

Aptamer Molecular Evolution for Liquid Biopsy

1491

Conclusion Liquid biopsy is attracting more and more attention in clinical investigations and practices. It can provide real-time and comprehensive physiological and pathological information in a noninvasive sampling manner and a longitudinal analysis model for disease diagnosis, therapy guidance, and monitoring of health and disease recurrence. Thus, liquid biopsy has been considered as a favorable supplement to traditional tissue biopsy, and it is anticipated to boost the development of precision medicine. Molecular recognition of circulating targets is the foundation for liquid biopsy, which calls for high-performance recognition ligands. Aptamers with unique merits of convenient in vitro molecular evolution and synthesis, controllable modification, diverse structure design, and programmable assembly have offered numerous efficient isolation platforms, gentle-release methods, and sensitive detection biosensors for liquid biopsy. In this chapter, we have summarized recent advances in aptamer molecular evolution for liquid biopsy, mainly focused on modified SELEX techniques, aptamer-based CTC isolation, release, and detection, as well as EV detection. Although aptamers have been considered as ideal recognition ligands for liquid biopsy, they are not perfect and have not brought the expected impact on cancer diagnosis and therapy methods at the present stage. The main reason is the unsatisfactory performance of aptamers in complex matrices of body liquids. Because aptamers are commonly evolved in a relatively simplified matrix, such as buffer, the complex composition of body liquids would affect their folding into tertiary structures and thus affinity. Meanwhile, aptamers are sensitive to nuclease digestion in body liquids. The compromised affinity and stability of aptamers largely restrict the isolation efficiency and detection sensitivity. Existing strategies mainly focus on the multivalent assembly to improve affinity and stability of aptamers, thus requiring complicated operation and high cost for aptamer applications. Therefore, the evolution of aptamers directly in body liquid matrix and even in vivo will provide high-performance and robust aptamers for liquid biopsy. On the other hand, up to now, aptamer selection still remains at the lab level. While plenty of famous companies have produced antibodies against various targets. Even for the same target, diverse antibodies with different clone numbers, different host species, and diverse functional conjugation are commercially available for a wide range of applications. Moreover, the features of antibodies (e.g., species reactivity) and suitable applications (e.g., Western blot, immunohistochemistry, flow cytometry, etc.) have been investigated and illuminated clearly in recommended protocols. Thus, as for aptamers, more research and commercial efforts are in great need to reveal their properties and suitable applications as well as improve their properties for broader applications. Although various aptamer-based liquid biopsy methods were developed, only a few of them have been brought into routine clinical practice so far, which can be attributed to the following reasons. First, most liquid biopsy methods fail to provide comprehensive and accurate biological information of tumors, which affects the clinical value. On the one hand, aptamer-based isolation methods targeting one type

1492

L. Wu et al.

of biomarkers would induce the loss of CTC subpopulations or EV subpopulations with low or without the expression of the biomarkers, thus only providing quantitative analysis of certain subpopulations. On the other hand, although CTCs and T-EVs carry rich biological information of tumors, current analysis methods focus on enumeration analysis, leading to the lack of in-depth analysis of their genotypes and phenotypes. Second, circulating targets possess inherent heterogeneity, and traditional bulk analysis only obtains average results, thus covering up some important information of individual cells or EVs. Third, the verification and evaluation of clinical applications of aptamer-based liquid biopsy face some challenges, including the establishment of standard kits, procedures, and instruments, accessibility of sufficient clinical samples, close collaboration of basic research and practical applications, as well as sound clinical follow-up systems. With the development of SELEX, diverse high-performance aptamers targeting various biomarkers can be obtained. Therefore, aptamer cocktails can be utilized for the efficient capture of different subpopulations of CTCs or T-EVs. Coupled with delicate isolation platforms and smart-release methods, high-purity CTCs or EVs can be captured and released for downstream analysis. With the integration of microfluidic platforms for single-cell manipulation, single-cell analysis of CTC genotypes and phenotypes would be achieved with few cell losses. Moreover, the introduction of DNA barcoding techniques will allow high-throughput sequence analysis of single CTCs and EVs. In addition, considering the distinct differentiation of size and concentrations of different types of circulating targets, novel aptamer-based isolation approaches and detection strategies are in need for simultaneous analysis of CTCs and EVs with multidimensional and complementary information for liquid biopsy. Therefore, based on these concentrated efforts, aptamer-based liquid biopsy will offer comprehensive and accurate biological information for precision medicine. Furthermore, the development of integrated and automatic instruments, as well as standard reagent kits and protocols, will greatly facilitate the clinical verification of aptamer-based liquid biopsy methods. Finally, with the collaboration and participation of researchers, clinicians, and patients, aptamer-based liquid biopsy is expected to be applicable to clinical practice, opening new avenues for precision medicine.

References Abate MF, Jia S, Ahmed MG et al (2019) Visual quantitative detection of circulating tumor cells with single-cell sensitivity using a portable microfluidic device. Small 15:1804890 Alam KK, Chang JL, Burke DH (2015) FASTAptamer: a bioinformatic toolkit for high-throughput sequence analysis of combinatorial selections. Mol. Ther.–Nucleic Acids 4:e230 Battersby TR, Ang DN, Burgstaller P et al (1999) Quantitative analysis of receptors for adenosine nucleotides obtained via in vitro selection from a library incorporating a cationic nucleotide analog. J. Am. Chem. Soc. 121:9781–9789 Chen Y, Tyagi D, Lyu M et al (2019a) Regenerative NanoOctopus based on multivalent-aptamerfunctionalized magnetic microparticles for effective cell capture in whole blood. Anal. Chem. 91:4017–4022 Chen Y, Wang W, Tyagi D et al (2019b) Non-invasive isolation of rare circulating tumor cells with a DNA mimic of double-sided tape using multimeric aptamers. Nanoscale 11:5879–5883

46

Aptamer Molecular Evolution for Liquid Biopsy

1493

Cho M, Soo OS, Nie J et al (2013) Quantitative selection and parallel characterization of aptamers. Proc. Natl. Acad. Sci. U. S. A. 110:18460–18465 Cho M, Xiao Y, Nie J et al (2010) Quantitative selection of DNA aptamers through microfluidic selection and high-throughput sequencing. Proc. Natl. Acad. Sci. U. S. A. 107:15373–15378 Dong H, Chen H, Jiang J et al (2018) Highly sensitive electrochemical detection of tumor exosomes based on aptamer recognition-induced multi-DNA release and cyclic enzymatic amplification. Anal. Chem. 90:4507–4513 Ellington AD, Szostak JW (1990) In vitro selection of RNA molecules that bind specific ligands. Nature 346:818–822 Gotrik MR, Feagin TA, Csordas AT et al (2016) Advancements in aptamer discovery technologies. Acc. Chem. Res. 49:1903–1910 Guo J, Lin L, Zhao K et al (2020) Auto-affitech: an automated ligand binding affinity evaluation platform using digital microfluidics with a bidirectional magnetic separation method. Lab Chip 20:1577–1585 Hamada M (2018) In silico approaches to RNA aptamer design. Biochimie 145:8–14 He D, Ho S-L, Chan H-N et al (2019) Molecular-recognition-based dna nanodevices for enhancing the direct visualization and quantification of single vesicles of tumor exosomes in plasma microsamples. Anal. Chem. 91:2768–2775 Hoinka J, Berezhnoy A, Sauna ZE et al (2014) AptaCluster – a method to cluster HT-SELEX aptamer pools and lessons from its application. Res. Comput. Mol. Biol. 8394:115–128 Hong S-L, Wan Y-T, Tang M et al (2017) Multifunctional screening platform for the highly efficient discovery of aptamers with high affinity and specificity. Anal. Chem. 89:6535–6542 Huang M, Li T, Xu Y et al (2021) Activation of aptamers with gain of function by small-moleculeclipping of intramolecular motifs. Angew. Chem. Int. Ed. Eng. 60:6021–6028 Huang M, Song J, Huang P et al (2019a) Molecular crowding evolution for enabling discovery of enthalpy-driven aptamers for robust biomedical applications. Anal. Chem. 91:10879–10886 Huang M, Yang J, Wang T et al (2020) Homogeneous, low-volume, efficient, and sensitive quantitation of circulating exosomal PD-L1 for cancer diagnosis and immunotherapy response prediction. Angew. Chem. Int. Ed. Eng. 132:4830–4835 Huang R, He L, Xia Y et al (2019b) A sensitive aptasensor based on a hemin/G-quadruplex-assisted signal amplification strategy for electrochemical detection of gastric cancer exosomes. Small 15:1900735 Jin D, Yang F, Zhang Y et al (2018) ExoAPP: exosome-oriented, aptamer nanoprobe-enabled surface proteins profiling and detection. Anal. Chem. 90:14402–14411 Jing M, Bowser MT (2011) Isolation of DNA aptamers using micro free flow electrophoresis. Lab Chip 11:3703–3709 Kalluri R, LeBleu VS (2020) The biology, function, and biomedical applications of exosomes. Science 367:6977 Kimoto M, Yamashige R, Matsunaga K-i et al (2013) Generation of high-affinity DNA aptamers using an expanded genetic alphabet. Nat Biotechnol 31:453–457 Labib M, Green B, Mohamadi RM et al (2016) Aptamer and antisense-mediated two-dimensional isolation of specific cancer cell subpopulations. J. Am. Chem. Soc. 138:2476–2479 Lato SM, Ozerova NDS, He K et al (2002) Boron-containing aptamers to ATP. Nucleic Acids Res. 30:1401–1407 Latulippe DR, Szeto K, Ozer A et al (2013) Multiplexed microcolumn-based process for efficient selection of RNA aptamers. Anal. Chem. 85:3417–3424 Lee S, Kang J, Ren S et al (2013) A cross-contamination-free SELEX platform for a multi-target selection strategy. Biochip J 7:38–45 Li M, Ding H, Lin M et al (2019) DNA framework-programmed cell capture via topologyengineered receptor-ligand interactions. J. Am. Chem. Soc. 141:18910–18915 Li S, Xu H, Ding H et al (2009) Identification of an aptamer targeting hnRNP A1 by tissue slidebased SELEX. J. Pathol. 218:327–336 Li W-M, Zhou L-L, Zheng M et al (2018a) Selection of metastatic breast cancer cell-specific aptamers for the capture of CTCs with a metastatic phenotype by cell-SELEX. Mol. Ther.– Nucleic Acids 12:707–717

1494

L. Wu et al.

Li Y, Deng J, Han Z et al (2021) Molecular identification of tumor-derived extracellular vesicles using thermophoresis-mediated DNA computation. J. Am. Chem. Soc. 143:1290–1295 Li Z, Wang G, Shen Y et al (2018b) DNA-templated magnetic nanoparticle-quantum dot polymers for ultrasensitive capture and detection of circulating tumor cells. Adv. Funct. Mater. 28:1707152 Lin B, Tian T, Lu Y et al (2021) Tracing tumor-derived exosomal PD-L1 by dual-aptamer activated proximity-induced droplet digital PCR. Angew. Chem. Int. Ed. Eng. 60:7582–7586 Liu C, Zhao J, Tian F et al (2019a) Low-cost thermophoretic profiling of extracellular-vesicle surface proteins for the early detection and classification of cancers. Nat. Biomed. Eng. 3: 183–193 Liu L, Yang K, Gao H et al (2019b) Artificial antibody with site-enhanced multivalent aptamers for specific capture of circulating tumor cells. Anal. Chem. 91:2591–2594 Lou X, Qian J, Xiao Y et al (2009) Micromagnetic selection of aptamers in microfluidic channels. Proc. Natl. Acad. Sci. U. S. A. 106:2989–2994 Lyu Y, Cui D, Huang J et al (2019) Near-infrared afterglow semiconducting nano-polycomplexes for the multiplex differentiation of cancer exosomes. Angew. Chem. Int. Ed. Eng. 58:4983–4987 Ma H, Liu J, Ali MM et al (2015) Nucleic acid aptamers in cancer resarch, diagnosis and therapy. Chem. Soc. Rev. 44:1240–1256 Mendonsa SD, Bowser MT (2004) In vitro evolution of functional DNA using capillary electrophoresis. J. Am. Chem. Soc. 126:20–21 Mi J, Liu Y, Rabbani ZN et al (2010) In vivo selection of tumor-targeting RNA motifs. Nat. Chem. Biol. 6:22–24 Park J-W, Lee SJ, Ren S et al (2016) Acousto-microfluidics for screening of ssDNA aptamer. Sci. Rep. 6:27121 Phillips JA, Xu Y, Xia Z et al (2009) Enrichment of cancer cells using aptamers immobilized on a microfluidic channel. Anal. Chem. 81:1033–1039 Pinheiro VB, Taylor AI, Cozens C et al (2012) Synthetic genetic polymers capable of heredity and evolution. Science 336:341–344 Poudineh M, Labib M, Ahmed S et al (2017) Profiling functional and biochemical phenotypes of circulating tumor cells using a two-dimensional sorting device. Angew. Chem. Int. Ed. Eng. 56: 163–168 Qu L, Xu J, Tan X et al (2014) Dual-aptamer modification generates a unique interface for highly sensitive and specific electrochemical detection of tumor cells. ACS Appl. Mater. Interfaces 6: 7309–7315 Shen QL, Xu L, Zhao LB et al (2013) Specific capture and release of circulating tumor cells using aptamer-modified nanosubstrates. Adv. Mater. 25:2368–2373 Sheng W, Chen T, Tan W et al (2013) Multivalent DNA nanospheres for enhanced capture of cancer cells in microfluidic devices. ACS Nano 7:7067–7076 Sheng WA, Chen T, Katnath R et al (2012) Aptamer-enabled efficient isolation of cancer cells from whole blood using a microfluidic device. Anal. Chem. 84:4199–4206 Siravegna G, Marsoni S, Siena S et al (2017) Integrating liquid biopsies into the management of cancer. Nat. Rev. Clin. Oncol. 14:531–548 Song J, Zheng Y, Huang M et al (2020) A sequential multidimensional analysis algorithm for aptamer identification based on structure analysis and machine learning. Anal. Chem. 92: 3307–3314 Song Y, Shi Y, Huang M et al (2019) Bioinspired engineering of a multivalent aptamerfunctionalized nanointerface to enhance the capture and release of circulating tumor cells. Angew. Chem. Int. Ed. Eng. 131:2258–2262 Song Y, Shi Y, Li X et al (2016) Afi-Chip: an equipment-free, low-cost, and universal binding ligand affinity evaluation platform. Anal. Chem. 88:8294–8301 Souza AG, Marangoni K, Fujimura PT et al (2016) 3D cell-SELEX: development of RNA aptamers as molecular probes for PC-3 tumor cell line. Exp. Cell Res. 341:147–156

46

Aptamer Molecular Evolution for Liquid Biopsy

1495

Sun Y, Jin H, Jiang X et al (2020) Assembly of black phosphorus nanosheets and MOF to form functional hybrid thin-film for precise protein capture, dual-signal and intrinsic self-calibration sensing of specific cancer-derived exosomes. Anal. Chem. 92:2866–2875 Takenaka M, Okumura Y, Amino T et al (2017) DNA-duplex linker for AFM-SELEX of DNA aptamer against human serum albumin. Bioorg. Med. Chem. Lett. 27:954–957 Tolle F, Braendle GM, Matzner D et al (2015) A versatile approach towards nucleobase-modified aptamers. Angew. Chem. Int. Ed. Eng. 54:10971–10974 Tuerk C, Gold L (1990) Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249:505–510 Vaught JD, Bock C, Carter J et al (2010) Expanding the chemistry of DNA for in vitro selection. J. Am. Chem. Soc. 132:4141–4151 Wang C-W, Chung W-H, Cheng Y-F et al (2013) A new nucleic acid–based agent inhibits cytotoxic T lymphocyte–mediated immune disorders. J. Allergy Clin. Immunol. 132:713–722.e711 Wang S, Zhang L, Wan S et al (2017a) Aptasensor with expanded nucleotide using DNA nanotetrahedra for electrochemical detection of cancerous exosomes. ACS Nano 11:3943–3949 Wang Y-M, Liu J-W, Adkins GB et al (2017b) Enhancement of the intrinsic peroxidase-like activity of graphitic carbon nitride nanosheets by ssDNAs and its application for detection of exosomes. Anal. Chem. 89:12327–12333 Wu L, Ding H, Qu X et al (2020a) Fluidic multivalent membrane nanointerface enables synergetic enrichment of circulating tumor cells with high efficiency and viability. J. Am. Chem. Soc. 142: 4800–4806 Wu L, Wang Y, Xu X et al (2021) Aptamer-based detection of circulating targets for precision medicine. Chem. Rev. 121:12035–12105 Wu L, Wang Y, Zhu L et al (2020b) Aptamer-based liquid biopsy. ACS Appl. Bio. Mater. 3:2743–2764 Xiao Z, Levy-Nissenbaum E, Alexis F et al (2012) Engineering of targeted nanoparticles for cancer therapy using internalizing aptamers isolated by cell-uptake selection. ACS Nano 6:696–704 Xu C-M, Tang M, Feng J et al (2020) A liquid biopsy-guided drug release system for cancer theranostics: integrating rapid circulating tumor cell detection and precision tumor therapy. Lab Chip 20:1418–1425 Xu H, Liao C, Zuo P et al (2018) Magnetic-based microfluidic device for on-chip isolation and detection of tumor-derived exosomes. Anal. Chem. 90:13451–13458 Yu X, Wang B, Zhang N et al (2015) Capture and release of cancer cells by combining on-chip purification and off-chip enzymatic treatment. ACS Appl. Mater. Interfaces 7:24001–24007 Zhang Y, Wang Z, Wu L et al (2018) Combining multiplex SERS nanovectors and multivariate analysis for in situ profiling of circulating tumor cell phenotype using a microfluidic chip. Small 14:1704433 Zamay GS, Kolovskaya OS, Ivanchenko TI et al (2019) Development of DNA aptamers to native EpCAM for isolation of lung circulating tumor cells from human blood. Cancer 11:351 Zhang J, Lin B, Wu L et al (2020) DNA nanolithography enables a highly ordered recognition interface in a microfluidic chip for the efficient capture and release of circulating tumor cells. Angew. Chem. Int. Ed. Eng. 59:14115–14119 Zhang L, Yang Z, Sefah K et al (2015) Evolution of functional six-nucleotide DNA. J. Am. Chem. Soc. 137:6734–6737 Zhang P-H, Cao J-T, Min Q-H et al (2013) Multi-shell structured fluorescent–magnetic nanoprobe for target cell imaging and on-chip sorting. ACS Appl. Mater. Interfaces 5:7417–7424 Zhang Z, Tang C, Zhao L et al (2019a) Aptamer-based fluorescence polarization assay for separation-free exosome quantification. Nanoscale 11:10106–10113 Zhang Y, Wang D, Yue S et al (2019b) Sensitive multicolor visual detection of exosomes via dual signal amplification strategy of enzyme-catalyzed metallization of Au nanorods and hybridization chain reaction. ACS Sens. 4:3210–3218 Zhao L, Tang C, Xu L et al (2016) Enhanced and differential capture of circulating tumor cells from lung cancer patients by microfluidic assays using aptamer cocktail. Small 12:1072–1081

1496

L. Wu et al.

Zhao WA, Cui CH, Bose S et al (2012) Bioinspired multivalent DNA network for capture and release of cells. Proc. Natl. Acad. Sci. U. S. A. 109:19626–19631 Zhou J, Rossi J (2017) Aptamers as targeted therapeutics: current potential and challenges. Nat. Rev. Drug Discov. 16:181–202 Zhou Q, Rahimian A, Son K et al (2016a) Development of an aptasensor for electrochemical detection of exosomes. Methods 97:88–93 Zhou Y-G, Mohamadi RM, Poudineh M et al (2016b) Interrogating circulating microsomes and exosomes using metal nanoparticles. Anal. Chem. 12:727–732 Zhu G, Zhang H, Jacobson O et al (2017) Combinatorial screening of DNA aptamers for molecular imaging of HER2 in cancer. Bioconjug. Chem. 28:1068–1075 Zhu J, Nguyen T, Pei R et al (2012) Specific capture and temperature-mediated release of cells in an aptamer-based microfluidic device. Lab Chip 12:3504–3513

Single-Molecule DNA Visualization

47

Xuelin Jin and Kyubong Jo

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optical Mapping Based on Single-Molecule DNA Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DNA Visualization Using DNA-Binding Fluorescent Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Damage Visualization on Single DNA Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DNA Modification Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DNA Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In Vitro Observation of DNA Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Observation of DNA Replication in Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1498 1501 1503 1507 1509 1511 1515 1517 1520 1522

Abstract

Direct visualization of individual DNA molecules is a powerful platform for understanding and managing genomic and biochemical phenomena in the context of DNA sequences. Although sequencing technology at the single nucleotide level has advanced dramatically, there are still numerous unsolved biological problems that are limited by DNA fragmentation during the sequencing procedure. The ultimate goal of DNA analysis is to search for a target gene by its sequence and to control its expression on chromosomal DNA without fragmentation or amplification. Given these concerns, single-molecule DNA visualization is the first step in identifying and controlling target genes of large DNA molecules. Advancements in this field include developments in single-molecule methods, advanced microscopy techniques, molecular labeling modalities, and

X. Jin College of Agriculture, Yanbian University, Yanji, China K. Jo (*) Department of Chemistry, Sogang University, Seoul, South Korea e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_53

1497

1498

X. Jin and K. Jo

nano/microfluidics. This chapter describes these developments in terms of single-molecule DNA visualization. Keywords

Single DNA molecule · Genome mapping · DNA-protein interaction · Fluorescence labeling · DNA manipulation

Introduction Optical microscopy dates back to the sixteenth century, but it was only in the early 1980s that researchers first imaged DNA strands consisting of individual fluorochrome-stained DNA molecules (Morikawa and Yanagida 1981). This pivotal discovery has paved way for new insights and has fostered the development of new single-molecule approaches to identify the locations of genes in the genome and visualize gene control through protein-DNA interactions (Lee et al. 2015). Although electron microscopy offers higher resolution and had played key revolutionary roles in molecular biology by enabling the detailed analysis of individual nucleic acids and their complexes, it has many technical issues that limit its practical use in various biological or biochemical investigations (Kleinschmidt and Zahn 1959; Williams and Wyckoff 1945). In contrast, optical microscopy only requires the staining of DNA molecules with an appropriate staining reagent for visualization. The first DNA staining method, Giemsa staining, was developed by Gustav Giemsa in 1904 (Giemsa 1904). Giemsa staining is often used to stain the G-band in a chromosome to obtain a karyogram. Until 1972, radioactive materials were the primary tool for visualizing DNA bands separated by gel electrophoresis or chromatography. Since 1972, ethidium bromide has been the first fluorescent material used for DNA staining after gel electrophoresis (Aaij and Borst 1972). However, DNA in a gel can be used to observe DNA bands composed of many same-sized DNA molecules; therefore, gel electrophoresis is not a single-molecule observation method. The first dye used for single DNA molecule observation was 40 ,6diamidino-2-phenylindole (DAPI), which was developed in 1975 (Russell et al. 1975). In 1981, Yanagida and his colleagues succeeded in visualizing DAPI-stained large elongated DNA molecules (bacteriophage T4 DNA169 kb) in the gel matrix (Fig. 1a) (Seiji Matsumoto and Yanagida 1981). DAPI staining is still the primary tool used by biologists to stain DNA in cell nuclei. However, the fluorescence intensity of DAPI-stained DNA is generally too weak to be readily observed under a microscope. In 1992, Glazer group introduced a TOTO series of dyes. Of these, YOYO-1 became a primary tool for single-molecule DNA visualization. YOYO-1stained DNA readily generates high-contrast images (Fig. 1b) (Rye et al. 1992; Perkins et al. 1994). YOYO-1 is a dye that intercalates between DNA base pairs. Notably, YOYO-1 is non-fluorescent in an unbound state and has a high affinity for double-stranded DNA

47

Single-Molecule DNA Visualization

1499

Fig. 1 Visualization of single DNA molecules. (a) The first fluorescence microscopic images of elongated large DNA molecules stained with a synthesized organic dye, DAPI, in 1981. The DNA molecules were T4 DNA with 169 kb. Reproduced from ref. (Seiji Matsumoto and Yanagida 1981) with permission from Academic Press Inc. (London) Ltd., copyright 1981. (b) The YOYO-1stained λ DNA dimers showing DNA relaxation after stretching with the optical tweezers in 1994. Reproduced from ref. (Perkins et al. 1994) with permission from American Association for the Advancement of Science, copyright 1994. (c) Visualization of λ DNA stained with KWKWKKAeGFP-AKKWKWK. Reproduced from ref. (Lee et al. 2016) with permission from Oxford University Press, copyright 2015. [Creative Commons Attribution License (http://creativecommons.org/ licenses/by/4.0/)] (d) λ DNA staining with tTALE-eGFP in 1X TE buffer containing 70 mM NaCl. Reproduced from ref. (Shin et al. 2019) with permission from Springer Nature Limited, copyright 2019. [Creative Commons Attribution 4.0 International License (http://creativecommons.org/ licenses/by/4.0/)]

molecules. After the dyes bind to double-stranded DNA, the fluorescence intensity increases 1000-folds, compared with the unbound states. Thus, the signal-to-background ratio is increased during the observation of the single DNA molecules. These dyes are popular in single-molecule imaging; however, the intercalation of YOYO-1 into double-stranded DNA results in a change in the persistence and contour lengths

1500

X. Jin and K. Jo

of DNA molecules. YOYO-1 provides clear and high-contrast images of single DNA molecules. Although organic dyes have been successfully used for DNA visualization, fluorescent proteins (FP) have emerged as new DNA staining material since the late 1990s (Shelby et al. 1996). Green fluorescent protein (GFP) is a marker for gene expression and has been used for probe development in the detection of biomolecules. Therefore, GFP has gained tremendous popularity (Davidson and Campbell 2009; Shaner et al. 2007). Protein engineering of GFP results in various fluorescent proteins with improved wavelength coverage, brightness, and stability (Shaner et al. 2007; Shu et al. 2006). Newly developed mutant variants and expression strategies reduced the aggregation or oligomerization of fluorescent proteins (Ratz et al. 2015; Stewart-Ornstein and Lahav 2016; Shaner et al. 2007). Genetic engineering offers tools for combining FP with DNA-binding proteins/peptides (FP-DBP) to stain DNA molecules. Figure 1c shows homogenous DNA staining with FP-DBP (Lee et al. 2016). This protein was KWKWKKA-eGFP-AKKWKWK. The repeated Try-Lys gave the DNA-binding capability (Kd ¼ 15 μM) to the FP-DBP. Alternatively, Fig. 1d shows AT-rich portions of λ DNA molecules (tTLAE-eGFP) (Shin et al. 2019). FP-DBP has various advantages in the visualization of single DNA molecules. First, there are many types of DNA-binding peptides or proteins; therefore, FP-DBPs have specificity and various affinities for specific DNA sequences. Second, FP-DBPs can be expressed in a cellular system. Hence, FP-DBP can easily translocate to the nucleus for DNA staining via genetic manipulation (Giepmans et al. 2006). Third, many types of fluorescent proteins with a wide range of emission spectra are available for FP-DBP development and multi-color staining (Jin et al. 2020). FP-DBPs can also help to overcome the disadvantages of using organic dyes for DNA staining: (i) FP-DBPs have low cytotoxicity (Shemiakina et al. 2012; Shen et al. 2017); however, organic dyes are cytotoxic and potentially mutagenic; therefore, careful handling is necessary (Saeidnia and Abdollahi 2013). (ii) FP-DBPs do not show photocleavage of stained DNA molecules because their fluorophores are buried inside the β-barrels (Lee et al. 2016), whereas intercalating organic dyes often cause DNA photodamage by generating radical intermediates that create DNA breaks under laser illumination (Tycon et al. 2012). (iii) Most FP-DBPs stain DNA molecules without significant changes in the DNA structure after staining (Lee et al. 2016), whereas intercalating organic dyes extend DNA lengths by intercalation between DNA bases (Seonghyun Lee et al. 2016). (iv) FP-DBPs can reversibly stain DNA molecules by changing the salt concentration or pH (Lee et al. 2016). However, it is difficult to reversibly stain single DNA molecules with organic dyes. In recent studies, FP-DBPs have been used in the study of DNA-related events, both in vivo and in vitro. In this chapter, we discuss genomic and biochemical applications using singlemolecule DNA visualization based on staining by YOYO-1 and FP-DBP. We first start with the subject of the Optical Mapping system that gives large-scale sequence information.

47

Single-Molecule DNA Visualization

1501

Optical Mapping Based on Single-Molecule DNA Visualization Optical mapping (OM) is a high-throughput single-molecule system used for the construction of genome-wide DNA physical maps. Optical mapping reveals long-range information of the genomic DNA. The OM system incorporates multidisciplinary approaches to obtaining long-range single-molecule genomic information and provides indispensable complementary information for DNA sequencing. In 1993, Schwartz and his colleagues invented an OM system through sequencespecific digestion of large elongated DNA. Initially, elongated DNA was immobilized in the agarose gel matrix (Schwartz et al. 1993). Since then, several glass surface-derivatizing reagents have been tested for optimizing the conditions of DNA immobilization, including polylysine (Meng et al. 1995), APTES (3-aminopropyltriethoxysilane) (Cai et al. 1995), and APDEMS (3-aminopropyldiethoxy-methylsilane) (Jing et al. 1999). Since 2002, trimethyl silane (N-trimethoxysilyl-propyl-N,N,N-trimethylammonium chloride, Gelest Corp.) has become the standard surface-derivatizing reagent for OM surfaces. Trimethyl silane contains quaternary ammonium, which has constant positive ions rather than a reversibly ionizable amino group for APTES and APDMES (Zhou et al. 2002). In OM, very large genomic DNA molecules are presented as stretched chains on positively charged glass surfaces, where they are restriction-digested, stained, and imaged to generate ordered restriction maps (Dimalanta et al. 2004). Figure 2a illustrates the digestion of a single DNA molecule by SwaI (Dimalanta et al. 2004). Because SwaI recognizes ATTTAAAT, the gaps between fragments represent its cognate sequence. Since 2001, this system has been commercialized using the OpGen Argus system. In 2007, Jo et al. introduced a new OM system for nanochannel-confined DNA, named “Nanocode” (Jo et al. 2007). The incorporation of fluorochrome-labeled nucleotides at cognate nick sites places fluorescent tags on genomic DNA, which can be imaged and analyzed upon DNA presentation and elongation in nanochannels. The original aim of this development was to integrate mapping and sequencing in the nanochannel. In other words, a DNA molecule is first loaded into a nanochannel, then mapped by imaging, and later sequenced using a nanosequencing tool attached to the exit of the nanochannel. For this purpose, DNA should first be confined to the nanochannel rather than being immobilized on the surface. However, it was not simple to add nano-sequencing tool at the exit of the nanochannel. Even today there is no technology to bring nanochannels and nanosequencing tools together. Unlike the first-generation OM system, restriction enzyme digestion could not be applied to nanochannel-confined DNA because DNA fragments would lose their positions because they are not fixed on the surface. As a solution, nick translation-based labeling was developed using a nicking enzyme (Nb.BbvCI: GCTGAGG) which was newly developed at that time. Figure 2b illustrates sequence-specific labeled DNA confined in a nanochannel (Jo et al. 2007). This method became the standard method for BioNano Genomics Saphyr

1502

X. Jin and K. Jo

Fig. 2 Optical mapping (a) Rmap: restriction enzyme (SwaI: ATTTAAAT) digestion map (Dimalanta et al. 2004). (b) Nanocode: nicking enzyme (Nb.BbvC1: GCTGAGG (Jo et al. 2007). (c) DNA fluorocode: methyltransfer (M.HhaI GCGC). Reprinted from ref. (Neely et al. 2010) with permission from The Royal Society of Chemistry, copyright 2010. (d) A/T profile: A/T specific DNA-binding pyrrole octamer. Reprinted from ref. (Lee et al. 2018a) with permission from Oxford University Press, copyright 2018. (e) Fiber-FISH: fluorescent in situ hybridization for Y chromosome PAR2 region. Reprinted from ref. (Skaletsky et al. 2003) with permission from Nature Publishing Group, copyright 2003. Creative Commons Attribution Non-Commercial License (http://creativecommons.org/ licenses/by-nc/4.0/)

system. However, there is an issue with the nick translation-based method in that the DNA can break if two polymerase reactions meet. Moreover, nicking enzyme reactions can make the DNA fragile. In 2010, Neely et al. introduced a new system called DNA fluorocode, in which methyltransferase adds fluorochrome to the DNA sequence specifically. Figure 2c shows an example of fluorocode labeled DNA (Neely et al. 2010). Since M.HhaI recognizes the GCGC 4-mer sequence, the DNA image resembles an intensity profile rather than a barcode. Recently, BioNano Genomics adapted DLE-1 (a direct labeling enzyme) method based on methyl transferase (M.AflII: CTTAAG). Figure 2d shows an alternative approach for A/Tspecific labeling of DNA molecules with the intensity profile using fluorochrometagged pyrrole-imidazole polyamide (Lee et al. 2018a). The OM systems are useful for whole-genomic DNA analysis because they enable sequence-specific labeling of long chromosomal DNA molecules. OM data have been used for the analysis of genomic alterations (Ray et al. 2013), DNA fragment identification (Lee et al. 2018b), and gene analysis (Cheeseman et al. 2014). OM offers long-read data with single-molecule sensitivity and provides various types of information, such as epigenetic modification, DNA replication sites, and DNA damage sites. Fluorescent labeling of genetic or epigenetic patterns with OM provides valuable information related to the epigenetic modifications of the genomic DNA (Matsuoka et al. 2012).

47

Single-Molecule DNA Visualization

1503

Another important method for visualizing genomic DNA with gene mapping is fluorescence in situ hybridization (FISH), which allows the detection of specific DNA sequences, gene mapping, identification of foreign chromatic from interspecific hybrids, and diagnosis of genetic diseases (Ratan et al. 2017; Weise et al. 2009). FISH probes contain fluorescent materials and reporter molecules that target specific sequences of DNA molecules. FISH is capable of whole-genome screening using multicolor whole-chromosome probes, such as multiplex FISH. Several probe labeling approaches enable the detection of more than two sequences in a cell using fluorochromes of different colors. The combination of digoxigenin, biotin, and fluorescein labeling enables the detection of multiple probes and sequence mapping. The basic principle of FISH is the hybridization of DNA strands and nuclear DNA, either in interphase cells or in the DNA of metaphase chromosomes attached to coverslips. The probes were labeled either directly via the incorporation of fluorophores or indirectly with a hapten. The FISH probes labeled with fluorophores and target DNA molecules were mixed after denaturation; thus, complementary DNA sequences were annealed with the corresponding complementary sequences of genomic DNA. For indirectly labeled FISH probes, it is necessary to add an extra step for immunological or enzymatic detection to visualize the non-fluorescent hapten. The reporter molecules used in the indirect detection method are biotin, dinitrophenol, and digoxigenin. For directly labeled FISH probes, The most popular reporter molecules are FITC, Texas Red, rhodamine, AMCA, Cy2, Cy3, and Cy5. For FISH, the choice of the probe is the most important. A wide range of FISH probes is available, from small, cloned probes to whole-genome probes. There are three types of probes: repetitive sequence, locus-specific, and wholechromosome painting probes. Repetitive sequence probes hybridize with specific chromosomal regions that contain short repetitive sequences. For example, centromeric, satellite DNA, and pan-telomeric probes are repetitive sequence probes. Locus-specific probes are genomic clones of different sizes depending on the nature of the cloning vectors, such as plasmids, PAC, YAC, and RAC vectors. Finally, whole chromosome painting probes enable the imaging of individual chromosomes in cells at the metaphase or interphase and observation of chromosomal aberrations. FISH probes are generated from a chromosome and are capable of PCR amplification and homogeneous labeling of the entire chromosome. With FISH probes, it is possible to observe chromosomal rearrangement in metaphase cells. In the early 1990s, the development of fiber-FISH improved the resolution of conventional metaphase FISH by extending chromatin fibers after the hybridization of probes to genomic DNA (Heng et al. 1992). Figure 2e shows an example of fiber-FISH with repeated sequences in the human Y chromosome (Skaletsky et al. 2003).

DNA Visualization Using DNA-Binding Fluorescent Proteins Besides organic dyes, DNA-binding fluorescent proteins are a useful tool for visualizing single DNA molecules. Figure 3 illustrates fluorescent protein (FP)-DNAbinding peptide/protein (DBP) for the visualization of elongated large DNA

1504

X. Jin and K. Jo

Fig. 3 Visualization of single DNA molecules with homogeneous staining and sequence-specific staining. (a) Scheme of DNA staining with FP-DBP (made in © BioRender – biorender. com). (b) λ DNA stained with KWKWKKA-eGFP-AKKWKWK, KWKWKKA-mCherry-AKKWKWK, and YOYO-1. Reprinted from ref. (Lee et al. 2016) with permission from Oxford University Press, copyright 2015 [Creative Commons Attribution License (http://creativecommons.org/licenses/by/ 4.0/)]. (c) Comparison of photocleavage of YOYO-1-stained T4 DNA and FP-DBP-stained T4 DNA molecules in 40 nm nanoslits. Scale ¼ 5 μm. Reprinted from ref. (Lee et al. 2016) with permission from Oxford University Press, copyright 2015 [Creative Commons Attribution License (http://creativecommons.org /licenses/by/ 4.0/)]. (d) Reversible staining of λ DNA with FP-DBP in microfluidic devices. The flow of 1X TE buffer with pH 11 destained λ DNA and FP-DBP in 1X TE buffer with pH 8 restained λ DNA. Reprinted from ref. (Lee et al. 2016) with permission from Oxford University Press, copyright 2015 [Creative Commons Attribution License (http:// creativecommons.org/licenses/by/ 4.0/)]. (e) Combination of H-NS-mCherry, an AT specific staining FP-DBP, and BRCA-eGFP, a non-specific staining FP-DBP for λ DNA staining. AT specifically stained λ DNA facilitate the orientation analysis of DNA and analysis of AT distribution on DNA backbones. Reprinted from ref. (Park et al. 2019) with permission from the Royal Society of Chemistry, copyright 2019. (f) Tethered λ DNA stained with tTALE-eGFP in 1X TE buffer containing 60 mM NaCl after XbaI digestion. Reprinted from ref. (Shin et al. 2019) with permission from Springer Nature Limited, copyright 2019. [Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/)]

molecules (Lee et al. 2016). Figure 3b shows FP-DBP-stained DNA that is comparable to DNA stained with YOYO-1, in which single DNA molecules were immobilized on positively charged coverslips through microfluidic channels. FP-DBPs do not cause photocleavage of DNA or structural deformation. Figure 3c illustrates that DNA molecules stained with YOYO-1 photocleaved in 400 nm nanoslits within 10 s of 488 nm laser illumination. YOYO-1, an intercalating dye, is known to unwind, deform, and photocleave DNA molecules under illumination with excitation light (Tycon et al. 2012; Murade et al. 2010). In contrast, DNA molecules stained with FP-DBP did not cause photocleavage because the fluorophores of fluorescent proteins were buried in β-barrels and did not directly interact with DNA molecules. Therefore, DNA molecules stained with FP-DBP did not break during the 5 min of exposure to the excitation light. Instead, the visualized

47

Single-Molecule DNA Visualization

1505

DNA images disappeared due to fluorescent protein bleaching. Moreover, the problems of fluorescent protein bleaching can be overcome by the reversible staining of DNA molecules via pH shifting or salt concentration adjustment. Figure 3d shows the destaining of DNA molecules using 1 TE buffer at pH 11 and restaining with FP-DBP in 1 TE buffer at pH 8 using DNA molecules tethered on microfluidic surfaces. In particular, DNA staining with FP-DBP is compatible with microfluidic surface tethering because microfluidic approaches enable changing buffers and reaction environments as well as types of FP-DBPs. Figure 3e and f show AT-specific staining of DNA molecules with FP-DBPs which were developed for the analysis of sequence information with AT-specific staining patterns (Park et al. 2019; Shin et al. 2019). As shown in Fig. 3e, H-NSmCherry (histone-like nucleoid-structural protein) binds to AT-specific regions; on the other hand, BRCA1-eGFP (breast cancer gene 1) works as a co-staining reagent for staining the remaining DNA molecules that are not stained by H-NS-mCherry (Park et al. 2019). BRCA1-eGFP can homogeneously stain DNA backbones. Consistent image patterns, three distinct red regions on a green λ DNA backbone, enable aligning of λ DNA molecules. As shown in Fig. 3e, the sequence-specific staining patterns provide an orientation of DNA molecules and identify a fragment in the context of genomic maps. As shown in Fig. 3f, truncated transcription activator-like effector (tTALE-eGFP) enabled AT-specific DNA staining (Shin et al. 2019). Instead of staining the target sequence, the TGTCTGT and tTALE-eGFP sequences specifically stained AT-rich regions in salt buffers containing 40–100 mM NaCl. These staining characteristics of tTALE-eGFP allow for the monitoring of DNA molecules during biochemical reactions. As shown in Fig. 3f, λ DNA was treated with restriction enzyme (XbaI) and stained with tTALE-eGFP in 60 mM NaCl in 1X TE buffer. This image shows the digestion sites on the DNA optical map. Sequencespecific staining with FP-DBP enables the observation of genome-specific staining patterns that reflect AT distribution in genomic DNA molecules. Consequently, FP-DBPs used for two-color staining and tTALE-eGFP produced sequence-specific staining patterns for the optical identification of DNA molecules at the singlemolecule level. Single DNA staining approaches using FP-DBP are usually combined with microfluidic surface tethering, as shown in Fig. 3c and f, because microfluidic devices allow the change of reaction solutions (Park et al. 2019; Shin et al. 2019). Microfluidic devices utilize liquid flow to fully stretch single DNA molecules tethered to a microfluidic surface. Fully elongated DNA molecules are important for the accurate analysis of DNA staining patterns (Fig. 3f) (Shin et al. 2019). Microfluidic devices facilitate and enhance single-molecule studies owing to their small volume manipulation, high-throughput capabilities, and precise liquid handling of microfluidic devices. Microfluidic devices are capable of precise manipulation of single molecules and single cells by handling nanoliter-scale liquid. Therefore, the control, reproducibility, and precision of microfluidic devices allow for biological detection at the single-molecule level. In Fig. 4, λ DNA molecules tethered on the microfluidic devices were stretched by the buffer flow (Lee and Jo

1506

X. Jin and K. Jo

(a) Aminosilane-PEG-Biotin (+Aminosilane-mPEG) O HN

H N O

O

H HN

NH

O S

Biotinylated and elongated DNA

H

Avidin

Si O O O

Flows

O

Glass Surface

(b)

l DNA (48.5 kbp) Flow Direction

Fig. 4 DNA tethering in microfluidic devices. (a) Biotinylated DNA tethered on avidin proteins that are linked to a PEG-biotin covalently coated on the glass surfaces. (b) Elongated DNA molecules stained by DNA-binding fluorescent proteins in microfluidic devices. λ DNA tethered on microfluidic surfaces followed by visualization with DNA-binding fluorescent proteins. Reprinted from ref. (Lee and Jo 2016) with permission from Journal of Visualized Experiments, copyright 2016

2016). As shown in Fig. 4a, the avidin-biotin interaction enables the tethering of biotinylated DNA molecules on avidin-coated microfluidic surfaces. Figure 4b shows the tethered λ DNA on surfaces using DNA-binding fluorescent proteins. Moreover, the tethered DNA molecules were fully stretched by the buffer flow. Microfluidic channels with a low Reynolds number environment, minimal

47

Single-Molecule DNA Visualization

1507

convection, and parallel streamlines provide predictable flow profiles; therefore, these characteristics of the microfluidic devices facilitate sample handling at the single-molecule level. The sophisticated control systems improve the precision, throughput, and automated biomolecule handling at the single-molecule level in microfluidic devices.

Damage Visualization on Single DNA Molecules The detection of DNA damage is important for solving health-related issues. For example, UV exposure, reactive oxygen species, environmental pollution, and endogenous metabolites can cause DNA damage. To date, the observation of gel smearing patterns has been extensively used in the analysis of DNA damage. However, gel smearing patterns do not provide detailed information, such as sensitive quantitative analysis of DNA damage sites. Moreover, DNA damage is a rare cellular event; therefore, it is difficult to detect DNA damage with electrophoresisbased approaches. Single-molecule approaches overcome the disadvantages of electrophoresis-based approaches with high sensitivity and accuracy for the quantitative analysis of DNA damage. Nick translation has been used for the single-molecule analysis of DNA damage caused by UV (Lee et al. 2013), reactive oxygen species (Jinyong Lee et al. 2016), and alcohol (Kang et al. 2016). Lee et al. introduced the nick translation to visualize UV-induced DNA damage in the genomic DNA of bacteriophage λ (Lee et al. 2013). Later, it was also used for the visualization and quantification of ROS-induced DNA damage and alcohol-induced DNA damage (Jinyong Lee et al. 2016; Kang et al. 2016). Figure 5a and b show a comparison of DNA damage in E. coli genomic DNA molecules from bacteria treated with alcoholic beverages with different concentrations of alcohol (Kang et al. 2016). In Fig. 5a, the scheme of the experimental design shows that E. coli-embedded agarose plugs were treated with alcoholic beverages for 30 min, followed by the washing steps (Kang et al. 2016). Thereafter, the E. coli in the plugs were lysed with proteinase K, followed by the removal of oxidized base adducts using a mixture of Fpg, Nfo, and Nei. Finally, DNA polymerase I synthesized DNA using a dNTP mixture and AlexaFluor-647-labeled dUTP, thereby visualizing DNA damage. As shown in Fig. 5b, elongated E. coli genomic DNA molecules with DNA lesions labeled with AlexaFluor-647 had different numbers of DNA damage sites for different concentrations of alcohols (Kang et al. 2016). Figure 5c shows a schematic to label damaged DNA molecules by two steps (Singh et al. 2021). In the first step, bacterial repair enzyme cocktails were used for the excision of DNA damage to create gaps. The second step was the nick translation of the DNA damage sites with fluorescently labeled dNTP, aminoallyldUTP-ATTO-647 N, dNTP mixtures, and DNA polymerase I, thereby labeling the DNA damage sites with fluorescent spots. Figure 5d shows a comparison of the damage labeling of DNA molecules. The nick translation method has the advantage

1508

X. Jin and K. Jo

Fig. 5 Labeling of DNA damages. (a) Scheme of DNA damage labeling. Cell-embedded gel plugs were treated with alcoholic beverages and the damaged genomic DNA molecules were labeled with the nick translation. Reprinted from ref. (Kang et al. 2016) with permission from The Royal Society of Chemistry, copyright 2016. (b) Visualization of DNA damage. Red spots indicated by arrows represent DNA damage sites labeled with AlexaFluor-647. Green: E. coli genomic DNA stained with YOYO-1. Scale bar: 20 μm. Reprinted from ref. (Kang et al. 2016) with permission from The Royal Society of Chemistry, copyright 2016. (c) Schematic of PBMCs (peripheral blood mononuclear cells) isolation and labeling of DNA damage induced by BLM (Bleomycin). The DNA damage labeling was two steps. The first is the removal of the DNA damage. The second is labeling of damage sites, the gap, using DNA polymerase I, dNTPs, and aminoallyl-dUTP-ATTO-647 N. Reprinted from ref. (Singh et al. 2021) with permission from Elsevier B.V, copyright 2021 [Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0/)]. (d) Images of DNA damage labeling. i) Untreated PBMC DNA molecules. ii) Damage-labeled DNA molecules from PBMCs treated with BLM:Fe(II) (3 μM). Blue: DNA stained with YOYO-1, Magenta: DNA-damage sites labeled with aminoallyl-dUTP-ATTO-647 N. Scale bar ¼ 10 μm. Reprinted from ref. (Singh et al. 2021) with permission from Elsevier B.V, copyright 2021 [Attribution 4.0 International (https://creativecommons.org/licenses/by/4.0/)]

of the sensitive quantification of DNA damage. Therefore, nick translation is an efficient method for the detection of DNA damage. Figure 6 illustrates DNA damage labels using human cells (Zirkin et al. 2014). Figure 6b shows an example of DNA damage induced by UV. Clearly, DNA damage sites are non-homogeneously distributed on genomic DNA backbones, and adequate coverage of genomic DNA is required for a reliable representation of global genomic damage. Figure 6c and d show the OM of human genomic DNA with competitive binding (OM by YOYO-1 and netropsin) and DNA damage labeling (Muller et al. 2019). This approach allows for the detection of different types of hazardous environments or damaging agents that cause DNA damage. Moreover, the damage assay performed in the nanochannels facilitated the homogeneous stretching of DNA molecules and quantitative analysis of DNA damage sites. Finally, the combination of competitive binding mapping and DNA damage labeling with nick translation increases the capacity to analyze DNA damage by enhancing the information from DNA maps, especially for the analysis of sparsely damaged DNA samples.

47

Single-Molecule DNA Visualization

1509

Fig. 6 Visualization of DNA damage using human cells. (a) Schematic of DNA damage labeling. i) DNA damage induced by various damage agents. ii) DNA damage excision by bacterial repair enzyme cocktails. iii) Damage site labeling. iv) Damage analysis. Reprinted from ref. (Zirkin et al. 2014) with permission from American Chemical Society, copyright 2014. (b) Labeling of DNA damage induced by UV. Reprinted from ref. (Zirkin et al. 2014) with permission from American Chemical Society, copyright 2014. (c) In silico evaluation of the parts of the human genome. Gray: human genome regions to explicitly map with 333 kb DNA molecules. Dark gray: non-mappable with 333 kbp DNA molecules. Reprinted from ref. (Muller et al. 2019) with permission from Oxford University Press, copyright 2019 [Creative Commons Attribution License (http:// creativecommons.org/licenses/by/4.0/)]. (d) Visualization of DNA damage of PBMC DNA molecules using the chemotherapeutic agent, etoposide. Experimental barcodes (blue) were compared with theoretical barcodes (black). Green: DNA damage sites labeled by ATTO-647, Blue: YOYO1-stained DNA. Reprinted from ref. (Muller et al. 2019) with permission from Oxford University Press, copyright 2019 [Creative Commons Attribution License (http://creativecommons.org/ licenses/by/4.0/)]

DNA Modification Labeling Single-molecule DNA visualization is a useful platform for the detection of epigenetic modifications. DNA methylation and hydroxymethylation are examples of epigenetic modification. These regions block the binding of proteins to target DNA sequences resulting in turning off genes. Figure 7a shows the detection of 5-hydroxymethylcytosine in single DNA molecules from healthy colon and colon cancers (Gilat et al. 2017). 5-hydroxymethylcytosine is formed by the oxidation of 5-methylcytosine. This epigenetic marker is a potential biomarker for several types of cancers. Therefore, the detection of 5-hydroxymethylcytosine is important for disease diagnosis. Figure 7a shows the two-step process for labeling 5-hydroxymethylcytosine. The first step is the glycosylation of 5-hydroxymethylcytosine with UDP-6-N3-Glu using the T4 β-GT enzyme. The second step is a click reaction between the fluorescently

1510

X. Jin and K. Jo

Fig. 7 Single-molecule DNA visualization of epigenetic modifications. (a) Detection of the 5hmC on single DNA molecules. (Upper panel) Schematic illustration of the two-step visualization of 5hmC. First, T4 β-GT enzyme catalyzes glucosylation of 5hmC using UDP-6-N3-Glu. Second, the click reaction of fluorescently labeled DBCO-Cy5 and N3 groups on the DNA molecules enables the visualization of 5hmC. (Lower panel) DNA molecules extracted from healthy colon and colon cancer patients are the sample DNA for 5hmC labeling and comparison of the 5hmC levels in each sample. Red, DNA molecules; green/yellow, 5hmC labeling. Reprinted from ref. (Gilat et al. 2017) with permission from the authors, copyright 2017. [Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/)]. (b) Detection of methylated sites on genomic DNA. (Upper panel) Labeling scheme of non-methylated CpGs on single DNA molecules. M. TaqI catalyzes the reaction of transferring a fluorophore to the adenine in the recognition sequence of the M. TaqI. If there are methylated CpGs, the reaction does not occur. (Lower panel) Dual labeling of genomic DNA molecules in a nanochannel array chip. The genomic DNA of human lymphocyte cells is shown as blue; the genomic DNA parts with genetic labels using Nt. BspQI are shown as red; the methylation-sensitive labels are shown as green. The genomic DNA molecules are in a nanochannel array chip. Reprinted from ref. (Assaf Grunwald et al. 2017) with permission from the authors, copyright 2017. [Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) (https://creativecommons.org/licenses/by-nc/4.0/)]. (c) Visualization of epigenetic modification of histones in linearized chromatin extracted from HeLa cell. (Upper panel) Scheme of labeling epigenetic modifications, such as H4Ac and H3Me (made in © BioRender-Biorender.com). (Lower panel) Methylated histone visualized with anti-H3K9me3 antibody (green) and acetylated histone visualized with anti-H4Ac antibody linked with CFTM594 (red). Reprinted from ref. (Matsuoka et al. 2012) with permission from the American Chemical Society, copyright 2012

labeled alkyne DBCO-Cy5 and the N3 group for labeling 5-hydroxymethylcytosine. In Figure 7a, a comparison of single DNA molecules from healthy colons and colon cancer shows a significant difference. The high sensitivity of single-molecule methods reveals that single DNA molecules from colon cancer cells have extremely low levels of 5-hydroxymethylcytosines on DNA molecules. Figure 7b demonstrates DNA methylation imaging (Assaf Grunwald et al. 2017). Approximately 80% of the CpG dinucleotides are methylated. On the other hand, unmodified CpGs can be labeled with alkyne or amine groups by methyltransferasedirected transfer of activated groups. After labeling with fluorophores, fluorescence microscopy allows the visualization of unmethylated CpG sites, and the visualization of unmodified CpG sites can be used to observe large structural aberrations. Super-resolution mapping enables the localization of epigenetic modifications; therefore, it can resolve the methylation of CpGs, as shown in Fig. 7b. It is difficult to directly and covalently label 5mC; however, direct covalent labeling of 5mC can be bypassed by changing 5mC to 5hmC. Selective staining of 5hmC and 5mC can be

47

Single-Molecule DNA Visualization

1511

achieved by blocking 5hmC using β-GT-mediated glucose transfer. The DNA methyltransferase M.TagI methylates TCGA sites, as shown in Fig. 7b. The enzyme incorporates fluorophores instead of methyl groups, using synthetic cofactor analogs. M.TagI recognition sites contain CpG dinucleotides. The methylation of TCGA sites blocks M.TaqI reaction. Methylation labeling combined with OM is a good platform for the sensitive detection of methylated sites. First, the genomic DNA of human lymphocytes was treated with Nt.BspQI, a nicking enzyme, to create nicks along with the DNA molecules. The second step is the labeling of non-methylated CpGs with green fluorophores using M.TaqI. For dual-color OM, single DNA molecules are loaded into the nanochannels, as shown in Fig. 7b. Using the M.TaqI labeling approach, it was possible to quantitatively measure methylation levels in various samples. Moreover, these epigenetic modifications occur not only on DNA backbones but also on histones in chromatin. Fig. 7c shows histone methylation (H3Me) and histone acetylation (H4Ac) in chromatin extracted from HeLa cells using monoclonal antibodies tagged by fluorophores (Matsuoka et al. 2012).

DNA Recombination The observation of single DNA molecules enables the study of DNA recombination (Galletto et al. 2006; Forget and Kowalczykowski 2012; Xue et al. 2021; Pokhrel et al. 2017). Typical methods include the use of optical tweezers and DNA curtains. These approaches have been used to study DNA recombination via RecA (Galletto et al. 2006; Forget and Kowalczykowski 2012) and RAD51 (Xue et al. 2021; Pokhrel et al. 2017) as shown in Fig. 8. RecA plays an important role in the DNA repair of double-strand breaks using homologous recombination. RecA interacts with DNA molecules to form RecA nucleoprotein filaments. Therefore, the direct observation of RecA nucleoprotein filament assembly on individual dsDNA molecules with fluorescently modified RecA is useful for understanding recombinational DNA repair. In recombinational DNA repair, RecA nucleoprotein filaments are typically on ssDNA for pairing with a relatively higher preference compared with dsDNA; however, RecA also forms filaments on dsDNA, active in an inverse strand of DNA exchange with complementary ssDNA. As shown in Fig. 8a, panels 1 and 2 show λ DNA molecules before and after YOYO-1 staining and destaining, respectively (Galletto et al. 2006). This unstained DNA interacts with fluorescently modified RecA in the presence of ATP-γS. After RecA filament formation, the RecA nucleoprotein filament was visualized using bound fluorescent RecA. Fig. 8a, panel 3, shows the visualized RecA nucleoprotein filaments. The formation of RecA nucleoprotein filaments extends the length of the λ DNA molecules. However, the length of the filament did not change with a continuous flow of NaCl or ATP for 20 min; therefore, there was only negligible dissociation of the RecA nucleoprotein filament, ATP-γS-RecA-dsDNA complex, as shown in Fig. 8a panel 4. Short-term nonhomologous interactions between RecA and single-stranded DNA regions are shown in Fig. 8b (Forget and Kowalczykowski 2012). These interactions were

1512

X. Jin and K. Jo

Fig. 8 Detection of DNA recombination. (a) Observation of RecA filament. Panel 1: Bead-DNA complex stained with YOYO-1. Panel 2: Dissociation of the YOYO-1 from stained bead-DNA complexes. Panel 3: Fluorescent filament after 5 min treated with RecA solution. Panel 4: Observation of RecA filament with a continuous flow of buffers for 20 min. Reprinted from ref. (Galletto et al. 2006) with permission from Nature Publishing Group, copyright 2006. (b) Kymograph of DNA dumbbell undergoing bead separation. The distance scale is on the top of the kymograph; the green represents the bead positions; the red represents the nucleoprotein filaments. The schematic illustration is the dissociation of the heterologously bound filaments. Reprinted from ref. (Forget and Kowalczykowski 2012) with permission from Macmillan Publishers Limited, copyright 2012. (c) Observation of RPA dynamics with mCherry-tagged RPA (RPAf). i) Schematic illustration of RPAf-coated ssDNA with double tethered ssDNA curtains. ii) Observation of RPAf-bound ssDNA shown in magenta. iii) Schematic illustration of the experimental design for observation of RPA dynamics. RPA-coated ssDNA was replaced with Rad51 to form a presynaptic filament that prevent rebinding of RPAf on ssDNA. Then, RPAf solution without ATP causes the Rad51 dissociation and RPAf rebinding. iv) Kymograph of a RPAf-bound ssDNA molecule through time. First, ssDNA is already coated with RPAf followed by continuous flowing of 100 pM RPAf with rate of 0.2 ml/min. Introduction of 2 μM Rad51 results in Rad51 coating on ssDNA by replacing RPAf. 100 pM RPAf buffer containing 2 mM ATP cannot replace Rad51 with RPAf. Finally, switching of buffer to 100 pM RPAf without ATP causes disassembly of Rad51 and rebinding of RPAf. Reprinted from ref. (Pokhrel et al. 2017) with permission from Oxford University Press, copyright 2017. (d) RECQ5 replacing RAD51 from ssDNA. Left panel: Schematic illustration of the RECQ5-mediated replacement of RAD51 on ssDNA by RPA-GFP. Right panel: Kymograph of dissociation of the RAD51 filament with RECQ5 followed by RPA-GFP bound to ssDNA. Reprinted from ref. (Xue et al. 2021) with permission from Oxford University Press, copyright 2020. [Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/)]

relatively unstable compared with homologous paired complexes during λ DNA extension or bead separation. These nonhomologous interactions lasted for less than a few tens of seconds. During the recombinational repair of dsDNA breaks, RAD51 recombinase interacts with ssDNA to form a nucleoprotein filament, which is capable of homology recognition as well as catalyzing strand exchange. The broken dsDNA ends formed ssDNA overhangs coated quickly with replication protein A (RPA). RAD51 then

47

Single-Molecule DNA Visualization

1513

competes with RPA to form nucleoprotein filaments around the ssDNA overhangs. Filaments facilitate homology recognition in homologous duplexes and catalyze strand exchange to form joint molecule intermediates. Following the disassembly of RAD51 from DNA heteroduplexes, the invading strand primes DNA synthesis to recover the genetic information. Figure 8c showed RPA dynamics in ssDNA curtains (Pokhrel et al. 2017). The ssDNA curtain was used to visualize RPAf-ssDNA complexes, thereby monitoring the assembly of Rad51 presynaptic complexes. ~50 knt long ssDNA substrates were tethered on the lipid bilayer via a biotinstreptavidin interaction preventing non-specific protein binding on the DNA curtain surface. Then, 100 pM RPAf was loaded into the ssDNA curtain system to form ssDNA-RPAf complexes with TIRF. Injection of 2 μM Rad51 buffer containing 2 mM ATP resulted in the rapid disappearance of fluorescence signal by replacing RPAf with Rad51. These Rad51 filaments were stable on ssDNA molecules after removing unbound Rad51 by loading 100 pM RPAf buffer containing 2 mM ATP for 30 min. However, after switching of buffer with 100 pM RPAf buffer without ATP, the ssDNA returned a fluorescence signal of RPAf by spontaneous dissociation of Rad51 and rebinding of RPAf. In the left panel of Fig. 8d, RecQ5 and RPA-GFP were added to RAD51-ssDNA curtains (Xue et al. 2021). Figure 8d right panel shows the dissociation of RAD51 by RECQ5, leading to the association of RPA-GFP from the 30 to 50 direction, similar to the translocation direction of RECQ5. At 100 and 400 nM RECQ5, RPA-GFP appeared at multiple locations on the ssDNA molecules. In contrast, at 10 nM or 25 nM RECQ5, RPA-GFP appeared at single sites and spread from the 30 to 50 direction on the ssDNA molecules. To observe the interaction between FP-DBP and single DNA molecules, optical tweezers and DNA curtains are usually used for this purpose, as shown in Fig. 8 (Galletto et al. 2006; Forget and Kowalczykowski 2012; Xue et al. 2021; Pokhrel et al. 2017). Optical tweezers allow the mechanical manipulation of single DNA molecules; therefore, optical tweezers are used in the study of DNA-protein interactions, as shown in Fig. 8a, b (Galletto et al. 2006; Forget and Kowalczykowski 2012). DNA is a semiflexible polymer with an intrinsic stiffness to resist sharp bending. The elasticity of DNA influences its dynamics and various cellular processes, such as DNA looping, organization, twisting, and bending by proteins. In an optical tweezer, two optical traps are used for force measurement or tracking. The two optical traps can suspend single DNA molecules with two functionalized polystyrene beads at each end in the multichannel flow cells. Tension can be applied to single DNA molecules by controlling the distance between the two beads. Optical tweezers with force-sensing instruments can be used to accurately measure the applied forces. Optical tweezers allow for the study of force-induced melting. However, optical tweezers have certain drawbacks. Because the trap stiffness depends on the gradient of the optical fields, optical perturbations affecting the intensity distribution have adverse effects on optical tweezers. Therefore, highly purified samples and optically homogeneous preparations are required for highresolution optical trapping. Optical interference leads to the generation of falseposition signals and ghost traps. Optical tweezers lack selectivity and exclusivity. Any dielectric particle near the focal point of the laser beam is trapped; therefore, it is

1514

X. Jin and K. Jo

possible to trap a number of particles simultaneously. Therefore, the samples used in optical tweezers should have low concentrations of particles to prevent the trapping of unwanted particles. Moreover, for complex samples with many impurities, such as cell extracts, the trapping of impurities distorts the position signal. The high intensity of the laser focal point leads to local heating which will affect the enzymatic activity and alter the viscosity of the local environment. A recent trend in optical trapping is the use of ultrashort femtosecond lasers instead of continuous-wave lasers. Ultrashort femtosecond lasers have short times of light pulses with few nanojoules of energy; therefore, these light sources can reduce damage to biomolecules. The use of optical fibers results in a strong intensity gradient around the tips of the optical fibers through chemical etching, polishing, and photopolymerization, resulting in increasingly complex configurations for fiber-based optical trapping. Nowadays, the development of nano-optical tweezers allows for the manipulation of nanoscale objects. Plasmonic optical tweezers and photonic crystal optical tweezers are available for force spectroscopy measurements, and fixed-position traps are available for singlemolecule studies. DNA curtains are also typically used for the observation of DNA-protein interactions, as shown in Fig. 8c and d (Xue et al. 2021; Pokhrel et al. 2017). DNA curtains are a combination of fluid lipid bilayers, hydrodynamic force, and nanofabricated surface patterns that align single DNA molecules and stretch them with buffer flows. One of the important components of DNA curtains is the lipid bilayer coated on the supporting surface of flow cells, which mimics phospholipid membranes. One end of the single DNA molecules is anchored on the lipid bilayers and pushed by buffer flows toward the nanofabricated chromium barriers, preventing lipid diffusion; therefore, the single DNA molecules align with the barriers. There are several different types of DNA curtains, such as single-tethered curtains, doubletethered curtains, parallel arrays of double-tethered isolated patterns (PARDI), crisscrossed DNA curtains, and ssDNA curtains. In single-tethered curtains, only one end of a single DNA molecule is anchored to the lipid bilayer; therefore, continuous buffer flow stretches the single DNA molecules during TIRFM to visualize the aligned single DNA molecules. Transient pauses of the buffer flow allow the verification of single DNA molecules and the interaction of the proteins with DNA. The barriers can be linear or zigzag-shaped. Linear barriers do not prevent the lateral distribution of single DNA molecules, resulting in the overlapping of DNA molecules. Zigzag-shaped barriers separate individual single DNA molecules from the DNA curtains. In double-tethered curtains, it is possible to observe single DNA molecules without buffer flows, since both the 3 ‘and 5’ ends of the DNA molecules are anchored to zigzag barriers or linear barriers to stretch the single DNA molecules without buffer flows. Therefore, double-tethered curtains allow the observation of DNA-protein interactions without perturbation of the hydrodynamic force. The first ends of single DNA molecules are tethered by biotin-streptavidin linkages; on the other hand, the second ends of single DNA molecules are anchored by the interaction between digoxigenin and anti-digoxigenin. In PARDI, the aligned single DNA molecules have sufficient distances between adjacent DNA molecules to prevent potential interference of DNA-protein interactions by nearest neighboring

47

Single-Molecule DNA Visualization

1515

single DNA molecules; therefore, the PARDI pattern curtains are specifically useful for studying protein binding kinetics under low concentrations of DNA molecules. On the other hand, crisscrossed curtains allow the observation of local high concentrations of single DNA molecules and proteins interacting with DNA molecules. ssDNA curtains are a good choice for DNA replication and repair because they allow the visualization of long ssDNA molecules that are involved in DNA replication and repair processes. Since ssDNA with shorter persisting lengths forms secondary structures, it is more difficult to stretch the ssDNA. In ssDNA curtains, ssDNA can be tethered via biotinstreptavidin interactions. DNA-binding fluorescent proteins are capable of visualizing ssDNA molecules. DNA curtains are used for the observation of DNA-protein interactions. Biological processes related to DNA typically require DNA-binding proteins. These DNA-binding proteins have specific target sequences; therefore, DNA curtains allow studies of target search mechanisms (such as protein sliding along the DNA molecules, protein hopping on the DNA molecules, intersegmental transfer of proteins, and three-dimensional diffusions), protein binding to target sequences, and proteinprotein colocalization on the DNA molecules (Lee 2021).

In Vitro Observation of DNA Replication Single-molecule approaches enable direct visualization of eukaryotic DNA replication. Cell-free extracts from Xenopus laevis were used to study DNA replication in higher eukaryotes. Figure 9a shows the DNA replication by Xenopus laevis egg extract (Yardimci et al. 2010). As shown in Fig. 9a, dig-dUTP was used for the visualization of the replicated DNA regions. Moreover, tethered DNA molecules contain replication bubbles, and DNA replication is bidirectional with the origin sites in the middle. Replication protein A (RPA), a major eukaryotic protein that binds to singlestranded DNA, is an essential factor in the DNA replication of Simian Virus 40. RPA not only binds to ssDNA but also interacts with various proteins, such as DNA damage recognition proteins, DNA polymerases, transcription activators, and recombination factors. Bloom helicase (BLM) is a robust helicase that rapidly unwinds dsDNA (70–80 bp/s) during DNA replication and repair. Once the BLM unwinds the dsDNA, RPA binds to ssDNA. As shown in Fig. 9b, RPA-mCherry labels ssDNA from BLM-mediated dsDNA unwinding (Xue et al. 2019) in DNA curtains. Figure 9b shows the formation of extended RPA-mCherry fluorescence signals on ssDNA after incubation of RPA-mCherry and unlabeled BLM with dsDNA curtains, and the results are consistent with the expected results. RPA-mCherry appeared at random locations on dsDNA and spread from these sites without a preferred direction. In Fig. 9c, to observe the interaction between the origin recognition complex (ORC) and DNA molecules, λ DNA containing autonomously replicating sequences (ARS) was tethered on the coverslips by the interaction between biotin and streptavidin (Xue et al. 2017). Following the incubation of ORC-Qdot705 with DNA molecules, ORC-Qdot705 not only labeled the ARS sites but was also

1516

X. Jin and K. Jo

Fig. 9 Observation of DNA replication at a single molecular level. (a) Visualization of the replication bubbles. (i) Single-tethered λ DNA with a replication bubble. (ii) Doubly tethered λ DNA with a replication bubble. The replication bubble was detected using dig-dUTP. Reprinted from ref. (Yardimci et al. 2010) with permission from Elsevier Inc., copyright 2012. (b) Observation of DNA unwinding by BLM. Left panel: Schematic illustration of DNA curtain assay. Right panel: Schematic illustration and a kymograph of detection of dsDNA unwinding by BLM followed by RPA-mCherry binding to the ssDNA for staining the ssDNA regions (magenta). Reprinted from ref. (Xue et al. 2019) with permission from Oxford University Press, copyright 2019. [Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/)]. (c) ORC binding to DNA molecules. After the design of the λ DNA containing ARS1 and ARS609 sites, ORC-Qdot705 binds to λ-ARS1/λ-ARS1-ARS609. Thereafter, the DNA stained with Sytox Orange and ORC-Qdot705 was observed. Red arrows represent ARS1 positions, and black arrows represent ARS609 positions. Reprinted from ref. (Xue et al. 2017) with permission from the authors, copyright 2017. [Attribution 4.0 International (CC BY 4.0) (https://creativecommons. org/licenses/by/4.0/)]

bound to other AT-rich sequences. ARS sites also contain AT-rich sequences. The ORC-Qdot705 preferentially binds to ARS sites and free ends of DNA molecules, compared with other AT-rich sequences. In the observation of protein-DNA interactions, such as DNA replication, as shown in Figs. 9b and c (Xue et al. 2019; Xue et al. 2017), total internal reflection fluorescence microscopy (TIRF) is usually used for the sensitive observation of single fluorophores from fluorescence proteins linked with DNA-binding proteins. TIRF enables the observation of the cargo transport process by single molecular motors or real-time DNA replication. Typical microscopes have a resolution of approximately 0.2 μm, unable to distinguish individual protein complexes interacting with single DNA molecules. One of the major disadvantages of standard fluorescence microscopy is the background noise resulting from out-of-focus fluorescence. The TIRF effectively removed the out-of-focus fluorescence in the background because the light only excites the fluorescence dyes near the coverslips using evanescent fields instead of direct illumination. Therefore, TIRF enables the observation of single fluorophores in a solution with an enhanced signal-to-noise ratio. In addition, this selective excitation of TIRF reduces the photobleaching of fluorophores. Therefore, TIRF is useful in the observation of DNA-protein

47

Single-Molecule DNA Visualization

1517

interactions. In a typical TIRF setup, light passes through two adjacent materials with different refractive indices. Typically, the light is transmitted through a coverslip followed by an aqueous DNA solution. During light transmission through the two media, the light can be partially or totally reflected, depending on the incident angle. At a critical angle, all of the light is refracted using light pathways parallel to the interface between two media with different refractive indices. After passing the critical angle, total internal reflection occurs. Although the light is no longer transmitted through the second medium during the total internal reflection, the reflected light generates an electromagnetic field, the evanescent fields, and penetrates the second medium. The evanescent fields decay exponentially with the penetration depth. Usually, the evanescent fields are 100 nm thick to excite the fluorophores near the boundary of the two media near the coverslip. Therefore, TIRF does not excite the fluorophores away from the surface of the coverslips, thereby reducing out-of-focus fluorescence. The recent development of commercially available TIRF microscopes or new objectives for TIRF microscopy has greatly increased the usage of TIRF microscopy in the observation of single DNA molecules.

Observation of DNA Replication in Cells FP-DBP is used for the visualization of single DNA molecules, not only in vitro but also in vivo. Figure 10 shows examples of the visualization of the DNA replication process with FP-DBPs. DNA replication plays an important role in cell cycles. Orgini-CEP

DnaX-mCherry YFP-YabA

YEP-DnaA

(c)

Early S

Mid S

Late S

REPPCNA

(a)

0 min

20 min

140 min

GFPHOXC13

(b)

Fig. 10 Visualization of DNA replication with FP-DBPs. (a) Localization of FP-DBPs related to DNA replication. Time-lapse localization of LacI-CFP (representing oriC location), YFP-DnaA, DnaX-mCherry (representing the location of replication machinery), and YFP-YabA (representing the reinitiation inhibitor) revealed the DNA replication process. Reprinted from ref. (Schenk et al. 2017) with permission from PLOS, copyright 2017 [Creative Commons Attribution License (http:// creativecommons.org/licenses/by/4.0/)]. Scale bar: 2 μm. (b) Overexpression of SeqA-PAmCherry in E. coli. Super-resolution image by PALM revealed scattered hemimethylated GATC sites more accurately, compared with the conventional epi-fluorescence microscopy. Reprinted from ref. (Mika et al. 2015) with permission from The Royal Society of Chemistry, copyright 2015. (c) NIH3T3 co-transfected with RFP-PCNA, and EGFP-HOXC13 in early S, mid S, and late S. NIH3T3 expressing RFP-PCNA and EGFP-HOXC13 showed different staining patterns of replication foci in different phases, early S, mid S, and late S. Reproduced from ref. (Marchetti et al. 2010) with permission from Oxford University Press, copyright 2010

1518

X. Jin and K. Jo

Figure 10a shows the positions of the replication components during DNA replication (Schenk et al. 2017). LacI-CFP visualized oriC, and YFP-DnaA visualized the replication initiation complex because DnaA is essential for regulating the initiation of replication in bacteria by opening the AT-rich strands (Richardson et al. 2016). The 20 min image indicated that YFP-DnaA colocalized with LacI-CFP bound to oriC for the initiation of DNA replication. During the separation of duplicated origins, some YFP-DnaA foci localize between origins and colocalize with the replication machinery; however, some other YFP-DnaA foci remained in the oriC regions for some time. The 140 min image indicated two separate oriC sites with YFP-DnaA between them. Therefore, after the initiation of replication, the YFP-DnaA foci were localized at both the replication forks and origin regions. Accumulation of DnaA at oriC regions after the initiation of replication can result in unwanted reinitiation; thus, to inhibit reinitiation, YabA and Soj function as negative DnaA regulators. From the right-most image, YFP-YabA, a regulator, inhibits reinitiation during DNA replication in bacterial cells, and DnaX-mCherry visualizes the replication machinery. The observation of FP-DBP positions on the DNA backbone reveals the DNA replication process. Figure 10b shows the SeqA-PAmCherry distribution in E. coli cells using superresolution microscopy and photoactivated localization microscopy (PALM) (Mika et al. 2015). SeqA, a DNA-binding protein, regulates the initiation of replication in E. coli (Slater et al. 1995). In E. coli, GATC sites are typically methylated by dam methylases. Newly replicated GATC sites remain in inactive hemimethylated states until the dam methylases these sites. SeqA protects hemimethylated GATC sites by binding to them to prevent DnaA from initiating DNA replication in the newly replicated oriC (Lu et al. 1994). The localization of SeqA was observed by immunofluorescence microscopy and fluorescent protein-tagged SeqA. As a result, SeqA localizes to the bacterial nucleoid (Hiraga et al. 2000). In conventional fluorescence microscopy, fluorescent protein-tagged SeqA forms foci shown as single dots. However, because of the diffraction limit, it is difficult to obtain additional details of these complex foci. Owing to the small size of bacterial cells, observation with conventional light microscopy has limitations. PALM enables the visualization of chemotactic protein networks, organization of nucleoid and nucleoid-related proteins, and cytoskeletal structures of bacteria (Wang et al. 2011). PALM is capable of visualizing a single protein, observing organizational features of clusters, and measuring the size of structures on scales of tens of nanometers (Gahlmann and Moerner 2014). Because of the high localization precision of PALM, it is used to observe SeqA localization in E. coli cells with spatial resolution. As shown in Fig. 10b, SeqA localization in E. coli overexpressing SeqA-PAmCherry was observed using PALM (Mika et al. 2015). Most of the SeqA-PAmCherry proteins are localized in the central regions of E. coli cells; therefore, SeqA proteins are distributed in the nucleoids of E. coli cells. As shown in Fig. 10b, the epi-fluorescence image for the distribution of SeqA-PCmCherry in E. coli suggests that SeqA was bound through the genome, whereas the super-resolution image by PALM shows scattered hemimethylated GATC sites throughout the nucleoids in E. coli cells (Mika et al. 2015).

47

Single-Molecule DNA Visualization

1519

Figure 10c shows the location of gene regulators and the DNA replication machinery in the eukaryotic nucleus (Marchetti et al. 2010). Replication complexes at several origin regions contain homeotic proteins, such as HOXA13, HOXC10, and HOXC13 (de Stanchina et al. 2000). Fluorescently tagged HOXC13 derivatives only colocalize with S replication foci in the early stages of replication (Comelli et al. 2009). The distribution of GFP-HOXC13 proteins shows a speckled pattern, similar to the distribution pattern of the early replicating chromatin. As shown in Fig. 10c, the expression of RFP-PCNA (sliding clamp), a marker of replication foci, and GFP-HOXC13, a homeotic regulator, in NIH3T3 cells showed changes in the distribution pattern of replication foci during the S phase. GFP-HOXC13 colocalizes with replication foci in the early S phase. Colocalization becomes less evident during the mid S phase and does not occur during the late S phase. Recently, super-resolution microscopy has become a powerful tool for highresolution fluorescence imaging, as shown in Fig. 10b (Mika et al. 2015). Single DNA molecule imaging with confocal, widefield, or multiphoton fluorescence microscopes has a limitation in spatial resolution due to the diffraction barrier of the lights. Therefore, the resolution of light microscopy enables the observation of microscale structures rather than nanostructures. The resolution of light microscopy is limited to 250 nm because the fluorescence of one point source is blurred to become light and diffracts through the optical path. However, super-resolution microscopy allows the observation of nanoscale structures and the acquisition of nanoscale information by overcoming the diffraction barrier. Stochastic optical reconstruction microscopy (STORM), a type of super-resolution single-molecule localization microscope, has a resolution of tens of nanometers. Other superresolution microscopies include photoactivated localization microscopy (PALM) and ground state depletion individual molecule return (GSDIM). Single-molecule super-resolution microscopy is a similar technique. Fluorophores are induced to a weakly emissive or non-emissive state. From these states, very small amounts of fluorophores are turned into emissive states for excitation, emission, and detection using intense ultraviolet laser pulses. The short activation pulses lead to the stochastic activation/switching of small amounts of molecules to the emissive state detected using longer wavelengths. However, the emission profiles of the images minimally overlapped with each other. The centroids of the identified molecules are fitted using a Gaussian function. By imaging and fitting molecules to the sub-diffraction limited area for thousands of images, the images are used to generate a composite reconstruction of stained molecules. Therefore, optically switching on/off different subpopulations of molecules at different times temporally separates spatially overlapping molecules to reach spatial resolutions of 10–50 nm. Most superresolution single-molecule localization microscopies use this generalized concept. The fluorophores used in STORM are organic dyes, fluorescent proteins, and quantum dots. Technically, it is challenging to perform two-color measurements because different fluorophores require different chemical conditions, including different oxygen levels. Currently, the development of effective antioxidants that work well with differently colored fluorophores enables the multicolor STORM. Another

1520

X. Jin and K. Jo

issue is aligning the beams to generate the same illumination profile during superresolution microscopy. In STED microscopy, fluorophores are excited by one beam and de-excited by stimulated emission with donut-shaped intensity profiles. This reduces the fluorescent area; therefore, the detected photons are generated in a small area toward the donut center. Reversible saturable optical linear fluorescence transition microscopy (RESOLFT) utilizes a longer-lived state change to de-excite fluorophores outside the donut center. PALM and STORM are far-field imaging microscopy techniques that detect fluorescence from a photoconversion process of fluorophores, photoactivation for PALM, and photo-switching for STORM. The difference between PALM and STORM is that in PALM, fluorophores are stochastically activated, followed by an image; on the other hand, in STORM, fluorophores are activated and imaged simultaneously. Therefore, STORM significantly increases the data collection rate. The PALM is useful for single-particle tracking. One or very small amounts of fluorophores are excited at a time; therefore, the diffraction-limited areas of the fluorophores do not overlap. Repeated excitation cycles allow the detection of all locations of the molecules, and these data are assembled to generate the final image. Further development of super-resolution single-molecule localization microscopy is capable of acquiring multidimensional data. The 3D information of biomolecules, such as chromosomes, provides more important insight into the structure and organization of large biomolecules.

Conclusion Recent developments in microscopic tools, single DNA manipulation approaches, and new staining materials provide more sensitive observations of DNA-related cellular processes and detailed information compared with other approaches, such as gel electrophoresis. Therefore, single DNA visualization approaches have revolutionized the study of many DNA-involved cellular phenomena, detection methods using single DNA molecules, and DNA physical characteristics, as described in this chapter. However, there are still many aspects to study using single DNA visualization approaches. Although single-molecule DNA manipulation approaches provide precise measurement platforms for the study of conformational changes, mechanical dynamics, and in vitro reconstitution of biochemical interaction systems, it is still difficult to mimic the cellular conditions to make exactly the same protein complexes interacting with DNA molecules. Many protein complexes have complicated structures with different protein components; therefore, it is difficult to study sophisticated biochemical interaction systems in vitro. The examples provided in this chapter are simple DNA-protein interactions, in which only one or a few proteins interact with single DNA molecules. Therefore, mimicking complicated protein complex-DNA interactions should be studied to further understand cellular processes. Thus, it is possible to extend the application of single DNA visualization to genetic disease research.

47

Single-Molecule DNA Visualization

1521

The combination of single-molecule DNA manipulation approaches, such as optical tweezers and microfluidics, with fluorescent microscopy, enables precise measurement. However, the recently developed super-resolution microscopies with 10 nm spatial resolutions are still not sufficient to directly observe the detailed sites of the protein surface that interact with DNA molecules. Although electron microscopy has a higher resolution than super-resolution microscopy, it is still difficult to observe the precise sites on protein surfaces that are involved in interactions with other biomolecules. The higher the microscopic resolution, the more detailed the information. Even with the availability of an anti-cancer drug with a small molecular weight that inhibits protein-DNA interactions to kill cancer cells, direct observation of the inhibition mechanisms with recently developed microscopic approaches remains difficult because of the limitation in microscopic resolution. Therefore, the resolutions of fluorescence microscopy, electron microscopy, and atomic force microscopy should be improved. Improving the resolution of microscopic approaches will revolutionize studies in biophysics, biochemistry, toxicology, drug discovery, and other fields that require detection with high sensitivity. From the perspective of DNA-binding proteins, fluorescent proteins used for FP-DBP have relatively bulky volumes; therefore, in some cases, tagging fluorescent proteins with DNA-binding proteins leads to changes in the characteristics, structure, function, and behavior of the DNA-binding proteins. Therefore, it is better to decrease the size of the fluorescent proteins. To observe changes in conformational states during protein-DNA interactions, there should be methods for simultaneous staining of DNA-binding proteins and post-translational modification of the proteins. Small-sized fluorescent proteins are essential for labeling multiple sites in a DNA-binding protein. Although the development of single-molecule DNA manipulation approaches enables the efficient elongation of single DNA molecules, it is still challenging to elongate large genomic DNA molecules from various sources, such as mammalian, plant, and bacterial cells, in one frame or one long microchannel. To study DNA-protein interactions efficiently, it is necessary to elongate long, large genomic DNA molecules to map the DNA-protein interaction sites. To map the epigenetic modification sites on the DNA molecules that affect the epigenetic characteristics of the genomic DNA molecules from different organizations in humans, it is better to elongate the entire genomic DNA molecules. Therefore, the elongation of long, large genomic DNA molecules will improve studies in DNA-protein interactions and epigenetics. There are many combinations of DNA-binding proteins and fluorescent proteins; therefore, it is possible to have many types of DNA-protein interactions. Moreover, many DNA-binding and fluorescent proteins have never been used to synthesize DNA-binding fluorescent proteins. There are 2600 proteins with DNA-binding domains in human cells and 300 transcription factors in S. cerevisiae. The database for fluorescent proteins, FPbase.org, provides characteristic data of hundreds of fluorescent proteins. These proteins enable the development of new types of DNA-binding fluorescent proteins. Therefore, more protein-DNA interaction mechanisms can be revealed using single DNA molecule visualization approaches.

1522

X. Jin and K. Jo

In summary, the development of DNA staining materials, advanced DNA manipulation approaches, and microscopic approaches have revolutionized the field of DNA research. However, as mentioned above, there is still much room for improvement in various aspects, such as the application of single DNA molecule observation, mimicking of physiological conditions in vitro, and development of new FP-DBPs, DNA manipulation approaches, and new microscopies with improved resolutions. These developments will provide new insights into molecular biology.

References Aaij C, Borst P (1972) The gel electrophoresis of DNA. Biochimica et Biophysica Acta (BBA) 269(2):192–200. https://doi.org/10.1016/0005-2787(72)90426-1 Assaf Grunwald HS, Gabrieli T, Michaeli Y, Torchinsky D, Arieli R, Juhasz M, Wagner KR, Pevsner J, Reifenberger J, Hastie AR, Cao H, Weinhold E, Ebenstein Y (2017) Reduced representation optical methylation mapping(R2OM2). bioRxiv 10(11):1–21 Cai W, Aburatani H, Stanton Jr VP, Housman DE, Wang YK, Schwartz DC (1995) Ordered restriction endonuclease maps of yeast artificial chromosomes created by optical mapping on surfaces. Proc Natl Acad Sci U S A 92(11):5164–5168 Cheeseman K, Ropars J, Renault P, Dupont J, Gouzy J, Branca A, Abraham AL, Ceppi M, Conseiller E, Debuchy R, Malagnac F, Goarin A, Silar P, Lacoste S, Sallet E, Bensimon A, Giraud T, Brygoo Y (2014) Multiple recent horizontal transfers of a large genomic region in cheese making fungi. Nat Commun 5:2876. https://doi.org/10.1038/ncomms3876 Comelli L, Marchetti L, Arosio D, Riva S, Abdurashidova G, Beltram F, Falaschi A (2009) The homeotic protein HOXC13 is a member of human DNA replication complexes. Cell Cycle 8(3): 454–459 Davidson MW, Campbell RE (2009) Engineered fluorescent proteins: innovations and applications. Nat Methods 6(10):713–717 de Stanchina E, Gabellini D, Norio P, Giacca M, Peverali FA, Riva S, Falaschi A, Biamonti G (2000) Selection of homeotic proteins for binding to a human DNA replication origin. J Mol Biol 299(3):667–680 Dimalanta ET, Lim A, Runnheim R, Lamers C, Churas C, Forrest DK, de Pablo JJ, Graham MD, Coppersmith SN, Goldstein S, Schwartz DC (2004) A microfluidic system for large DNA molecule arrays. Anal Chem 76(18):5293–5301. https://doi.org/10.1021/ac0496401 Forget AL, Kowalczykowski SC (2012) Single-molecule imaging of DNA pairing by RecA reveals a three-dimensional homology search. Nature 482(7385):423–427. https://doi.org/10.1038/ nature10782 Gahlmann A, Moerner WE (2014) Exploring bacterial cell biology with single-molecule tracking and super-resolution imaging. Nat Rev Microbiol 12(1):9–22 Galletto R, Amitani I, Baskin RJ, Kowalczykowski SC (2006) Direct observation of individual RecA filaments assembling on single DNA molecules. Nature 443(7113):875–878. https://doi. org/10.1038/nature05197 Giemsa G (1904) Eine Vereinfachung und Vervollkommnung meiner Methylenblau-EosinFärbemethode zur Erzielung der Romanowsky-Nocht’schen Chromatinfärbung. Centralblatt für Bakteriologie I Abteilung 32:307–313 Giepmans BNG, Adams SR, Ellisman MH, Tsien RY (2006) Review - the fluorescent toolbox for assessing protein location and function. Science 312(5771):217–224 Gilat N, Tabachnik T, Shwartz A, Shahal T, Torchinsky D, Michaeli Y, Nifker G, Zirkin S, Ebenstein Y (2017) Single-molecule quantification of 5-hydroxymethylcytosine for diagnosis of blood and colon cancers. Clin Epigenetics 9:70. https://doi.org/10.1186/s13148-017-0368-9

47

Single-Molecule DNA Visualization

1523

Heng HH, Squire J, Tsui LC (1992) High-resolution mapping of mammalian genes by in situ hybridization to free chromatin. Proc Natl Acad Sci U S A 89(20):9509–9513. https://doi.org/ 10.1073/pnas.89.20.9509 Hiraga S, Ichinose C, Onogi T, Niki H, Yamazoe M (2000) Bidirectional migration of SeqA-bound hemimethylated DNA clusters and pairing of oriC copies in Escherichia coli. Genes Cells 5(5): 327–341 Jin X, Hapsari ND, Lee S, Jo K (2020) DNA binding fluorescent proteins as single-molecule probes. Analyst 145(12):4079–4095. https://doi.org/10.1039/d0an00218f Jing J, Lai Z, Aston C, Lin J, Carucci DJ, Gardner MJ, Mishra B, Anantharaman TS, Tettelin H, Cummings LM, Hoffman SL, Venter JC, Schwartz DC (1999) Optical mapping of plasmodium falciparum chromosome 2. Genome Res 9(2):175–181 Jinyong Lee YK, Lim S, Jo K (2016) Single-molecule visualization of ROS-induced DNA damage in large DNA molecules. Analyst 141:847–852 Jo K, Dhingra DM, Odijk T, de Pablo JJ, Graham MD, Runnheim R, Forrest D, Schwartz DC (2007) A single-molecule barcoding system using nanoslits for DNA analysis. Proc Natl Acad Sci U S A 104(8):2673–2678. https://doi.org/10.1073/pnas.0611151104 Kang Y, Lee J, Kim J, Oh Y, Kim D, Lee J, Lim S, Jo K (2016) Analysis of alcohol-induced DNA damage in Escherichia coli by visualizing single genomic DNA molecules. Analyst 141(14): 4326–4331. https://doi.org/10.1039/c6an00616g Kleinschmidt AK, Zahn RK (1959) Uber Deoxyribonuklein-. saure-Molekin in Protein Misschfilmen. Z Naturforsch 14b (770) Lee JCJY (2021) A novel high-throughput single-molecule technique: DNA curtain. J Korean Phys Soc 78:442–448 Lee S, Jo K (2016) Visualization of surface-tethered large DNA molecules with a fluorescent protein DNA binding peptide. J Vis Exp 112. https://doi.org/10.3791/54141 Lee J, Park HS, Lim S, Jo K (2013) Visualization of UV-induced damage on single DNA molecules. Chem Commun (Camb) 49(42):4740–4742. https://doi.org/10.1039/c3cc38884k Lee J, Kim Y, Lee S, Jo K (2015) Visualization of large elongated DNA molecules. Electrophoresis 36(17):2057–2071. https://doi.org/10.1002/elps.201400479 Lee S, Oh Y, Lee J, Choe S, Lim S, Lee HS, Jo K, Schwartz DC (2016) DNA binding fluorescent proteins for the direct visualization of large DNA molecules. Nucleic Acids Res 44(1):e6. https://doi.org/10.1093/nar/gkv834 Lee S, Kawamoto Y, Vaijayanthi T, Park J, Bae J, Kim-Ha J, Sugiyama H, Jo K (2018a) TAMRApolypyrrole for A/T sequence visualization on DNA molecules. Nucleic Acids Res 46(18). https://doi.org/10.1093/nar/gky531 Lee S, Kawamoto Y, Vaijayanthi T, Park J, Bae J, Kim-Ha J, Sugiyama H, Jo K (2018b) TAMRApolypyrrole for A/T sequence visualization on DNA molecules. Nucleic Acids Res 46(18):e108. https://doi.org/10.1093/nar/gky531 Lu M, Campbell JL, Boye E, Kleckner N (1994) SeqA: a negative modulator of replication initiation in E. coli. Cell 77(3):413–426. https://doi.org/10.1016/0092-8674(94)90156-2 Marchetti L, Comelli L, D'Innocenzo B, Puzzi L, Luin S, Arosio D, Calvello M, MendozaMaldonado R, Peverali F, Trovato F, Riva S, Biamonti G, Abdurashidova G, Beltram F, Falaschi A (2010) Homeotic proteins participate in the function of human-DNA replication origins. Nucleic Acids Res 38(22):8105–8119 Matsuoka T, Kim BC, Huang J, Douville NJ, Thouless MD, Takayama S (2012) Nanoscale squeezing in elastomeric nanochannels for single chromatin linearization. Nano Lett 12(12): 6480–6484. https://doi.org/10.1021/nl304063f Meng X, Benson K, Chada K, Huff EJ, Schwartz DC (1995) Optical mapping of lambda bacteriophage clones using restriction endonucleases. Nat Genet 9(4):432–438 Mika JT, Vanhecke A, Dedecker P, Swings T, Vangindertael J, Van den Bergh B, Michiels J, Hofkens J (2015) A study of SeqA subcellular localization in Escherichia coli using photoactivated localization microscopy. Faraday Discuss 184:425–450. https://doi.org/10.1039/ c5fd00058k

1524

X. Jin and K. Jo

Morikawa K, Yanagida M (1981) Visualization of individual DNA-molecules in solution by lightmicroscopy - Dapi staining method. J Biochem 89(2):693–696 Muller V, Dvirnas A, Andersson J, Singh V, Kk S, Johansson P, Ebenstein Y, Ambjornsson T, Westerlund F (2019) Enzyme-free optical DNA mapping of the human genome using competitive binding. Nucleic Acids Res 47(15):e89. https://doi.org/10.1093/nar/gkz489 Murade CU, Subramaniam V, Otto C, Bennink ML (2010) Force spectroscopy and fluorescence microscopy of dsDNA-YOYO-1 complexes: implications for the structure of dsDNA in the overstretching region. Nucleic Acids Res 38(10):3423–3431. https://doi.org/10.1093/nar/ gkq034 Neely RK, Dedecker P, Hotta JI, Urbanaviciute G, Klimasauskas S, Hofkens J (2010) DNA fluorocode: a single molecule, optical map of DNA with nanometre resolution. Chem Sci 1(4):453–460. https://doi.org/10.1039/c0sc00277a Park J, Lee S, Won N, Shin E, Kim SH, Chun MY, Gu J, Jung GY, Lim KI, Jo K (2019) Singlemolecule DNA visualization using AT-specific red and non-specific green DNA-binding fluorescent proteins. Analyst 144(3):921–927. https://doi.org/10.1039/c8an01426d Perkins TT, Quake SR, Smith DE, Chu S (1994) Relaxation of a single DNA molecule observed by optical microscopy. Science 264(5160):822–826. https://doi.org/10.1126/science.8171336 Pokhrel N, Origanti S, Davenport EP, Gandhi D, Kaniecki K, Mehl RA, Greene EC, Dockendorff C, Antony E (2017) Monitoring replication protein a (RPA) dynamics in homologous recombination through site-specific incorporation of non-canonical amino acids. Nucleic Acids Res 45(16):9413–9426. https://doi.org/10.1093/nar/gkx598 Ratan ZA, Zaman SB, Mehta V, Haidere MF, Runa NJ, Akter N (2017) Application of fluorescence in situ hybridization (FISH) technique for the detection of genetic aberration in medical science. Cureus 9(6):e1325. https://doi.org/10.7759/cureus.1325 Ratz M, Testa I, Hell SW, Jakobs S (2015) CRISPR/Cas9-mediated endogenous protein tagging for RESOLFT super-resolution microscopy of living human cells. Sci Rep-UK:5 Ray M, Goldstein S, Zhou S, Potamousis K, Sarkar D, Newton MA, Esterberg E, Kendziorski C, Bogler O, Schwartz DC (2013) Discovery of structural alterations in solid tumor oligodendroglioma by single molecule analysis. BMC Genomics 14:505. https://doi.org/10.1186/14712164-14-505 Richardson TT, Harran O, Murray H (2016) The bacterial DnaA-trio replication origin element specifies single-stranded DNA initiator binding. Nature 534(7607):412–416. https://doi.org/10. 1038/nature17962 Russell WC, Newman C, Williamson DH (1975) A simple cytochemical technique for demonstration of DNA in cells infected with mycoplasmas and viruses. Nature 253(5491):461–462. https://doi.org/10.1038/253461a0 Rye HS, Yue S, Wemmer DE, Quesada MA, Haugland RP, Mathies RA, Glazer AN (1992) Stable fluorescent complexes of double-stranded DNA with bis-intercalating asymmetric cyanine dyes: properties and applications. Nucleic Acids Res 20(11):2803–2812. https://doi.org/10.1093/nar/ 20.11.2803 Saeidnia S, Abdollahi M (2013) Are other fluorescent tags used instead of ethidium bromide safer? Daru 21(1):71. https://doi.org/10.1186/2008-2231-21-71 Schenk K, Hervas AB, Rosch TC, Eisemann M, Schmitt BA, Dahlke S, Kleine-Borgmann L, Murray SM, Graumann PL (2017) Rapid turnover of DnaA at replication origin regions contributes to initiation control of DNA replication. PLoS Genet 13(2):e1006561. https://doi. org/10.1371/journal.pgen.1006561 Schwartz DC, Li X, Hernandez LI, Ramnarain SP, Huff EJ, Wang YK (1993) Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262(5130):110–114 Seiji Matsumoto KM, Yanagida M (1981) Light microscopic structure of DNA in solution studied by the 40 ,6-diamidino-2-phenylindole staining method. J Mol Biol 152(2):501–516. https://doi. org/10.1016/0022-2836(81)90255-2

47

Single-Molecule DNA Visualization

1525

Seonghyun Lee CW, Song J, Kim D-g, Yeeun O, Ko W, Lee J, Park J, Lee HS, Jo K (2016) Investigation of various fluorescent protein–DNA binding peptides for effectively visualizing large DNA molecules. RSC Adv 6(52):46291–46298 Shaner NC, Patterson GH, Davidson MW (2007) Advances in fluorescent protein technology. J Cell Sci 120(24):4247–4260 Shelby RD, Hahn KM, Sullivan KF (1996) Dynamic elastic behavior of alpha-satellite DNA domains visualized in situ in living human cells. J Cell Biol 135(3):545–557 Shemiakina II, Ermakova GV, Cranfill PJ, Baird MA, Evans RA, Souslova EA, Staroverov DB, Gorokhovatsky AY, Putintseva EV, Gorodnicheva TV, Chepurnykh TV, Strukova L, Lukyanov S, Zaraisky AG, Davidson MW, Chudakov DM, Shcherbo D (2012) A monomeric red fluorescent protein with low cytotoxicity. Nat Commun 3:1204. https://doi.org/10.1038/ ncomms2208 Shen Y, Chen Y, Wu J, Shaner NC, Campbell RE (2017) Engineering of mCherry variants with long stokes shift, red-shifted fluorescence, and low cytotoxicity. PLoS One 12(2):e0171257. https:// doi.org/10.1371/journal.pone.0171257 Shin E, Kim W, Lee S, Bae J, Kim S, Ko W, Seo HS, Lim S, Lee HS, Jo K (2019) Truncated TALEFP as DNA staining dye in a high-salt buffer. Sci Rep 9(1):17197. https://doi.org/10.1038/ s41598-019-53722-0 Shu XK, Shaner NC, Yarbrough CA, Tsien RY, Remington SJ (2006) Novel chromophores and buried charges control color in mFruits. Biochemistry-US 45(32):9639–9647 Singh V, Johansson P, Lin YL, Hammarsten O, Westerlund F (2021) Shining light on single-strand lesions caused by the chemotherapy drug bleomycin. DNA Repair (Amst) 105:103153. https:// doi.org/10.1016/j.dnarep.2021.103153 Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S, Pyntikova T, Ali J, Bieri T, Chinwalla A, Delehaunty A, Delehaunty K, Du H, Fewell G, Fulton L, Fulton R, Graves T, Hou SF, Latrielle P, Leonard S, Mardis E, Maupin R, McPherson J, Miner T, Nash W, Nguyen C, Ozersky P, Pepin K, Rock S, Rohlfing T, Scott K, Schultz B, Strong C, Tin-Wollam A, Yang SP, Waterston RH, Wilson RK, Rozen S, Page DC (2003) The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 423(6942):825–U822 Slater S, Wold S, Lu M, Boye E, Skarstad K, Kleckner N (1995) E. coli SeqA protein binds oriC in two different methyl-modulated reactions appropriate to its roles in DNA replication initiation and origin sequestration. Cell 82(6):927–936. https://doi.org/10.1016/0092-8674(95)90272-4 Stewart-Ornstein J, Lahav G (2016) Dynamics of CDKN1A in single cells defined by an endogenous fluorescent tagging toolkit. Cell Rep 14(7):1800–1811 Tycon MA, Dial CF, Faison K, Melvin W, Fecko CJ (2012) Quantification of dye-mediated photodamage during single-molecule DNA imaging. Anal Biochem 426(1):13–21. https://doi. org/10.1016/j.ab.2012.03.021 Wang WQ, Li GW, Chen CY, Xie XS, Zhuang XW (2011) Chromosome organization by a nucleoid-associated protein in live bacteria. Science 333(6048):1445–1449 Weise A, Mrasek K, Ewers E, Mkrtchyan H, Kosyakova N, Liehr T (2009) Diagnostic applications of fluorescence in situ hybridization. Expert Opin Med Diagn 3(4):453–460. https://doi.org/10. 1517/17530050902841948 Williams R, Wyckoff R (1945) Electron shadow-micrography of virus particles. Proc Soc Exptl Biol Med 58:265–270 Xue H, Bei Y, Zhan Z, Chen X, Xu X, Fu YV (2017) Utilizing biotinylated proteins expressed in yeast to visualize DNA-protein interactions at the single-molecule level. Front Microbiol 8: 2062. https://doi.org/10.3389/fmicb.2017.02062 Xue C, Daley JM, Xue X, Steinfeld J, Kwon Y, Sung P, Greene EC (2019) Single-molecule visualization of human BLM helicase as it acts upon double- and single-stranded DNA substrates. Nucleic Acids Res 47(21):11225–11237. https://doi.org/10.1093/nar/gkz810

1526

X. Jin and K. Jo

Xue C, Molnarova L, Steinfeld JB, Zhao W, Ma C, Spirek M, Kaniecki K, Kwon Y, Belan O, Krejci K, Boulton SJ, Sung P, Greene EC, Krejci L (2021) Single-molecule visualization of human RECQ5 interactions with single-stranded DNA recombination intermediates. Nucleic Acids Res 49(1):285–305. https://doi.org/10.1093/nar/gkaa1184 Yardimci H, Loveland AB, Habuchi S, van Oijen AM, Walter JC (2010) Uncoupling of sister replisomes during eukaryotic DNA replication. Mol Cell 40(5):834–840. https://doi.org/10. 1016/j.molcel.2010.11.027 Zhou S, Deng W, Anantharaman TS, Lim A, Dimalanta ET, Wang J, Wu T, Chunhong T, Creighton R, Kile A, Kvikstad E, Bechner M, Yen G, Garic-Stankovic A, Severin J, Forrest D, Runnheim R, Churas C, Lamers C, Perna NT, Burland V, Blattner FR, Mishra B, Schwartz DC (2002) A whole-genome shotgun optical map of Yersinia pestis strain KIM. Appl Environ Microbiol 68(12):6321–6331 Zirkin S, Fishman S, Sharim H, Michaeli Y, Don J, Ebenstein Y (2014) Lighting up individual DNA damage sites by in vitro repair synthesis. J Am Chem Soc 136(21):7771–7776. https://doi.org/ 10.1021/ja503677n

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles

48

Kyoung-Ran Kim, Junghyun Kim, and Dae-Ro Ahn

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Four Classes of NANPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Drug Loading in NANPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In Vivo Stability of NANPs for Targeted Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NANPs for Passive Tumor-Targeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NANPs for Active Tumor-Targeting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discovery of Tumor-Specific NANPs Using Library Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NANPs for Lung-Targeted Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NANPs for Kidney-Targeted Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NANPs for Liver-Targeted Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NANPs for Brain-Targeted Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . NANPs for Spleen-Targeted Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Challenges and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1528 1530 1532 1533 1533 1536 1538 1539 1542 1544 1545 1546 1547 1548

Abstract

While nucleic acids are naturally employed as carriers of genetic information based on the base pairs, they can be used as programmable materials with the base complementarity to prepare nanoparticles. Nucleic acid nanoparticles (NANPs) are highly biocompatible, and their size and shape are easily controllable via K.-R. Kim · J. Kim Chemical and Biological Integrative Research Center, Biomedical Research Division, Korea Institute of Science and Technology (KIST), Seoul, Republic of Korea e-mail: [email protected] D.-R. Ahn (*) Chemical and Biological Integrative Research Center, Biomedical Research Division, Korea Institute of Science and Technology (KIST), Seoul, Republic of Korea Division of Bio-Medical Science and Technology, KIST School, University of Science and Technology (UST), Seoul, Republic of Korea e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_54

1527

1528

K.-R. Kim et al.

base-pairing sequences. Due to such properties of NANPs distinguished from those of other nanomaterials, they are promising candidates as drug delivery carriers. In particular, the structural features of NANPs can be fine-tuned at subnanometer resolution for the specific conditions required for biodistribution to a target tissue, thereby providing opportunities to overcome biological hurdles that hinder tissue-specific drug delivery. In this chapter, we describe major strategies to fabricate NANPs suitable for in vivo drug delivery and recent efforts to develop tissue-specific NANPs targeting important disease-related tissues such as tumors, liver, lungs, kidney, and brain and their applications for targeted systemic drug delivery. Keywords

DNA nanoparticles · Drug delivery · Tissue-specificity · Systemic administration · Biodistribution

Introduction Drug delivery to the target tissue is required to maximize the efficacy of the drug while minimizing undesired side effects caused by off-targeting in vivo. Topical administration routes of drug delivery to accessible tissues have been employed to effectively treat tissue-specific diseases. For example, therapeutics can be delivered directly into the eye through intravitreal injection for the treatment of retinal diseases, including age-related macular degeneration. Intrathecal injection is a route to the spinal canal leading to cerebrospinal fluid to treat diseases of the central nervous system, including spinal muscular atrophy. Intratracheal injection delivers drugs into the lungs for the treatment of respiratory diseases, including chronic obstructive pulmonary disease (COPD). However, despite their accuracy, topical administration routes are often invasive and not patient friendly. In addition, there are other tissues that are hardly accessible with local injections and therefore require tissue-specific systemic administration. Representative systemic administration routes, such as intravenous (i.v.) injection and intraperitoneal (i.p.) injection, rely on vascular circulation to deliver therapeutics to the target tissues. However, systemically injected therapeutics need to overcome several barriers to reach the target area and achieve therapeutic efficacy comparable to that of topical administration routes. On entering the vascular circulation, the therapeutics will encounter hydrolytic enzymes in plasma, including proteases and nucleases, which have the potential to degrade peptide- and nucleotide-based therapeutics, respectively (Fig. 1). Therapeutics may also be opsonized by plasma proteins such as immunoglobulins and complement factors, and the opsonized therapeutics are recognized for clearance by the mononuclear phagocyte system. In addition, the therapeutics need to penetrate the endothelial layer for extravasation

48

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles

1529

Fig. 1 Drugs administered via systemic routes such as oral, intraperitoneal i.p. and intravenous i.v. injections can be degraded by hydrolytic enzymes and cleared by phagocytes before extravasation. After extravasation, drugs need to go through extracellular matrix (ECM) and plasma membrane. Furthermore, the endosomal escape is required for drugs to reach their cytoplasmic targets

around the target tissue and then pass through the extracellular matrix to reach the target cells in the tissue. Finally, cellular internalization is required for therapeutics that interact with intracellular targets. If cellular uptake is based on endocytosis, entrapment of the therapeutics in endosomes and lysosomes, where acidic pH and hydrolytic enzymes can inactivate the potency of the therapeutics, is another barrier to be overcome before the therapeutics interact with the cytoplasmic targets. Nanoparticle-based drug delivery technology has been developed to improve the in vivo properties of therapeutics, including pharmacokinetics, by changing the physicochemical properties of the nanoparticles containing the active ingredients, thereby overcoming in vivo barriers hampering tissue-specific delivery. For example, nanoparticles with sizes suitable for adherence to endothelial cells and extravasation through the leaky vascular structure around the tumor have been employed as carriers for tumor-targeted drug delivery. In addition to the size, the effects of the shape and charge of nanoparticles on their tumor targetability have also been extensively investigated, illustrating that controlling the structural and chemical features of nanoparticles is critical to achieving tissue-specific drug delivery (Mitchell et al. 2021). Nucleic acid nanoparticles (NANPs), constructed by the self-assembly of DNA and RNA strands, are highly programmable nanoconstructs distinguished from other nanomaterials. They are emerging drug delivery platforms with excellent potential for clinical translation because of the innate biocompatibility of nucleic acids. The

1530

K.-R. Kim et al.

size and shape of NANPs can be defined at subnanometer resolution by designing base-pairing sequences. Tissue-specific NANPs can be developed using strategies similar to those employed for nanoparticle-based drug carriers, utilizing structures and ligands preferred by the target tissues. However, after systemic injection, the cargo and/or carrier may be unexpectedly recognized by nontarget cells or adsorbed onto plasma proteins before reaching the target site. To address these in situgenerated distortions of targeting properties, a library of various NANP structures can be screened to develop tissue-specific carriers. The possibility for precise control of NANP structure unlike with other nanomaterials enables researchers to generate libraries with structural diversity, with the potential to unearth nanostructures satisfying the subtle conditions required for target tissue-specific delivery. In this chapter, we describe recently reported approaches, including conventional methods and library screening, to develop tissue-specific NANPs targeting tumors, liver, lungs, kidney, and brain and their applications for targeted systemic drug delivery. As we focused on the tissue-specific properties of NANPs upon systemic administration (i.v. and i.p. injections), the efforts on applications of NANPs to targeted delivery only at in vitro cellular levels will not be covered in this chapter. We recommend that those who are more interested in the broad range of biomedical applications of NANPs refer other relevant reviews (Hu et al. 2019). In addition, overviews of nanoparticles other than NANPs for tissue-specific drug delivery can be found in recent reviews and accounts (Ulbrich et al. 2016).

Four Classes of NANPs Four main types of NANPs are used in drug delivery technology: 1) wireframe structures, 2) origami structures, 3) nanoflowers, and 4) spherical nucleic acids (SNAs) (Fig. 2). Wireframe nucleic acid nanostructures are self-assembled structures of oligonucleotide strands. They include spiky constructs with three- and four-way

Fig. 2 Four major classes of NANPs based on their structure and mode of construction

48

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles

1531

junctions (3WJ and 4WJ) (Lilley and Clegg 1993), dendrimers (Liu et al. 2021b), and polyhedral structures with multiple fenestrations and a central hollow space (Goodman et al. 2005). The size of wireframe nanostructures is in the range 5–50 nm and can be controlled by changing the number of base pairs to adjust the length of the duplex or the double-crossover wireframe. NANPs of various polyhedral shapes can be sculpted based on wireframe nucleic acid nanostructures with diverse aspect ratios. In particular, DNA tetrahedron has the proper shape to minimize electrostatic repulsion when approaching the plasma membrane for cellular internalization (Ding et al. 2018), making it the most popular NANP-based drug delivery platform. The promising potential of DNA tetrahedron as a drug delivery carrier has recently been reviewed (Hu et al. 2017). DNA and RNA origami structures are obtained by folding a long scaffold strand via hybridization of stapling oligonucleotide strands (Rothemund 2006). DNA origami structures of 50–200 nm size have been obtained using M13 phage genomic DNA as the scaffold. Virtually any nanostructure shape can be built using origami technology. The origami strategy can be used to fabricate the condensed structures without the central hollow cavity, as well as the wireframe nanostructures (Benson et al. 2015). Despite their requirement for significantly higher concentrations of Mg2+ compared to the physiological Mg2+ concentration, DNA origami structures have been extensively employed as drug delivery carriers in recent studies (Jiang et al. 2018b). Nanoflowers are NANPs formed via entanglement of long polynucleotide strands produced by polymerase-mediated rolling circle amplification (RCA) (Mohsen and Kool 2016). As the structures of the nanoflowers are generated in situ upon enzymatic reaction, the size of the nanoflowers is less controllable compared to that of the wireframe or the origami structures, becoming easily over 1 μm. Several sizecontrolling methods for RCA products, including varying substrate concentrations, embedding chemical modifications, or incorporating secondary structure-forming sequences, have been suggested to constrain the size of the nanoflowers to less than 500 nm for drug delivery (Kim et al. 2018b). Nanoflowers are mostly shaped as balllike flowers assembled with a bunch of petals on the surface. Although it is difficult to control the overall shape, the surface pattern of nanoflowers can be varied by adjusting the type and concentration of divalent metal ions required for RCA (Lee et al. 2017b). Nanoflowers have been employed for the delivery of both smallmolecule drugs and macromolecular therapeutics, demonstrating their potential as a promising delivery platform (Sun et al. 2015). SNAs are core-shell nanoconstructs displaying densely packed oligonucleotides on the surfaces of inorganic and organic spherical nanoparticles (Cutler et al. 2012). SNAs are spherical, with a typical size of 10–100 nm that can be easily controlled by adjusting the length of the oligonucleotides on the surface and the size of the core nanoparticles. Oligonucleotides in the shell can be replaced or hybridized with oligonucleotide therapeutics (OTs) such as small interfering RNAs (siRNA) and antisense oligonucleotides (ASOs), illustrating the potential of SNAs as carriers, mainly for OTs in drug delivery applications.

1532

K.-R. Kim et al.

Drug Loading in NANPs The drug loading capacity of NANPs can vary depending on the platform type, size of the drug, and loading methods employed. However, the number of drugs loaded in the carriers should be carefully controlled as too many payloads per carrier may change the carrier properties of NANPs, including target tissue specificity. Small molecule drugs that interact strongly with the major or minor grooves of duplexes can easily be loaded into NANPs via spontaneous intercalation into the duplexes when mixed with NANPs (Fig. 3). The intercalated drugs are usually released as a result of relatively weakened drug-cargo interactions at endosomal pH (5–6). Drugs that exhibit weak or no interactions with the grooves can be chemically conjugated at precise positions, such as internal nucleotides and the ends of the oligonucleotides that compose NANPs. Peptide and protein drugs are also loaded via either noncovalent interaction or chemical conjugation. Cationic peptides and proteins can be easily complexed with polyanionic NANPs (Kim et al. 2020a). Biotin-streptavidin interaction can be employed as a cross-linker between protein cargos and NANPs when a precise drug-to-cargo ratio is desired (Kim et al. 2018a). Alternatively, chemical conjugation of peptides and proteins can be enabled by cross-linking the cargo and premodified nucleotide residues incorporated in NANPs (Stephanopoulos 2020). For chemically conjugated drugs that require release from their carriers in the target site for potency, cross-linkers that are cleavable upon responding to the environment of the target site should be used for conjugation. OTs are the most suitable cargos for NANP carriers as they can be either seamlessly inserted within the sequences of NANPs or hybridized with a part of NANPs. The number of OTs can be precisely controlled by adjusting the number of cargo sequences to be inserted or hybridized. Loading Small molecule drugs and OTs together in one carrier for combination therapy is also possible provided the targeting ability of NANPs after loading is maintained (Liu et al. 2018a). Unlike most other nanomaterial-based

Fig. 3 Various types of drugs can be loaded in NANPs via either covalent bonds or noncovalent interactions

48

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles

1533

carriers, NANPs can be loaded with an accurate ratio of the two different types of drugs, offering opportunities to optimize the therapeutic efficacy of combination therapy.

In Vivo Stability of NANPs for Targeted Delivery Nanomaterials intended for biomedical applications are required to exhibit a certain level of biocompatibility. Polyanionic NANPs exhibit lower cytotoxicity than the positive nanoparticles, as their uptake results in milder disruption of the negative cell membrane. Although the charge of polyanionic NANPs is supposed to impede their penetration through the negative cell membrane, the cellular internalization of these NANPs is enhanced as a result of their interactions with various receptors including scavenger receptors on the cell surface (Paterl et al. 2010). In vivo administered NANPs are susceptible to nucleases and are therefore expected to eventually degrade and exhibit low long-term toxicity. Although the biocompatibility of NANPs based on their nuclease susceptibility is certainly advantageous for their application as drug delivery carriers, there are also some disadvantages. For example, the degradation of NANPs before arriving at the target site would abolish all the aforementioned desirable properties of NANPs as drug carriers. When the structural feature defining the tissue specificity of NANPs is lost on degradation, the drug loaded on NANPs may be released at off-target sites, leading to low efficacy and high side effects. Thus, NANPs designed as drug carriers should exhibit moderate nuclease stability to facilitate the major portion to reach the target tissue, with eventual degradation and clearance after drug delivery.

NANPs for Passive Tumor-Targeting NANP-based drug carriers have been mostly employed to target tumors, because of the clinical impact of targeted anticancer therapy and the precedent examples of cancer nanomedicines with other materials. Passive tumor targeting is based on the extravasation of nanoparticles through the leaky vascular structure around the solid tumor tissue, known as the enhanced permeability and retention (EPR) effect (Fig. 4)

Fig. 4 Intravenously injected NANPs are extravasated through fenestrae of tumor vessels during circulation, leading to tumor accumulation of NANPs

1534

K.-R. Kim et al.

(Matsumura and Maeda 1986). According to traditional understanding, the passive targeting mechanism relies mainly on the physical size and shape of nanoparticles to extravasate, while recent studies have suggested that the dynamic processes in tumor vessels may play an important role in the tumor accumulation of nanoparticles (Sindhwani et al. 2020). Similar to traditional nanomedicine used for passive tumor targeting, the size and shape of DNA nanostructures can be fabricated to enhance their tumor accumulation after prolonged circulation. However, the in vivo environment is enriched with nucleases that may promptly degrade the injected DNA materials, thereby preventing their prolonged circulation and tumor accumulation. This intrinsic nuclease susceptibility may also compromise the programmability of DNA nanostructures, despite their higher nuclease stability than that of linear single- and double-stranded structures. To circumvent this issue, the DNA nanostructures can be made more nuclease-resistant by increasing their structural complexity. DNA origamis enriched with three- and four-way junctions exhibit higher nuclease stability than simple wireframed DNA polyhedrons (Chandrasekaran 2021). The recently discovered topology of double crossover, which influences nuclease stability, can also be harnessed for designing nanostructures with improved in vivo stability (Chandrasekaran et al. 2020), thereby providing an opportunity to preserve the passive tumor-targeting ability of DNA nanostructures. Anticancer agents loaded in NANPs with passive tumor-targeting potential based on enhanced nuclease resistance can be delivered to tumors with higher targeting specificity. Mou et al. prepared polyhedral DNA nanostructures incorporated with floxuridine, an anticancer agent (Mou et al. 2017), based on DNA tiles assembled using double crossover. The anticancer drug within the DNA nanostructures functioned as a prodrug, exhibiting anticancer potency after in vivo hydrolysis of the polyhedral DNA nanostructures. Floxuridine-incorporated DNA nanoflowers have also been prepared with an anticancer aptamer sequence (AS1411) using RCA (Tran et al. 2020). The enzymatically generated DNA polynucleotides were entangled to form nanoparticles that showed enhanced nuclease stability and passive-targeted tumor distribution in a xenograft mouse model upon i.v. injection, thereby enabling the co-delivery of floxuridine and AS1411 into the tumor. A triangular DNA origami was found to exhibit a tumor-specific distribution upon i.v. injection (Zhang et al. 2014). However, origami structures with rectangular and long tubular DNA nanostructures were unable to reach the tumor with comparable efficiency, indicating the dependence of tumor-targeting properties of the origami structures on their shape. Doxorubicin was loaded into the triangular DNA origami via intercalation and delivered to the tumor in a targeted manner, leading to significant therapeutic efficacy in a xenograft tumor mouse model. In another study, a 6-helix-wide and 96-bp-long rectangular DNA origami (6Hx96BP-Rect), prepared using the modular DNA brick method (Ke et al. 2012), was loaded with siBcl2 and intravenously injected into a DMS53 tumor-bearing xenograft mouse model. Although the tumorspecific distribution of 6Hx96BP-Rect was not investigated, tumor growth was significantly inhibited on treatment with the siBcl2-loaded DNA origami, suggesting that a substantial amount of siRNA could be delivered to the tumor (Rahman et al.

48

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles

1535

2017). Erlotinib, a potent EGFR inhibitor with low permeability and solubility, was covalently conjugated to DNA strands and self-assembled into a 6  6  64 nt DNA framework for the effective treatment of nonsmall cell lung cancer (Wang et al. 2020). Compared to the erlotinib-conjugated single stand, the DNA nanostructure led to increased tumor accumulation due to the EPR effect. A DNA nanotube based on an origami structure was designed to undergo a glutathione (GSH)-responsive conformational change to open the tube and release the encapsulated cargo. When the DNA nanotubes loaded with doxorubicin, siBcl2, and siP-gp were intravenously injected into the MCF-7R tumor-bearing xenograft mouse model, tumor-specific distribution of the nanotubes was observed, thereby co-delivering anticancer chemo/ siRNA therapeutics to inhibit tumor growth (Wang et al. 2021). Oligonucleotides grafted on the surface of SNAs are also considerably stable against nucleases because of the high local salt concentrations (Seferos et al. 2009), consequently facilitating the passive accumulation of SNAs in tumors via the EPR effect. The conjugates of 19-mer DNA oligonucleotides and dodecane units were demonstrated to form 12 nm micellar SNAs, which exhibited long circulation times and tumor accumulation after full-body distribution (Bousmail et al. 2017). Coating NANPs with a protective membrane is another method to improve their in vivo stability for passive tumor accumulation. An octahedral DNA origami encapsulated in a PEGylated lipid bilayer membrane showed reduced interaction with immune cells and longer circulation time in vivo compared to the naked structure (Perrault and Shih 2014). PEGylated oligolysines were also able to protect DNA origami structures from nuclease digestion, resulting in a modest increase in pharmacokinetic bioavailability (Ponnuswamy et al. 2017). Cross-linking of the PEGylated oligolysines further increased the nuclease resistance of the DNA origami (Anastassacos et al. 2020). Although several methods have been suggested for the surface coating of NANPs, only a few applications of this strategy for tumor targeting have been reported. A wireframe DNA bipyramid coated with PEGoligolysine copolymer showed considerable tumor distribution when injected intravenously into MDA-MB-231-tumor bearing mice (Song et al. 2020). The DNA bipyramid was labeled with a magnetic resonance imaging (MRI) contrast agent, such as Gd-DOTA, and used for MRI imaging of triple-negative breast tumors in mice. The nuclease stability of NANPs can also be improved by employing sugar backbone-modified oligonucleotides for the assembly of nucleic acid nanostructures. Modification of the phosphodiester linkage between nucleotides with a phosphorothioate group has been conventionally used to stabilize nucleic acids. In addition to phosphorothioate, modified sugar backbones such as 20 -methoxyribo nucleic acid (2’-OMe-RNA), 20 -fluororibo nucleic acid (2’-F-RNA), 20 -fluoroarabino nucleic acid (FANA), hexitol nucleic acid (HNA), locked nucleic acid (LNA), and cytohexene nucleic acid (CeNA) have also been employed in the construction of nucleic acid nanostructures to improve nuclease stability (Taylor et al. 2016 and Kim et al. 2019). The serum stability of the modified sugar backbone-based nucleic acid nanostructures was considerably higher than that of natural backbones, enabling sufficient circulation time for passive tumor targeting (Kim et al. 2019). When

1536

K.-R. Kim et al.

replacing DNA with modified nucleotides, the thermodynamic properties of the nanostructures, such as melting temperature, may also change, potentially requiring additional adjustment of the sequence of oligonucleotides participating in the assembly of the nanostructures. To avoid such additional efforts in using modified backbone-based NANPs, L-DNA, an enantiomeric DNA based on L-deoxyribose instead of natural D-deoxyribose, could be considered as the backbone for NANPs because L-DNA nanostructures have thermodynamic properties identical to those of D-DNA nanostructures. Indeed, a recent study showed that an L-DNA tetrahedron (L-Td) with 17-mer per side could be simply prepared with an identical sequence of oligonucleotides to the reported D-DNA tetrahedron (D-Td) and that L-Td is significantly more stable in serum than its natural counterpart (Kim et al. 2014). No degradation product was observed on incubation of L-Tds in 10% mouse serum for at least 3 days, whereas D-Tds was almost completely degraded after 12 h incubation (Kim et al. 2014). Owing to such extremely high serum stability, L-Tds showed extended circulation time as well as increased plasma concentrations compared to D-Tds (Kim et al. 2016). In addition, L-Tds unexpectedly showed enhanced cellular uptake efficiency compared to D-Tds, possibly because more receptors are involved in the endocytosis of L-Tds (Kim et al. 2014). The tumor-homing property and enhanced cellular uptake efficiency of L-Tds were successfully utilized for tumor-targeted delivery of various bioactive agents, such as doxorubicin, and enzymes such as caspase-3, the core enzyme that induces apoptotic cell death (Kim et al. 2016, 2018a).

NANPs for Active Tumor-Targeting NANPs for active tumor-targeting can be designed based on the ligand exhibiting the preferred interaction with the tumor and tumor-associated environment (Fig. 5). Active and passive targeting can occur concurrently or consecutively. Active tumor-targeting DNA nanostructures usually employ cancer cell-specific ligands that have been employed in many previous studies on tumor-targeting nanoparticles (Muhamed et al. 2018).

Fig. 5 Receptors abundantly expressed in cancer cells such as folate receptor and nucleolin are targeted by conjugation of ligands (folate) and aptamers (AS1411) with NANPs, respectively, for tumor tissue-specific delivery

48

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles

1537

Tumor-specific distribution of a Td with 30-mer per side, which was originally distributed to the kidney upon i.v. injection into mice, was achieved via its conjugation with folates and harnessed for tumor-targeted siRNA delivery (Lee et al. 2012). Liu et al. designed an NANP-based nanoplatform using the co-assembly of branched antisense and siRNA for tumor therapy (Liu et al. 2021a). Seven-armed antisense oligonucleotides were conjugated to β-cyclodextrin, facilitating the attachment of adamantane-modified folate molecules and hemagglutinin peptides to the formulation utilizing the host-guest chemistry for tumor targeting and endosomal escape, respectively. The nanoformulation with folic acid showed significantly high tumor accumulation and remarkable antitumor effects. RNA nanoparticles assembled with the 3WJ domain of phi29 pRNA (3WJ-pRNA) with folate conjugation were specifically distributed to the tumor when intravenously injected into a xenograft mouse model. The high tumor accumulation was potentially enabled by the cancer cell-specific ligand as well as by nuclease resistance gained as a result of the 2’-F-modification in U and C (Shu et al. 2011). Designed with the same ligand and modifications, pRNA-X, an X-shaped RNA nanoparticle, also showed a high tumorspecific distribution (Shu et al. 2013). Affibody, a small engineered protein for targeting, was conjugated to Td and demonstrated HER2-targeting ability in BT474-bearing mice (Zhang et al. 2020). Floxuridine was also loaded onto the Td via covalent bonding with a definite drug-loading ratio (19.6%), and the resultant affibody-floxuridine-Td showed improved antitumor efficacy. DNA nanostructures decorated with cancer cell-specific aptamers can also target tumor tissues. A tubular DNA origami structure decorated with AS1411 aptamers was able to target tumor-associated blood vessels and switch to the planar structure to release the encapsulated cargo upon recognition of nucleolin in tumor-associated endothelial cells by the aptamers (Li et al. 2018). Thrombin conjugated with a DNA linker was loaded as the cargo in the DNA tube via hybridization, and this DNA tube was administered to tumor-bearing mice by i.v. injection to induce intravascular thrombosis, resulting in inhibition of tumor growth. In another study, the MUC1 aptamer binding to mucin 1, abundantly expressed on epithelial cancer cells, was conjugated to a triangular DNA origami, fortifying the tumor-homing property of the origami structure. Co-delivery of chemo and RNAi therapeutics was successfully achieved with the MUC1 aptamer-conjugated triangular DNA origami as a tumortargeted carrier for antitumor efficacy (Liu et al. 2018b). This strategy was also demonstrated using the Td system, where three MUC1 aptamers were conjugated per Td for active targeting (Han et al. 2019). Aptamer-conjugated Td exhibited higher tumor accumulation and consequently induced an improved antitumor effect than the control (free doxorubicin). Self-assembled linear DNA nanostructures called nanotrains capped with sgc8, an aptamer of human T-cell acute lymphocytic leukemia (CEM), have been used to deliver doxorubicin to CEM tumor xenografts in mice, showing a moderate anticancer effect. However, the tumor-specific distribution of the nanotrains was not investigated in this study (Zhu et al. 2013). SNAs formed by tiling rigidified DNA bricks on gold nanoparticles also showed tumor accumulation in conjugation with on AS1411 aptamers, whereas unconjugated SNAs were not able to reach the tumor (Chang et al. 2021). A DNA nanocomplex

1538

K.-R. Kim et al.

prepared by RCA of AS1411 aptamer and DNAzyme exhibited significantly enhanced tumor accumulation as a result of the active targeting effect of the aptamer (Yao et al. 2022). This nanocomplex also incorporated promoter-like Zn-Mn-ferrite (ZMF) to achieve combination therapy (gene/chemo-dynamic treatment). Using a similar strategy, Zhang et al. adopted the complementary sequence of the sgc8 aptamer sequence (targeting protein tyrosine kinase 7; PTK7) in the RCA template for their DNA nanoflower, which exerted an active targeting effect in HCT-116 tumor-bearing mice (Zhang et al. 2019). They also added a ferrocene moiety to induce self-degradation of the nanoflower via Fenton’s reaction in cancer cells to facilitate the efficient release of anticancer drugs from the formulation. RNA 4WJ nanoparticles conjugated with EGFR aptamers were used to enhance the solubility of paclitaxel, the hydrophobic anticancer drug cargo, and tumor-specific delivery of the cargo in a KB tumor-bearing xenograft mouse model (Guo et al. 2020). The intravenously injected drug-loaded RNA nanoparticles were preferentially accumulated in the tumor, leading to an antitumor effect.

Discovery of Tumor-Specific NANPs Using Library Approach NANPs fabricated for tumor accumulation may be distorted in vivo and fail to reach the target tissue after injection. Despite good tumor accumulation, many NANPs also exhibit a considerable liver distribution, resulting in limited tumor specificity with a tumor-to-liver ratio often lower than 1. To address this issue, a library screening-based approach is proposed (Fig. 6). A library of 16 wireframed nucleic acid nanostructures containing an 18-mer duplex per side and formed by a combination of four sugar backbone types, namely, D-DNA (D), L-DNA (L), 2’-OMe-RNA (M), and 2’-F-RNA (F), with four shapes, namely, pyramid (Py), triangular prism (Tp), cube (Cb), and rugby ball (Rb), has been prepared and screened for tumor accumulation. Modified serum stable backbone-based nanostructures generally showed a high tumor accumulation level. L-Py showed the highest tumor specificity among the 16 nanostructures, with a relative tumor-toliver distribution ratio of approximately 3. The high tumor-to-liver ratio was associated with enhanced uptake by cancer cells and low uptake by macrophages. As Py

Fig. 6 A library of NANPs can be constructed by combining the backbone diversity and the shape diversity of NANPs and screened in vivo for tumor targeting ability

48

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles

1539

structurally resembles Td, doxorubicin and caspase-3 tumor-specifically delivered by L-Py provided nearly identical anticancer potency as that delivered by L-Td. These results suggest that the tetrahedral shape and L-DNA backbone provide important structural and chemical characteristics, respectively, for tumor-targeted nucleic acid nanostructures. Jasinski et al. prepared a similar library of RNA polygons composed of six constructs of various sizes (5–25 nm) and shapes (triangles, squares, and pentagons) to investigate the effect of the structural features of RNA nanoparticles on their biodistribution. 2’-F-modified C and U were used for nuclease stability. Highly tumor-specific accumulation was observed with 5 and 10 nm RNA squares at 12 and 24 h post i.v. injection, respectively (Jasinski et al. 2018) (Table 1).

NANPs for Lung-Targeted Delivery Lung delivery is conventionally achieved via local routes such as intranasal and intratracheal administration. However, intratracheal injection is an invasive approach that is not highly acceptable for practical applications. Pulmonary delivery based on intranasal administration and/or inhalation of therapeutic solid powder and aerosol is a widely used method for lung delivery; however, pulmonary surfactants such as mucus and alveolar fluid severely reduce cellular penetration and transfection of the therapeutic particles. Pulmonary delivery via local routes is usually more effective for diseases occurring in the airway, such as asthma and chronic obstructive pulmonary disease (COPD). Lung diseases, including idiopathic pulmonary fibrosis (IPF), occur in the lower region of the lung, which is hardly accessible by formulations used for local delivery. These drawbacks of local delivery methods can be addressed by employing a systemic delivery route, such as intravenous injection, because the lower region of the lung is directly accessible through capillary blood vessels surrounding the alveoli, circumventing the barriers from the trachea to the bronchiole (Fig. 7). For systemically injected materials to reach the epithelial layer of lung tissue, transcytosis across the endothelial layer is required. The lung-targeting ability of nanoparticles is dependent on their size. Microparticles are known to accumulate in the lung owing to the capillary width of lung tissue, which is several micrometers (Kutscher et al. 2010). The lower blood flow rate in narrow capillaries allows the particles to be retained longer in the capillaries for transcellular uptake through endothelial cells to reach the lung tissue. Moreover, the pulmonary vasculature area contains 20–30% of the total endothelial surface in the body, is the first pass surface for some types of i.v. administration, and receives 50% of blood output, becoming a privileged locus for circulating drug carriers with a certain level of affinity for endothelial cells (Muzykantov and Muro 2011). Thrombosis is a risk factor for this lung-targeting principle based on the size of the carriers. Hard inorganic materials can cause thrombosis as a major adverse effect in vivo (Urban et al. 2019); therefore, soft materials are preferred for lung carriers. Although nucleic acids are typically soft materials, micrometer-sized constructs are challenging to

NANP Wireframe

2’-F-RNA

RNA partially with 2’-F-U, 2’-F-C

Passive Passive Passive

12 12 18

Triangular prism Cube Rugby ball

Active Passive

20 10

Passive

20 X-shape Pyramid

Passive

HeLa HeLa

HeLa

KB HeLa

KB

HeLa HeLa KB

SCC7 HeLa

SCC7, HeLa

12 18 10

Passive

Tumor HeLa KB BT474 HeLa

HeLa

Pyramid

L-DNA

Targeting strategy Passive Active Active Passive

12

25 10

Dodecahedron Bucky ball Tetrahedron

Triangular prism Cube Rugby ball Square

Size (nm) 10 25 23 24 42 8.5

Shape Tetrahedron

Backbone DNA

Table 1 Tumor-targeted NANPs

– –

– Doxorubicin, protein –







Doxorubicin, protein Doxorubicin Doxorubicin, protein –

Cargo Floxuridine siRNA Floxuridine Floxuridine

Jasinski et al. 2018 Jasinski et al. 2018 Shu et al. 2013 Kim et al. 2019

Kim et al. 2016, 2018 Kim et al. 2016 Kim et al. 2019

Reference Mou et al. 2017 Lee et al. 2012 Zhang et al. 2020 Mou et al. 2017

1540 K.-R. Kim et al.

DNA

RNA RNA DNA

Nanoflower

SNA

DNA RNA

DNA

Origami

2’-OMe-RNA

128 95

Active Active

Passive Passive Passive Active Passive Passive Active Active Active Active Passive

12 18 120 90 50 30 218 100 120 240 12

Sphere Sphere

Passive

12

Triangular prism Cube Rugby ball Triangle Tubule Tubule Box Sphere Sphere Sphere Sphere Sphere

Passive

10

Pyramid

HCT116 HeLa

HeLa HeLa MDA-MB-231 MDA-MB-231 MCF-7R A549 MCF-7 MCF-7 SKOV3 PC3 HCT116

HeLa

HeLa

siRNA siRNA

– – Doxorubicin Protein siRNA Erlotinib DNAzyme Doxorubicin siRNA siRNA –

Doxorubicin, protein –

Zhang et al. 2014 Li et al. 2018 Wang et al. 2021 Wang et al. 2020 Yao et al. 2022 Zhang et al. 2019 Jang et al. 2015 Lee et al. 2017a Bousmail et al. 2017 Chang et al. 2021 Liu et al. 2021a

Kim et al. 2019

48 Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles 1541

1542

K.-R. Kim et al.

Fig. 7 Schematic presentation of penetration of systemically injected large NANPs through lung capillary endothelial cells around alveoli to reach lung tissue

prepare using self-assembled wireframe nucleic acid nanostructures and origami, making them unsuitable for lung-targeted platforms. In contrast, RCA provides several hundred nanometers to millimeter-sized particles depending on the reaction conditions. We recently prepared approximately 500-nm-sized DNA nanoparticles composed of polymeric antisense oligonucleotides (pASO) targeting TGF-β1 using RCA (Kim et al. 2020a). However, the relatively large size of these nanoparticles impedes cellular uptake efficiency and penetration into pulmonary cells. To enhance cellular uptake efficiency, we complexed the pASO particles with the biocompatible cationic peptide dimeric human beta-defensin 23. The sizes of the pASO/peptide complexes were similar to that of pASO. NANPs may form microparticles via serum protein adsorption after systemic injection. The size of the DNA nanoparticles composed of the pASO/peptide complexes increased to approximately 1 μm with serum protein adsorption, which led to a high lung distribution upon intravenous injection. I.v. administration of the pASO/peptide with the tgf-1 antisense sequence into a bleomycin-induced lung fibrosis mouse model alleviated fibrosis, demonstrating the potency of antisense oligonucleotides delivered with the lung-targeting platform.

NANPs for Kidney-Targeted Delivery The kidney is a filtering organ that acts as a gatekeeper to regulate materials in the body for renal clearance. The glomerular filtration barrier, composed of the endothelial layer, glomerular basement membrane, and podocytes, has a solute filtration cut-off size of approximately 5–7 nm (Huang et al. 2021). Nanoparticles smaller than 100 nm can extravasate through the endothelial layer in the glomeruli of the kidney. In principle, much smaller particles can only be filtered through the glomerular basement membrane (GBM) because the pore size of GBM is 2–8 nm after extravasation (Fig. 8). Nanoparticles smaller than the cutoff size can be reabsorbed into the proximal tubules after GBM penetration, resulting in the accumulation of nanoparticles in the kidney. Nanoparticles with a size smaller than 2 nm exhibit a low glomerular filtration rate due to enhanced interaction with the glomerular endothelial

48

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles

1543

Fig. 8 Intravenously injected small NANPs are filtered through the GBM and Bowman’s space to enter tubules where the NANPs are taken up into tubular cells

glycocalyx and GBM (Du et al. 2017). Nanoparticles larger than the cutoff size of the glomerular filtration barrier (ca. 7 nm) generally have a reduced tendency to be distributed in the kidney. Even when smaller than the cutoff size, the nanoparticles may increase in size through serum protein adsorption after in vivo administration, which prevents GBM penetration. Nevertheless, nanoparticles larger than the cut-off size may still accumulate in nephron tissue, particularly in kidney diseases such as acute kidney injury (AKI), as vascular permeability is increased in AKI associated with tumors (Kamaly et al. 2016). While the size of NANPs can be precisely controlled, their polyanionic nature hinders transglomerular transport owing to the negative charge of the GBM matrix and endothelial glycocalyx (Comper and Glasgow 1995). On the other hand, they have a lower chance of being opsonized for hepatic clearance than positively charged nanoparticles. In such situations with contradicting size and charge effects, the library approach can be useful for the optimized construction of NANPs. In a recent study, small tetrahedral nucleic acid nanostructures (6 nm) with four different sugar backbones (sTds) were prepared and screened for kidney distribution in mice. L-DNA sTd (L-sTd) showed high kidney accumulation, which was different from that of other backbone-based sTds. Kidney accumulation of intact structures was assessed by quantifying an oligonucleotide from L-sTd using mass spectrometry. L-sTd mainly differed from other sTds in the level of serum protein adsorption, which possibly made the structures larger than the GBM pores, suggesting that opsonization of nucleic acid nanostructures significantly affects their biodistribution patterns. The kidney-targeting property was further harnessed for the delivery of p53 siRNA to the kidney to treat AKI in a mouse model. Even if the nanoparticles are larger than the GBM pore size, kidney distribution is possible depending on the deformability of the materials used. Recently, Jiang et al. found that three differently shaped DNA origami nanostructures (DONs), rectangles, rods, and triangles with sizes over 100 nm, were preferentially distributed in the kidney (Jiang et al. 2018a).

1544

K.-R. Kim et al.

Among these, the rectangular DON showed the highest renal uptake. However, the mechanism underlying the accumulation of DONs in the kidney was not investigated in this study. As DONs are deformable, they might pass through the glomeruli for tubular cell internalization or be distributed to the renal parenchyma via the basolateral uptake pathway.

NANPs for Liver-Targeted Delivery The liver is a vital organ responsible for a wide range of biological functions, including the metabolism of various biochemicals and protein secretion. Nanoparticle-based drug delivery to the liver is more achievable than to other tissues, as it accumulates most nanoparticles circulating in the bloodstream. However, mononuclear phagocytes, such as Kupffer cells in the liver, capture nanoparticles in the bloodstream and play a critical role in the hepatic clearance of nanoparticles. For the nanoparticles to evade entrapment by macrophages and reach the parenchymal cells (such as hepatocytes) comprising 70–80% of the total liver cell population, the key target cells for hepatic therapy (Witzigmann et al. 2020), their suborgan distribution should be considered. Even after evading from Kupffer cells, nanoparticles need to pass through the sinusoidal fenestrae, which are 100–150 nm wide, to reach hepatocytes (Wisse et al. 2008). Therefore, for access to hepatocytes, nanoparticles should be smaller than 150 nm or deformable to squeeze into the slit if larger than 150 nm. Most NANPs can be easily fabricated in a size suitable for hepatocyte delivery in the liver. Recently, a wireframe Td of approximately 9 nm was found to be distributed to liver hepatocytes (Kim et al. 2020b). The siRNA targeting apoB1 mRNA (siApoB1) was loaded into Td and successfully delivered to the hepatocytes. The siApoB1 delivered by Td downregulated the expression of apoB1 involved in the transport and biosynthesis of cholesterol, which subsequently lowered the cholesterol levels in plasma, thereby demonstrating its potential as a treatment for hypercholesterolemia. Traditionally, lipid nanoparticles have been widely employed as liver-targeted drug delivery platforms because lipids interact with liver-homing endogenous lipoprotein particles, such as LDL and HDL particles. Receptor-mediated hepatocyte uptake of lipoprotein particles is the underlying mechanism of the liver distribution of lipid particles (Akin et al. 2010). Thus, NANPs with enhanced affinity for lipoprotein particles in serum may have the potential for liver distribution, particularly to hepatocytes in the liver, via in situ hijacking of the liver transport mechanisms for lipoprotein particles (Fig. 9). Recently, Td conjugated with three cholesteryl groups (Chol3-Td) was found to have high liver distribution properties (Kim et al. 2022). Compared to Td, Chol3-Td showed a substantially higher affinity for lipoprotein particles as well as other serum proteins, forming a protein corona in situ upon intraperitoneal injection, and resulting in a high distribution in the liver and hepatocytes. TGF-β1 siRNA was loaded into Chol3-Td and delivered into the liver of a CCl4-induced liver fibrosis mouse model to alleviate liver fibrosis. The efficacy of siRNA delivered with Chol3-Td was similar to that delivered with trivalent N-acetylgalactosamine (GalNAc3), a well-known liver-targeting ligand.

48

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles

1545

Fig. 9 Liver- and hepatocyte-targeted delivery can be achieved using the lipoprotein-associated protein corona formed in situ on systemically injected NANPs

Fig. 10 Schematic presentation of receptor-mediated transcytosis of intravenously injected NANPs to penetrate the blood-brain barrier

NANPs for Brain-Targeted Delivery The central nervous system is protected from pathogens and toxic molecules by a series of barriers. The blood-brain barrier (BBB) is the most rigorous gatekeeper, mainly composed of tightly connected brain endothelial cells that are cohesively bound by tight junctions composed of transmembrane and cytoplasmic proteins. As passive diffusion through the BBB is very limited, brain delivery of nanoparticles across the healthy BBB is usually performed by receptors responsible for supplying essential molecules for brain vitality, such as glucose, insulin, and transferrin (Fig. 10). In addition, caveolae-mediated transcytosis can be used for the systemic brain delivery of therapeutics. The BBB in some brain pathologies, such as neurodegenerative disorders, stroke, and glioblastoma multiforme (GB), is disrupted and thus more permeable, allowing brain delivery of therapeutics through endothelial fenestrae.

1546

K.-R. Kim et al.

siRNA micelles, a type of SNAs, fabricated with siRNA-disulfide-poly(N-isopropylacrylamide (siRNA-SS-PNIPAM) deblock copolymers, could be delivered to the brain. In a recent study, the brain distribution level of the siRNA micelles in GB-bearing nude mice was observed to be approximately 15% ID/g at 10 h postintravenous injection, higher than that of free siRNA (4% ID/g). A first-line clinical drug for GB and signal transducers and activators of transcription 3 (STAT3) siRNA were loaded in the core and surface of the micelles, respectively, and were used to achieve combinational therapy for drug-resistant GB in mice (Jiang et al. 2021). Similarly, SNAs comprising gold nanoparticles covalently functionalized with siRNAs targeting Bcl2L12 mRNA, an oncogene in GB, were able to penetrate the BBB and could be systemically delivered to the brain in GB-bearing mice (Jensen et al. 2013). Fluorescence imaging-based analysis showed that the brain distribution level of these SNAs in GB mice was approximately 1.8-fold higher than that in normal mice. In addition, SNA accumulation in the brain tissue adjacent to the tumor in GB-bearing mice was significantly higher than the corresponding level in normal mice brain, suggesting that the enhanced permeability of the BBB in GB is a major driver of SNA accumulation in GB-bearing mice. A Td with 20-mer D-DNA duplex per side was delivered to the brain by conjugation with angiopep-2 (ANG), a 19-mer peptide and a ligand of the lipoprotein receptor-related protein-1 (LRP-1), facilitating BBB transcytosis (Tian et al. 2018). The brain distribution of ANG-Td was approximately 1.7 and 5 times higher than that of Td in healthy and glioma-bearing mice, respectively, upon i.v. injection. This indicates that the brain distribution of Td may not be based on the intrinsic property of the tetrahedral structure, but solely on peptide conjugation. In contrast, a DNA triangular prism (Tp) with 10-mer D-DNA duplex on each side showed a moderate level of brain distribution in normal and GB-bearing mice, even without conjugation with BBB-targeting ligands (Tam et al. 2020). The mechanism underlying the brain distribution of Tp remains to be investigated. Overall, while conjugation of ligands for receptors overexpressed on brain capillary endothelial cells is the main strategy for brain targeting, enhanced BBB permeability in brain pathologies is effective for NANP distribution to the brain.

NANPs for Spleen-Targeted Delivery The spleen is a reticulo-lymphoid organ involved in various immunological events. The main functions of the spleen include removing nanoparticles, micropathogens, and abnormal red blood cells from systemic circulation and producing components of the immune system. The spleen-specific delivery of therapeutics can provide clinical intervention for splenic disorders, including splenomegaly, hypersplenism, and splenic rupture. Together with the liver, the spleen is a major organ of the reticuloendothelial system, the destination of intravenously injected nanoparticles after phagocytosis by circulating macrophages. Therefore, the conventional strategy for designing a spleen-targeted nanocarrier is to avoid its uptake by Kupffer cells in the liver. Nanocarriers with physicochemical properties for bypassing hepatic

48

Tissue-Specific Drug Delivery Platforms Based on DNA Nanoparticles

1547

Table 2 NANPs for targeting lung, kidney, liver, brain, and spleen Target tissue Lung Kidney

Liver

Brain

Spleen

Backbone DNA/peptide L-DNA DNA DNA DNA DNA 2’-F-RNA DNA/cholesterol RNA/polymer RNA/GNP DNA DNA DNA/LNP

Shape Sphere Tetrahedron Rectangle Tubule Triangle Tetrahedron Tetrahedron Tetrahedron Sphere Sphere Tetrahedron Triangular prism Sphere

Size (nm) 500 6 90 120 400 9 6 11 40 30 9 50 proofreading exonuclease activity. Genet Anal 12(5–6):185–195 Antao VP, Player AN, Kolberg JA (2000) In situ hybridization using the bDNA technology. In: Patterson BK (ed) Techniques in quantification and localization of gene expression. Birkhauser Press, Boston, pp 81–93 Bebenek K, Joyce CM, Fitzgerald MP, Kunkel TA (1990) The fidelity of DNA synthesis catalyzed by derivatives of Escherichia coli DNA polymerase I. J Biol Chem 265:13878–13887 Blanco L, Salas M (1984) Characterization and purification of a phage phi 29-encoded DNA polymerase required for the initiation of replication. Proc Natl Acad Sci U S A 81:5325–5329 Brandhagen DJ, Gross Jr JB, Poterucha JJ, Charlton MR, Detmer J, Kolberg J, Gossard AA, Batts KP, Kim WR, Germer JJ, Wiesner RH, Persing DH (1999) The clinical significance of simultaneous infection with hepatitis G virus in patients with chronic hepatitis C. Am J Gastroenterol 94:1000–1005 Breen EC, Salazar-Gonzalez JF, Shen LP, Kolberg JA, Urdea MS, Martinez-Maza O, Fahey JL (1997) Circulating CD8 T cells show increased interferon-gamma mRNA expression in HIV infection. Cell Immunol 178:91–98 Brown K, Blake RS, Dennany L (2022) Electrochemiluminescence within veterinary science: a review. Bioelectrochemistry 146:108156 Burris TP, Pelton PD, Zhou L, Osborne MC, Cryan E, Demarest KT (1999) A novel method for analysis of nuclear receptor function at natural promoters: peroxisome proliferator-activated receptor gamma agonist actions on aP2 gene expression detected using branched DNA messenger RNA quantitation. Mol Endocrinol 13:410–417 Bushnell S, Budde J, Catino T, Cole J, Derti A, Kelso R, Collins ML, Molino G, Sheridan P, Monahan J, Urdea M (1999) ProbeDesigner: for the design of probesets for branched DNA (bDNA) signal amplification assays. Bioinformatics 15:348–355 Cabrera-Valladares G, German MS, Matschinsky FM, Wang J, Fernandez-Mejia C (1999) Effect of retinoic acid on glucokinase activity and gene expression and on insulin secretion in primary cultures of pancreatic islets. Endocrinology 140:3091–3096 Chernoff DN, Miner RC, Hoo BS, Shen LP, Kelso RJ, Jekic-McMullen D, Lalezari JP, Chou S, Drew WL, Kolberg JA (1997) Quantification of cytomegalovirus DNA in peripheral blood leukocytes by a branched-DNA signal amplification assay. J Clin Microbiol 35:2740–2744 Collins ML, Zayati C, Detmer JJ, Daly B, Kolberg JA, Cha TA, Irvine BD, Tucker J, Urdea MS (1995) Preparation and characterization of RNA standards for use in quantitative branched DNA hybridization assays. Anal Biochem 226:120–129 Collins ML, Irvine B, Tyner D, Fine E, Zayati C, Chang C, Horn T, Ahle D, Detmer J, Shen LP, Kolberg J, Bushnell S, Urdea MS, Ho DD (1997) A branched DNA signal amplification assay

51

Detection Systems Using the Ternary Complex Formation of Nucleic Acids

1619

for quantification of nucleic acid targets below 100 molecules/ml. Nucleic Acids Res 25: 2979–2984 Deng H, Zhou X, Liu Q, Li B, Liu H, Huang R, Xing D (2017) Paperfluidic Chip device for small RNA extraction, amplification, and multiplexed analysis. ACS Appl Mater Interfaces 9: 41151–41158 Derbyshire V, Freemont PS, Sanderson MR, Beese L, Friedman JM, Joyce CM, Steitz TA (1988) Genetic and crystallographic studies of the 30 ,50 -exonucleolytic site of DNA polymerase I. Science 240:199–201 Detmer J, Lagier R, Flynn J, Zayati C, Kolberg J, Collins M, Urdea M, Sanchez-Pescador R (1996) Accurate quantification of hepatitis C virus (HCV) RNA from all HCV genotypes by using branched-DNA technology. J Clin Microbiol 34:901–907 Flagella M, Bui S, Zheng Z, Nguyen CT, Zhang A, Pastor L, Ma Y, Yang W, Crawford KL, McMaster GK, Witney F, Luo Y (2006) A multiplex branched DNA assay for parallel quantitative gene expression profiling. Anal Biochem 352:50–60 Fujita H, Kataoka Y, Tobita S, Kuwahara M, Sugimoto N (2016) Novel one-tube-one-step real-time methodology for rapid transcriptomic biomarker detection: signal amplification by ternary initiation complexes. Anal Chem 88:7137–7144 Fujita H, Kataoka Y, Nagano R, Nakajima Y, Yamada M, Sugimoto N, Kuwahara M (2017) Specific light-up system for protein and metabolite targets triggered by initiation complex formation. Sci Rep 7:15191 Garmendia C, Bernad A, Esteban JA, Blanco L, Salas M (1992) The bacteriophage phi 29 DNA polymerase, a proofreading enzyme. J Biol Chem 267:2594–2599 Gong L, Zhao Z, Lv YF, Huan SY, Fu T, Zhang XB, Shen GL, Yu RQ (2015) DNAzyme-based biosensors and nanodevices. Chem Commun (Camb) 51:979–995 Hall MJ, Wharam SD, Weston A, Cardy DL, Wilson WH (2002) Use of signal-mediated amplification of RNA technology (SMART) to detect marine cyanophage DNA. BioTechniques 32(604–6):608–611 Hartley DP, Klaassen CD (2000) Detection of chemicalinduced differential expression of rat hepatic cytochrome P450 mRNA transcripts using branched DNA signal amplification technology. Drug Metab Disp 28:608–616 Hendricks DA, Stowe BJ, Hoo BS, Kolberg J, Irvine BD, Neuwald PD, Urdea MS, Perrillo RP (1995) Quantitation of HBV DNA in human serum using a branched DNA (bDNA) signal amplification assay. Am J Clin Pathol 104:537–546 Hosseinzadeh E, Ravan H, Mohammadi A, Mohammad-Rezaei R, Norouzi A, Hosseinzadeh H (2018) Target-triggered three-way junction in conjugation with catalytic concatemersfunctionalized nanocomposites provides a highly sensitive colorimetric method for miR-21 detection. Biosens Bioelectron 117:567–574 Im K, Jeong D, Hur J, Kim SJ, Hwang S, Jin KS, Park N, Kim K (2013) Robust analysis of synthetic label-free DNA junctions in solution by X-ray scattering and molecular simulation. Sci Rep 3:3226 Johne R, Müller H, Rector A, van Ranst M, Stevens H (2009) Rolling-circle amplification of viral DNA genomes using phi29 polymerase. Trends Microbiol 17:205–211 Kataoka Y, Fujita H, Kasahara Y, Yoshihara T, Tobita S, Kuwahara M (2014) Minimal thioflavin T modifications improve visual discrimination of guanine-quadruplex topologies and alter compound-induced topological structures. Anal Chem 86:12078–12084 Kern D, Collins M, Fultz T, Detmer J, Hamren S, Peterkin JJ, Sheridan P, Urdea M, White R, Yeghiazarian T, Todd J (1996) An enhanced-sensitivity branched-DNA assay for quantification of human immunodeficiency virus type 1 RNA in plasma. J Clin Microbiol 34:3196–3202 Kim JH, Kim S, Hwang SH, Yoon TH, Park JS, Lee ES, Woo J, Park KS (2021) Three-way junction-induced isothermal amplification with high signal-to-background ratio for detection of pathogenic bacteria. Sensors (Basel) 21(12):4132 Kong H, Kucera RB, Jack WE (1993) Characterization of a DNA polymerase from the hyperthermophile archaea Thermococcus litoralis. Vent DNA polymerase, steady state kinetics, thermal

1620

H. Fujita and M. Kuwahara

stability, processivity, strand displacement, and exonuclease activities. J Biol Chem 268: 1965–1975 Kong D, Wang X, Gu C, Guo M, Wang Y, Ai Z, Zhang S, Chen Y, Liu W, Wu Y, Dai C, Guo Q, Qu D, Zhu Z, Xie Y, Liu Y, Wei D (2021) Direct SARS-CoV-2 nucleic acid detection by Y-shaped DNA dual-probe transistor assay. J Am Chem Soc 143:17004–17014 Lee T, Mohammadniaei M, Zhang H, Yoon J, Choi HK, Guo S, Guo P, Choi JW (2019a) Single functionalized pRNA/gold nanoparticle for ultrasensitive MicroRNA detection using electrochemical surface-enhanced Raman spectroscopy. Adv Sci (Weinh) 7:1902477 Lee T, Lee Y, Park SY, Hong K, Kim Y, Park C, Chung YH, Lee MH, Min J (2019b) Fabrication of electrochemical biosensor composed of multi-functional DNA structure/Au nanospike on micro-gap/PCB system for detecting troponin I in human serum. Colloids Surf B Biointerfaces 175:343–350 Lee S, Jang H, Kim HY, Park HG (2020) Three-way junction-induced isothermal amplification for nucleic acid detection. Biosens Bioelectron 147:111762 Leontis NB, Hills MT, Piotto M, Ouporov IV, Malhotra A, Gorenstein DG (1995) Helical stacking in DNA three-way junctions containing two unpaired pyrimidines: proton NMR studies. Biophys J 68:251–265 Li J, Wu H, Yan Y, Yuan T, Shu Y, Gao X, Zhang L, Li S, Ding S, Cheng W (2021) Zippered G-quadruplex/hemin DNAzyme: exceptional catalyst for universal bioanalytical applications. Nucleic Acids Res 49:13031–13044 Liu Y, Wei Y, Cao Y, Zhu D, Ma W, Yu Y, Guo M (2018) Ultrasensitive electrochemiluminescence detection of Staphylococcus aureus via enzyme–free branched DNA signal amplification probe. Biosens Bioelectron 117:830–837 Long Y, Zhou X, Xing D (2013) An isothermal and sensitive nucleic acids assay by target sequence recycled rolling circle amplification. Biosens Bioelectron 46:102–107 Mallet F (2000) Comparison of competitive and positive control-based PCR quantitative procedures coupled with end point detection. Mol Biotechnol 14:205–214 Mead DA, McClary JA, Luckey JA, Kostichka AJ, Witney FR, Smith LM (1991) Bst DNA polymerase permits rapid sequence analysis from nanogram amounts of template. Biotechniques 11:76–78, 80, 82–87 Milligan JF, Groebe DR, Witherell GW, Uhlenbeck OC (1987) Oligoribonucleotide synthesis using T7 RNA polymerase and synthetic DNA templates. Nucleic Acids Res 15:8783–8798 Mok E, Wee E, Wang Y, Trau M (2016) Comprehensive evaluation of molecular enhancers of the isothermal exponential amplification reaction. Sci Rep 6:37837 Muhuri S, Mimura K, Miyoshi D, Sugimoto N (2009) Stabilization of three-way junctions of DNA under molecular crowding conditions. J Am Chem Soc 131:9268–9280 Murakami T, Sumaoka J, Komiyama M (2009) Sensitive isothermal detection of nucleic-acid sequence by primer generation-rolling circle amplification. Nucleic Acids Res 37:e19 Murakami T, Sumaoka J, Komiyama M (2012) Sensitive RNA detection by combining three-way junction formation and primer generation-rolling circle amplification. Nucleic Acids Res 40:e22 Nargessi RD, Khabbaz NF, Xu XM, Zamroud M, Kolberg J, Collins ML (1998a) Quantitation of estrogen receptor mRNA in breast carcinoma by branched DNA assay. Breast Cancer Res Treat 50:47–55 Nargessi RD, Shimizu RM, Xu XM, Connolly J, Zamroud M, Collins ML, Kolberg J (1998b) Quantitation of progesterone receptor mRNA in breast carcinoma by branched DNA assay. Breast Cancer Res Treat 50:57–62 Paige JS, Wu KY, Jaffrey SR (2011) RNA mimics of green fluorescent protein. Science 333: 642–646 Pawlotsky JM, Martinot-Peignoux M, Poveda JD, Bastie A, Le Breton V, Darthuy F, Rémiré J, Erlinger S, Dhumeaux D, Marcellin P (1999) Quantification of hepatitis C virus RNA in serum by branched DNA-based signal amplification assays. J Virol Methods 79:227–235

51

Detection Systems Using the Ternary Complex Formation of Nucleic Acids

1621

Pellegrin I, Garrigue I, Ekouevi D, Couzi L, Merville P, Merel P, Chene G, Schrive MH, Trimoulet P, Lafon ME, Fleury H (2000) New molecular assays to predict occurrence of cytomegalovirus disease in renal transplant recipients. J Infect Dis 182:36–42 Player AN, Shen LP, Kenny D, Antao VP, Kolberg JA (2001) Single-copy gene detection using branched DNA (bDNA) in situ hybridization. J Histochem Cytochem 49:603–612 Qian C, Wang R, Wu H, Ji F, Wu J (2019) Nicking enzyme-assisted amplification (NEAA) technology and its applications: a review. Anal Chim Acta 1050:1–15 Richter MM (2004) Electrochemiluminescence (ECL). Chem Rev 104:3003–3036 Schenborn ET, Mierendorf Jr RC (1985) A novel transcription property of SP6 and T7 RNA polymerases: dependence on template structure. Nucleic Acids Res 13:6223–6236 Shen LP, Sheridan P, Cao WW, Dailey PJ, Salazar-Gonzalez JF, Breen EC, Fahey JL, Urdea MS, Kolberg JA (1998) Quantification of cytokine mRNA in peripheral blood mononuclear cells using branched DNA (bDNA) technology. J Immunol Methods 215:123–134 Shyamala V, Khoja H, Anderson ML, Wang JX, Cen H, Kavanaugh WM (1999) High-throughput screening for ligand-induced c-fos mRNA expression by branched DNA assay in Chinese hamster ovary cells. Anal Biochem 266:140–147 Sodora DL, Lee F, Dailey PJ, Marx PA (1998) A genetic and viral load analysis of the simian immunodeficiency virus during the acute phase in macaques inoculated by the vaginal route. AIDS Res Hum Retrovir 14:171–181 Takahashi S, Kotar A, Tateishi-Karimata H, Bhowmik S, Wang ZF, Chang TC, Sato S, Takenaka S, Plavec J, Sugimoto N (2021) Chemical modulation of DNA replication along G-Quadruplex based on topology-dependent ligand binding. J Am Chem Soc 143:16458–16469 Tang S, Tong P, Li H, Gu F, Zhang L (2013) The three-way junction DNAzyme based probe for label-free colorimetric detection of DNA. Biosens Bioelectron 41:397–402 Urdea MS, Wilber JC, Yeghiazarian T, Todd JA, Kern DG, Fong SJ, Besemer D, Hoo B, Sheridan PJ, Kokka R (1993) Direct and quantitative detection of HIV-1 RNA in human plasma with a branched DNA signal amplification assay. AIDS Suppl 2:S11–S14 Wang J, Shen L, Najafi H, Kolberg J, Matschinsky FM, Urdea M, German M (1997) Regulation of insulin preRNA splicing by glucose. Proc Natl Acad Sci U S A 94:4360–4365 Wang R, Wang L, Zhao H, Jiang W (2016) A split recognition mode combined with cascade signal amplification strategy for highly specific, sensitive detection of microRNA. Biosens Bioelectron 86:834–839 Wang X, Wang H, Liu C, Wang H, Li Z (2017a) A three-way junction structure-based isothermal exponential amplification strategy for sensitive detection of 30 -terminal 2'-O-methylated plant microRNA. Chem Commun (Camb) 53:1124–1127 Wang X, Huang SC, Huang TX, Su HS, Zhong JH, Zeng ZC, Li MH, Ren B (2017b) Tip-enhanced Raman spectroscopy for surfaces and interfaces. Chem Soc Rev 46:4020–4041 Wharam SD, Marsh P, Lloyd JS, Ray TD, Mock GA, Assenberg R, McPhee JE, Brown P, Weston A, Cardy DL (2001) Specific detection of DNA and RNA targets using a novel isothermal nucleic acid amplification assay based on the formation of a three-way junction structure. Nucleic Acids Res 29:e54–e54 Wharam SD, Hall MJ, Wilson WH (2007) Detection of virus mRNA within infected host cells using an isothermal nucleic acid amplification assay: marine cyanophage gene expression within Synechococcus sp. Virol J 4:52 Wu J, Lv J, Zheng X, Wu ZS (2021) Hybridization chain reaction and its applications in biosensing. Talanta 234:122637 Xu SY (2015) Sequence-specific DNA nicking endonucleases. Biomol Concepts 6:253–267 Xu Y, Bian X, Sang Y, Li Y, Li D, Cheng W, Yin Y, Ju H, Ding S (2016a) Bis-three-way junction nanostructure and DNA machineries for ultrasensitive and specific detection of BCR/ABL fusion gene by chemiluminescence imaging. Sci Rep 6:32370 Xu Y, Wang Y, Liu S, Yu J, Wang H, Guo Y, Huang J (2016b) Ultrasensitive and rapid detection of miRNA with three-way junction structure-based trigger-assisted exponential enzymatic amplification. Biosens Bioelectron 81:236–241

1622

H. Fujita and M. Kuwahara

Ye LP, Hu J, Liang L, Zhang CY (2014) Surface-enhanced Raman spectroscopy for simultaneous sensitive detection of multiple microRNAs in lung cancer cells. Chem Commun (Camb) 50: 11883–11886 Zhao Y, Qi L, Chen F, Zhao Y, Fan C (2013) Highly sensitive detection of telomerase activity in tumor cells by cascade isothermal signal amplification based on three-way junction and basestacking hybridization. Biosens Bioelectron 41:764–770 Zhao Y, Chen F, Li Q, Wang L, Fan C (2015) Isothermal amplification of nucleic acids. Chem Rev 115:12491–12545 Zhou L, Cryan EV, Minor LK, Gunnet JW, Demarest KT (2000) A branched DNA signal amplification assay to quantitate messenger RNA of human uncoupling proteins 1, 2, and 3. Anal Biochem 282:46–53

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

52

Toshihiro Ihara, Yusuke Kitamura, and Yousuke Katsuda

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reactions Involved in DNA/RNA Conjugate Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Functional Group-Specific Coupling Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bioorthogonal Reactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unique Binding and Functional Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kinetic and Thermodynamic Stabilization by Conjugation with Click Chemistry . . . . . . . . Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nucleic Acid Conjugates for Sensing and as a Research Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complementary DNA Probes Modified with Reporter Molecules . . . . . . . . . . . . . . . . . . . . . . . . Nucleic Acid Aptamer Conjugates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lipid/Cholesterol-Modified DNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antibody/Enzyme-Modified DNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Caged Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1624 1625 1625 1627 1629 1630 1632 1634 1635 1641 1642 1644 1648 1652 1653

Abstract

A nucleic acid conjugate is a hybrid molecule consisting of a nucleic acid and a linked molecule that confers a specific function. The first nucleic acid conjugates were DNA probes, which are short sequences complementary to target DNA or RNA that can be chemically modified with fluorescent dyes or antigens for signal amplification. Since the development of early nucleic acid conjugates, conjugate diversity has expanded along with the development of new synthesis methods, the discovery of new target biological phenomena, and the availability of novel functional nucleic acids including ribozymes, DNAzymes, and aptamers. At T. Ihara (*) · Y. Kitamura · Y. Katsuda Division of Materials Science and Chemistry, Faculty of Advanced Science and Technology, Kumamoto University, Kumamoto, Japan e-mail: [email protected]; [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_58

1623

1624

T. Ihara et al.

present, a wide range of research has been conducted using nucleic acid conjugates, including studies in molecular engineering, analytical science, pharmacology, and therapeutics. Of this range of scientific fields, this review focuses primarily on the analytical applications and the use of nucleic acid conjugates as research tools for the targeting of biomolecules and relevant biophenomena. Nevertheless, the functions of nucleic acid conjugates are not as simple as those of conventional DNA probes, but are instead quite diverse, and can reflect the unique ideas of researchers with backgrounds in synthetic chemistry, chemical biology, pharmacy, and/or medicine. Here, we show how nucleic acid conjugates are being designed for specific target molecules and biophenomena and how they provide the solutions for various specific challenges in biosciences.

Introduction Bioconjugates are artificial molecules consisting of different two or more molecules, at least one of which is a biomolecule (i.e., a nucleic acid, oligonucleotide, protein, peptide, polysaccharide, sugar, and/or lipid). Component molecules are covalently linked to each other by chemical or biological means to form a bioconjugate that possesses combined properties of both individual components. Various biophenomena have been elucidated using carefully engineered bioconjugates that possess unique characteristics; examples include specific chemical probes, agonists/ antagonists, stimuli-response reaction triggers or modulators, drug carriers, and drugs. Thus, without the development of bioconjugate chemistry, much of the research in contemporary life science would not be possible. Bioconjugates are, quite literally, the (synergistic) conjunction between chemistry and biology and represent a powerful tool of the chemical biology. After the frenzy of the completion of the Human Genome Project, now we are again in the midst of a new upsurge in the research on nucleic acids due to the emergence of unexplored new fields such as epigenetics, RNA interference, noncoding RNA, mRNA vaccine, iPS cell, and nucleic acid medicine. Chemical biologists and even chemists can be involved in each of these research fields because their foundations are beginning to be understood on a molecular level. Therefore, a fundamental knowledge of the above research area enables the molecular engineering and design of nucleic acid conjugates optimized to individual research targets. When selecting a nucleic acid and an auxiliary functional molecule to covalently bind with each other, there is a very wide range of choices for not only the functional molecule but also the nucleic acid. Instead of a nucleic acid with a natural structure, DNA analogs (or XNAs, e.g., PNA, LNA, SNA and L-aTNA) can also be chosen as complementary sequences. Alternatively, besides nucleic acid sequences complementary to the target, functional nucleic acids such as ribozymes, DNAzymes, and aptamers can also be used as nucleic acid part of the conjugate. This greatly expands our options when designing nucleic acid conjugates. Coupled with the recent development of advanced experimental techniques such as fluorescent proteins,

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1625

fluorescent microscopies at superhigh resolution, directed evolution, and CRISPR/ Cas9 gene editing, DNA conjugates are expected to play an increasingly important role, both within nucleic acid science and throughout the life sciences. Nucleic acid conjugates have evolved by combining members of the wide variety of usable molecules and ideas. They can be classified differently in many ways, and it has, therefore, been difficult to develop a single classificatory scheme along a single axis. Thus, in this chapter, nucleic acid conjugates have been organized according to their structure, function, and application, with priority given to readability. First, we review the preparation of DNA conjugates, and we then discuss recent reports of their application, mainly in bioanalytical science.

Reactions Involved in DNA/RNA Conjugate Formation Many common bioconjugate reagents, which have been developed for protein conjugation, cannot be used for the preparation of nucleic acid conjugates because they do not contain key reactive groups such as aliphatic primary amines, sulfhydryls, carboxylates, or phenolates. To modify the unique functional molecules onto nucleic acids, it is, generally, first necessary to introduce a reactive group that specifically accepts the subsequent modification reaction into the DNA. The reactive groups can then be introduced by inserting them into the discrete site during chemical synthesis or by inserting an unnatural triphosphate monomer during enzymatic synthesis of DNA/RNA.

Functional Group-Specific Coupling Reactions Figure 1 shows typical coupling reactions between functional molecules (R) and the reactive chemical groups introduced into nucleic acids. All of these reactions proceed in a structurally selective manner with high yields under mild conditions in aqueous solution. Aliphatic primary amines are the most commonly used functional groups for the chemical modification of nucleic acids. The activated ester of the functional molecule couples with the amino group to provide stable amide linking (Fig. 1a). Sulfhydryl groups also provide the desired conjugate via specific coupling reactions with maleimide, iodoacetyl, and pyridyl disulfide groups to provide thioether, thioester, and disulfide linkages, respectively (Fig. 1b). There are also commercially available CPG (controlled-pore glass) resins – which are used as solid support for DNA/RNA synthesis – that are used to introduce amino and sulfhydryl groups to the 30 end of nucleic acids and amidite reagents for introducing these grouped into the middle of the chain or at the 50 -end of synthetic oligonucleotides. Moreover, there are also a variety of commercially available homo- and hetero-bifunctional linker reagents that specifically connect amino groups and thiol groups on the biomolecule to functional molecule to be modified.

1626

T. Ihara et al.

Fig. 1 Typical functional group-specific chemical modification reactions used in DNA conjugate synthesis. The chemical reactions for (a) aliphatic amino and (b) sulfhydryl groups. End-specific reactions for (c) ribonucleoside structure and (d) terminal hydroxyl group

Several end-specific reactions for nucleic acid modification have also been known. The 30 -end of RNA molecules or ribonucleoside structures introduced at the 30 -end of DNA molecules can be oxidized to form dialdehyde structures, which can then be condensed with an aliphatic amino group present on the functional molecule and subsequently reduced to form a morpholino ring linkage (Fig. 1c) (Yamamoto et al. 2017). In addition, a hydroxyl group on the 50 -end can be used for

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1627

the modification of the amino derivatives of the functional molecule using the condensation reagent CDI (N, N’-Carbonyldiimidazole). To prevent side reactions to the 30 end hydroxyl group, this reaction should be performed on a CPG prior to cleavage (Fig. 1d). Carbamate linkage shows excellent stability under the conditions of cleavage.

Bioorthogonal Reactions Bioorthogonal reactions proceed efficiently in the presence of the myriad functional groups found in living systems, including nucleophiles, electrophiles, reductants, oxidants, and, of course, water, the solvent of life. These reactions are very useful for preparing bioconjugates in vitro and, more importantly, can also proceed specifically in cells or even in tissues. Bioorthogonal reactions can be used for specific applications in chemical biology research, such as the specific labeling of targets, chemical sorting to targeted sites in cells, and in situ prodrug activation (uncaging). These reactions are called click chemistry because they proceed exclusively and reliably in the expected combinations. In this section, we will mainly focus on the three most important bioorthogonal reactions, azide-alkyne cycloaddition (AAC), the Staudinger reaction, and inverse-electron-demand Diels–Alder cycloaddition (IEDAC).

Azide-alkyne Cycloaddition The [3 þ 2] dipolar cycloaddition (Huisgen cyclization) of azides and alkynes (AAC) is a typical example of click chemistry. This chemical reaction is remembered as having revolutionized chemical biology research and is characterized by extremely high yields and high functional group selectivity. The reaction proceeds irreversibly, with that participation of only the alkyne-azide combination, regardless of which other functional groups are present. Moreover, the atom economy is 100%. Initially, the Cu(I) ion was commonly used as a catalyst in AAC (CuAAC, Cu(I)catalyzed AAC) along with a Cu(I)-stabilizing ligand such as tris(benzyltriazolylmethyl)amine (TBTA) and tris(3-hydroxypropyltriazolylmethyl)amine (THPTA) (Fig. 2a). Later, it was reported that cyclooctyne, an eight-membered ring alkyne (DBCO, dibenzocyclooctyne), had a low activation energy due to its distorted structure and readily reacted with azides without Cu(I) ion catalysis (SPAAC,

Fig. 2 DNA conjugation with azide-alkyne cycloaddition. General reactions of (a) CuAAC with azide and (b) SPAAC with DBCO. R shows any molecular structures with desired functions to be conjugated

1628

T. Ihara et al.

strain-promoted AAC) (Fig. 2b) (Debets et al. 2011). This makes it possible to label various biomolecules easily at good yields under mild conditions. In addition, since this process does not use toxic copper ions, it can be used in cells and is, therefore, an attractive prospect for various biotechnological applications. At present, amidite reagents carrying simple alkynyl groups and DBCO are commercially available.

Staudinger Ligation The Staudinger reaction is a classic reaction that involves the formation of primary amines from organic azides by treatment with phosphines. In the presence of water, an aza-ylide intermediate is hydrolyzed spontaneously to yield a primary amine. To take advantage of the exclusive reactivity between the azide and phosphine in water and their high yield, a smart phosphine reagent was designed so that the aza-ylide intermediate could be trapped by intramolecular electrophilic carbonyl group which upon hydrolysis forms a stable amide linkage. This process is referred to as Staudinger ligation because of its versatility in intermolecular linking (Fig. 3a). Shortly after the original report, the reaction was modified as “traceless” Staudinger ligation by eliminating the phosphine oxide motif from the structure of the products. Using this version, it is possible to obtain the conjugates connected by simple amide bonds that do not leave a reaction footprint (Fig. 3b) (Saxon et al. 2000). DNA with azide groups can be easily prepared by amine-to-azide conversion by fluorosulfuryl azide (FSO2N3) (Krasheninina et al. 2021), coupling reactions between primary amino groups previously introduced in DNA synthesis and active esters carrying alkyl azide structures, or treating pre-introduced bromoalkyl structures with sodium azide. Inverse Electron-Demand Diels–Alder Cycloaddition This method is based on an older report of an unusually fast bioorthogonal reaction. This reaction is based on IEDAC between tetrazine and trans-cyclooctene (TCO) without any catalyst (Fig. 4a). In general, it also shows high reactivity to other strained C-C multiple bond compounds such as norbornene (Fig. 4b) and cyclooctyne (Fig. 4c) (Dommerholt et al. 2014). Therefore, these reactions are collectively referred to as strain-promoted IEDAC (SPIEDAC) reactions. SPIEDAC reactions are characterized by an extremely fast reaction rate of k ¼ 1–106 M–1s–1,

Fig. 3 DNA conjugation with Staudinger ligations. (a) Chemical ligation through intramolecular nucleophilic reaction. (b) Traceless Staudinger ligation giving simple linking structure

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1629

Fig. 4 DNA conjugation with IEDAC. General reactions using (a) trans-cyclooctene, (b) norbornene, and (c) cyclooctyne

which is significantly faster than SPAAC reactions (approximately k ¼ 10–2–10–1 M–1s–1). That is, it is at least two orders of magnitude faster than all other bioorthogonal reactions. The general trend is that the more electron-deficient and less sterically hindered the tetrazine is, and the greater the strain on the alkene/ alkyne, the greater the reaction rate (Debets et al. 2011). Because of its fast kinetics, the tetrazine ligation may be particularly useful in cases where rapid reactions are essential for tracking fast biological events or for the labeling of low abundance biomolecules. At present, several bifunctional linkers for click chemistries – such as DBCO and TCO – are commercially available.

Unique Binding and Functional Switching Because of their versatility, nucleic acid conjugates have almost unlimited possibilities. Their flexibility allowed them to adapt to the rapid expansion of nucleic acid science. Here, we summarize recent studies of the unique molecular recognition of nucleic acid conjugates using the conjugation reaction itself as well as studies involving the switching of the function of nucleic acid conjugates in response to stimuli. Some of these studies may have applications in biosensing and therapeutics, as described in the sections that follow.

1630

T. Ihara et al.

Kinetic and Thermodynamic Stabilization by Conjugation with Click Chemistry In the first report, we will discuss concerns regarding a pair of the DNA conjugates that form a kinetically stable duplex with the target sequence. One conjugate contains a DBCO group at the internal position and a phosphorothioate group at the 30 -end, while the other contains an azide group at the internal position and a chloroacetyl group at the 50 -end. The SN2-type chemical ligation reaction between the terminal phosphorothioate and chloroacetyl groups proceeds only when the two conjugates hybridize with target DNA or RNA to form a tandem duplex. This structure facilitates successive click chemistry between DBCO and azides across a helical turn. These two chemical reactions are induced by hybridization with the target and should lead to the formation of a pseudorotaxane architecture (Fig. 5) (Onizuka et al. 2014). Thermodynamically stable complexes found in biological systems are, in most cases, the result of multivalency consisting of many weak bonds between cognate molecules. In one study, the optimization of multivalency in sugars was found to yield RNA-supported glycan clusters that bound tightly to HIV antibodies by directed evolution combined with click chemistry (CuAAC). The strategy adopted for the selection of glycan clusters was Capture SELMA (selection of modified

Fig. 5 Schematic representation of topologically stabilized pseudorotaxane formation

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1631

Fig. 6 Selection cycle of Capture SELMA for the evolution of glycol RNA aptamer. Translucent and solid colors denote RNA and DNA, respectively. b ! c: Synthesized RNA was captured by the intra-complex linker chain while stalling RNA synthesis at isodC. c ! a: Man9 glycomannose glycan was selectively modified on ethynyl groups by CuAAC. The DNA/RNA architectures, a, were subjected to the selection for POI (antibody 2G12)

aptamers), in which each of the glycol RNAs was at all times captured by a unique DNA architecture by a linker chain tethering it to the DNA that encoded it. That is, the genotype and its cognate phenotype always remained in the same complex (Fig. 6). This method was validated by the selection of oligomannose glycan clusters from a sequence pool of 1013 sequences that bound to the broadly neutralizing HIV antibody 2G12 with 13 to 36 nM affinities (Redman and Krauss 2021). Cis-Platinum(II) complexes (cisPtII) are typical metal complexes that are often used in chemotherapy medications to treat a number of cancers. The limitations of clinical platinum(II) therapeutics, however, include systemic toxicity and inherent resistance. Therefore, modern approaches seek new ways to deliver active cisPtII to discrete nucleic acid targets. Recently, a click chemistry (CuAAC and SPAAC)based approach was reported that used a combination of alkyne-modified triplexforming oligonucleotide (TFO) and azide-bearing cisPtII. The constructs were able to

1632

T. Ihara et al.

assemble modularly and enable directed cisPtII cross-linking to purine nucleobases on the target sequence under the TFO guidance (Hennessy et al. 2022).

Switching Stimuli-responsive auxiliary functional groups on nucleic acid conjugates provide conjugates with the ability to switch. Although many allosteric nucleic acid conjugates have been reported, here, we focus on nucleic acid conjugates that respond to chemical stimuli such as pH and metal ions, as well as physical stimuli of light.

pH Peptide–DNA conjugates can be rationally designed as smart ligands that allow pH-dependent control of antibody activity. This strategy uses a pH-responsive parallel motif DNA triple helix to control switching from a tight-binding bivalent peptide–DNA lock into a weaker-binding monovalent ligand. The design of this system is very flexible, as it allows antibody activation both at basic and acidic pHs. The response to pH can be tuned by the length and sequence of the triple-stranded region (Fig. 7). It is also possible to redesign a structure that responds to complementary short single-stranded DNA or RNA. The peptide–DNA lock allows pH-dependent antibody targeting of tumor cells both in bulk and for single cells confined in water-in-oil microdroplets. The latter approach enables high-throughput antibody-mediated detection of single tumor cells based on their metabolic activity (Engelen et al. 2020).

Fig. 7 Antibody activation at lower pH of tumor cell. pH of tumor cells is lower than that of normal cells. DNA complex consisting of two components (blue and red) carrying two epitopes binds tightly as bivalent ligand to the target antibody to prevent the binding it to the cell. At lower pH, intramolecular parallel triplex structure forms to break up the complex to provide weak monovalent ligands. Then, the antibody allows to bind to the tumor cell

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1633

Metal Ions Several metal ion chelator–DNA conjugates have been designed for various research purposes, including sensing and binding metalloregulation. One recent report involved a conjugate for which two terpyridine were inserted into the distal sites of its internal DNA backbone. This then formed an intramolecular 1:2 complex with divalent transition metal ions (Fe2+, Ni2+, Cu2+, Zn2+) and adopted an Ω-shape structure, in which two distal sequences, located outside the terpyridines, connect with each other to form a continuous segment with a specific structure or sequence (Fig. 8). In such cases, a DNA structure is globally controlled by local metal complexation events, and this can be rationally designed based on general coordination chemistry. This method is regarded as metal ion-directed dynamic sequence edition or DNA splicing. The activity of split DNAzyme can be regulated by several transition metal ions through the techniques of sequence edition based on the Ω-motif (Ihara et al. 2015). A DNA aptamer that binds to thrombin, one of the blood coagulation factors, has a G-quadruplex (G4) structure. In another case, four pyridine units were inserted into the putative loop regions in the G4 structure of the thrombin aptamer. After the addition of Cu2+ to this DNA conjugate, the four pyridines convergently coordinated with it and folded the DNA backbone, promoting the formation of a G4 structure that facilitates effective thrombin binding. Moreover, the addition of Cu2+ also resulted in delayed blood coagulation (Engelhard et al. 2017).

Fig. 8 Metal ion-directed reversible sequence edition. a ! b: Two terpyridines built into the backbone of the DNA conjugate formed an intramolecular complex, and the conjugate adopt an Ω-shape structure providing a new sequence shown in red. c: The new sequence reconstructs an active form of the DNAzyme with peroxidase-like activity. d: One of the possible structures of Ω-shaped DNA duplex with complementary DNA sequence

1634

T. Ihara et al.

Light There are many examples of photoresponsive DNA conjugates. Many photoresponse mechanisms involve light-induced isomerization of aromatic conjugated olefins introduced to the nucleic acid backbone or side chain, or light-induced cleavage of the main chain. In one study, multiple azobenzenes were introduced into the side chains (instead of the nucleobases) of DNA analogs. They succeeded in achieving reversible control of hybridization with two different wavelengths of light (Kamiya and Asanuma 2014). The G-rich DNA modified with azobenzene was used as smart ion channel. The trans-form stabilized G4 structure, which acted as a transmembrane K+ ion channel. In a sense, this is a smart ion channel that can be reversibly controlled by light (Li et al. 2021a). Interestingly many DNA aptamers take the G4 structure as their active form. The G4 structure is, however, kinetically very stable, and the dissociation of target molecules bound to that structure is slow. The photoisomerization of azobenzene has been reported to enhance the reversibility of G-rich DNA-based sensors. This has been shown to enable reversible regeneration of the electrochemical thrombin sensors by photo irradiation (Zhang et al. 2020). Photoswitching of DNA hybridization has also been achieved using a molecular motor as a modulator. In one study, large geometrical changes were generated by rotation of a molecular motor based on its cis-trans-isomerization, and as a consequence, the duplex stability of an oligonucleotide was regulated by light irradiation through the reversible motion of the molecular motor built into the backbone (Lubbe et al. 2018). In addition, the photoregulation of transcription in vitro has been reported using 7-deaza-adenosine-based diarylethenes (Büllmann et al. 2021). In another study, a guanine base was modified with 8-pyrenylvinyl deoxyguanosine to photocontrol the G4 structure and also succeeded in reversible control of gene expression in zebrafish (Ogasawara 2018). Conventional photocleaving strategies rely on UV light generated by high-power laser irradiation. These conditions are not appropriate for in vitro applicable as the penetration depth of UV light is very limited. The Cy7-based structure was introduced by CuAAC to a DNA backbone as a red-light photocleavable linker connecting two specific sites of a thrombin aptamer. The modified aptamer was deactivated with the light in phototherapeutic window (660 nm) even in human blood, where it restored natural coagulation capability (Müller et al. 2021).

Nucleic Acid Conjugates for Sensing and as a Research Tool Nucleic acid conjugates have been developed for a wide variety of bioanalyses. Papers describing biosensing and research tools involving bioconjugates can be classified by the types of analytes, modified molecules, and analytical techniques used. Despite some risk to the clarity of the structure of this review, here, we summarize these applications based on the keywords that best characterize the analytical methods.

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1635

Complementary DNA Probes Modified with Reporter Molecules The first type of nucleic acid conjugate produced were DNA probes, which are short oligonucleotides complementary to the target DNA or RNA sequences that are modified with fluorescent dyes or antigens for signal amplification. Additional examples of DNA conjugates with unique reporter functions include the following:

Spectrophotometry Since radiolabeled DNA probes have important risks and are rarely used, many short DNAs modified with a fluorescent dye have been developed as DNA probes. For live-cell RNA imaging, where probes cannot be washed out to reduce the background – as is done for in situ hybridization experiments – turn-on fluorescent probes have been developed to increase the signal-to-noise ratio. Exciton-controlled hybridization-sensitive fluorescent oligonucleotide (ECHO) probes are one of the most promising examples of this group. The background signal of ECHO probes in the unbound state is very low owing to the excitonic interaction provided by H-aggregation of the bis-thiazole orange moiety. In contrast, strong emission is shown after hybridization with target RNAs via disruption of H-aggregation and intercalation of thiazole orange dye into the duplex structure (Fig. 9). Bulky fluorescent ECHO probes that possess streptavidin or gold nanoparticles at the end of oligonucleotides have also been prepared. These bulky probes permit

Fig. 9 Light-up probe controlled by exciton coupling between thiazole orange dyes. (a) Splitting of the excited states of the thiazole orange H-aggregate. The arrows shown next to the energy levels represent the transition dipoles of the dye molecules. (b) Structure of a doubly dye-modified nucleotide. (c) Schematic illustration of sequence-specific detection of the target nucleic acid by an ECHO probe

1636

T. Ihara et al.

nucleus- and/or cytoplasm-selective monitoring of endogenous mRNAs through nuclear and cytoplasmic microinjection, respectively. Moreover, the simultaneous use of bulky and unbulky probes conjugated with different fluorescent dyes enables dual-color imaging of mRNAs present in the nucleus and cytoplasm (Hayashi et al. 2015). Nonnatural nucleoside analogs such as adenosine-1,3-diazaphenoxazine (Adap) derivatives have been used for selective detection of 8-oxo-20 -deoxyguanosine (8-oxo-dG), a damaged nucleobases. Since 8-oxo-dG forms a base pair not only with dC but also with dA, 8-oxo-dG induces transversion mutations from the GC to the TA during DNA replication. Therefore, it is of significant importance to develop methods for the selective detection of 8-oxo-dG. Adap has a highly selective stabilizing effect on the duplex containing the Adap–8-oxo-dG base pair. Furthermore, the fluorescence of Adap is useful for the selective detection of 8-oxo-dG in the duplex DNA (Taniguchi et al. 2011). G4 structures regulate many biological functions and are considered potential molecular targets for cancer therapeutics. However, due to the lack of analytical methods, the mechanism regulating monogenic G4s remain unclear. On G4 research technique, a module-assembled multifunctional probes assay (MAMPA) has been proposed as a way to visualize endogenous G4s for individual genes in single cells. This method functions by having two modular probes separately recognize G4 structures and adjacent RNA sequences, and module assembly by SPAAC enables imaging of G4s for an individual RNA with high specificity (Dai et al. 2022). Furthermore, DNA probes carrying multiple fluorescent bases and base-like molecules have been designed and synthesized. Several methods have been proposed using changes in the clustering state of these fluorescent bases upon hybridization to analyze DNA and RNA. One proposed method of fluorescence analysis of miRNAs (miR-21) is based on the formation of pyrene excimers by DNA probes carrying two pyrene-modified adenines that form three-way junctions with the complementary strand (Ro et al. 2018). Moreover, it is also possible to modify each of the successive nucleotide bases of DNA with one of several chromophores. Hybridization with the complementary DNA then results in a helical self-assembled chromophore cluster, with consequential changes in optical properties such as fluorescence. This technique would be potentially useful for optical nanodevices and as nucleic acid sensors. In recent studies, a DNA probe with adjacent bases substituted with pyrenes was used to measure the activity of ALKBH2, one of the enzymes that repairs DNA damage. The pyrene excimer was quenched by photoinduced charge transfer to the damaged base. Following repair of the positively charged lesion, the quenching was lost, resulting in fluorescence emission (Wilson et al. 2018). In addition, highly sensitive analysis of mRNA in cells has been reported using DNA analogs with multiple perylene and anthraquinone in appropriate arrangements. The signal-to-background ratio was as high as 1600 in vitro, which would be one of the most sensitive linear probes reported (Asanuma et al. 2015). Luminous lanthanide metal ion complexes have several advantageous properties for analytical use: long lifetime, large Stokes shift, and characteristic emission

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1637

spectra. A pair of split probes carrying EDTA and phenanthroline, each of which are essential elements for luminosity, have been described for sensitive and multicolored gene analysis. Colorimetric allele typing has also been successfully performed using Eu3+ and Tb3+, which emit red and green light, respectively (Kitamura et al. 2008). Other types of various turn-on probes have been reported. Fluorescence polarization using fluorescent dye-labeled DNA was used as the universal technique for genotyping and for sensing other molecule types such as small molecules and proteins. Cationic conjugated polymers (Li et al. 2018b) and β-cyclodextrin (βCyD) polymers (Liu et al. 2015) have also been used as signal transducers for DNA analyses. In addition, βCyD-modified DNAs have been designed and synthesized for DNA analysis. In a study, βCyD–DNA conjugates were used cooperatively with nucleobase-specific fluorescent ligands for single-base detection. This system was shown to be able to detect any base at any position by causing illumination in desired specific colors (Ihara et al. 2009; Futamura et al. 2013) (Fig. 10).

Fig. 10 Single-base designated fluorometric analysis using βCyD–DNA conjugate and nucleobase-specific fluorescent ligand (MNDS for G). (a) Schematic illustration of cooperative nucleobase recognition at specific position. (b) G recognition by MNDS. (c) One of the possible structures of the DNA complex (AMBER* force field with GB/SA (generalized Born/surface area))

1638

T. Ihara et al.

Electrochemistry Electrochemical techniques are cost-effective, potentially sensitive, and versatile. After the first reports of electrochemical DNA sensor using ferrocene (Fn)-modified DNA (Ihara et al. 1997), hundreds of the bioanalytical platforms based on electrochemical responses have been proposed for the study of specific interactions involving nucleic acids and/or proteins. Most of these proposed analytical systems are designed to amplify the signal itself or to generate structures that emit their own signals to increase the sensitivity. Enzymatic reactions are one of the primary approaches for signal amplification. For example, Φ29 DNA polymerase proceeds with the synthesis of daughter DNA while untying the hybridized complementary strand of its template due to its helicase activity. Therefore, the use of cyclic DNA as a template yields a DNA product with numerous tandemly repeated complementary sequences under isothermal conditions. This technique is known as rolling circle amplification (RCA) and widely used as a universal subroutine for signal amplification. Since a short DNA or RNA connector that is complementary to both ends of the linear DNA is required to circularize the DNA by ligase, RCA can be used to detect either the short DNA or RNA itself or its relevant bioprocesses with enhanced sensitivity. Recently, multiplex RCA assay has been reported. Three electrochemically active DNA probes, methylene blue, acridine orange, and [Ru(bpy)3]2+-labeled short DNA, were simultaneously used to detect and discriminate between three common enteropathogens in a single reaction, with femtomolar sensitivity (Yeap et al. 2021). Nicking endonucleases are a unique type of enzymes that can also be used in bioanalytical techniques. Like restriction endonucleases, they recognize short specific DNA sequences and cleave DNA at a fixed position relative to the recognition sequence. However, unlike restriction endonucleases, nicking endonucleases cleave only one predetermined strand. Recently, an ultrasensitive analysis of miRNAs by combined use of Nt.BstNBI, a nicking endonuclease, and DNA walker, a typical dynamic DNA device, was performed. In this study, the tripedal DNA walker activated by the target miRNA hybridized to the Fn-modified DNA fixed on the substrate in turn, and each time the Nt.BstNBI cut off the Fn strand, resulting in a significant change in the electrochemical signal (Chang et al. 2021) (Fig. 11). In another study, combined use of strand displacement polymerization from the nicking site and reductant-mediated signal amplification enabled electrochemical sensing of attomolar miRNA (Miao et al. 2018a). T7 exonuclease (T7 exo) hydrolyzes duplex DNA or RNA/DNA heteroduplexes in the 50 ! 30 direction to liberate oligonucleotides. This property of T7 exo has been used for signal amplification in many bioanalytical systems since it allows for target recycling (Miao et al. 2018b). Finally, another study reported how DNA tetrahedron structure units accumulated on an electrode by triggering target miRNA for its sensitive analysis. In this experiment, for the assembly of biotins, each of which was modified on the structural units, streptavidin-conjugated horse radish peroxidases (Av-HRP) were further modified. Amplified electric signal was detected by catalytic electron transfer to HRP mediated by 3,30 , 5,5-tetramethylbenzidine (Wan et al. 2020).

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1639

Fig. 11 Schematic illustration of the as-prepared electrochemical biosensor for the detection of miRNA. Annular DNA walker assembles from three-way junction (TWJ) structure by hybridization with target miRNA. The DNA walker forms duplexes with ferrocene-modified DNA probes immobilized on the electrode. Repeated cleavage of the probe by nicking endonuclease Nt.BstNB I amplifies the difference in electrochemical response

Nucleic acid chemistry is a practical field of supramolecular chemistry. In this field, a variety of autonomous reactions and enzyme-independent catalytic reactions (or combination of both sometime) can be designed to amplify or transduce signals. Ultrasensitive miRNA detections can be performed using sophisticated molecular systems involving superparamagnetic nanostructures coupled with an entropydriven DNA circuit (Li et al. 2020) and nanomachine-mediated catalytic hairpin assembly (Zhang et al. 2021b). Moreover, circulating tumor DNA (ctDNA), new class of cancer biomarker, has been successfully detected from whole blood by calibration-free amplified electrochemical measurements based on ctDNA-triggered hybridization chain reaction (HCR) on an electrode (Li et al. 2021b). Electrochemical genosensing amplified with target-initiated multipedal DNA walkers has also been reported. In one example, DNAzymes on each leg of the DNA walker (DNAzyme walker) cut the substrate sequences immobilized on an electrode one after another, thereby inducing changes in silver stripping peaks, which are welldefined sharp peak, and then can be used as an excellent electrochemical signal source (Chai and Miao 2019). Similarly, label-free sensitive electrochemical miRNA sensors have been reported using tripedal DNAzyme walkers assembled by target

1640

T. Ihara et al.

miRNA-initiated toehold-mediated strand displacement reactions (TSDR) (Xue et al. 2021). Proximity effects are a central and important idea in supramolecular chemistry and ultimately in all biological reactions. Electrochemically active DNA complexes, which stably form on electrode surfaces by the proximity effect, have been found to be destabilized by TSDRs stimulated by the target miRNA, permitting ultrasensitive miRNA detection (Zhang et al. 2018). Most electrochemical biosensors require one of the participants of the interaction to be fixed on the electrode, since the signal contrast is caused by the change in the distance between the electrochemically active probe/its redox center and the electrode surface (resulting in a change in the local concentration of the probe on the electrode) due to the target molecule or associated biological event. It is known that βCyD binds tightly with Fn to form an inclusion complex (βCyD  Fn), and the electrochemical activity of Fn accommodated in the βCyD cavity is suppressed since it is shielded from the bulk solution. Here, βCyD can, therefore, be regarded as a “quencher” of electrochemical signals, similar to an azo-quencher for FRET-based fluorimetry. Split probes consisting of Fn–DNA and βCyD–DNA have been developed for electrochemical gene sensing in homogeneous solutions. This technique is specific enough to detect single-base displacement in the target sequences, which causes large differences in electric currents in Fn–DNA (Ihara et al. 2011). Furthermore, the signal from such split probes have been shown to be amplified by entropy-driven DNA circuits (Kitamura et al. 2021) (Fig. 12a). Moreover,

Fig. 12 Electrochemical DNA/RNA sensing in solution using the β-CyD conjugated on DNA as an electrochemical quencher of ferrocene. (a) Gene sensing with split electrochemical probe amplified with entropy-driven DNA circuit. (b) Mechanism of action of ECMB

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1641

βCyD and Fn were modified on each of both ends of a DNA probe to form an electrochemical molecular beacon (ECMB). In the absence of a target, the electrochemical signal of the ECMB is suppressed in its hairpin structure, in which Fn is shielded in the inclusion complex formed at the end of the structure. The target sequence opened the structure of the ECMB to form a duplex and restore its electrochemical signal (Kitamura et al. 2020) (Fig. 12b).

Nucleic Acid Aptamer Conjugates As mentioned in the previous section, traditional DNA probes involve synthetic oligonucleotides complementary to the target sequence labeled with isotopes or fluorescent molecules, among others. At present, DNA probes are primitive but indispensable bioconjugates used in nucleic acid science research. On the other hand, simply replacing the nucleic acid part of the conjugate with aptamers permits the targeting of molecules other than nucleic acids. That is, aptamers are an interface between two different worlds: that of nucleic acids and that of non-nucleic acid molecules.

Spectrophotometry A protease-activatable aptamer system has been designed for molecular sensing and imaging. It has both enabled tumor cell-selective ATP imaging in vitro and produced a fluorescent signal in vivo with improved tumor specificity. This system also works as a redox-activatable DNA nanodevice for spatially selective, AND-gated imaging of ATP and glutathione in the mitochondria (Chai et al. 2021). Forced intercalation (FIT)–aptamer conjugates aptamers that fluoresce in response to steroids have also been reported. Here, aptamers were selected by screening from the library of greenfluorescent (TO, thiazole orange as viscosity-sensitive fluorescent dye) FIT– aptamers whose design is guided by computational modeling. These aptamers sense steroids like dehydroepiandrosterone sulfate (DHEAS) down to 1.3 μM with no loss in binding affinity compared to the unmodified (Ebrahimi et al. 2021). Trigging isothermal amplification of fluorescence signals has also been reported by subjecting the assembly of aptamer-type molecular beacons (MBs) to enzymatic reactions. In one study, the fluorescence of a multivalent aptamer probe (multi-VAP) showed a very high signal-to-noise ratio. It was strongly quenched by proximity effects (a dual effect) in the assembly in the absence of the target. However, when the target was added, the fluorescent groups were released from the assembly by successive enzymatic reactions, and the intrinsic fluorescence was fully restored. Using this system, salmonella was successfully detected with high sensitivity in a sample-to-result time of only 30 minutes (Xu et al. 2022) (Fig. 13). Electrochemistry Electrochemical sensors for non-nucleotide targets have also been developed using the aptamer-based MB. A research group showed rapid (seconds) and convenient (single-step and calibration-free) measurement of plasma vancomycin in finger-

1642

T. Ihara et al.

Fig. 13 Schematic illustration of multi-VAP-based trigging isothermal circular amplification (TICA) for Salmonella determination. Molecular beacons modified with biotin are assembled on streptavidin (SAV) to form multi-VAP and subjected to TICA. Restriction endonuclease (BamHI) cuts off the short duplex with fluorophore resulting in complete restoration of its fluorescence

prick-scale samples of whole blood and real-time measurements of vancomycin in situ in the veins of living animals (Dauphin-Ducharme et al. 2019). More recently, a rapid and efficient electrochemical sensor for SARS-CoV-2 was reported using the aptamer for the spike protein (Idili et al. 2021).

Lipid/Cholesterol-Modified DNAs Lipid–oligonucleotide conjugates (LONs) are powerful molecular engineering materials for various applications ranging from biosensors to biomedicine. Their unique amphiphilic structures enable the self-assembly and the anchoring to the cell surface. These features have been used for drug delivery, sensing, cell imaging, and various other cell engineering applications.

Liposome/Micelle Manipulation The general method for the modification of liposomes for encoded assembly on supported bilayers has been reported using LONs (Yoshina-Ishii et al. 2005). The control of the morphology and assembly of LON-embedded liposomes was studied using specific hybridization with a DNA target. In one study, stability-tunable DNA micelles were prepared using LONs. The researchers then inserted a photocleavable linker (PC linker) into the DNA moiety of the LON and succeeded in using light to destabilize the micelles, which had been stabilized by an intermolecular G4 structure

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1643

formed between LON molecules. The same research group also succeeded in using photoinduced polymerization to stabilize micelles consisting of LON bearing the DNA aptamer for the target cell. Covalently linked aptamer micelles showed an improved targeting ability (Li et al. 2018a). In addition, the studies have reported the formation of liposomes using DNA nanostructures as templates. Basically, multiple single-stranded DNA handles that displayed on the DNA-origami ring or cage were hybridized with the complementary DNA sequences of LONs. The lipid molecules nucleated liposome formation with the addition of extra lipids. As a result, liposomes with different complex membrane structures were generated based on defined DNA templates. If generalizable, this could be a platform for future studies of membrane mechanics and/or drug delivery across cell barriers and artificial organelles (Zhang et al. 2017).

Sensing on the Cell Surface Many sensing systems using the modification of DNA through covalent bonding or the anchoring lipid/cholesterol–DNA conjugates to the cell membrane have been reported. The ability to explore cell signaling and cell-to-cell communication is essential for understanding cell biology and developing effective therapeutics. In one study, PDGF (platelet-derived growth factor) aptamers were covalently attached to stem cells. The resulting sensor cell can quantitatively detect – with high spatial and temporal resolution – PDGF added to cell culture medium or secreted by neighboring cells (Zhao et al. 2011). In addition, another study reported how DNA nanomachines that change fluorescence in response to pH were anchored to the cells via cholesterol-modified DNA. These DNA nanomachines showed quantitative ability for extracellular or apoplastic pH detection and imaging (Zeng et al. 2018). The imaging of membrane order and dynamic interactions has also been monitored in living cells by DNA zipper probes. A strong correlation was observed between membrane order and the activation of T-cell receptor signaling (Bagheri et al. 2022). The simultaneous detection of K+ and Na+ levels is important because they have a synergistic effect on many biological processes in cells. In one study, researchers anchored DNA ion sensors to cells and succeeded in the simultaneous monitoring of K+ and Na+ in extracellular microenvironments. Responses to K+ and Na+ by the sensor were based on K+-directed G4 structure formation and Na+-dependent DNAzyme activity, respectively. In addition, the sensor performed the imaging of the two ions on the cell surface in a real-time, on-site manner with regulatory biological processes (Deng et al. 2021). Glycosylation is one of the most ubiquitous and complicated modifications of proteins and lipids. In one study, a fluorescent molecular beacon (glycan probe) and a DNA duplex carrying toehold (raft probe) were labeled to N-acetyl-neuraminic acid on glycan chains and lipid raft-specific protein, respectively. By repetitive digestion with a nicking endonuclease, Nt.BbvCI, the raft probe can be cyclically utilized to turn-on the fluorescence of the glycan probe, which only resides in rafts, via a proximity DNA hybridization. It permits the observation of the spatial heterogeneity of cell surface glycans, for example, distinguishing glycans exhibited in lipid raft or nonraft domains (Shi et al. 2021). In another study, a cholesterol-tethered

1644

T. Ihara et al.

IFNγ-specific aptamer MB was anchored onto immune cells (T cells). Secretion of IFNγ was successfully monitored by single-cell level using a combination with the technique with droplet microfluidics (Qiu et al. 2017). Cell-surface-anchored selfphosphorylating DNAzyme sensors have also been used for fluorescence imaging of both stress-induced endogenous ATP release in astrocytes and mechanical stimulation-evoked ATP release on the single-cell level (Zhao et al. 2021a). In one study, a DNA–cholesterol–DNA triblock conjugate was prepared. This conjugate penetrated the bilayer of a vesicle (GUV). The conjugate dimerized upon stimulation (primary messenger) outside the vesicle, and the DNA strands that were brought closer together inside the vesicle activated the internal DNAzyme, yielding an amplified fluorescent signal (secondary messenger) (Liu et al. 2021).

Liposome Fusion LONs can anchor onto the liposome membrane to prepare two pools of liposomes carrying single-stranded DNAs, each of which is complementary with each other. The kinetics of the fusion of these two liposomes has been investigated with respect to their dependence on linker length, DNA direction, and sequence (Chan et al. 2009). In one case, the Ca2+-triggered fusion of lipid vesicles to supported lipid bilayers (SLB) was reported. Ca2+ seems to bridge the lipid head groups and facilitate docking of the vesicles to SLB by zipperlike hybridization, giving rise to liposome fusion (Simonsson et al. 2010). Purification-free miRNA detection was performed by using magnetically immobilized liposome membranes with nanopores. The target miRNAs specifically hybridized with the oligo DNA bridge, which tethered magnetic beads to the liposome containing α-hemolysin (nanopore) through a cholesterol anchor, to form a heteroduplex. An existing duplex-specific nuclease cleaved only the DNA strand of the heteroduplex and then released the liposome fused to the bilayer channel on chip, thereby emitting an electrical signal derived from the nanopore. This signal was amplified in an isothermal reaction because miRNA can be recycled after enzymatic digestion (Fujii et al. 2018) (Fig. 14).

Antibody/Enzyme-Modified DNAs Nucleic acids and proteins are the twin mainstays of life. Many bioconjugates connecting these two components have been reported so far. The rapid growth of click chemistries has greatly facilitated the synthesis of nucleic acid-protein conjugates, and in recent years, they have been studied for a wide variety of applications.

Preparation of Antibody–DNA Conjugates Many antibody–DNA conjugates have been reported. The synergy between the exceptional target specificity of antibodies, and the structural programmability and signal amplification ability of DNA allows the conjugates to perform a variety of outstanding functions. Initially, classical bifunctional linker reagents such as those consisting of succinimide esters and maleimides were used to link nucleic acids and

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1645

Fig. 14 Schematic illustration of microRNA sensing. (a) Magnetic beads gather on the inner wall of the well, and (b) the target microRNA hybridizes to the cholesterol-modified linker oligo-DNA, which is then cleaved by DSN. (c) The nanopore (α-hemolysin)  liposome complexes are released from the magnetic bead and fused to the lipid bilayer. (d) The bilayer generates a stepped signal corresponding to the number of nanopores on it

proteins, but recent studies have chiefly used click chemistry. Affinity-based conjugation methods have also been used in order to produce conjugates with typical affinity pairs such as streptavidin–biotin and protein A/G–Fc monoclonal antibodies.

1646

T. Ihara et al.

The conjugation between DNA and antibodies using DNA aptamers has also been studied. For this method, antibody-selective aptamers were used to bind to the Fc domain of human IgG1 antibodies and worked both as templates for the subsequent formation of covalent bonds and for direct conjugation to the antibody. Obviously, these principles can also be applied to the preparation of protein–DNA conjugates other than antibodies (Skovsgaard et al. 2019).

Refined iPCR After the immunostaining by the antibody part of a conjugates, the DNA part is used for signal amplification. Immuno-PCR (iPCR), which combines the advantages of flexible and robust immunoassays with the exponential signal amplification power of PCR, is already an established technique. In the PCR amplification process, the DNA part of the conjugates is used as the template or as one of the primers for iPCR. In one study, the detection of Her2+ cells with great sensitivity and specificity was performed using site-specific antibody–DNA conjugates. Antibodies used in this work contained genetically encoded unnatural amino acids such as p-acetylphenylalanine and p-azidophenylalanine, which enable highly specific bioorthogonal conjugation with oligonucleotides (Kazane et al. 2012). In another study, a covalent and cleavable antibody–DNA conjugation strategy was proposed for sensitive protein detection via iPCR. This employed click chemistries (SPAAC and SPIEDAC) to link antibodies to DNA via disulfide linkers. The DNA, which has a barcode sequence corresponding to its connected antibody, can be released by reductive treatment after immunostaining. By reading these DNA barcodes, multiplex protein analysis is possible using iPCR and immuno-sequencing methodologies (van Buggenum et al. 2016). Proximity Assays Techniques using the proximity effect are anticipated to provide highly specific molecular recognition. These techniques include proximity ligation assays (PLAs), proximity extension assays (PEAs), and electrochemically proximity assays (ECPAs). Label-free protein detection has been successfully demonstrated using the formation of light-up RNA aptamers by transcription after PLA (i.e., by PITA, a proximity-induced transcription assay). That is, the fluorescence of the fluorophore DFHBI-1 T (3,5-difluoro-4-hydroxybenzylidene-1-trifluoroethyl-imidazolinone) was activated by a generated cognate Broccoli aptamer, resulting in the quantitative detection of PSA (prostate-specific antigen) (Ying et al. 2018) (Fig. 15). Highly sensitive detection of SARS CoV-2 protein by chemiluminescence assay was reported using an RCA-produced G4/hemin DNAzyme (Zhang et al. 2021a). Activity-dependent protein detection has also been proposed using PLA amplified with RCA. In this method, live cells are first pulsed with integrated chemical probes consisting of a substrate of POI (protein of interest) and a tag. The cells are then labeled with POI-specific and tag-specific antibodies, and subsequently, secondary antibodies conjugated to single-stranded barcode sequences. Subsequent incubation with bridging oligonucleotides allows for ligation and RCA. Finally, the signal is

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1647

Fig. 15 General principle of light-up RNA aptamer enabled label-free protein detection by PITA. Solid and translucent colors denote DNA and RNA, respectively. Two antibody–DNA conjugates are prepared. These two antibodies bind to different sites of POI. Each of the DNAs carries promotor and anti-DFHBI-1 T aptamer coding sequences, respectively. Following the proximity ligation, the RNA aptamers are produced and make an intense fluorescence

detected by incubating with a complementary fluorescent DNA probe oligonucleotide. In practice, this technique enabled the interrogation of protein activity rather than abundance (Li et al. 2017) (Fig. 16). Multiplex protein assays in a single cell have also been performed using PEA. These assays employed a pair of antibodies that each recognized two different parts of a POI. Each of the antibodies was conjugated with its own oligonucleotide and prepared for a specific POI. The two oligonucleotides each contained unique six-base complementary regions at their 30 prime ends to allow annealing and extension by DNA synthesis enzymes to form a DNA template. The DNA template is then detected by qPCR. Distinct oligonucleotide sequences are assigned to different antibody binders to enable multiplex protein detection (Gong et al. 2016).

Autonomous DNA Assembly One of the unique features of DNA is its structural programmability. Repetitive DNA structures, which are autonomously formed from encoded DNA fragments, enable multivalency and multilabeling for improving binding and signaling, respectively. Antibody-modified wireframe DNA cubes have been reported for the targeted delivery of multiple copies of monomethyl auristatin E (MMAE), a tubulin inhibitor. MMAE was loaded onto an antibody by CuAAC and SPAAC, and the obtained antibody–drug conjugate was used for targeted drug delivery in vitro (Märcher et al. 2021). Immunostaining with signal amplification by exchange reaction (ImmunoSABER) was reported to achieve highly multiplexed signal amplification via DNAbarcoded antibodies and orthogonal DNA concatemers generated by primer exchange reaction. SABER offers independently programmable signal amplification

1648

T. Ihara et al.

Fig. 16 Scheme of activity-dependent proximity ligation. a: Live cells are pulsed with a chemical probe, which labels only active POI. b: Detection of probe-labeled POI is performed by incubation of the cells with two primary antibodies directed to the POI and the tag. Two secondary antibody–oligonucleotide conjugates were then bound to the corresponding primary antibodies, and it enables hybridization and ligation of two bridging complementary oligonucleotides only when the probe and POI are in proximity. c: After ligation, signal is amplified by RCA and subsequent labeling with fluorescent DNA probe. d: Visualization and quantification of subcellular and intercellular enzyme activity is performed by fluorescence microscopy

without in situ enzymatic reactions. In one study, it demonstrated 5- to 180-fold signal amplification in diverse samples as well as simultaneous signal amplification for ten different proteins. This technique was also combined with expansion microscopy to enable rapid, multiplexed super-resolution tissue imaging (Saka et al. 2019). A multiplexed magnetofluorescent bioplatform has been used for the sensitive detection of SARS-CoV-2 viral RNA in the total RNA extracted from nasopharyngeal swabs of COVID-19-positive patients without nucleic acid amplification (Zayani et al. 2021).

Caged Nucleic Acids The method of temporarily turning off a function of a particular molecule so that it can be restored by a specific stimulus is called caging, and the target molecule is called a caged molecule. Researchers can then restore the original function (i.e., uncage) of the caged biomolecule spatiotemporally – even in a living cell or

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1649

tissue – by providing the corresponding appropriate stimulus. This is now an indispensable technique in chemical biology. The following is an overview of recent reports on caged nucleic acids by classifying the caged sites.

DNA/RNA Backbone The toehold-mediated strand exchange is the fundamental reaction underlying the formation of various DNA nanostructures as well as the DNA circuit, an autonomous chain reaction. Photocontrolled toehold formation has been achieved based on the photocleavage of 2-nitrobenzyl linker (PC linker)-embedded DNA hairpin precursor structures. UV light irradiation (λ ¼ 365 nm) of solutions containing these DNA hairpin structures causes the cleavage of the PC linker, and pure 1:1 DNA duplexes with toehold structures are then easily formed. The resulting toehold structures have been used for subsequent toehold-mediated DNA branch migration reactions, e.g., DNA hybridization chain reactions (Huang et al. 2013) (Fig. 17). The amplified fluorescence imaging of miRNA and the fluorescent/electrochemical dual sensing of mRNA have been performed in living cells using uncaged toeholdmediated strand exchange reactions. Near-infrared light (NIR)-activated multiplex miRNA analyses have also been reported using upconversion nanoparticles (Zhao et al. 2021b). These strategies use strand exchange reactions triggered by PC linker cleavage. In one study, a modifying reagent for internucleotide phosphates was prepared. This reagent enabled affinity-based purification of caged linear DNA for light-controlled gene expression in mammalian cells (Teraoka et al. 2014) (Fig. 18). Small molecule control of morpholino antisense oligonucleotide function has been proposed by taking advantage of Staudinger reduction. Cyclic oligonucleotides are conformationally gated and do not block relevant gene expression until they are linearized by the application of an external trigger. Phosphine-triggered knockdown of gene expression has been demonstrated in zebrafish embryos (Darrah et al. 2021). Caged siRNA was reported by modification with single cholesterol at the 50 terminal of antisense strand RNA via a PC linker. Native siRNA with phosphate at the 50 terminal was released by light irradiation and was found to regulate both exogenous and endogenous gene expression in cells in spatiotemporally specific manner (Yang et al. 2018). In another study, by the addition of input nucleic acids, a pair of the

Fig. 17 Principle of photocontrolled hidden-toehold activation. A 2-nitrobenzyl linker built into the backbone is cleaved by photoirradiation to emerge a single-stranded toehold. Strand exchange reaction proceeds from the toehold

1650

T. Ihara et al.

Fig. 18 Caging reaction and affinity purification of dsDNA with Bio-Bhc-diazo. (a) Caging BioBhc-diazo on phosphate groups in the backbone of dsDNA. (b) Affinity purification of Bio-Bhccaged dsDNA and its photo-uncaging

hairpin DNAs carrying the Ru complex (a sensitizer) were opened and formed into a duplex by a catalytic hairpin assembly. Ru complexes on both sticky ends of the duplex functioned as photocatalysts, causing the uncaging of anticancer drugs (5-fluorouracil, 5-FU) or fluorescent molecules tethered to the end of complementary PNAs via a PC linker. That is, the nucleic acid input was exponentially amplified with double-layered catalytic processes to generate the release of functional molecules (Kim et al. 2019). An athermal approach to mRNA enrichment from total RNA was reported using a self-immolative thioester linked nucleic acids (TENA). OligoT TENA has a six-atom spacing, which is the same as natural RNA. This allowed TENA to hybridize with the polyA sequence on mRNAs. The neutral backbone of TENA and the hydrophobicity of the octanethiol end group made TENA insoluble in water and efficiently pulled down of EGFP mRNA. Moreover, self-immolative degradation of TENA upon exposure to nucleophilic buffer components (e.g., Tris, DTT) allowed recovery of mRNAs at ambient temperature (Mavila et al. 2022) (Fig. 19).

Nucleobase Since the hybridization of nucleic acids is dominated by the formation of specific hydrogen bonds between complementary nucleobases, caging to nucleobases allows for direct control of the structure of DNA and RNA. Therefore, gene regulation using DNA fragments with caged nucleobases has been the subject of considerable active research. Stimuli-responsive dimerization of transmembrane proteins, receptor

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1651

Fig. 19 Athermal, chemically triggered release of mRNA from TENA. (a) The chemical structure and its self-immolative degradation reaction of TENA in the presence of a nucleophilic species. (b) Traditional thermal (left) and the TENA-based athermal approach (right) for mRNA enrichment/ release. The black circles represent the polyA tail of mRNA

tyrosine kinases (RTK), has been reported as being controlled using DNA conjugates. This conjugate consisted of two parts: an aptamer that binds to the extracellular domain of RTKs and a self-complementary linker sequence with cytosine caged by a PC group. Uncaging the conjugate by photoirradiation from an outer cell enabled hybridization of the conjugates bound to different RTKs, induced RTK to dimerize, and initiated phosphorylation cascades of intracellular kinases (Ueki et al. 2021). Exon skipping is the technique that is used to restore the reading frame within a gene. The mechanism behind exon skipping is a mutation-specific antisense oligonucleotide (ASO). The photoactivation of target genes was reported using modulation of the spliceosome machinery with short RNAs carrying photocaged uridines (off/on-type ASO). This technique has also been shown to reverse the direction of regulation by using another type of short RNAs in which the PC group was introduced into the RNA backbone (on/off-type ASO) (Hemphill et al. 2015). The

1652

T. Ihara et al.

RNA chaperone Hfq accelerates antisense pairing between noncoding RNAs (antisense) and their mRNA targets, by a mechanism still unknown. Light-triggered RNA annealing catalyzed by Hfq has been studied using the RNA oligonucleotide with photocaged guanosine to elucidate the mechanism of Hfq as chaperone. The results showed that the Hfq chaperone directly stabilizes the initiation of RNA base pairs (Panja et al. 2015).

Ribose The 20 -OH groups of RNAs exhibit relatively high nucleophilicity due to their low pKa. Since they are present in each unit of all RNA nucleotides, they would be an effective RNA-specific target for caging. Actually, 20 -OH groups of RNA can be selectively acylated in aqueous buffers with activated acyl compounds in structure mapping experiments; this is known as selective 20 -hydroxyl acylation and profiling experiment (SHAPE) as an established experimental technique (Spitale et al. 2015). External photocontrol over RNA function has also been performed by postsynthetic acylation of 20 -OHs with photoprotecting groups. One-step introduction of these groups efficiently blocked hybridization, which was restored after light exposure. Polyacylation (termed cloaking) enables control over a hammerhead ribozyme, resulting in optical control of RNA catalytic function (Velema et al. 2018). Recently, the same research group have taken a DNA-tiling approach for mRNA research by using different acylation reagents. Their method, tiled RNA acylation at induced loops (TRAIL), exploits a pool of “protector” oligodeoxynucleotides to hybridize and block mRNA, combined with an “inducer” DNA that extrudes a predetermined site of RNA as a loop to be activated for acylation. Using TRAIL, an azidoacylimidazole reagent, which can be removed by Staudinger reduction, was employed for labeling and controlling RNA for multiple applications, both in vitro and in cells. These experiments included analysis of RNA-binding proteins, imaging mRNA in cells, and analysis and control of translation (Xiao et al. 2021) (Fig. 20).

Conclusions and Perspectives There are three main reasons for the recent upsurge in research related to nucleic acid conjugates. Firstly, there is an emergence of basic biological phenomena and techniques to be targeted, including RNAi, epigenetics, and nucleic acid medicine, among others. Secondly, functional nucleic acids such as ribozymes, DNAzymes, and aptamers have been added as new options for the nucleic acid part of conjugates, in addition to the conventionally used complementary strands that simply bind to the target sequences. Thirdly, the availability of useful bioorthogonal reactions such as AAC, Staudinger reactions, and IEDAC has increased. Each of these three factors is an independent variable when designing new nucleic acid conjugates, so the number of the candidates of the competent conjugates has increased exponentially. That is, this situation has brought about great diversity in the structure and action mechanism of the conjugates and has synergistically accelerated the research in this field. Although not mainly dealt within this review, nucleic acid conjugates have also

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1653

Fig. 20 Outline of the TRAIL technique for site-specific mRNA acylation and applications in mRNA labeling and regulation of its expression. mRNA (blue) is tiled with protector DNAs (black) to cover most of the mRNA, except for a reactive loop extruded by an inducer DNA (red). Exposed 2’-OH groups in the loop can then selectively react with an acylating reagent, such as NAI-N3, yielding a site-localized mRNA conjugate after one-step DNA digestion and purification. The azide group on the mRNA adduct can be further used for the addition of a fluorophore or biotin label by SPAAC, or the adduct can be removed by Staudinger reduction to switch on biological activity

been used in a number of applications related to theragnostics and therapeutics. In these cases, the conjugates can play a wide range of roles, including drug carrier, cell recognition and delivery, immune adjuvant, RNAi regulator, and reactive oxygen species generator, among others. With the spread of modern research techniques such as various fluorescent proteins, ultrahigh-performance LC-MSn, advanced microscopies (including fluorescent microscopy and cryo-electron microscopy) with one-molecule resolution, SELEX, and CRISPR/Cas, nucleic acid conjugates are poised to become involved in an even great number of applications in the future.

References Asanuma H, Akahane M, Niwa R, Kashida H, Kamiya Y (2015) Highly sensitive and robust linear probe for detection of mRNA in cells. Angew Chem Int Ed 54:4315–4319 Bagheri Y, Ali AA, Keshri P, Chambers J, Gershenson A, You M (2022) Imaging membrane order and dynamic interactions in living cells with a DNA zipper probe. Angew Chem Int Ed 61: e202112033

1654

T. Ihara et al.

Büllmann SM, Kolmar T, Slawetzky P, Wald S, Jäschke A (2021) Optochemical control of transcription by the use of 7-deaza-adenosine-based diarylethenes. Chem Commun 57: 6596–6599 Chai H, Miao P (2019) Bipedal DNA walker based electrochemical genosensing strategy. Anal Chem 91:4953–4957 Chai X, Fan Z, Yu M-M, Zhao J, Li L (2021) A redox-activatable DNA nanodevice for spatiallyselective, AND-gated imaging of ATP and glutathione in mitochondria. Nano Lett 21: 10047–10053 Chan Y-HM, van Lengerich B, Boxer SG (2009) Effects of linker sequences on vesicle fusion mediated by lipid-anchored DNA oligonucleotides. Proc Natl Acad Sci U S A 106:979–984 Chang Y, Xu S, Li Y, Hu W, Li H, Yuan R, Chai Y (2021) DNA three-way junction with multiple recognition regions mediated an unconfined DNA walker for electrochemical ultrasensitive detection of miRNA-182-5p. Anal Chem 93:12981–12986 Dai Y, Teng X, Li J (2022) Single-cell visualization of monogenic RNA G-quadruplex and occupied G-quadruplex ratio through a module-assembled multifunctional probes assay (MAMPA). Angew Chem Int Ed 61:e202111132 Darrah K, Wesalo J, Lukasak B, Tsang M, Chen JK, Deiters A (2021) Small molecule control of morpholino antisense oligonucleotide function through Staudinger reduction. J Am Chem Soc 143:18665–18671 Dauphin-Ducharme P, Yang K, Arroyo-Currás N, Ploense KL, Zhang Y, Gerson J, Kurnik M, Kippin TE, Stojanovic MN, Plaxco KW (2019) Electrochemical aptamer-based sensors for improved therapeutics drug monitoring and high-precision, feedback-controlled drug delivery. ACS Sens 4:2832–2837 Debets MF, van Berkel SS, Dommerholt J, Dirks AJ, Rutjes FPJ, van Delft FL (2011) Bioconjugation with strained alkenes and alkynes. Acc Chem Res 44:805–811 Deng Z, Gao P, Liu H, He Y, Zhong S, Yang Y (2021) Cell-surface-anchored DNA sensors for simultaneously monitoring extracellular sodium and potassium levels. Anal Chem 93: 16432–16438 Dommerholt J, van Rooijen O, Borrmann A, Guerra CF, Bickelhaupt FM, van Delft FL (2014) Highly accelerated inverse electron-demand cycloaddition of electron-deficient azides with aliphatic cyclooctynes. Nat Commun 5:5378 Ebrahimi SB, Samanta D, Partridge BE, Kusmierz CD, Chen HF, Grigorescu AA, Chávez JL, Mirau PA, Mirkin CA (2021) Programming fluorogenic DNA probes for rapid detection of steroids. Angew Chem Int Ed 60:15260–15265 Engelen W, Zhu K, Subedi N, Idili A, Ricci F, Tel J, Merks M (2020) Programable bivalent peptide–DNA locks for pH-based control of antibody activity. ACS Cent Sci 6:22–31 Engelhard DM, Nowack J, Clever GH (2017) Copper-induced topology switching and thrombin inhibition with telomeric DNA G-quadruplexes. Angew Chem Int Ed 56:11640–11644 Fujii S, Kamiya K, Osaki T, Misawa N, Hayakawa M, Takeuchi S (2018) Purification-free microRNA detection by using magnetically immobilized nanopores on liposome membrane. Anal Chem 90:10217–10222 Futamura A, Uemura A, Imoto T, Kitamura Y, Matsuura H, Wang C-X, Ichihashi T, Sato Y, Teramae N, Nishizawa S, Ihara T (2013) Rational design for cooperative recognition of specific nucleobases using β-cyclodextrin-modified DNAs and fluorescent ligands on DNA and RNA scaffolds. Chem Eur J 19:10526–10535 Gong H, Holcomb I, Ooi A, Wang X, Majonis D, Unger MA, Ramakrishnan R (2016) Simple method to prepare oligonucleotide-conjugated antibodies and its application in multiplex protein detection in single cells. Bioconjug Chem 27:217–225 Hayashi G, Yanase M, Takeda K, Sakakibara D, Sakamoto R, Wang DO, Okamoto A (2015) Hybridization-sensitive fluorescent oligonucleotide probe conjugated with a bulky module for compartment-specific mRNA monitoring in a living cell. Bioconjug Chem 26:412–417

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1655

Hemphill J, Liu Q, Uprety R, Samanta S, Tsang M, Juliano RL, Deiters A (2015) Conditional control of alternative splicing through light-triggered splice-switching oligonucleotides. J Am Chem Soc 137:3656–3662 Hennessy J, McGorman B, Molphy Z, Farrell NP, Singleton D, Brown T, Kellett A (2022) A click chemistry approach to targeted DNA crosslinking with cis-platinum(II)-modified triplexforming oligonucleotides. Angew Chem Int Ed 61:e202110455 Huang F, You M, Han D, Xiong X, Liang H, Tan W (2013) DNA branch migration reactions through photocontrollable toehold formation. J Am Chem Soc 135:7963–7973 Idili A, Parolo C, Alvarez-Diduk R, Merkoçi A (2021) Rapid and efficient detection of the SARSCoV-2 spike protein using an electrochemical aptamer-based sensor. ACS Sens 6:3093–3101 Ihara T, Nakayama M, Murata M, Nakano K, Maeda M (1997) Gene sensor using ferrocenyl oligonucleotide. Chem Commun:1609–1610 Ihara T, Uemura A, Futamura A, Shimizu M, Baba N, Nishizawa S, Teramae N, Jyo A (2009) Cooperative DNA probing using a β-cyclodextrin–DNA conjugate and a nucleobase-specific fluorescent ligand. J Am Chem Soc 131:1386–1387 Ihara T, Wasano T, Nakatake R, Arslan P, Futamura A, Jyo A (2011) Electrochemical signal modulation in homogeneous solutions using the formation of an inclusion complex between ferrocene and β-cyclodextrin on a DNA scaffold. Chem Commun 47:12388–12390 Ihara T, Ohura H, Shirahama C, Furuzono T, Shimada H, Matsuura H, Kitamura Y (2015) Metal ion-directed dynamic splicing of DNA through global conformational change by intramolecular complexation. Nat Commun 6:6640 Kamiya Y, Asanuma H (2014) Light-driven DNA nanomachine with a photoresponsive molecular engine. Acc Chem Res 47:1663–1672 Kazane SA, Sok D, Cho EH, Uson ML, Kuhn P, Schultz PG, Smider VV (2012) Site-specific DNA-antibody conjugates for specific and sensitive immuno-PCR. Proc Natl Acad Sci U S A 109:3731–3736 Kim KT, Angerani S, Chang D, Winssinger N (2019) Coupling of DNA circuit and templated reactions for quadratic amplification and release of functional molecules. J Am Chem Soc 141: 16288–16295 Kitamura Y, Ihara T, Tsujimura Y, Osawa Y, Sasahara D, Yamamoto M, Okada K, Tazaki M, Jyo A (2008) Template-directed formation of luminescent lanthanide complexes: Vesatile tools for colorimetric identification of single nucleotide polymorphism. J Inorg Biochem 102:1921–1931 Kitamura Y, Mishio K, Arslan P, Ikeda B, Imoto C, Katsuda Y, Ihara T (2020) Electrochemical molecular Beacon for nucleic acid sensing in a homogeneous solution. Anal Sci 36:959–964 Kitamura Y, Yoshimura K, Kuramoto R, Katsuda Y, Ihara T (2021) Catalytic amplification of electrochemical signal in homogeneous solution using an entropy-driven DNA circuit. Anal Sci 37:533–537 Krasheninina OA, Thaler J, Erlacher MD, Micura R (2021) Amine-to-azide conversion on native RNA via metal-free diazotransfer opens new avenues for RNA manipulations. Angew Chem Int Ed 60:6970–6974 Li G, Montgomery JE, Eckert MA, Chang JW, Tienda SM, Lengyel E, Moellering RE (2017) An activity-dependent proximity ligation platform for spatially resolved quantification of active enzymes in single cells. Nat Commun 8:1775 Li X, Figg AC, Wang R, Jiang Y, Lyu Y, Sun H, Liu Y, Wang Y, Teng I-T, Hou W, Cai R, Cui C, Li L, Pan X, Sumerlin BS, Tan W (2018a) Cross-linked aptamer–lipid micelles for excellent stability and specificity in target-cell recognition. Angew Chem Int Ed 57:11589–11593 Li Z, Zhou X, Li L, Liu S, Wang C, Li L, Yu C, Su X (2018b) Probing DNA hybridization equilibrium by cationic conjugated polymer for Highly selective detection and imaging of single-nucleotide mutation. Anal Chem 90:6804–6810 Li J, Weng X, Mo F, Han M, Li H (2020) Superparamagnetic nanostructures coupled with an entropy-driven DNA circuit for elegant and robust photoelectrochemical biosensing. Anal Chem 92:15145–15151

1656

T. Ihara et al.

Li C, Chen H, Yang X, Wang K, Liu J (2021a) An ion transport switch based on light-responsive conformation-dependent G-quadruplex transmembrane channels. 57:8214–8217 Li S, Li H, Li X, Zhu M, Li H, Xia F (2021b) Hybridization chain reaction-amplified electrochemical DNA-based sensors enable calibration-free measurements of nucleic acids directly in whole blood. Anal Chem 93:8354–8361 Liu P, Sun S, Guo X, Yang X, Huang J, Wang K, Wang Q, Liu J, He L (2015) Competitive host–guest interaction between β-cyclodextrin polymer and pyrene-labeled probes for fluorescence analyses. Anal Chem 87:2665–2671 Liu G, Huang S, Liu X, Chen W, Ma X, Cao S, Wang L, Chen L, Yang H (2021) DNA-based artificial signaling system mimicking the dimerization of receptors for signal transduction and amplification. Anal Chem 93:13807–13814 Lubbe AS, Liu Q, Smith SJ, de Vries JW, Kistemaker JCM, de Vires AH, Faustino I, Meng Z, Szymanski W, Herrmann A, Feringa BL (2018) Photoswitching of DNA hybridization using a molecular motor. J Am Chem Soc 140:5069–5076 Märcher A, Nijenhuis MAD, Gothelf KV (2021) A wireframe DNA cube: Antibody conjugate for targeted delivery of multiple copies of monomethyl auristatin E. Angew Chem Int Ed 60: 21691–21696 Mavila S, Culver HR, Anderson AJ, Prieto TR, Bowman CN (2022) Athermal, chemically triggered release of RNA from thioester nucleic acids. Angew Chem Int Ed 61:e202110741 Miao P, Jiang Y, Zhang T, Huang Y, Tang Y (2018a) Electrochemical sensing of attomolar miRNA combining cascade strand displacement polymerization and reductant-mediated amplification. Chem Commun 54:7366–7369 Miao P, Zhang T, Xu J, Tang Y (2018b) Electrochemical detection of miRNA combining T7 exonuclease-assisted cascade signal amplification and DNA-templated copper nanoparticles. Anal Chem 90:11154–11160 Müller P, Sahlbach M, Gasper S, Mayer G, Müller J, Pötzsch B, Heckel A (2021) Controlling coagulation in blood with red light. Angew Chem Int Ed 60:22441–22446 Ogasawara S (2018) Transcription driven by reversible photocontrol of hyperstable G-quadruplexes. ACS Synth Biol 7:2507–2513 Onizuka K, Nagatsugi F, Ito Y, Abe H (2014) Automatic pseudorotaxane formation targeting on nucleic acids using a pair of reactive oligodeoxynucleotides. J Am Chem Soc 136:7201–7204 Panja S, Paul R, Greenberg MM, Woodson SA (2015) Light-triggered RNA annealing by RNA chaperone. Angew Chem Int Ed 54:7281–7284 Qiu L, Wimmers F, Weiden J, Heus HA, Tel J, Figdor CG (2017) A membrane-anchored aptamer sensor for probing IFNγ secretion by single cells. Chem Commun 53:8066–8069 Redman RL, Krauss IJ (2021) Directed evolution of 20 -fluoro-modified, RNA-supported carbohydrate clusters that bind tightly to HIV antibody 2G12. J Am Chem Soc 143:8565–8571 Ro JJ, Lee HJ, Kim BH (2018) PyA-cluster system for the detection and imaging of miRNAs in living cells through double-three-way junction formation. Chem Commun 54:7471–7474 Saka SK, Wang Y, Kishi JY, Zhu A, Zeng Y, Xie W, Kirli K, Yapp C, Cicconet M, Beliveau BJ, Lapan SW, Yin S, Lin M, Boyden ES, Kaeser PS, Pihan G, Church GM, Yin P (2019) ImmunoSABER enables highly multiplexed and amplified protein imaging in tissues. Nat Biotechnol 37:1080–1090 Saxon E, Armstrong JI, Bertozzi CR (2000) A “traceless” Staudinger ligation for the chemoselective synthesis of amide bonds. Org Lett 2:2141–2143 Shi H, Chen Y, Li Y, Chen L, Wang H, Yang C, Ding L, Ju H (2021) Hierarchical fluorescence imaging strategy for assessment of the sialylation level of lipid raft on the cell membrane. Anal Chem 93:14643–14650 Simonsson L, Jönsson P, Stengel G, Höök F (2010) Site-specific DNA-controlled fusion of single lipid vesicles to supported lipid bilayers. ChemPhysChem 11:1011–1017 Skovsgaard MB, Mortensen MR, Palmfeldt J, Gothelf KV (2019) Aptamer-directed conjugation of DNA to therapeutic antibodies. Bioconjug Chem 30:2127–2135

52

Nucleic Acid Conjugates for Biosensing: Design, Preparation, and Application

1657

Spitale RC, Flynn RA, Zhan QC, Crisalli P, Lee B, Jung J-W, Kuchelmeister HY, Batista PJ, Torre EA, Kool ET, Chang HY (2015) Structural imprints in vivo decode RNA regulatory mechanisms. Nature 519:486–490 Taniguchi Y, Kawaguchi R, Sasaki S (2011) Adenosine-1,3-diazaphenoxazine derivative for selective base pair formation with 8-oxo-2’-deoxyguanosine in DNA. J Am Chem Soc 133: 7272–7275 Teraoka A, Murakoshi K, Fukamauchi K, Suzuki AZ, Watanabe S, Furuta T (2014) Preparation and affinity-based purification of caged linear DNA for light-controlled gene expression in mammalian cells. Chem Commun 50:664–666 Ueki R, Hayashi S, Tsunoda M, Akiyama M, Liu H, Ueno T, Urano Y, Sando S (2021) Nongenetic control of receptor signaling dynamics using a DNA-based optochemical tool. Chem Commun 57:5969–5972 van Buggenum JAGL, Gerlach JP, Eising S, Schoonen L, van Eiji RAPM, Tanis SEJ, Hogeweg M, Hubner NC, van Hest JC, Bonger KM, Mulder KW (2016) A covalent and cleavable antibody–DNA conjugation strategy for sensitive protein detection via immuno-PCR. Sci Rep 6:22675 Velema WA, Kietrys AM, Kool ET (2018) RNA control by photoreversible acylation. J Am Chem Soc 140:3491–3496 Wan Y, Wang H, Ji J, Kang K, Yang M, Huang Y, Su Y, Ma K, Zhu L, Deng S (2020) Zipping DNA tetrahedral hyperlink for ultrasensitive electrochemical microRNA detection. Anal Chem 92: 15137–15144 Wilson DL, Beharry AA, Srivastava A, O’Connor TR, Kool ET (2018) Fluorescence probes for ALKBH2 allow the measurement of DNA alkylation repair and drug resistance responses. Angew Chem Int Ed 57:12896–12900 Xiao L, Jun YW, Kool ET (2021) DNA tiling enables precise acylation-based labeling and control of mRNA. Angew Chem Int Ed 60:26798–26805 Xu J, Zhang X, Yan C, Qin P, Yao L, Wang Q, Chen W (2022) Trigging isothermal circular amplification-based tuning of rigorous fluorescence quenching into complete restoration on a multivalent aptamer probe enables ultrasensitive detection of Salmonella. Anal Chem 94: 1357–1364 Xue Y, Wang Y, Feng S, Yan M, Huang J, Yang X (2021) Label-free and sensitive electrochemical biosensor for amplification detection of target nucleic acids based on transduction hairpins and three-leg DNAzyme walkers. Anal Chem 93:8962–8970 Yamamoto J, Ebisuda S, Kong L, Yamago H, Iwai S (2017) Post-synthetic modification of 30 terminus of RNA with propargylamine: A versatile scaffold for RNA labeling through coppercatalyzed azide-alkyne cycloaddition. Chem Lett 46:767–770 Yang J, Chen C, Tang X (2018) Cholesterol-modified caged siRNAs for photoregulating exogeneous and endogenous gene expression. Bioconjug Chem 29:1010–1015 Yeap CSY, Chaibun T, Lee SY, Zhao B, Jan Y, La-o-vorakiat C, Surareungchai W, Song S, Lertanantawong B (2021) Ultrasensitive pathogen detection with a rolling circle amplification-empowered multiplex electrochemical DNA sensor. Chem Commun 57: 12155–12158 Ying Z-M, Xiao H-Y, Tang H, Yu R-Q, Jiang J-H (2018) Light-up RNA aptamer enabled label-free protein detection via a proximity induced transcription assay. Chem Commun 54:8877–8880 Yoshina-Ishii C, Miller GP, Kraft ML, Kool ET, Boxer SG (2005) General method for modification of liposomes for encoded assembly on supported bilayers. J Am Chem Soc 127:1356–1357 Zayani R, Rezig D, Fares W, Marrakchi M, Essafi M, Raouafi N (2021) Multiplexed magnetofluorescent bioplatform for the sensitive detection of SARS-CoV-2 viral RNA without nucleic acid amplification. Anal Chem 93:11225–11232 Zeng S, Liu D, Li C, Yu F, Fan L, Lei C, Huang Y, Nie Z, Yao S (2018) Cell-surface-anchored ratiometric DNA tweezer for real-time monitoring of extracellular and apoplastic pH. Anal Chem 90:13459–13466

1658

T. Ihara et al.

Zhang Z, Yang Y, Pincet F, Llaguno MC, Lin C (2017) Placing and shaping liposomes with reconfigurable DNA nanocages. Nat Chem 9:653–659 Zhang X, Yang Z, Chang Y, Quin M, Yuan R, Chai Y (2018) Novel 2D-DNA-nanoprobe-mediated enzyme-free-target-recycling amplification for the ultrasensitive electrochemical detection of microRNA. Anal Chem 90:9538–9544 Zhang L, Zhang X, Feng P, Han Q, Liu W, Lu Y, Song C, Li F (2020) Photodriven regeneration of G-quadruplex aptasensor for sensitively detecting thrombin. Anal Chem 92:7419–7424 Zhang R, Wu J, Ao H, Fu J, Qiao B, Wu Q, Ju H (2021a) A rolling circle-amplified G-quadruplex/ hemin DNAzyme for chemiluminescence immunoassay of the SARS-CoV-2 protein. Anal Chem 93:9933–9938 Zhang X-L, Yin Y, Du S-M, Kong L-Q, Chai Y-Q, Li Z-H, Yuan R (2021b) Dual 3D DNA nanomachine-mediated catalytic hairpin assembly for ultrasensitive detection of microRNA. Anal Chem 93:13952–13959 Zhao W, Schafer S, Choi J, Yamanaka YJ, Lombardi ML, Bose S, Carlson AL, Phillips JA, Teo W, Droujinine IA, Cui CH, Jain RK, Lammerding J, Love JL, Lin CP, Sarkar D, Karnik R, Karp JM (2011) Cell-surface sensors for real-time probing of cellular environments. Nat Nanotechnol 6: 524–531 Zhao D, Chang D, Zhang Q, Chang Y, Liu B, Sun C, Li Z, Dong C, Liu M, Li Y (2021a) Rapid and specific imaging of extracellular signaling molecule adenosine triphosphate with a selfphosphorylating DNAzyme. J Am Chem Soc 143:15084–15090 Zhao T, Gao Y, Wang J, Cui Y, Niu S, Xu S, Lou X (2021b) From passive signal output to intelligent response: “On-demand” precise imaging controlled by near-infrared light. Anal Chem 93: 12329–12336

Molecular Beacons With and Without Quenchers

53

SueJin Lee and Byeang Hyean Kim

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Molecular Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structure of Molecular Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mechanism and Principles of Molecular Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advantages and Limitations of Molecular Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modifications of Molecular Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications of Molecular Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quencher-Free Molecular Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mono-labeled Quencher-Free Molecular Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dual-Labeled Quencher-Free Molecular Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Applications of Quencher-Free Molecular Beacons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1660 1662 1663 1664 1666 1667 1671 1673 1674 1682 1687 1689 1690

Abstract

Modified nucleic acids have a wide range of applications in many areas of biochemistry. In particular, fluorescence-based nucleic acid systems have been studied extensively for their implementation in molecular biology as platforms for disease diagnosis. A hybridization probe is a fluorescent oligonucleotide used in DNA analysis, operating through sequence-specific complementary binding of a short synthetic oligonucleotide containing a fluorescent tag. Such fluorescent oligonucleotides play important roles in single-nucleotide polymorphism (SNP) typing, allowing both quantitative and qualitative analyses. Among them, stem–loop oligonucleotide probes have been developed to improve the specificity and selectivity toward target DNA. A molecular beacon (MB), a representative oligonucleotide probe having a stem–loop structure containing fluorescent and quencher units, is a probe used in biomolecular recognition. MB-based assays are S. Lee · B. H. Kim (*) Bioneer, Daejeon, Republic of Korea e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_59

1659

1660

S. Lee and B. H. Kim

fast, simple, and inexpensive and they enable real-time monitoring of nucleic acid responses in vivo and in vitro. Modifications of the structures and functions of MBs can lead to improved performance. For example, quencher-free molecular beacons (QF-MBs) are MBs, in which the quenching agent has been removed. Despite the absence of a quencher, QF-MBs can also identify specific target DNAs with high selectivity and sensitivity. MB and QF-MB probes have been applied widely in various fields, including SNP typing, monitoring of polymerase chain reactions (PCR), real-time detection of DNA–RNA hybridization in living cells, DNA mutation analysis, and disease diagnosis, including point-of-care (POC) testing. Keywords

Molecular beacons · Quencher-free molecular beacons · Fluorescent oligonucleotides · Hybridization probes

Introduction Nucleic acids are the most important biopolymers used by living organisms to transfer genetic information; they also play significant roles in controlling various biological functions in the life cycle (Podder et al. 2021). Deoxyribonucleic acid (DNA) transmits genetic information for the development, functioning, growth, and reproduction of organisms and viruses; ribonucleic acid (RNA) is responsible for coding, decoding, regulation, and expression of genes. These are extremely important components of life. The complete instructions for generating a human are encoded in the DNA present in our cells; the human genome comprises roughly four billion base pairs (BPs) of DNA (Jackson et al. 2018). A mutation is any heritable change to the DNA sequence, where heritable refers to both somatic cell division and germline inheritance. Many human diseases result from genetic mutations, including DNA damage in the nucleus, errors during DNA replication, and some defects caused by chemical and environmental factors. The ability to analyze and diagnose the causes of these diseases at the genetic level would enable diagnoses to be made before or during the early stages of onset, thereby helping to prevent and/or treat the disease progression. Among the many possible gene mutations, single-nucleotide polymorphisms (SNPs) are the most common; in humans, they account for 80–90% of the differences between the genomes of two individuals (Gerasimova et al. 2010). SNPs change both the structures and functions of their encoded proteins, and thus, they are related directly to the physiological properties of living systems. SNPs are associated with many common genetic disorders, including sickle cell anemia and muscular dystrophy; they may also serve as genetic markers. These nucleotide variations within the genome are, in some cases, linked to the variable responses that patients can have toward drugs. Reliable methods for the rapid screening of SNPs would facilitate the study and identification of diseasecausing genes (Ryu et al. 2007).

53

Molecular Beacons With and Without Quenchers

1661

The modification of nucleic acids and oligonucleotides is a major aspect of biochemical and chemical research. In particular, modified nucleic acid-based biosensors are receiving a great deal of attention for use as molecular probes because of their ease of chemical synthesis and functional modification, the specificity of their base pairing, and the predictability of their intermolecular and/or intramolecular interactions (Huang et al. 2015). Diagnoses of SNPs have traditionally been based on various forms of enzymatic reactions, including sequencing, the polymerase chain reaction (PCR), and microarray techniques (Chen et al. 2020). The ideal way to detect an SNP would be to mix a DNA or RNA target with a probes and obtain a measurable signal directly. Watson–Crick hybridization of complementary sequences is the process underlying molecular recognition of nucleotides in vivo, and it forms the basis for using probes to identify and detect specific genes. This hybridization technique uses a specifically labeled probe of a small single-stranded (ss) RNA or DNA fragment capable of recognizing the complementary sequence of a target RNA or DNA (Goel et al. 2005). Conventional methods for the labeling of fluorescent probes can necessitate the removal of unbound probes, requiring additional washing steps; they can also disrupt equilibria during nucleic acid hybridization. In addition, radiolabeled probes can be toxic because of their radioactive components. Accordingly, conventional fluorescent probes and radiolabeled probes cannot be used to monitor the real-time amplification of DNA during the PCR or during in vivo DNA synthesis. Hybridization techniques can, however, compensate for these shortcomings, allowing oligonucleotide probes to be used in the dynamic real-time detection of nucleic acid amplification both in vivo and in vitro. Fluorescent probes for biological use require high reactivity and chemoselectivity, and they must function in a way that provides noticeable images against a dark background. Most fluorescent nucleic acid probes have been synthesized from oligonucleotides by attaching both a fluorophore and a quencher, or through covalent attachment of multiple fluorophores. In some cases, the modified nucleic acid itself acts as a fluorophore in the oligonucleotide (Podder et al. 2021). Improved performance can result when the probes feature a variety of fluorophores. A good fluorophore should possess high photostability, fluorescence, and quantum yield and perform well over a range of excitation and emission wavelengths. Fluorophores emit their signal through several mechanisms, including photoinduced electron transfer (PET), internal charge transfer (ICT), fluorescence resonance energy transfer (FRET), and aggregation-induced emission (AIE). In particular, molecular beacons (MBs) – the subject of this chapter – are single-stranded oligonucleotide probes having a loop-stem hairpin structure, and their fluorophore emissions occur through FRET (Tyagi and Kramer 1996). Even if conventional single-stranded oligonucleotide probes can find their complementary strands in the presence of an excess of other nucleic acids, they have a disadvantage in that only small measurable changes occur to the physical properties of the nucleic acids during hybridization. Therefore, when measuring the degree of hybridization, the sensitivity will be low because it would be necessary to label the oligonucleotide probe, immobilize the target molecule on a solid surface, then remove the unbound probe. On the other hand, MB probes – and, especially,

1662

S. Lee and B. H. Kim

quencher-free molecular beacons (QF-MBs) – not only exhibit high sensitivity and selectivity but also can be used as efficient probes for real-time measurements, including in living cells. In this chapter, we discuss the characteristics, functions, and mechanisms of MB probes, including (i) the various types of modifications that can improve the MB performance, (ii) the properties of QF-MBs, which feature only a fluorophore (i.e., no quenching agent) and operate through the quenching effects of neighboring bases, and (iii) the applications of MB probes.

Molecular Beacons MBs are oligonucleotide hybridization probes that are generally used in the fluorimetric analysis of nucleic acids. Since they were first reported by Tyagi and Kramer (1996), MBs have become very popular probes for analysis and for related fields of molecular sensing, including the detection of PCR products, mutational analysis, and medical diagnostics (Sokol et al. 1998; Song et al. 2009; Tan et al. 2014). As presented in Fig. 1a, an MB is a single-stranded oligonucleotide probe possessing a special hairpin (stem-and-loop) structure that is doubly labeled with a fluorophore and a quencher at its 50 - and 30 -ends, respectively (Huang et al. 2015; Tyagi et al. 1998). In the absence of a target, the fluorescence of the fluorophore at the 50 -end is quenched by the effect of the quencher; that is, when the fluorophore and quencher are positioned close together in the hairpin structure, efficient intramolecular energy transfer occurs between them. In contrast, a target sequence complementary to the loop of the MB will induce hybridization, with the MB undergoing a conformational transformation from a closed to open structure, leading to separation of the

Fig. 1 (a) Chemical structure of a conventional MB. (b) Mechanism of operation of a conventional MB. (Reprinted from Li et al. (2008), copyright 2008, with permission from Elsevier)

53

Molecular Beacons With and Without Quenchers

1663

fluorophore and quencher units; consequently, a fluorescent signal will appear (Fig. 1b) (Tyagi and Kramer 1996; Venkatesan et al. 2008). The temperature range over which MBs can distinguish mismatches is wider than that of unstructured probes, because the stem–loop structure stabilizes the probe when dissociated from the analyte. Furthermore, MBs can be simple to operate, with high sensitivity in their specific sequence-selective recognition of oligonucleotides.

Structure of Molecular Beacons Figure 1 reveals that an MB consists of a loop, stem, and signal reporter. The loop position is the probe sequence and the real determinant of the MB’s specificity. The length of the probe sequence should be designed in consideration of the annealing temperature used in PCR (Broude 2002). The melting temperature of the probe-target hybrid is affected by the percentage of guanine and cytosine (GC) content, the ionic concentration, and the probe concentration. The melting temperature can be predicted using a commercial software package (e.g., Oligo 6.0) (Zadeh et al. 2011; Zuker 2003). The length of the probe sequence generally ranges from 15 to 30 nucleotides; it should not form any secondary structure (Tyagi et al. 1998; Kostrikis et al. 1998; Zheng et al. 2015). Increasing the probe length can improve the affinity but, at the same time, decrease the specificity. The stem position is formed by the annealing of two complementary arm sequences positioned on either side of the probe sequence; these arm sequences are not related to the target sequence (Tyagi and Kramer 1996). While the MB maintains its closed structure in the absence of a target, its stem portion usually features the Watson–Crick hydrogen bonding of natural DNA base pairs (Zheng et al. 2015). The stem sequence should have a melting temperature that is 7–10  C higher than the detection temperature. Increasing the GC content can increase the stability of the stem. An MB has maximum stability when it consists of a 15-to-25base sequence with 5 to 7 base pairs in the stem. If the MB has a relatively short stem, it can undergo faster hybridization and give a lower signal-to-background ratio. Furthermore, the length of the stem of an MB can affect the difference in melting temperatures between perfectly complementary and mismatched duplexes (Goel et al. 2005; Drake and Tan 2004). The signal reporter generally consists of two elements: a fluorophore and a quencher. Typically, the fluorophore is attached at the 50 -end position and the quencher is positioned at the 30 -end. Many kinds of fluorophore dyes have been tested for their efficiency in MBs, including 5-((2-aminoethyl)amino)-naphthalene-1sulfonic acid (EDNAS), fluorescein (FAM), tetrachloro-6-carboxyfluorescein (Tet), hexachloro-6-carboxyfluorescein (Hex), tetramethyl rhodamine (TAMRA), and 5-carboxyrhodamine-X (ROX). Because of the flexibility in the use of reporter dyes, the applications of MBs can be expanded to multiplex detection reactions, where multiple targets can be distinguished in the same solution. The proper selection of the fluorophore is one of the most important aspects affecting the signal-to-background ratio (Goel et al. 2005). The capture and transfer of energy

1664

S. Lee and B. H. Kim

from an excited fluorophore is known as quenching; the substances involved in this process are termed quenchers. When an MB exists in its closed-loop shape in the absence of the target, the quencher prevents the fluorophore from emitting light. 4-(40 -Dimethylaminophenylazo)benzoic acid (DABCYL) is a nonfluorescent chromophore commonly used as a quencher. Because of its neutral and hydrophobic structure, DABCYL can be used as a universal quencher for a variety of fluorophores. Although it optimally quenches fluorescein, it can decrease the quenching efficiency for dyes emitting at longer wavelengths. The use of metal ions has made it possible to employ various other types of highly efficient quenchers, including gold nanoparticles (AuNPs), single-walled carbon nanotubes (SWNT), and graphene oxide (GO) (Zheng et al. 2015; Tsourkas et al. 2001; Dubertret et al. 2001; Yang et al. 2008; Lu et al. 2010; Yeh et al. 2010; Tan et al. 2000). Normally, MBs can be synthesized as soluble or glass-bound probes. Although the two generally have similar properties, glass-bound probes have the advantage that they can be harnessed in nucleic acid screening in the form of controlled pore glass (CPG)-bound beacons, where hybridization with the target DNA or RNA generates fluorescent beads that can be isolated and analyzed (Tyagi et al. 1998; Brown et al. 2000). Typically, an MB probe is synthesized from the modified oligonucleotide having a primary amino group at the 30 -end and a trityl-protected sulfhydryl group at the 50 -end. The sulfhydryl group is covalently attached to the 50 -phosphate through a (CH2)6 spacer; the amino group is linked to the 30 -hydroxyl moiety through a (CH2)7 spacer. Using two consecutive coupling reactions, the quencher and fluorophore are attached to their respective ends of the stem. The static quenching efficiencies of different pairs of fluorophores and quenchers should be considered in the quest for improved sensitivity. After first coupling the quencher, the MB probe is typically purified through high-performance liquid chromatography (HPLC) to remove any unreacted quencher units. The protective trityl group is then removed from the sulfhydryl group of the purified oligonucleotide, and the fluorophore is coupled to the thiol. The oligonucleotide is the purified again, typically through gel exclusion chromatography and HPLC (Goel et al. 2005; Fang et al. 1999). The synthesized MB can be characterized using UV spectroscopy and mass spectrometry. Such MB probes can be prepared in bound form, as biotinylated or CPG-bound probes.

Mechanism and Principles of Molecular Beacons The leading mechanism of operation of an MB probe is static or contact quenching, which occurs when the distance between the dyes is short, such that electron excitation energy is transferred (without radiation) from the fluorophore to the quencher. In the absence of a complementary target, the MB forms the hairpin structure. Because of the proximity of the fluorophore and quencher in this state, they share electrons and form a nonfluorescent complex that absorbs light energy and transfers it into heat energy. In other words, most of the absorbed energy is dissipated as heat and only a small amount of energy is emitted as light. One of the

53

Molecular Beacons With and Without Quenchers

1665

attractive features of contact quenching is that all fluorophores are quenched equally well, irrespective of whether its emission spectrum overlaps with the absorption spectrum of the quencher, one of the key conditions that determines FRET (Goel et al. 2005; Marras et al. 2002). FRET requires two conditions: (i) The fluorophore and quencher should interact within a distance of 20–100 Å and (ii) there should be significant overlap between the emission spectrum of the fluorophore and absorption spectrum of the quencher. In general, transfer of the excited state energy occurs from the initially excited donor to the acceptor, with the donor molecule emitting at wavelengths that overlap with the absorption spectrum of the acceptor. FRET is a mechanism describing energy transfer between two chromophores. FRET occurs without the appearance of a photon. A donor chromophore in an electronic excited state can transfer energy to an acceptor molecule through nonradiative dipole–dipole interactions. The rate of energy transfer depends on several factors, including the degree of spectral overlap of the emission spectrum of the donor with the absorption spectrum of the acceptor, the relative orientation of the donor and acceptor molecules, and the quantum yield of the donor (Han et al. 2013). Moreover, the distance between the donor and acceptor is an important factor, because the efficiency of energy transfer is inversely proportional to the sixth power of this distance. When the hairpin-structured probe undergoes annealing with the complementary sequence, the higher stability of probe–target duplex disrupts the unimolecular stem–loop conformation, separating the fluorophore and the quencher. Therefore, hybridization with the target nucleic acid leads to emission of a fluorescence signal of a characteristic wavelength. The specificity of an MB is much higher than that of a linear probe, due to the stem–loop hairpin structure of the MB (Tyagi et al. 1998). In an ideal MB probe, no fluorescence occurs in the presence of a single-base mismatched oligonucleotide (Tyagi and Kramer 1996), but when the MB is hybridized with its target sequence, the probe and target will form a rigid double-stranded DNA. The degree of discrimination of the MB hybridization between the perfectly matched and single-base mismatched DNA relies on the stability of the newly formed DNA duplex. The thermal stability of the DNA duplex (i.e., the melting temperature of the MB probe) is affected by the chain length of the hybridization sequence, the GC content ratio, the ionic concentration of the buffer, and the location of the mismatched bases in the sequence. In particular, adjusting the GC content ratio can be used to optimize the specificity and sensitivity (Tyagi and Kramer 1996; Tyagi et al. 1998). From a thermodynamic analysis of MB transitions, Kramer and coworkers found that enhanced specificity is a general feature of conformationally constrained probes (Bonnet et al. 1999). Because the efficiency of an MB is influenced by many factors, false-positive and false-negative signals often appear. Depending on the distance between the fluorophore and quencher, the fluorescence signal can generally be increased by 10- to 100-fold upon hybridization. The temperature and environmental pH can also significantly influence the efficiency of the probe. The efficiency of an MB can be improved by studying its changes in fluorescence with respect to temperature in both the presence and absence of the target sequence. At low temperatures, the MB will maintain its closed state; the fluorophore and quencher

1666

S. Lee and B. H. Kim

at the ends of the stem will be located in close proximity in the hairpin structure, and no fluorescence will appear. In contrast, at high temperatures, the helical order of the stem will give way to a random-coil configuration, separating the fluorophore from the quencher and restoring the fluorescence. Normally, when the temperature rises beyond the transition temperature of the closed form in the probe–target duplex, the closed MB probe melts to give a fluorescent random coil. If the temperature of the MB solution is below the melting temperature of the stem, the MB binds to the target sequence spontaneously; the result is dissociation of the stem and turning on of fluorescence. The intensity of the fluorescence can be used to track the fluorescence of the probe–target hybrid with respect to temperature. At low temperatures, the probe–target hybrid has bright fluorescence; increasing the temperature will cause the fluorescence to diminish, a result of dissociation from the target and a return to the hairpin state (Marras et al. 2002; Vet and Marras 2005). In addition, the environmental pH will affect the MB’s function (Tyagi and Kramer 1996); too high a value of pH will disrupt the stem portion, causing the MB to degenerate and leading to a false-positive result. MBs are normally functionalized with a single fluorescence unit. A change in fluorescence signifies that the probe has detected its sole target sequence in a homogeneous system. Accordingly, a range of other fluorophores can be introduced into different MB sequences to allow the analysis of various target sequences at the same time. For example, Tyagi and coworkers (Tyagi et al. 1998) used four types of MBs for allele detection. They found that the hairpin structure enabled the use of differently colored fluorophores. Also, when using diverse MBs designed to recognize different target sequences, they demonstrated that multiple targets could be discriminated in the same solution, even if they differed from one another by as little as a single nucleotide. Because the thermal stability of perfectly matched sequence was higher than single-base mismatched sequences, the MB could bind specifically with the fully matched target sequence and emit fluorescence at the corresponding maximum wavelengths. As a result, they demonstrated that MBs can emit fluorescence with very different colors, allowing these MBs to be used for the detection of several different targets in the same solution (Lee and Kim 2011). Thus, the specificity of identifying a single-base-mismatched DNA can be improved by using carefully designed MB probes and optimized hybridization conditions (Tan et al. 2000).

Advantages and Limitations of Molecular Beacons MBs can be used in a variety of applications. One of the main attractions of MBs is that their inherent mechanism of fluorescence signal transduction allows them to function as sensitive probes for real-time monitoring. In addition, MBs function with a low background signal, making them more suitable than other fluorescent probes for ultrasensitive analyses because they can exhibit fluorescent improvements of greater than 200-fold upon target hybridization under optimal conditions (Li et al. 2008). Because MBs can be detected without isolation, it is possible to use them in

53

Molecular Beacons With and Without Quenchers

1667

situations where it would be difficult or impossible to separate the probe–target hybrids from an excess of the nonhybridized probe. Other advantages are specificity and selectivity, arising from the unique stem–loop structure. The target specificity, even discriminating nucleic acid target sequences differing by a single nucleotide, can be applied in various biological environments, further expanding the application of MBs (Tan et al. 2005). Nevertheless, the applications of MBs also have some limitations. One is that the MBs can be degraded by endogenous nucleases, causing the stem to open and resulting in a false-positive signal. Similarly, nonspecific interactions of MBs with intracellular proteins can also disrupt the hairpin structure, creating false-positive signals. Furthermore, low sensitivity can result from the low brightness of the fluorophore-labeled MB, the high background signal from the closed structure, and autofluorescence from the cell. Finally, to detect mRNA in vivo, highly efficient MBs must be deliverable into living cells (Marras et al. 2002); this process is not trivial, and various MB modification strategies are being actively pursued.

Modifications of Molecular Beacons Since the first conventional MB was introduced in 1996, many research groups have attempted to improve its functions, resulting in a range of modified MBs featuring various designs, fluorophores, and quenchers. The main challenges have been to eliminate false signals, increase the specificity of MBs toward their targets, and improve the detection limits (Venkatesan et al. 2008). Combining MBs with other technologies – for example, molecular aptamer beacons (MABs) – can also improve their properties (Moutsiopoulou et al. 2019). MABs are molecular imaging and detection tools that can be used to visualize molecules by combining the selectivity and sensitivity of MBs with aptamer technologies. They can be designed using a variety of donor–quencher pairs, including fluorescent dyes, quantum dots (QDs), carbon-based materials, and metallic nanoparticles. The reporters can also be modified with, for example, bioluminescent proteins, to give higher sensitivity and better signal-to-noise ratios than those of the fluorescent dyes of conventional MBs. Changing the specific stem–loop structure of an MB can also improve its properties. Stemless beacons (Fig. 2a) might have sufficient signal-to-noise ratios for some in vivo applications. They are similar to conventional MBs in that the quenching of the fluorophore in the stemless beacon occurs through FRET (Venkatesan et al. 2008; Johansson et al. 2002). For this reason, to improve the activity, the fluorophore and quencher should be selected such that their spectra overlap as much as possible. A duplex comprising two single-stranded linear oligonucleotides (Fig. 2b) might also be useful for the detection of a specific nucleic acid in a homogeneous solution (Li et al. 2002; Kong et al. 2003). Because each single strand contains a fluorophore or a quencher, the structure is similar to that of an MB, but without a loop, such that the probe can function in a similar manner. This duplex probe can discriminate between perfectly matched and single-nucleotide mismatches and can be more target-specific than linear single-stranded probes. In

1668

S. Lee and B. H. Kim

Fig. 2 Modification of conventional MBs. (a) A stemless MB. (b) An MB lacking a loop structure. (c) An MB modified with super quenchers – in this case, a three-quencher molecular array

addition, such probes are less expensive, easily synthesized and purified, and can be used for the simultaneous detection of multiple targets in homogeneous solutions. It is also possible to improve the function of MBs through changes in the nature of the fluorophore and quencher units at the ends of the structure. Using super quencher (SQs), instead of normal quenchers, can decrease the background fluorescence intensity. Figure 2c depicts a super quencher structure in which a three-quencher molecular array is assembled at the terminus. Using this approach, a super quencher has delivered a quenching efficiency of 99.7% (Venkatesan et al. 2008; Yang et al. 2005). Here, hydrophobic interactions between the fluorophore and quenchers improved the thermostability of the probe; furthermore, the signal-to-background ratio was significantly improved as a result of a decrease in background fluorescence intensity. Tyagi and coworkers reported wavelength-shifting MBs having a structure containing two fluorophore groups (Fig. 3); the first (harvester) fluorophore absorbed strongly in the wavelength range of the monochromatic light source, while the second (emitter) fluorophore emitted at a desired wavelength, the result of FRET from the harvester fluorophore to the emitter fluorophore (Han et al. 2013; Tyagi et al. 2000). In the hairpin state, the rate of energy transfer from the internal harvesting fluorescence unit to the quenching group was much faster than that to the final fluorescence emission group. Because the wavelength-shifting MB had a large Strokes shift, it was significantly brighter than conventional beacons, which contain a fluorophore that cannot efficiently absorb energy from the available monochromatic light source. Accordingly, the enhanced signal intensity of the

53

Molecular Beacons With and Without Quenchers

1669

Fig. 3 Mechanism of a wavelength-shifting MB featuring two fluorophores (harvester and emitter) and one quencher. In the hairpin state, the rate of energy transfer from the internal harvesting fluorescence unit to the quenching group was much faster than that to the terminal fluorescence emission group. (Reprinted from Li et al. (2008), copyright 2008, with permission from Elsevier)

Fig. 4 Mechanism of dual-FRET MB. When the two MBs bound to a target molecule, FRET occurred through activation of the donor fluorophore and detection of the fluorescence emission at a wavelength characteristic of the acceptor fluorophore

fluorophore led to improved sensitivity. Moreover, this approach could be used to detect a variety of targets simultaneously in a single mixture, because differently colored MB probes could be introduced and excited at the same wavelength. With conventional MBs, false-positive signals can arise in living cells through degradation induced by nucleases or through opening of the hairpin structures upon nonspecific protein binding. To solve this problem, a dual-FRET MB has been prepared (Fig. 4) (Han et al. 2013; Mao et al. 2020; Santangelo et al. 2004), featuring a pair of FRET probes, two MBs, with each containing a fluorescence quencher and a fluorescence reporter. One of these fluorophores acted as a donor and the other as an acceptor. This dual-FRET MB was designed to have complementary sequences in adjacent regions of the same mRNA target, such that FRET would occur only when both beacons were hybridized to the target. Upon probe–target binding of the two

1670

S. Lee and B. H. Kim

Fig. 5 Principal scheme of indirect binding of MB probe to the analyte by using a binary approach. The probe consists of an MB probe and the two synthetic oligodeoxyribonucleotides m and f, which coexist in dissociated state in the absence of a DNA analyte (left). Addition of the specific nucleic acid analyte results in the formation of a quadripartite associate, in which MB probe adopts the open conformation (right). The complex is unstable if there is a mismatch base-pairing in the hybrid of analyte and strand m. (Reprinted from Gerasimova et al. (2010), copyright 2010 Wiley-VCH Verlag)

MBs, both fluorophores were located within the FRET range (ca. 6 nm). When the two MBs bound to a target molecule, FRET occurred through activation of the donor fluorophore and detection of the fluorescence emission at a wavelength characteristic of the acceptor fluorophore. Even if degradation of the probe or nonspecific opening of the probe were to occur, these processes were readily distinguishable from non-FRET false-positive signals, thereby decreasing the proportion of false-positive results and maintaining a low background signal and high specificity. In addition to modifying MBs, there are other methods for increasing the specificity of the MB. Because a conventional MB probe has a loop fragment that is complementary to the analyte, a unique probe is required for the analysis of each new analyte sequence. To avoid this limitation, Kolpashchikov and coworkers (Gerasimova et al. 2010) introduced a binding arm part to allow the ready detection of any sequence by a single MB probe. As depicted in Fig. 5, oligonucleotide strands m and f each contains a segment complementary to the MB probe (MB-binding arms) and a segment complementary to a nucleic acid analyte (analyte binding arms). Strands m and f are inexpensive synthetic oligonucleotides that can be used without purification. In the absence of the target, the MB probe exists in the hairpin state, because strands m and f are not bound to the MB probe. In the presence of the target, a quadripartite complex is formed, with the fluorophore separated from the quencher, resulting in a highly fluorescent signal. A 20 -O-methyl MB has displayed improved thermal stability and nuclease resistance, leading to decreased false-negative signals (Gao et al. 2020). When an MB having long-stem structures binds to its target, the stem part is converted into stickyends. For this reason, when the fluorescence donor and the acceptor approach each other, and FRET occurs, a false-negative signal appears, due to an unexpected decrease in fluorescence (Fig. 6). One way to prevent this behavior is to use nucleases (e.g., DNase I) that can remove the sticky-end pairing. In addition, a 20 -O-methyl MB, which is resistant to enzymatic digestion mediated by DNase I, can

53

Molecular Beacons With and Without Quenchers

1671

Fig. 6 Mechanism of sticky-end pairing, which is an unexpected problem of MBs. (Reprinted from Gao et al. (2020), copyright 2020, with permission from the Royal Society of Chemistry)

be used to eliminate unexpected cleavage of the MB, thereby improving the detection performance. Furthermore, fluorescent hairpin oligonucleotides that can function as MBs, even in the absence of the quencher moiety, have been proposed; such probing systems are known as QF-MBs.

Applications of Molecular Beacons Tyagi and Kramer developed the first conventional MB probe to detect a specific nucleic acid. Since then, the use of MB probes for genetic identification has led to advances in a variety of applications, including sensitive monitoring of PCRs, realtime detection of DNA–RNA hybridization in living cells, analysis of DNA mutation, and disease diagnosis. Recently, they have also been used in the design of biosensors for diagnostic and monitoring devices for point-of-care (POC) testing. MicroRNA (miRNA), a representative biomarker used in POC testing, can diagnose numerous diseases in readily available biological fluids (e.g., urine and blood). Various technologies for miRNA detection, including light-assisted molecular immobilization (LAMI) and electrochemiluminescence (ECL), are being developed (Gonçalves et al. 2019; Kerr et al. 2021). Furthermore, it is possible to use nanotechnology to improve the performance of functionalized biosensors, making them more powerful tools for various applications related to human health. Monitoring DNA/RNA amplification using real-time PCR is the most representative application of MBs (Tyagi and Kramer 1996). When the temperature is lowered to a condition that allows annealing of the primers, the probe will not fluoresce, due to the stem–loop hybrid, but a part of the molecule binds to the target amplicon and exhibits fluorescence. Hybridization of the MB to the target occurs only in the annealing step of every cycle, and the resulting fluorescence directly indicates the amplification rate that occurs in each annealing stem of the closed tube; that is, it is affected by the concentration of the amplicon. As the temperature increases, the MB itself dissociates; therefore, PCR monitoring is possible only at low temperatures. This approach has several advantages: the ability to rapidly

1672

S. Lee and B. H. Kim

process many samples at a time; sensitivity; and the amplified product can be checked in the same sealed tube with minimal risk of contamination. The use of a peptide nucleic acid (PNA)–DNA hybrid probe enables rapid PCR detection. This surface-immobilized probe detects PCR amplicons when adding a PCR mixture to a micrometer-well containing the immobilized probe and reading the resulting fluorescence. MB probes can be used to demonstrate hybridization of antisense oligodeoxynucleotides (ODNs) to complementary strands of mRNA and to monitor changes in mRNA in real-time in living cells. A combination of rational sequence design, efficient probe insertion, and selection of target mRNA sequences is very important for in vivo studies using MBs. This approach can also be used to detect early-stage cancer, by detecting mRNA originating from mutated genes in living cells. One of the most critical aspects of using synthetic probes to measure the intracellular levels of RNA molecules is the ability to deliver these probes into cells through the plasma membrane, which is quite lipophilic and restricts the transport of large or charged molecules. Indeed, the plasma membrane acts as a very robust barrier toward polyanionic molecules (e.g., hairpin oligonucleotides). Using MB technology, the metabolism of biological macromolecules in vivo (e.g., mRNA in cells) can be analyzed dynamically to investigate the process of transcription and other changes. This ability can provide important information concerning the temporal and spatial processing, localization, and transport of specific mRNAs under various conditions. Matsuo (1998) used MBs to visualize basic fibroblast growth factor (bFGF) mRNA in human trabecular cells. In addition, MBs have been microinjected into K562 human leukemia cells for the real-time visualization and detection of mRNA hybridization within living cells (Giesendorf et al. 1998). This possibility suggests the real-time visualization and determination of the localization potential of oligonucleotide–mRNA interactions. Furthermore, Chen and coworkers visualized single RNA transcripts in living cells by using MBs with minimal target engineering (Chen et al. 2017). In that study, the MBs were composed of 20 -O-methyl RNAs having a fully phosphorothioate-modified loop domain (2Me/PSLOOPMBs). An MB complementary to the repeat target sequence could detect single mRNAs engineered with 32 tandem repeats of the MB target sequence, with approximately 90% accuracy. Chen and coworkers quantified the impact of appending an MB-tag on the mobility of the target RNA, by using 2Me/PSLOOPMBs to compare the intracellular dynamics of engineering enhanced green fluorescent protein (EGFP) mRNA transcripts harboring MB-tags of different sizes in cells. The mean diffusion rate of the transcript containing 32 repeats (pEGFP-N1-32x RNA) was nearly 10 times slower than that of the transcript containing 8 repeats (pEGFP-N1-8x RNA) in both the nucleus and the cytoplasm, suggesting that the intracellular activities of target RNAs are less impeded by smaller engineered insertions. MB probes have displayed high selectivities in discriminating single-base mismatches, with high stabilities of complementary probe–target hybrids over a wide range of temperatures. Thus, when combined with DNA/RNA amplification technologies, MBs can become promising probes for genetic analysis. Mismatched hairpin probe duplexes are less stable than mismatched linear probe duplexes, but

53

Molecular Beacons With and Without Quenchers

1673

fully matched duplexes are more stable, making them ideal probes for investigating SNPs. For example, spectral genotyping is a gene mutation detection method that uses two molecularly labeled probes presenting different fluorophores (Kostrikis et al. 1998). Alleles can be identified by the color of the fluorescence generated in a sealed amplification tube. In addition, an MB probe has been used to detect point mutations in the methylenetetrahydrofolate reductase gene, associated with cardiovascular disease and neutral tube defects; analyses of an 81-bp region of the gene could reveal mutations conferring resistance to the antibiotic rifampin (Piatek et al. 1998). MBs are also suitable for use in determining the absence of a specific causative agent; therefore, they can be used to screen for effective antibodies against a specific strain of a causative organism. Some pathogens cannot grow on many selective media, requiring molecular approaches for identification. Although the heterogeneous method using amplified nucleic acid products is the technique most widely used (Marras et al. 2002), the homogeneous method using MBs is suitable for detecting common pathogens with high sensitivity and specificity.

Quencher-Free Molecular Beacons A QF-MB is a hairpin-structured fluorescent oligonucleotide containing one or multiple fluorophores attached at any part of the sequence, without any additional quencher. The fluorophores can be modified fluorescent nucleobases or fluorescent dyes. A QF-MB is another modification of the conventional MB; although it does not feature a quencher, the hairpin structure and the fluorescent unit are essential building blocks. In almost all QF-MBs, the entire loop, or part of it, is complimentary for hybridization with the target sequence. QF-MBs can be classified into two different types, according to their structure (Venkatesan et al. 2008; Lee and Kim 2011) (Fig. 7): (i) QF-MBs containing only one fluorophore at the middle or end

Fig. 7 Different types of QF-MBs. (a) Mono-labeled QF-MB containing a fluorophore at the middle of the loop. (b) Mono-labeled QF-MB containing a fluorophore at the end of the stem. (c) Dual-labeled QF-MB containing two different fluorophores. (d) Dual-labeled QF-MB containing two fluorophores of the same type

1674

S. Lee and B. H. Kim

position of the oligonucleotide and (ii) dual-labeled QF-MBs containing two fluorophores at the stem or ends of the hairpin structure. In the case of monolabeled QF-MBs, the fluorescence is controlled by neighboring DNA bases; thus, any structural change of oligonucleotide will affect the fluorescence (Fig. 7a, b). In dual labeled QF-MBs, two different kinds of fluorophores are attached to the hairpin oligonucleotides: One fluorophore is linked at the 50 -end and the other at the 30 -end (Fig. 7c). Figure 7d displays a system in which two fluorophores of the same kind are attached at the ends of the MB. None of these MBs feature a quencher moiety.

Mono-labeled Quencher-Free Molecular Beacons Natural purine and pyrimidine nucleobases cannot be introduced into sequence probing systems because they have poor fluorescent characteristics. To improve these properties, many efforts have been made to design modified nucleosides containing fluorescent nucleobase analogues (Venkatesan et al. 2008) (e.g., extended nucleobases and conjugated nucleobase analogues). As mentioned above, QF-MBs can be largely divided into mono-labeled QF-MBs and dual-labeled QF-MBs, according to the number of fluorophores. They can also be classified into various types of QF-MBs that exhibit different characteristics in terms of the types and locations of the fluorophores.

Quencher-Free Molecular Beacons with Fluorophore at the Loop or Middle of the Oligonucleotide Fluorene-labeled deoxyuridine (FlU, Fig. 8a) has a good quantum yield (absolute quantum yield in EtOH: 0.54) and is less bulky than pyrene, fluorescein, rhodamine, and coumarin derivatives. Kim and coworkers prepared QF-MBs containing the fluorescent nucleobase FlU in the middle loop of the hairpin DNA (Hwang et al. 2004a). They developed an oligonucleotide model system as a new type of MB, consisting of a six-base-pair stem and a seven-base loop sequence containing a FlU moiety (Fig. 8b). The fluorene unit was attached covalently to the C-5 position of deoxyuridine through a rigid conjugated ethynyl linker, using Sonogashira coupling; the FlU unit was subsequently introduced into the ODNs using the phosphoramidite method. Similar to a conventional MB, these QF-MBs were fluorescent hairpin DNAs, but their fluorophores were anchored at the loop, rather than at the stem end. Thus, another functional group could be introduced at the end position of the MBs. In general, substitution at the C-5 position of the pyrimidine ring and the C-8 position of the purine ring of natural nucleobases, or at the C-7 position of 7-deazapurines, does not disturb their ability to form nucleobase pairs in DNA duplexes. Kim and coworker’s QF-MB underwent DNA hybridization similar to that of its unmodified counterparts; thus, substitution with fluorene did not disturb the hybridization process. As target single strands, Kim and coworkers synthesized the fully matched sequence and a one-base-mismatched sequence (mutated from A to C). Hybridization of the MB in solution with its complementary and single-base-

53

Molecular Beacons With and Without Quenchers

1675

Fig. 8 (a) Chemical structure of FlU. (b) Schematic representation of a QF-MB. (c) Schematic representation of a QF-MB operated in the presence of GO. (Reprinted from Yi et al. (2011), copyright 2013, with permission from Elsevier)

mismatched strands occurred with a 2.2-fold enhancement and 0.15-fold decrease, respectively, in the emission intensity. Thus, the total discrimination factor for the recognition of a single (A/C) base mismatch was 14.7. The mechanism of action of this system involved uracil nucleobases being efficient quenchers for fluorophore emission in nucleosides and single-stranded oligonucleotides (Fig. 8b). Upon hybridization with the fully matched target sequence, hydrogen bonding and π-stacking of the matched base pair (FlU with A) resulted in enhanced fluorescence of the fluorene unit. The decrease in fluorescence intensity upon hybridization with single-base mismatches was due to more interactions with one or more neighboring nucleobases, as well as the quenching phenomenon of the uracil base (Podder et al. 2021; Hwang et al. 2004a). Various methods can improve the signal-to-background ratio, including the use of FRET, excimer signals, multivalent fluorophores and quenchers, fluorophore-labeled pyrrole G–C base-pairing, H-dimerization of phthalocyanine, and in-stem MBs containing threonine nucleotides. In their study, Kim and coworkers also used GO as an external quencher. In the absence of GO, the signal-to-background ratio was 2.2; thus, half of the emission intensity resulted from the incompletely quenched state, relative to the fully matched duplex. Because the incomplete quenching of QF-MBs might result in false-positive signals, their practical use requires enhanced signal-to-background ratios. As displayed in Fig. 8c, an optimized concentration of GO in the buffer solution led to a completely quenched state. The addition of the fully matched sequence to the QF-MB/GO solution led to regeneration of the fluorescence emission intensity and an increase in the signal-to-

1676

S. Lee and B. H. Kim

background ratio to 31.0 (Yi et al. 2011). This high signal-to-background ratio meant that the closed and open states could be clearly discriminated by the naked eye. Kim and coworkers also studied the photophysical properties of QF-MB containing different combination of triad-nucleobases (X FlU Y) or neighboring bases (NBs). The triad nucleobases were positioned at the loop of the oligonucleotide, where the fluorescent nucleotide was surrounded by different combinations of flanking nucleobases. From a study of the photophysical properties of the linear probe, prior to investigating the MB probe, the fluorescence intensity (λmax ¼ 425 nm) of the fully matched duplex was improved 3.4-fold relative to that of the single-stranded duplex, and it was easy to distinguish from the single-basemismatched duplex. In addition, linear probes containing pyrene-labeled deoxyuridine (PyU) also have excellent ability to discriminate between perfectly matched duplexes and single-base-mismatched duplexes. Furthermore, Hybeacon and base-discrimination fluorescent (BDF) probes also function in a manner similar to single-stranded linear probes: They undergo hybridization-induced changes in fluorescence intensity with the target oligonucleotides and can distinguish fully matched DNA sequences from mismatched sequences (Venkatesan et al. 2008). The hairpin strands and the linear single strands are weakly fluorescent prior to hybridization, but they become highly fluorescent when annealed to perfectly matched sequences. Moreover, the fluorescence intensities of the duplexes with single-base-mismatched DNAs are always lower than those of the perfectly matched duplexes (Hwang et al. 2004b). A study of probes containing different combinations of nucleobase (X and Y) and triad-nucleobase units (X FlU Y) revealed that the nature of the X and Y moieties significantly affected the emission properties (Ryu et al. 2007). When the fluorescence intensity of the probe X ¼ Y ¼ cytosine (C) was 1.00, the fluorescence intensity for X ¼ Y ¼ thymine (T) was the highest (1.01), with the lowest (0.24) occurring for X ¼ Y ¼ guanine (G). Thus, the interactions of the fluorene unit with the surrounding nucleobases in the probe and probe–target duplex significantly affected the emission properties of the probe, with G quenching (G-effect) playing an important role in the function. As a result, flanking abasic sites had no or negligible effect, whereas the presence of an abasic site in the complementary strand had a profound effect. Based on these studies, it was concluded that the fluorophore was quenched when its NBs were both G units. The BDF probes reported by Saito and coworkers contained different BDF nucleosides (Okamoto et al. 2004), including PyU, PyC, and PerU (see Fig. 9a for their structures). These probes usually exhibit an intense fluorescence signal that reports the presence of complementary nucleobase corresponding to the target DNA strand. The BDF nucleosides incorporating PyU and PyC could distinguish A and G units opposite a base, with sharp changes in fluorescence, because they exhibited fluorescence enhancement only when A and G residues were present in opposite strands. Fluorophores such as pyrene, anthracene, and perylene display strong fluorescence when exposed to aqueous environments. Thus, the polarity and hydrophobicity around the fluorophore play important roles in BDF probes. For example, the BDF probe incorporating PyU displayed A-selective fluorescence because it

53

Molecular Beacons With and Without Quenchers

1677

Fig. 9 (a) Structures of the BDF probes PyU, PyA, PyC, and PerU. (b) Operation of BDF probes in SNP typing in target DNA. The PyA residue experienced strong intercalation of its pyrene chromophore into the duplex

exhibited a strong emission at 397 nm, with a quantum yield of 0.203, when excited at 327 nm. In addition, it displayed G-selective fluorescence when excited at 329 nm, with strong fluorescence observed at 393 nm with a quantum yield of 0.147. The pyrene fluorophores PyU and PyC acted as antennas that were sensitive to the solvent polarity. Only when PyU and PyC units formed Watson–Crick nucleobase pairs via hydrogen bonding with A and G residues, respectively, did the pyrene fluorophore located on the outside of the duplex (a highly polar aqueous phase) exhibit strong base-selective fluorescence. On the other hand, when the PyU and PyC moieties did not form their complementary base pairs, then the pyrene fluorophore folded into a hydrophobic duplex, thereby quenching the fluorescence effectively (Okamoto et al. 2006). Thus, the pyrenecarboxamide chromophore was exposed to a more aqueous phase in the corresponding probe–target duplexes (Fig. 9b). In the case where a PyA unit was located in the middle of the loop structure, the fluorescence intensity of the perfectly matched duplex, with the BDF probe containing the nucleoside PyA and a T residue at the opposite site, was completely quenched. This drastic change in fluorescence was applied to the detection of T base located at a specific site on a target DNA. The pyrene unit in PyA was quenched through intercalation into the hydrophobic intraduplex. As depicted in Fig. 9b, the PyA residue experienced strong intercalation of its pyrene chromophore into the duplex. In addition, the fluorescence quantum yield of pyrenecarboxamide decreased upon decreasing the solvent polarity (Saito et al. 2004).

1678

S. Lee and B. H. Kim

Overall, nucleobase (mostly G) quenching has been the main factor affecting successful functioning of mono-labeled QF-MBs. Nucleobase quenching occurs in a fluorescent oligonucleotide with effective photoelectron transfer (PET) between the nucleobase and the florophore (Piestert et al. 2003; Heinlein et al. 2003). Comparing the four natural nucleobases (A, C, G, and T), G has the lowest oxidation potential (1.49 V vs. NHE) and A has the second lowest (1.96 V vs. NHE). The pyrimidine nucleobases T and C have a high oxidation potential of 2.10 V (vs. NHE). Thus, G, with its lowest oxidation potential, can be oxidized most readily, and PET from G to the fluorophore is greater than that to the other nucleobases. Because guanosine has excellent electron donating properties, it can efficiently suppress the emission of fluorescent dyes, contributing to the improvement in QF-MB function. This behavior results from the fluorescence intensity of G-containing hairpin or linear single strands close to the fluorophore being lowered as a result of the guanine quenching effect. Nucleobase quenching and changes in the microenvironment of the fluorophore have been the main factors affecting the optical properties of the fluorophores. Varying several factors – including the solvent polarity, hydrophobicity, pH of the medium, intercalation of the fluorophore, and nucleobase-pair degeneracy – can cause changes in the emission characteristics of the fluorophore.

Quencher-Free Molecular Beacons with Fluorophore at the Stem or Strand-end Sauer and coworkers prepared smart probes having the structures depicted in Fig. 10a (Heinlein et al. 2003). They attached various fluorophores (e.g., oxazine [MR 121] and rhodamine [R6G] derivatives) at the 50 -end of the guanosine-rich

Fig. 10 (a) Structure of smart probes and a T-FAM probe. (b) Structure of T-FAM. (c) Structures of the appended fluorophores

53

Molecular Beacons With and Without Quenchers

1679

hairpin ODNs, through C6 alkyl linkers, using classical N-hydroxylsuccinimidyl (NHS) ester chemistry. The whole or major part of the loop was complementary to the target DNA. Instead of taking advantage of interactions between two extrinsic probe units, here the interactions of a single fluorophore moiety with the DNA bases or amino acids allowed the specific detection of DNA or RNA sequences and antibodies at the single-molecule level. In these probes, fluorophores (e.g., oxazine and rhodamine derivatives) covalently linked to the hairpin ODN were quenched through PET from neighboring guanosine residues of G–C nucleobase pairs present in the stem. This approach took advantage of the low oxidation potential of the DNA base G and its tendency to aggregate in aqueous environments, thereby reducing the area accessible to water. As a result, it was possible for efficient intramolecular PET to occur in an excited state upon contact formation with guanosine, according to the reduction potential of the fluorophore. The electron transfer efficiency depended on several factors: the structure of the fluorophore, the location of the guanosine residue in the complementary stem, the attachment of additional overhang single-stranded nucleotides in the complementary stem, and the exchange of the guanosine residue by more potent electron donors (Heinlein et al. 2003) Controlling the number of guanosine units can improve the quenching efficiency. A stem containing for guanosine residues was the optimum structure required for efficient quenching in the probe. The quenching efficiency also varied depending on the type of fluorophore. In the case of the ODN end-capped with an MR121 unit, the fluorophore was almost coplanar with the last G–C nucleobase pair, and the system preferred quenching; in contrast, the fluorescence was not quenched for the ODNs modified with R6G and JA242 fluorophores. Moreover, the C-6 linker was an essential structural element, providing a suitable geometry to balance structural flexibility and efficient quenching. The relative fluorescence quantum yield of MR121 decreased to 0.2 with respect to that of the free dye. The double-stranded stem of the DNA-hairpin facilitated efficient electron transfer from the guanosine residue to the MR121 unit with a shallow distance-dependence. Moreover, overhanging single-stranded purine or pyrimidine nucleotides at the 30 -end also affected the quenching efficiency by influencing the geometry of the interaction between the fluorophore and the guanosine residue. The degree of quenching increased when the overhanging strand stabilized the stacking interactions. Substitution of the guanosine residue by a more strongly electron-donating nucleotide (e.g., a 7-deazaguanosine residue, which has a lower oxidation potential [0.9–1.0 V vs. SCE] than that of guanosine) enhanced the quenching. The self-quenched probe has also been prepared with labeling of a single fluorophore on a base close to the 30 -end, with no quencher required. Rashtchian and coworkers reported that it could be attached directly using a dye-phosphoramidite derivative (Nazarenko et al. 2002a). In a few other ODNs, a nucleoside analogue tagged with a dye (e.g., C5-fluorescein-dT) was incorporated, at any position of the blunt-ended stem or as an overhang unit, through the phosphoramidite method. Figure 10a, b displays a blunt-ended hairpin ODN featuring a fluorescein-modified thymidine derivative (T-FAM) incorporated at the 30 -end

1680

S. Lee and B. H. Kim

as the penultimate nucleoside unit. In T-FAM, the fluorophore is bound to the T unit at its C5-position through a C6 spacer. Linear and hairpin ODNs containing a fluorophore (T-FAM) residue near the 30 -end have been studied to examine the effects of neighboring nucleobases (or nucleobase pairs) and secondary structures on the optical properties of the ODNs (Nazarenko et al. 2002a; Nazarenko et al. 2002b). The fluorescence of the labeled linear ODN featuring a 30 -terminal C residue was quenched by 87% upon duplex formation. In contrast, addition of an A–T nucleobase pair at the end of the same duplex completely eliminated the quenching, whereas replacing that A–T nucleobase pair with another G–C nucleobase pair restored the quenching. The presence of purine nucleobases of nucleobase pairs decreased the fluorescence significantly. The linear ODNs featuring the dye attached close to the 30 -end and nucleobases G or C at the 30 -ends experienced up to 10-fold quenching of fluorescence upon hybridization. Notably, the quenching efficiency of hairpin ODNs featuring a blunt-end G–C or C–G nucleobase pair was more effective than that of the linear ODNs. Figure 11a depicts the structures of hairpin ODNs containing a pyrene-modified adenosine (PyA) or uridine (PyU) units at the 50 -end overhang position. The pyrene units were covalently linked to the nucleobases through rigid ethynyl linkers, using Sonogashira coupling. In these structures (Seo et al. 2005), the fluorophorecontaining nucleosides lacked any corresponding nucleosides in the opposite strand to form nucleobase pairs. These probes functioned based on fluorescence quenching through PET of nonpolar aromatic fluorophores that stack at the termini of the hairpin stem. Using fluorene and pyrene moieties as the nonpolar fluorophores enhanced the thermodynamic stability of the duplexes through terminal π-stacking, because of their high quantum yields and efficient planar aromatic stacking. As a result, the fluorescence in these hairpin ODNs could be quenched through PET with neighboring G, C, and T bases, but not with an A moiety (Fig. 11b). The main factors influencing the operation of these novel fluorescent oligonucleotides were (i) π-stacking between the pyrene-labeled 20 -deoxynucleotide units and neighboring bases and (ii) the stability of the hairpin stems that experienced PET. To study the NB effect, sequences were prepared featuring PyA and PyU fluorophores at the 50 - and 30 -ends. Various neighboring base pairs (NBPs) were

Fig. 11 (a) Structures of hairpin ODNs modified at the stem-end. (b) Schematic representation of a PET-induced QF-MB displaying fluorescence in the presence of its target RNA. (Reprinted with permission from Seo et al. (2005). Copyright 2003 American Chemical Society)

53

Molecular Beacons With and Without Quenchers

1681

stacked with the fluorescent nucleobases on the hairpin stems of these sequences (Nazarenko et al. 2002a). When the probes featured the PyA and PyU fluorophores at the 50 -end, the two sequences provided very different absorption and emission spectra. For the sequence with PyA, the change in fluorescence intensity at 420 nm was dramatic for the fully matched duplex and the single-base-mismatched duplex. In contrast, the sequence with PyU did not undergo a hybridization-induced change in fluorescence intensity. This behavior was the result of the different electronic properties of the nucleobases linked to the ethynyl pyrene fluorophore: A is an electron donor and U is an electron acceptor. In the case of the PyU-labeled sequence, the fluorescence of the pyrene fluorophore was quenched by the covalently linked uracil residue, in both the bound and unbound states. For the nonhybridized hairpin sequence labeled with a PyA residue, the fluorescence was quenched through electron interaction between the pyrene unit and the NBs. The stem was stabilized through π-stacking of the pyrene unit with terminal nucleobases. The fluorescence quenching was due to two factors: stable end stacking and strong electronic interaction between the π-stacking units. In the case of the probe without the terminal PyA unit, the stability of the hairpin decreased sharply to a level similar to that of a mismatched NBP hairpin; its melting temperature (Tm) was 47  C. Thus, the pyrene unit stabilized the hairpin through π-stacking with its NBP at the terminus of the stem. The formation of a stable nucleobase pairs was a key factor in ensuring effective SNP discrimination. If the NBPs were mismatched nucleobases, the degree of fluorescence quenching decreased, even when the NB was G. From studies of hairpin probes having various sequences, the ODN featuring a neighboring G–C nucleobase had the highest matched/mismatched discrimination factor. Nucleobase pairs at the 50 -terminus played a greater role in the quenching process than those at the 30 -terminus. In addition, the degree of quenching of the 50 -terminus nucleobase followed the order C > G > T > A, regardless of the presence of matched or mismatched NBPs in the hairpin state. Furthermore, quenching did not occur when the NB was A. Finally, when the pyrene unit was replaced by a fluorene unit (i.e., using the nucleosides FlA and FlU), the ODNs were inefficient probes because fluorene is smaller than pyrene and its π-stacking ability is weaker than that of pyrene. As a result, π-stacking between the pyrene-labeled 20 -deoxynucleotide unit and the NB and the stability of the hairpin stem experiencing PET were the main factors affecting the operation of these mono-labeled QF-MBs. There are several processes that can lead to a change in emission, including quenching from nucleobases, FRET, excimer/exciplex formation, and selfquenching of multiple fluorophores. It is also possible that quenching of QF-MBs can occur upon protonation. Asanuma and coworkers synthesized a simple QF-MB (Kashida et al. 2012), featuring a 7-hydroxycoumarin unit tethered onto D-threoninol unit, that functioned based on changes in its value of pKa. 7-Hydroxycoumarin has a quantum yield as high as 0.63 under basic conditions, but its fluorescence is quenched upon protonation, allowing the design of more sensitive probes. The 7-hydroxyclumarin unit was positioned near the phosphate anion of the complementary strand, between the base pairs. This hydrophobic environment favored

1682

S. Lee and B. H. Kim

protonation of the dye, resulting in an increase in the value of pKa of the dye upon hybridization. The signal-to-background ratio of this QF-MB, in which the 7-hydroxycoumarin unit was inserted in the middle of the stem, was 10.0. In the absence of the target, the stem part formed a duplex and the fluorescence of the MB was quenched because the dye was protonated. In the presence of the target, however, the MB opened and became single-stranded in the region of the 7-hydroxycoumarin unit, causing the dye to deprotonate and, thereby, inducing its emission. Thus, the fluorescence of the 7-hydroxycoumarin unit was quenched upon protonation, and the emission intensity of the MB in the presence of its target was 10-fold higher than that in the absence of its target.

Dual-Labeled Quencher-Free Molecular Beacons The main problems encountered with conventional MBs are poor sensitivity and residual fluorescence. To overcome them, several dual-labeled QF-MBs have been developed, including FRET-MBs, dimer–monomer switching MBs (DMS-MBs), and excimer–monomer switching MBs (EMS-MBs). These systems all feature two fluorophores, but no quencher moieties. FRET-MBs feature two fluorophores – a fluorescence acceptor (FA) and fluorescence donor (FD) – attached to the two ends of the stem (Zhang et al. 2001). To ensure effective FRET from FD to the FA, they should have overlapping emission spectra. In the absence of the target oligonucleotide, the MB exists in the stem-closed form and is excited at the absorption band of FD; the fluorescence of FD is quenched by FA, while that of FA is observed. In this case, FRET occurs, with efficient transfer of energy from FD to FA. When the MB forms a hybrid with the target, the energy transfer is decreased significantly (or eliminated) and the fluorescence of FD will increase, while that of FA will diminish or disappear. The changes in the two fluorescence intensities can be monitored, allowing the duplex to be distinguishable from the stem-closed probe. Figure 12 displays a representative FRET-MB featuring two fluorescent dyes; a 6-carboxyfluorescein (6-FAM) unit is attached at the 50 -end as FA; a coumarin unit is attached at the 30 -end as FD. To determine the sensitivity of the MB, the ratio of the fluorescence signals from the donor and acceptor, IFA/IFD, is measured before and after binding the target. According to the mechanism of action of the FRET-MB, the fluorescence intensity of FD is increased upon hybridization, while that for FA is decreased. Thus, the IFA/IFD ratio of the stem-open state after hybridization is always greater than that of the stem-closed state. The coumarin/ FAM MB could be used to detect the target accurately over a wide concentration range (from 1 nM to 1 μM). For comparison, a conventional coumarin/DABCYL MB having the same stem and loop sequence failed to yield a linear response when the concentration was greater than 10 nM. Thus, the sensitivity and dynamic range of the FRET-MB were greater than those of the conventional MB. It should be stressed, however, that spectral overlap should exist between the emission spectrum of the donor and the absorption spectrum of the acceptor when determining a suitable pair of fluorophores for such FRET probes.

53

Molecular Beacons With and Without Quenchers

1683

Fig. 12 Mechanism of FRET-MBs. When the MB forms a hybrid with the target, the energy transfer is decreased significantly (or eliminated) and the fluorescence of FD will increase, while that of FA will diminish or disappear. (Reprinted from Zhang et al. (2001), copyright 2001 Wiley-VCH Verlag)

In addition to the FRET-MB containing two fluorophores, FRET-MBs have also been prepared containing two or more separate fluorophores; for example, a hairpin ODN containing three fluorophores: FAM, N,N,N0 , N0 -tetramethyl-6-carboxyrhodamine (TMR), and cyanine-5 (Cy5) (Li et al. 2006). In this structure, the FAM unit was located at one end and the TMR and Cy5 fluorophores at the other. In this triple-FRET system, one fluorophore acted as the primary energy donor, while the other two acted as energy acceptors. For example, in the FRET-MB containing FAM, TMR, and Cy5 at the stem, the FAM unit acted as the primary light absorber and energy donor, the TMR unit acted as the primary energy acceptor and secondary energy donor, and the Cy5 unit acted as the secondary acceptor. When the MB existed in the stem-closed configuration, excitation of the FAM unit initiated an energy transfer cascade from FAM to TMR and then to Cy5, with the energy released as the fluorescence emission of Cy5. In the presence of the complementary DNA, the MB opened and hybridized with the target. The FAM unit and the TMR/Cy5 pair were now separated by a large distance in the stem-open form, thereby blocking energy transfer from FAM to TMR. As a result of opening of the MB, the fluorescence signal was now that of the selectively excited FAM donor. DMS-MBs feature fluorophores that form nonradiative dimers when located in close proximity. One such probe based on dye dimerization consisted of oligonucleotides labeled with MR121 fluorophores at both the 50 - and 30 -ends (Fig. 13a). Similar to a smart probe, this probe was self-quenched (Knemeyer et al. 2005). Because the MR121 dye has a tendency to form nonfluorescent H-type dimers, the fluorescence intensity of the hairpin ODN was manifold lower than that of the respective free dyes. In the presence of the target, the stem of the DMS-MB opened and the nonradiative H-dimer switched to two monomer units, thereby exhibiting a strong fluorescence signal. The rigid double helix that formed upon hybridization separated the two dye units, thereby leading to a 10-fold increase in fluorescence intensity. Operating in a manner similar to that probe, a self-quenched intramolecular dimer (SQuID) MB has also been prepared as a type of DMS-MB (Conlet et al. 2007).

1684

S. Lee and B. H. Kim

Fig. 13 (a) Types of DMS-MBs. (b) Chemical structure of DCDHF

Fig. 14 Mechanism of DMS-MBs and EMS-MBs. (Reprinted with permission from Fujimoto et al. (2004), copyright 2004 American Chemical Society)

This system was homo-dual-labeled with the NHS ester functionalized dicyanomethylenedihydofuran (DCDHF) fluorophore (Fig. 13b). Here, quenching occurred in the closed hairpin conformation, due to an excitonic interaction between the two fluorophores, which formed a nonemissive H-dimer. This probe exhibited 97% quenching; addition of the complementary ODN resulted in a large increase in fluorescence intensity at 650 nm. Compared with conventional MBs, DMS-MBs have several attractive features: almost no residual fluorescence, greater signal-to-noise ratios, single-pot labeling, and visible calorimetric detection of targets with at least two-fold signal enhancement. EMS-MBs feature two fluorophores of the same kind, attached at the 50 - and 0 3 -ends of hairpin ODNs. Fujimoto and Inouye prepared EMS probes (Fujimoto et al. 2004) in the form of hairpin oligonucleotides containing two pyrene units, one at each end. Pyrene is a simple aromatic hydrocarbon that can undergo various chemical modifications; it displays monomer and excimer emissions depending on its concentration. In this probe, the pyrene units were positioned parallel to each other in close proximity. Because of this spatial arrangement, the pyrene units emitted only their excimer fluorescence. The design of EMS-MBs is similar to that of FRET-MBs, but the fluorescence signal is generated by a different mechanism. In

53

Molecular Beacons With and Without Quenchers

1685

the absence of the target oligonucleotide, the EMS-MBs existed in the stem-closed form (Fig. 14). The fluorophores were positioned close enough to form an excimer, resulting in the maximum excimer emission. In the presence of the target oligonucleotide, the EMS-MB hybridized with the target and existed in linear form. As a result of this conformational change, the excimer emission decreased while that of the monomers increased. Thus, the monomer-to-excimer intensity ratio, which was close to zero prior to hybridization, underwent a manifold increase upon binding to the target. In the EMS-MB of the Inouye group, the pyrene excimer emission (IEXIM) appeared at 498 nm and the monomer emission (IMONO) appeared at 382 nm. Initially, the IMONO/IEXIM ratio was 0.2; after addition of target, it was 20. The spectral changes were clearly evident even at a concentration of 1 nM. The sensitivity of this EMS-MB was much higher than that of the FRET-MB featuring 50 -FAM and 30 -DABCYL fluorophores. Kim and coworkers synthesized dual-labeled quencher-free hairpin ODNs containing pyrene-appended adenosine (PyA) and uridine (PyU) units in three different combinations: PyU–PyA, PyU–PyU, and PyA–PyA (Seo et al. 2007a). As mentioned above, the fluorescent nucleobase PyA and PyU provide a diverse range of emission intensities when they form duplexes together. Unlike other dual-labeled QF-MBs, Kim and coworkers incorporated the PyA and PyU residues into oligonucleotides such that the residues were positioned in complementary locations on opposite strands in the middle positions of the hairpin stem. Accordingly, it was possible to modify the structures at the ends of the sequence. This practical feature of the free ends in EMS-MBs was demonstrated by attaching a cholesterol moiety at the 50 -end of the ODN. The cellular permeability of the cholesterol-linked ODN was enhanced significantly relative to that of the free ODN (Seo et al. 2006). As depicted in Fig. 15, these QF-MBs displayed various photophysical properties. The two probes featuring PyU–PyA and PyA–PyA pairs underwent aromatic stacking of their opposing pyrene units in the hairpin stem. Their fluorescence spectra featured strongly red-shifted emission bands. These two probes functioned in a manner similar to EMS-MBs having pyrene units at their 50 - and 30 -ends, but they displayed clearly distinct emission intensities upon hybridization with matched

Fig. 15 Mechanism of QF-MB operation with a significant color change upon interacting with its target sequence. (Reprinted from Seo et al. (2007a), copyright 2006, with permission from Elsevier)

1686

S. Lee and B. H. Kim

and mismatched ODNs. In case of the PyU–PyA and PyA–PyA probes, the color of their aqueous solutions was green, due to excimer emission; the color changed to blue, due to monomer emission, upon duplex formation with the fully matched sequence. Furthermore, the PyA–PyA probe could discriminate the corresponding one-base-mismatched duplex (green color), so this system could be used as an effective color-changing MB. In contrast, the spectrum of the PyU–PyU probe exhibited quenching, with no change in color or fluorescence wavelength even in the presence of the one-base-mismatched ODN. It became highly fluorescent, however, upon addition of fully matched target strand. Therefore, this PyU–PyUbased system could be used as an “on/off” MB. Thus, in the PyU–PyU probe, the orientation of the two pyrene units did not allow them to form a pyrene excimer. In contrast, single-stranded linear oligodeoxyadenylates incorporating two PyA-units could be used as probes to study their self-duplex formation, using time-resolved fluorescence spectroscopy (Seo et al. 2007b). Wengel and coworkers studied QF-MBs based on pyrene excimer fluorescence in which two pyrene units (pyrene-unlocked nucleic acid [UNA] monomers, Fig. 16a), one being in an excited state and one in the ground state, form a complex (Karlsen et al. 2013). The relaxation of this complex was accompanied by a fluorescence emission (λmax) near 480 nm and a relatively long fluorescence lifetime (30–60 ns) when compared with the autofluorescence of cellular extracts (ca. 7 ns). This QF-MB exploited the pyrene excimer emission, but provided a positive signal readout based on pyrene monomer fluorescence. Figure 16b reveals that when the system existed in the closed state, excimer formation was prevented because the pyrene moieties in the MB were dissociated, due to the formation of the doublestranded stem region ensuring separation of the pyrene moieties. In the open state formed in the presence of the target sequence, however, the ends of the MB allowed the formation of pyrene excimers in the 50 -single-stranded region. Because the excimer fluorescence of this MB was not quenched by nucleobases or other moieties in the native state, but rather it was hindered structurally by separating the pyrene units in the stem–loop structure, this MB was indeed a quencher-free system.

Fig. 16 (a) Molecular structure of RNA and UNA. (b) Design and proposed mechanism of the UNA-MB. (Reprinted from Karlsen et al. (2013), copyright 2013, with permission from Elsevier)

53

Molecular Beacons With and Without Quenchers

1687

Applications of Quencher-Free Molecular Beacons Similar to the applications of MBs in, for example, real-time PCR monitoring, genetic analysis, in vivo RNA detection, DNA mutation analysis, POC testing, and disease diagnosis, the superior performance of QF-MBs also leads to their application in various fields. In real-time PCR assays, QF-MB can be used to monitor the accumulation of specific DNA amplicons. For example, Hybeacons are hybridization probes consisting of a single-stranded oligonucleotide containing a fluorophore attached to a nucleoside within the DNA sequence (Dobson 2003). When the probe anneals to a complementary target and forms a duplex, the fluorescence increases. Because Hybeacons do not require enzymatic activation, they can be used in conjunction with rapid PCR cycling conditions. Nevertheless, SNP typing using Hybeacons relies on differences in the vales of Tm of matched and mismatched Hybeacon probe duplexes. Hence, two alleles having the same value of Δ Tm cannot be differentiated. Rashtchian and coworkers used a fluorogenic hairpin ODN incorporating a T-FAM residue as a primer for multiplex quantitative real-time PCR experiments (Nazarenko et al. 2002b). The primer maintained a blunt-end hairpin structure that exhibited relatively low fluorescence at temperatures below the value of Tm. The signal increased when the primer was linear, reaching its maximum when the primer was incorporated into the double-stranded DNA. For this hairpin ODN, the primer fluorescence increased up to eightfold when forming a PCR product, because it provided specificity for the desired target ODN in the PCR product by forming a primer-dimer and mispairing. Thus, fluorogenic mono-labeled primers can be efficient and cost-effective alternatives to conventional FRET-labeled oligonucleotides. QF-MBs also display high selectivity in discriminating single-base mismatches from fully matched ODNs. Because many of the above-mentioned QF-MBs exhibit sharp increases in fluorescence upon hybridization, they can discriminate fully matched DNA from one-base-mismatched DNA; accordingly, they can be used as fluorescent probes in nonenzymatic SNP typing. These probes can be used to visually discriminate perfectly matched from mismatched DNAs, with distinct colors appearing for the probes and their duplexes with target DNA stems. Smart probes presenting terminal fluorophore units have also been used for target identification and matched/mismatched discrimination. In Sauer’s study of a 30 -biotinylated hairpin ODN immobilized on a streptavidin-coated silica surface, the availability of the 30 -end made it possible to immobilize the smart probe on the solid surface (Piestert et al. 2003). Although the fluorescence intensity of the immobilized smart probe was very low, in the presence of micromolar concentrations of complementary oligonucleotides, the fluorescence intensity increased fourfold within a short period of time (ca. 5 min). Thus, the system facilitated the identification of specific DNA or RNA sequences when using very low concentrations of the targets. Smart probes have also been employed in the specific and sensitive detection of pathogenic DNA sequences in microsphere-based heterogeneous assays (StÖhr et al. 2005). Measuring the fluorescence intensity of a short biotinylated target DNA of Mycobacterium xenopi immobilized on

1688

S. Lee and B. H. Kim

streptavidin-coated silica microspheres at different concentrations, the target DNA was readily identified, even when its concentration was as low as 1011 M, through comparison with the intensity of the probe/nonspecific target DNA mixture. The pyrene excimer fluorescence obtained from pyrene-labeled UNA probes has been used in the detection of RNA in living cells (Karlsen et al. 2013). Pyrene excimer fluorescence is a useful tool in diagnostic applications because it can be detected selectively in biological assays when fluorescence is measured after cellular autofluorescence has decayed. The probe was designed to bind a stretch on the endogenously generated circular RNA target ciRS-7, which regulates a series of important proteins, including A-synuclein (involve in the development of Parkinson’s disease). To test the UNA-MB design in a cell culture, the interaction was examined between the UNA-MB hybridization sites (and as many as 27 sites if three mismatches are allowed). A confocal laser scanning microscope equipped with a 405-nm laser was used to excite the MB. Cells expressing ciRS-7 appeared as dot-shaped fluorescent signals in the cytoplasm; such signals were not apparent in the mock treated cells. A low level of autofluorescence appeared in the cytoplasm and in the mock-treated cells, presumably originating from protein. Thus, it was possible to activate a UNA-MB in cells expressing the circular RNA target. To develop an in vivo biosensor from a QF-MB system, Kim and coworkers introduced a cholesterol unit at the 50 -terminus of a probe featuring a PyA– PyA pair, to enhance the cellular delivery of the modified MB relative to conventional transfection methods (Seo et al. 2006). MBs can be useful probe systems for detecting molecular dynamics, but their bulky and polar anionic backbones make transfection into cells difficult. The cholesterol-linked MB, however, could efficiently enter living cells without requiring other transfection factors. Comparing the behavior of the probe with the two PyA units attached and that of the probe with the cholesterol moiety attached to the 50 -end PyA unit revealed that the photophysical signal of the system shifted from 509 to 445 nm. The strong red-shifting of this band originated from the stacking of the two PyA units at the 50 -end of the hairpin. In addition, in vivo experiments using Huh7 cells revealed that, during the first 12 h, the cholesterollinked ODN exhibited a very strong signal in the cytoplasm; the transfection efficiency was much higher than that of the cholesterol-free ODN. The fluorescence generated when using the transfection reagent lipofectamine and the cholesterol-free ODN was similar to that of the cholesterol-linked ODN. Because the cholesterollinked transfection could avoid the endocentric pathway, the number of falsepositive signals resulting from nuclease degradation had decreased. In addition, Kim and coworkers also constructed a simple and efficient quencherfree molecular aptamer beacon (QF-MAB) for probing adenosine triphosphate (ATP) (Park et al. 2015). ATP is one of the most important compounds in living organisms, supplying energy for various life activities (e.g., muscle contraction, conduction of excitation, and material synthesis) and serving as an indicator of cell damage. The ATP levels in cells are related to many diseases, including angiocardiopathy and Parkinson’s disease. Thus, its detection can facilitate biochemical studies and clinical diagnoses (Ma et al. 2016). In this ATP-probing system, the QF-MAB exploited the structural transition that occurs when ATP aptamer binds to

53

Molecular Beacons With and Without Quenchers

1689

Fig. 17 Schematic representation of a QF-MAB interacting with ATP. (Reprinted from Park et al. (2015), copyright 2015, with permission from Elsevier)

its target. Kim and coworkers used fluorescently modified versions of the ATP-binding DNA aptamer sequence as loops of the MBs to recognize ATP; they elongated the stems with three G–C base pairs to increase the stability of the hairpin structure. As their name implies, QF-MABs do not contain any quencher, but they do feature one or more fluorophore units somewhere in the hairpin sequence (Park et al. 2018). In the absence of ATP, the fluorescence of the fluorophore unit, incorporated on the ATP-binding aptamer sequence and residing in the loop of the QF-MAB, was quenched by the neighboring nucleobases; upon binding with ATP, however, the hairpin of the QF-MAB underwent a conformational change to a folded structure, such that the fluorophore unit was exposed to the solvent, thereby increasing its fluorescence (Fig. 17).

Conclusion Since they were first reported in the literature in 1990s, MBs have assumed an increasingly prominent role in many important bioanalytical fields. The use of MB probes for genetic identification has led to many advances in research and in a variety of applications, including sensitive monitoring of PCRs, real-time detection of DNA–RNA hybridization in living cells, DNA mutation analysis, POC testing, and disease diagnosis. To improve the performance of MB systems, many research groups have developed a range of modified MBs with varied designs, fluorophores, and quenchers. The functions considered for improvement include eliminating false signals, increasing the specificity of MBs toward their targets, and enhancing detection limits. Moreover, biochip technologies based on MBs can be used for high-throughput screening (Marras et al. 2002). Although MBs have been adopted in other diverse fields by applying nanotechnology, confocal laser technology, and other advanced technologies, they still find limited applicability in the field of disease diagnosis. MBs can function as sensitive probes for real-time monitoring

1690

S. Lee and B. H. Kim

through their unique mechanism of fluorescence signal transduction; their low background signals make it easy to use them in ultra-sensitive analysis. In addition, high selectivity and target specificity make it easy to distinguish different nucleic acid target sequences from those with a single base mismatch, allowing MBs to be applied in various biological environments. Limitations of MB systems include false-positive signals generated by endogenous nuclease degradation (or interactions with intracellular proteins), high background signals of the closed structures, and low sensitivity resulting from cell autofluorescence. To improve their performance, various modified MB systems have been developed, including wavelength-shifting MBs, dual-FRET MBs, and QF-MBs. In particular, QF-MBs have played important roles in nucleic acid analysis. They can sensitively identify perfectly matched target sequences, even when mismatched DNA is present in excess (Venkatesan et al. 2008). Mono-labeled QF-MBs, where a fluorophore unit is attached at only one end of a single strand, can readily be immobilized onto solid surfaces through the free end, allowing them to be used as primers for real-time quantification of PCR products with high sensitivity. Unfortunately, QF-MBs also have several disadvantages: the similarity of the fluorescence signals of nonhybridized QF-MBs and their mismatched duplexes, nonspecific guanine quenching, and the limited number of fluorophores whose fluorescence can be quenched by nucleobases. Therefore, more efforts will be needed to minimize these shortcomings and, thereby, boost the applicability of QF-MBs. The in vivo applications of QF-MBs should also be explored further. New fluorophores should be developed providing superior quantum yields, emission maxima, and interactions with nucleobases. Through continued improvements in performance and applicability, we suspect that MBs and QF-MBs will soon function as highly specific and sensitive recognition and signaling elements in biological detection strategies.

References Bonnet G et al (1999) Thermodynamic basis of the enhanced specificity of structured DNA probes. Proc Natl Acad Sci U S A 96:6171–6176 Broude NE (2002) Stem-loop oligonucleotides: a robust tool for molecular biology and biotechnology. Trends Biotechnol 20:249–256 Brown LJ et al (2000) Molecular beacons attached to glass beads fluoresce upon hybridisation to target DNA. J Chem Soc Chem Commun:621–622. https://doi.org/10.1039/B000389L Chen M et al (2017) A molecular beacon-based approach for live-cell imaging of RNA transcripts with minimal target engineering at the single-molecule level. Sci Rep 7:1550 Chen J et al (2020) Recent advances in fluorescence resonance energy transfer-based probes in nucleic acid diagnosis. Anal Methods 12:884–893 Conlet NR et al (2007) Bulk and single-molecule characterization of an improved molecular beacon utilizing H-dimer excitonic behavior. J Phys Chem B 111:7929–7931 Dobson N (2003) Synthesis of HyBeacons and dual-labelled probes containing 20 -fluorescent groups for use in genetic analysis. Chem Commun:1234–1235. https://doi.org/10.1039/ B302855K

53

Molecular Beacons With and Without Quenchers

1691

Drake TJ, Tan W (2004) Molecular beacon DNA probes and their bioanalytical applications. Appl Spectrosc 25:269–280 Dubertret B et al (2001) Single-mismatch detection using gold-quenched fluorescent oligonucleotides. Nat Biotechnol 19:365–370 Fang X et al (1999) Designing a novel molecular Beacon for surface-immobilized DNA hybridization studies. J Am Chem Soc 121:2921–2922 Fujimoto K et al (2004) Unambiguous detection of target DNAs by excimer–monomer switching molecular beacons. J Org Chem 69:3271–3275 Gao J et al (2020) 20 -O-methyl molecular beacon: a promising molecular tool that permits elimination of sticky-end pairing and improvement of detection sensitivity. RSC Adv 10: 41618–41624 Gerasimova YV et al (2010) A single molecular Beacon probe is sufficient for the analysis of multiple nucleic acid sequences. Chembiochem 11:1762–1768 Giesendorf BAJ et al (1998) Molecular beacons: a new approach for semiautomated mutation analysis. Clin Chem 44:482–486 Goel G et al (2005) Molecular beacon: a multitask probe. J Appl Microbiol 99:435–442 Gonçalves OSL et al (2019) Detection of miRNA cancer biomarkers using light activated molecular beacons. RSC Adv 9:12766–12783 Han SX et al (2013) Molecular beacons: a novel optical diagnostic tool. Arch Immunol Ther Exp 61:139–148 Heinlein T et al (2003) Photoinduced electron transfer between fluorescent dyes and guanosine residues in DNA-hairpins. J Phys Chem B 107:7957–7964 Huang J et al (2015) Biosensing using hairpin DNA probes. Rev Anal Chem 34:1–27 Hwang GT et al (2004a) A highly discriminating quencher-free molecular Beacon for probing DNA. J Am Chem Soc 126:6528–6529 Hwang GT et al (2004b) Fluorescent oligonucleotide incorporating 5-(1-ethynylpyrenyl)-20 deoxyuridine: sequence-specific fluorescence changes upon duplex formation. Tetrahedron Lett 45:3543–3546 Jackson M et al (2018) The genetic basis of disease. Essays Biochem 62:643–723 Johansson MK et al (2002) Intramolecular dimers: a new strategy to fluorescence quenching in dual-labeled oligonucleotide probes. J Am Chem Soc 124:6950–6956 Karlsen KK et al (2013) A quencher-free molecular beacon design base on pyrene excimer fluorescence using pyrene-labeled UNA (unlocked nucleic acid). Bioorg Med Chem 21: 6186–6190 Kashida H et al (2012) Quencher-free molecular beacon tethering 7-hydroxycoumarin detects targets through protonation/deprotonation. Bioorg Med Chem 20:4310–4315 Kerr E et al (2021) Amplification-free electrochemiluminescence molecular beacon-based microRNA sensing using a mobile phone for detection. Sensors Actuators B Chem 330:129261 Knemeyer JP et al (2005) Self-quenching DNA probes based on dye dimerization for identification of mycobacteria. Int J Environ Anal Chem 85:625–637 Kong DM et al (2003) Duplex probes: a new approach for the detection of specific nucleic acids in homogeneous assays. Anal Chim Acta 491:135–143 Kostrikis LG et al (1998) Spectral genotyping of human alleles. Science 279:1228–1229 Lee IJ, Kim BH (2011) Labeling oligonucleotides toward the biomedical probe. In: Zhang LH, Chattopadhyaya J (eds) Medicinal chemistry of nucleic acids. Wiley, Hoboken, pp 292–334 Li Q et al (2002) A new class of homogeneous nucleic acid probes based on specific displacement hybridization. Nucleic Acids Res 30:e5 Li X et al (2006) Combinatorial fluorescence energy transfer molecular beacons for probing nucleic acnd sequences. Photochem Photobiol Sci 5:896–902 Li Y et al (2008) Molecular beacons: An optimal multifunctional biological probe. Biophys Res Commun 373:457–461 Lu CH et al (2010) Increasing the sensitivity and single-base mismatch selectivity of the molecular Beacon using graphene oxide as the “nanoquencher”. Chem Eur J 16:4889–4894

1692

S. Lee and B. H. Kim

Ma D et al (2016) DNA-based ATP sensing. Trends. Anal Chem 77:226–241 Mao S et al (2020) Recent advances in the molecular beacon technology for live-cell singlemolecule imaging. iScience 23:101801 Marras SAE et al (2002) Efficiencies of fluorescence resonance energy transfer and contactmediated quenching in oligonucleotide probes. Nucleic Acids Res 30:E122 Matsuo T (1998) In situ visualization of messenger RNA for basic fibroblast growth factor in living cells. Biochim Biophys Acta 1379:178–184 Moutsiopoulou A et al (2019) Molecular aptamer beacons and their applications in sensing, imaging, and diagnostics. Small 35:1902248 Nazarenko I et al (2002a) Effect of primary and secondary structure of oligodeoxyribonucleotides on the fluorescent properties of conjugated dyes. Nucleic Acids Res 30:2089–2095 Nazarenko I et al (2002b) Multiplex quantitative PCR using self-quenched primers labeled with a single fluorophore. Nucleic Acids Res 30:e37 Okamoto A et al (2004) Pyrene-labeled base-discriminating fluorescent DNA probes for homogeneous SNP typing. J Am Chem Soc 126:4820–4827 Okamoto A et al (2006) Simple SNP typing assay using a base-discriminating fluorescent probe. Mol BioSyst 2:122–127 Park JW et al (2015) Quencher-free molecular aptamer beacons (QF-MABs) for detection of ATP. Bioorg Med Chem Lett 25:4597–4600 Park Y et al (2018) Facile conversion of ATP-binding RNA aptamer to quencher-free molecular aptamer beacon. Bioorg Med Chem Lett 28:77–80 Piatek AS et al (1998) Molecular beacon sequence analysis for detecting drug resistance in Mycobacterium tuberculosis. Nat Biotechnol 16:359–363 Piestert O et al (2003) A single-molecule sensitive DNA hairpin system based on intramolecular electron transfer. Nano Lett 3:979–982 Podder A et al (2021) Fluorescent nucleic acid systems for biosensors. Bull Chem Soc Jpn 94: 1010–1035 Ryu JH et al (2007) Triad base pairs containing fluorene unit for quencher-free SNP typing. Tetrahedron 63:3538–3547 Saito Y et al (2004) Base-discriminating fluorescent (BDF) nucleoside: distinction of thymine by fluorescence quenching. Chem Commun:1704–1705. https://doi.org/10.1039/B405832A Santangelo PJ et al (2004) Dual FRET molecular beacons for mRNA detection in living cells. Nucleic Acids Res 32:e57 Seo YJ et al (2005) Quencher-free, end-stacking oligonucleotides for probing single-base mismatches in DNA. Org Lett 7:4931–4933 Seo YJ et al (2006) Cholesterol-linked fluorescent molecular beacons with enhanced cell permeability. Bioconjug Chem 17:1151–1155 Seo YJ et al (2007a) Quencher-free molecular beacon systems with two pyrene units in the stem region. Tetrahedron Lett 47:4037–4039 Seo YJ et al (2007b) Self-duplex formation of an APy substituted oligodeoxyadenylate and its unique fluorescence. J Am Chem Soc 129:5244–5247 Sokol DL et al (1998) Real time detection of DNA∙RNA hybridization in living cells. Proc Natl Acad Sci USA 95:11538–11543 Song S et al (2009) Gold-nanoparticle-based multicolor nanobeacons for sequence-specific DNA analysis. Angew Chem Int Ed 48:8670–8674 Stöhr K et al (2005) Species-specific identification of mycobacterial 16S rRNA PCR amplicons using smart probes. Anal Chem 77:7195–7203 Tan W et al (2000) Molecular beacons: a novel DNA probe for nucleic acid and protein studies. Chem Eur J 6:1107–1111 Tan L et al (2005) Molecular beacons for bioanalytical applications. Analyst 130:1002–1005 Tan X et al (2014) Label-free molecular beacons for biomolecular detection. Anal Chem 86: 10864–10869

53

Molecular Beacons With and Without Quenchers

1693

Tsourkas A et al (2001) Structure–function relationships of shared-stem and conventional molecular beacons. Nucleic Acids Res 30:4208–2415 Tyagi S, Kramer FR (1996) Molecular beacons: probes that fluoresce upon hybridization. Nat Biotechnol 14:303–308 Tyagi S et al (1998) Multicolor molecular beacons for allele discrimination. Nat Biotechnol 16: 49–53 Tyagi S et al (2000) Wavelength-shifting molecular beacons. Nat Biotechnol 18:1191–1196 Venkatesan N et al (2008) Quencher-free molecular beacons: a new strategy in fluorescence based nucleic acid analysis. Chem Soc Rev 37:648–663 Vet JAM, Marras SAE (2005) Design and optimization of molecular beacon real-time polymerase chain reaction assays. In: Herdewjin P (ed) Methods in molecular biology. Oligonucleotide synthesis: methods and applications, vol 288. Humana Press, Totowa, pp 273–290 Yang CJ et al (2005) Molecular assembly of superquenchers in signaling molecular interactions. J Am Chem Soc 127:12772–12773 Yang R et al (2008) Carbon nanotube-quenched fluorescent oligonucleotides: probes that fluoresce upon hybridization. J Am Chem Soc 130:8351–8358 Yeh HY et al (2010) Molecular beacon–quantum dot–Au nanoparticle hybrid nanoprobes for visualizing virus replication in living cells. Chem Commun 16:3914–3916 Yi JW et al (2011) Quencher-free molecular beacon: enhancement of the signal-to-background ratio with graphene oxide. Bioorg Med Chem Lett 21:704–706 Zadeh JN et al (2011) Software news and updates NUPACK: analysis and design of nucleic acid systems. J Comput Chem 32:170–173 Zhang P et al (2001) Design of a molecular beacon DNA probe with two fluorophores. Angew Chem Int Ed 40:402–405 Zheng J et al (2015) Rationally designed molecular beacons for bioanalytical and biomedical applications. Chem Soc Rev 44:3036–3055 Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415

Part VII Nanotechnology and Nanomaterial Biology of Nucleic Acids

54

Gene Nanovector for Genome Therapy Dejun Ma and Zhen Xi

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiplex Gene Regulation at Different Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Gene Regulation Toolbox for Genome Therapy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gene Rescue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gene Silencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gene Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gene Activation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RNA Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gene Read-Through . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exon Skipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aptamer and Riboswitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Catalytic Nucleic Acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Gene Vectors for Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Delivery Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Form of Gene Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Presumptive Model of Archimedes Solid-like Nanostructures Assembled from Branch-PCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Design of ASN-TO Gene Nanovector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Size-Tunability of ASN-TO Gene Nanovector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Applications of ASN-TO Gene Nanovector in Genome Therapy . . . . . . . . . . . . . . . . . . . . . . . . The Gene Overexpression of ASN-TO Gene Nanovector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Gene Silencing of ASN-TO Gene Nanovector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Genome Editing of ASN-TO Gene Nanovector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiplex Gene Regulation of ASN-TO Gene Nanovector for Cancer Therapy . . . . . . . . . . Prospects of ASN-TO Gene Nanovector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Systemic and Targeted Delivery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stimulus-Responsive DNA Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Network Target-Based Genome Therapy with Co-Branch PCR Perspective . . . . . . . . . . . . .

1698 1701 1701 1701 1701 1702 1703 1703 1704 1704 1704 1705 1706 1706 1706 1713 1715 1715 1717 1718 1720 1722 1723 1726 1726 1727 1727

D. Ma · Z. Xi (*) State Key Laboratory of Elemento-Organic Chemistry and Department of Chemical Biology, National Engineering Research Center of Pesticide (Tianjin), College of Chemistry, Nankai University, Tianjin, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_60

1697

1698

D. Ma and Z. Xi

Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1728 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1729

Abstract

Gene therapy is a potential approach to deal with complicated diseases that are now untreatable. A critical step in the effective gene therapy is the successful delivery of therapeutic genes into target cells. The fast development of gene regulation toolboxes and diverse gene vectors greatly increase the likelihood of delivering functional gene modules to multiple organs and expand the therapeutic scope of complex diseases. In the design of the optimal gene vectors for therapeutic applications, the gene loading capacity, shape, size, and safety are critical factors to be considered. In this chapter, the design, construction, function, applications, and future prospects of a new type of nonviral gene nanovector with Archimedes solid-like truncated octahedron nanostructures (ASN-TO) constructed from polymerase chain reaction based on branched primers (branch-PCR) were described. ASN-TO gene nanovector exhibits its unique feature in the improvement of high DNA loading capacity, serum stability, particle size controllability, and long-lasting bioactivity, especially multiplex gene regulation. ASN-TO gene nanovector offers us new opportunity to develop the strategy of genome therapy through integrating various kinds of gene regulation toolboxes (gene overexpression, gene silencing, gene editing, etc.) as an all-in-one vector, fine-tuning gene expression in multiple levels and synergistically adjusting network targets in multiple signaling pathways, which is different from single gene therapy and would potentially expand therapeutic applications. For complex diseases involving multiple genes, ASN-TO gene nanovector would hopefully act as a chromosome-like payload to carry diverse artificially designed gene regulation toolboxes to synergistically regulate the network targets for genome therapy. Keywords

Plasmid DNA · Linear DNA · Nano DNA · Viral vector · Nonviral vector · Gene therapy · Gene nanovector · Branch-PCR · Targeted delivery · Gene capacity · Serum stability · Size-tunability · Multiplex gene regulation · Cancer therapy · Archimedes solid · Tumor suppressor genes (TSR) · Oncogenes · Co-branch PCR · Genome therapy · Network target

Introduction As widely known, the central dogma guides the sequential flow of genetic information based on the general rule that DNA is converted to RNA and then into proteins in living cells. A biological process involves the unfolding of chromatin DNA information to decode the genetic information into messenger RNA and translate them into an active protein to execute the biological function. The central dogma tell

54

Gene Nanovector for Genome Therapy

1699

us that the decoding processes of chromatin into function proteins are wellcoordinated through the stringent genetic regulation at multiple levels: chromatin unfolding, DNA transcription, posttranscription, mRNA translation, and posttranslation. The dysfunctional decoding processes in each level may convert the physiological balance into the pathological disorder, thereby causing a myriad of hereditary disease and complex diseases such as metabolic diseases and cancers. For the genetic disorder mediated diseases, the concept of gene therapy started to arise, and then more kinds of gene regulation tools were also exploited in the treatment of a wide range of genetic disorders. With hundreds of potential gene candidates, gene therapy provides more promising opportunities for the cure of complex diseases like rare genetic diseases, blood disorders, cardiovascular diseases, neurodegenerative diseases, and cancers. In early stages of gene therapy, the replacement, deletion, and repair of a single disease-causing gene have been developed and validated as the effective therapeutic approach to restoring the gene function. Until now, there are 11 gene therapy products, 5 siRNA drugs, 9 antisense oligonucleotides, 1 aptamer, and 2 mRNA vaccines against SARS-CoV-2 approved by the US Food and Drug Administration (FDA), the European Medicines Agency (EMA), or the National Medical Products Administration (NMPA) in China (Fig. 1). Furthermore, the fast emerging companies are now competing to develop various gene therapy products to treat different complex diseases and gradually move forward to the clinical trials.

Fig. 1 The approved medicine categories used in nucleic acid therapy

1700

D. Ma and Z. Xi

There were two basic approaches including ex vivo gene therapy and in vivo gene therapy. Ex vivo gene therapy focused on the modification of cells extracted from patients and circulating them back into patients for cell therapy, which was mostly limited to blood diseases like chimeric antigen receptor T cells (CART). In vivo gene therapy was a direct delivery of new or modified genes into the body with the help of gene delivery systems like viral vectors and lipids to help treat a disease. Despite the great progress, still many hurdles need to be overcome to transform gene therapy into the clinical use. In retrospect of the history of gene therapy, the original strategy of gene therapy was delivering a functional gene into the disease-associated cells having a missing or dysfunctional gene, which was still used in current treatment. In 1990, two young girls with severe immunodeficiency for the lack of an essential enzyme (adenosine deaminase) were treated with the engineered virus for the delivery of the normal genes to express the missing enzyme (Das et al. 2015). The reduced symptoms showed the efficiency of this treatment and stimulated the flurry following of gene therapy trials. Unexpectedly, in 1999, 18-year-old Jesse Gelsinger died after the treatment with the modified virus, which evoked a strong inflammatory response to form a dangerous blood-clotting disorder for the following organ failure (Somia and Verma 2000). In 2003, patients with immunodeficiency also deteriorated into leukemia. The severe risks in treated patients forced the researchers to rethink the safety of the gene delivery vectors, especially the viral vectors. In contrast with the previously used viral vectors, numerous kinds of viral vectors such as adenoassociated virus (AAV), retrovirus, lentivirus, adenovirus, and oncolytic virus have been massively engineered to overcome this safety issue in gene delivery. All these viral vectors showed their advantages and limitations in different clinical settings. The repeated modification to capsid protein of AAV vectors endowed the broadspectrum organ-targeted delivery. The simplified lentivirus vectors harbored the larger gene capacity (8 kb) than AAV vectors (4.7 kb) and retained the long-lasting gene expression with a reported minimized carcinogenic risk (Kubo et al. 2010). After a decade of rethink, gene therapy reignited the hope, when the cancer immunotherapy with T cells delivered with chimeric antigen receptor (CAR) genes by retroviral vectors has selectively killed tumors from leukemia and lymphoma (Lipowska-Bhalla et al. 2012). Although this strategy did not deliver the functional gene to replace the dysfunctional gene in affected cells, it improved the tumor recognition and tumor killing capacity of T cells through the delivery of new additional genes. Since the approval of several CAR-T cell therapy products, the definition of gene therapy was further extended, which was not limited to the traditional delivery of functional genes for the substitution of dysfunctional genes. After decades of development, the combination of the emerging gene delivery methods and gene-operation technologies made gene therapy go far beyond single gene causing illnesses and move to multiple gene-associated disorders, which showed the unlimited potential for future gene therapy in a more personalized, precise, safe, and efficient manner, especially when all the gene regulation tools targeting DNA and RNA are rationally integrated as a comprehensive toolbox and loaded into chromatin-like payloads to mimic the chromosome-mediated gene decoding process for disease therapy. Therefore, we termed this artificial

54

Gene Nanovector for Genome Therapy

1701

chromosome-like gene network regulation at multiple levels with different tools simultaneously on the central dogma as genome therapy. We believe that the era of genome therapy will come.

Multiplex Gene Regulation at Different Levels The Gene Regulation Toolbox for Genome Therapy According to the central dogma, gene expression can be stringently regulated in multiple levels – chromatin unfolding, DNA transcription, posttranscription, mRNA translation, posttranslation – which has been utilized to form different gene regulation technologies. For genome therapy, all of these gene regulation technologies could constitute an integrative toolbox through combining multiplex gene regulation at different levels (DNA-RNA-protein). In order to constitute the genome therapy concept, we need to understand the available regulation tools with potential therapeutic application. The commonly used gene regulation technologies for possible genome therapy use are briefly reviewed as follows for our further discussion.

Gene Rescue The most used methods in gene therapy are the rescue of the function of the damaged gene through the overexpression of the functional gene from the plasmid DNA or virus vectors. In 2003, p53 gene therapy called Gendicine was approved by the National Medical Products Administration (NMPA) for the treatment of head and neck cancer (Zhang et al. 2018). Gendicine was composed of the recombinant human adenovirus expressing wild-type p53 protein as the tumor suppressor to induce cancer cell apoptosis. Encouraged by p53 gene therapy, more kinds of tumor suppressor genes such as PTEN, BRCA1/2, and Retinoblastoma (RB) have been exploited to combat different cancers (Liu et al. 2015a, b). For short-term gene expression, messenger RNA (mRNA) can also be directly delivered into cells to express the target proteins for the intended use like mRNA vaccine for the precaution of infections and cancers (Zhang et al. 2021).

Gene Silencing RNA interference (RNAi) is a natural process of gene silencing in eukaryotes, which involves the processing of double-stranded RNA into short interfering RNA (siRNA) by Dicer ribonuclease and the recruitment of Argonaute proteins (AGO1, AGO2, AGO3, or AGO4) and other auxiliary proteins to form RNAi-induced silencing complex (RISC), followed by the translational or transcriptional repression (Sioud 2021). The small functional RNA usually expressed as siRNA, microRNA (miRNA), and piRNA. Since the discovery of RNA interference (RNAi) in higher

1702

D. Ma and Z. Xi

eukaryotes, RNAi-based gene silencing with siRNA has been exploited to downregulate the abnormally hyperactive disease-causing mRNA expression (Fire et al. 1998). Until now, lipid nanoparticles (LNP) and N-acetylgalactosamine siRNA conjugates (GalNAc) have been the major delivery approaches to siRNA delivery. The approval of Patisiran and Givosiran by the US FDA dramatically stimulated the fast development of siRNA drugs into many diseases other than rare genetic diseases. Similarly, natural microRNA backbones were also exploited as diverse artificial microRNAs (amiRNA) to mediate the specific mRNA degradation, such as pre-miR-155, pre-miR-30a, and pre-miR-21 (Lam et al. 2015). Different from RNAi, antisense oligonucleotides (ASO) can be also used to knock down the mRNA through the hybridization of the target mRNA to induce RNase H-mediated mRNA cleavage and then block protein translation (Orr and Dorr 2005). As the discovery of Cas13 in recognizing RNA target, Cas13-based RNA cleavage induced the specific mRNA degradation and gene silencing. Relying on the transcriptional repressors like the Krüppel-associated box (KRAB) or histone deacetylase 1 (HDAC1), a fusion of Cas9 and repressors was directed by guide RNAs to bind the target promoter for the block of recruiting transcription factors or epigenetic editing to shut down the transcription for gene silencing (Liu et al. 2021).

Gene Editing As the fast development of site-specific nucleases to cleave DNA, the nucleasebased genome editing including ZFN, TALEN, and CRISPR/Cas9 is widely developed to introduce gene insertion, gene deletion, base substitution, base transversion, and gene recombination. In addition to the sequence-based alteration, the epigenetic editing to DNA with the fusion of CRISPR/Cas9 nuclease to chemically modifying enzymes can be also achieved to regulate the gene expression through the introduction of different epigenetic modifications. The clinical trials of in vivo genome editing as the novel gene therapy approaches have been going on the way with the great enthusiasm. Among these striking technologies, the discovery and utilization of CRISPR-Cas system have fueled the enthusiasm of precisely editing genes without the substitution of dysfunctional genes with the functional genes. As an immune defense system in bacteria, CRISPR/Cas9 system was identified to specifically cleave the foreign DNA sequences from invading viruses, finally destroying the invading viral genome (Weinberger et al. 2012). The successful trial of gene editing induced by CRISPR/Cas9 system in mammal cells promoted a variety of new gene intervening technologies based on CRISPR/Cas9 system, which were also moved from labs to clinical trials for gene therapy (Jinek et al. 2012; Cong et al. 2013). In 2021, Intellia Therapeutics and Regeneron Pharmaceuticals have announced the first clinical trial of in vivo human TTR gene editing to treat patients with transthyretin amyloidosis (ATTR amyloidosis) through the lipid nanoparticle to deliver a single guide RNA targeting human TTR and a human codon-optimized mRNA sequence encoding Streptococcus pyogenes Cas9 protein to the liver (Gillmore et al. 2021). With the discovery and directed evolution of more kinds of

54

Gene Nanovector for Genome Therapy

1703

CRISPR-associated nucleases, Cas9 base editors and prime editors have made the DSB-free and template-free gene editing possible and more applicable to unsolved genetic diseases.

Gene Activation Contrary to RNAi, RNA activation (RNAa) is also exploited to stimulate the expression of therapeutic mRNA in the low-to-moderate level through the transcription activation. As same as siRNA in length, small activating RNAs (saRNA) were designed to target the gene promoters or other regulated regions to enhance gene transcription. The majority of saRNA were discovered in different cancer cell types such breast cancer, prostate cancer, and liver cancer. For prostate cancer, dsEcad-215 targeting E-cadherin and dsP21–322 targeting p21 were validated to be effective in enhancing gene expression in PC-3 cells (Place et al. 2010). Currently, a kind of saRNA (MTL-CEBPA) inducing the expression of CEBPα has been clinically tested in phase I to treat advanced hepatocellular carcinoma (HCC), which greatly encourages the pursuit of gene activation for cancer therapy (Sarker et al. 2020). In another way, a fusion of Cas9 and activators like VP16 and VP64 was also directed by guide RNAs to bind the target promoter for recruiting transcription factors to initiate the transcription for gene expression (Polstein et al. 2015).

RNA Editing RNA editing involves the direct alteration of base in mRNA for the correction of point mutation and the epigenetic modification. The direct base editing relies on the deamination of A or C base. The conversion of cytidine to uridine and adenosine to inosine (decoded as guanosine) will rewrite the mRNA codon for protein translation. Based on different types of endogenous deaminases, multiple site-directed RNA editing technologies have been successfully developed with ADAR-recruiting RNAs such as GluR2-ADAR, RESTORE (Recruiting Endogenous ADAR to Specific Transcripts for Oligonucleotide-mediated RNA Editing), CIRTS (CRISPR-CasInspired RNA Targeting System), and LEAPER (Leveraging Endogenous ADAR for Programmable Editing of RNA) (Khosravi and Jantsch 2021). The fusion of deaminase domain to dCas13 also established several CRISPR-based RNA editing methods including REPAIR (RNA Editing for Programmable A to I Replacement), RESCUE (RNA Editing for Specific C to U Exchange), and CURE (C to U RNA Editor) (Lo et al. 2022). On the other hand, the mRNA modifications like m6A play an important role in regulating mRNA stability, mRNA export, mRNA translation, and mRNA splicing at the posttranscriptional level. The site-specific epigenetic modification of mRNA can be accomplished through the fusion of methyltransferase writer or eraser (FTO and ALKBH5) with dCas9. The programmable m6A editing further extended the scope of RNA editing.

1704

D. Ma and Z. Xi

Gene Read-Through In 11% rare genetic diseases, a stop codon induced by single nucleotide mutation caused the premature termination of protein synthesis. Technologies of promoting gene read-through based on the introduction of amino acids into the stop codon have been developed. The gene codon expansion technology offered the opportunity to modify aminoacyl tRNA synthetase and transfer RNA (tRNA) to install unnatural amino acids into the stop codon to make the full-length protein to restore gene function (Shi et al. 2022). Furthermore, suppressor tRNA has obtained wide attention for its small size and reliance on endogenous aminoacyl tRNA synthetases. Suppressor tRNA is a mutated tRNA in the anticodon with the normal function of ligating with the specific amino acid. The direct delivery of the suppressor tRNA for protein read-through provides a simple and efficient therapy, which has been successfully delivered by recombinant adeno-associated virus (rAAV) to read through the premature termination codons for protein synthesis (Wang et al. 2022).

Exon Skipping Exon skipping was beneficial to relieving the disease symptom like Duchenne muscular dystrophy (DMD) when interfering with the mRNA splicing process to produce less toxic proteins. Antisense oligonucleotide was developed to target different regions like exon 51 to rescue partial function of dystrophin (Dzierlega and Yokota 2020). Furthermore, CRISPR/Cas9 system has been also validated to be effective in exon skipping through the introduction of mutation at the interface between exon and intron (Mou et al. 2017).

Aptamer and Riboswitch Similar to antibody, aptamers are usually short, single-stranded DNA or RNA (ssDNA or ssRNA) with a variety of shapes made from helices and loops folded by the single strand. The versatile conformations contribute to the selectivity and specificity of binding the target, which can be any peptides, proteins, carbohydrates, small molecules, metabolites, and living cells. The target recognition and binding capacity of aptamers strictly depend on their three-dimensional structure rather than the primary sequence, which are usually affected by the environment conditions including temperature, pH, salt, and other interfering surfactants. Due to the specificity of target binding, aptamers have been exploited for the diagnostic and therapeutic applications. For example, Pegaptanib, a 27 nt RNA aptamer, was the first therapeutic aptamer approved by the US Food and Drug Administration (FDA) for the treatment of neovascular (wet) age-related macular degeneration (AMD) in 2004. Furthermore, aptamers have been also conjugated to small molecule drugs, proteins, nucleic acids, or nanoparticles for cancer therapy through the specific binding to the receptors on the cell membrane and endocytosis.

54

Gene Nanovector for Genome Therapy

1705

Riboswitch is also a kind of aptamer, which was naturally found in untranslated regions of mRNAs. The riboswitch structure contains two mutually exclusive domains (aptamer domain and expression platform) partitioned by switch sequence, which will bind to the element in mRNA for transcriptional or translational regulation depending on whether the ligand binds to the aptamer domain (Barrick and Breaker 2007). According to different kinds of ligands, the common riboswitches include purine riboswitch, S-adenosylmethionine (SAM) riboswitch, thiamine pyrophosphate (TPP) riboswitch, lysine riboswitch, flavin mononucleotide (FMN) riboswitch, magnesium riboswitch, glmS (ribozyme) riboswitch, pre-Q1, cyclic-di-GMP riboswitch, and so on (Garst et al. 2011). In the presence of a small molecule ligand, the riboswitch in the mRNA could in cis fine-tune the mRNA transcription and translation in a dosage-dependent manner.

Catalytic Nucleic Acids Catalytic nucleic acids were subdivided into three categories: ribozyme, DNAzyme, and aptazyme. Unlike aptamers only harboring the binding capacity, catalytic nucleic acids could mediate the sequence-specific catalytic reaction including cleavage and ligation. As naturally occurring RNAs, ribozymes are widely involved in RNA processing and mRNA translation such as hammerhead ribozymes, the VS ribozyme, leadzyme and the hairpin ribozyme, group I introns, RNase P RNA, and hepatitis delta virus ribozyme (HDV). DNAzymes can be chemically synthesized after the in vitro selection processes, which can bind and cleave RNA targets to reduce target mRNA abundance (Liu et al. 2017). Besides mRNA mediated gene downregulation, ribozyme-based RNA circularization provides an efficient avenue to preparing more stable circular RNA aptamer, circular ADAR-recruiting RNAs, and circular mRNA vaccine for RNA-based therapy. As a special ribozyme, the catalytic activity of aptazyme can be controlled by the ligand concentration. An aptazyme is composed of an aptamer domain and a ribozyme or DNAzyme, which are linked by a communication module. Aptazyme resembles allosteric ribozyme, which is regulated by the aptamer-ligand interaction. Based on this principle, aptazymes have been utilized as gene switch to control the gene expression. Furthermore, the fusion of aptazyme with guide RNAs also exhibited its advantages in ligand-controlled CRISPR/Cas9-mediated genome editing, base editing, and transcriptional activation in mammalian cells (Tang et al. 2017), which expanded the application scope of ribozymes. The above-discussed nucleic acid tools may constitute a toolbox for combined applications if we can put them into one vector. Although all the above gene regulation tools showed the substantial potential in therapeutic applications, the delivery was still the greatest barrier for all these gene regulation tools to show their efficiency. Of course, for an efficient genome therapy, a proper way of delivery must be found. The present delivery tools for nucleic acid are briefly summarized in the following part to further enhance our realization of genome therapy concept.

1706

D. Ma and Z. Xi

The Gene Vectors for Delivery The Delivery Approaches The successful delivery of therapeutic nucleic acids into target cells is a critical step to execute the biological function for the cure of illnesses. There are diverse delivery strategies like physical delivery (microinjection, electroporation, particle bombardment, etc.), viral delivery (adeno-associated virus, adenovirus, lentivirus, etc.), and nonviral delivery (liposomes, polyplexes, polymers, gold particles, lipid nanoparticle, etc.). These delivery methods had their own unique advantages for different applications. Exemplified by delivering CRISPR/Cas9 components, microinjection can be suitable to any forms of CRISPR/Cas9 cargos including DNA, mRNA, or RNP and has great success in the delivery into cells of interest for physical delivery. However, it is difficult, time-consuming, and usually used for in vitro single cell injection. For AAV delivery, albeit with the highly efficient delivery and multiple tissue targeting, the gene capacity of AAV genome was less than 5 kb, which could not carry the full CRISPR/Cas9 cargos in a single AAV vector unless the much smaller SaCas9 was used (Ran et al. 2015). For nonviral delivery, the lipid nanoparticles, liposomes, and lipoplexes are the main tools for loading any forms of CRISPR/Cas9 cargos including DNA, mRNA, or RNP (Gillmore et al. 2021). Although various delivery methods were well established, they all need nucleic acids (DNA or RNA) as gene vectors, which were still the major gene cargo forms for gene therapy.

The Form of Gene Vectors In gene delivery approaches, gene regulation tools rely on gene vectors to carry the genetic information to execute their function in living cells. In order to understand the potential characteristics of gene vectors for future genome therapy application, we will further discuss the development status and the improvable direction of the available gene vectors to help to find a feasible solution to construct a suitable gene vector for genome therapy. The current gene vectors in gene therapy can be generally subdivided into plasmid DNA, linear DNA, viral vectors, and nano DNA.

Plasmid DNA The genetic information of a gain-of-function gene can be stored as DNA and mRNA, which could be used to express the functional proteins. Owing to the lower stability of mRNA than DNA, the design of DNA vector and RNA vector is different. As a customizable DNA vector, three constitutive elements (promoter, open reading frame, terminator) are required to ensure the basic mRNA transcription initiation and transcription termination. In addition to these three indispensable elements, the regulatory element such as the TATA box, CAAT box, GC box, initiator, transcription start site, Kozak sequence in the 50 -untranslated region, and polyadenylation signaling site in the 30 -untranslated region should be also added to

54

Gene Nanovector for Genome Therapy

1707

maximize the mRNA transcription and protein translation efficiency. Sometimes, the addition of the insulator sequence, the enhancer sequence, the scaffold/matrix attachment regions, and the intron sequence is also introduced to improve the stability of DNA vectors in cell nucleus. Generally, the DNA vectors are derived from plasmid DNA, which can selfamplify in bacteria and can be purified in a large scale. Since the earlier clinical trials of plasmids for exogenous protein expression in 1990, plasmids for the gene therapy have been widely exploited and optimized for several decades. The greatest advantage of plasmids for gene therapy was the large-scale productivity, longer storage life, and lower immunogenic risks than viral vectors and RNA vectors. Furthermore, gene operation can be easily achieved in the plasmid backbone for the innovative plasmid redesign. The potential risks of gene integration into the chromosome by viral vectors can be also avoided with plasmids showing the form of covalently closed circular DNA (cccDNA) and can be compatible with other payloads for in vivo delivery (Hardee et al. 2017). Until now, approximately 200 cases have been tested in clinical trials for the therapy of rare genetic diseases, cancers, metabolic diseases, and infectious diseases. Though the significant advantages, there are still some aspects for plasmid DNA to be optimized. When extracted from bacteria, plasmids are not the unique and pure product, which include bacterial endotoxin contaminants and diverse forms of DNA such as supercoiled DNA, the open relaxed DNA in different topological states, the linear DNA, and the nicked DNA. The biological studies have validated the most efficient form of plasmids as the supercoiled DNA (Schleef and Schmidt 2004). It requires more sophisticated purification and encapsulation of supercoiled DNA to maintain the structure stability of plasmids in different batches to avoid the unwanted single-stranded DNA nicking or double-stranded DNA breaks. Otherwise, the efficiency of gene transfer and gene expression would be interfered to a great extent and also the risk of genome integration by the cleaved DNA fragments added. Since the plasmids require the bacterial elements like origin of replication (pOri) for replication and antibiotic resistance-encoding genes (ampicillin, kanamycin, etc.) for the positive selection to acquire the large amount propagation in bacterial cells, these prokaryotic sequences from bacterial elements might exhibit the significant influence on eukaryotic gene expression and innate immune response in mammal cells. Notably, the long length of the extragenic DNA flanking the expression cassette contributed to the plasmid-mediated transgene silencing, while the extragenic DNA including pOri and antibiotic resistance-encoding genes also brought about the potential risks of transferring these genes to gut bacteria in the human body for the high frequency use of antibiotics (Williams et al. 2009). Furthermore, the highly frequent motifs of unmethylated cytosine-phosphate-guanine (CpG) dinucleotides in plasmids can readily activate Toll-like receptor 9 (TLR9) to increase the immunostimulatory activity (Cornelie et al. 2004). Owing to this immunostimulatory characteristic, unmethylated CpG motifs have been intentionally added to plasmids for the preparation of DNA vaccines. According to the abovementioned modifiable aspects, the precise redesign of more efficient and safe plasmids requires the multiple sequence modifications for the

1708

D. Ma and Z. Xi

future clinical applications through directly pruning unnecessary sequences, adding the stabilizing sequences and shortening the plasmid length. Previous studies demonstrated that the plasmid transfection efficiency could be improved when the total length of plasmids was shortened (Florian et al. 2021). To increase the likelihood of clinical usages, the removal of antibiotic resistance-encoding genes should be prioritized to the other sequence operations, which lost the antibiotic selection and required the addition of antibiotic-free selection method. Exemplified by pORT plasmids, the strategy of operator repressor titration was performed to compete with the repressor binding site upstream of the essential genes in chromosome DNA when plasmids normally propagated in bacterial cells (Mignon et al. 2015). Another strategy of constructing nonantibiotic plasmids was the establishment of survival complementation between plasmids and chromosomal DNA in bacterial cells. pCOR plasmids harbored a conditional origin of replication (ori-γ) for replication in producing cells expressing the π initiator protein. This kind of plasmids had decreased immunostimulatory side effects and also exhibited higher expression activity than commonly used plasmids (Soubrier et al. 1999). To completely remove the extra bacterial sequence flanking the target gene expression cassette, the sitespecific recombinases (phage λ integrase, Flp recombinase, Cre recombinase, etc.) were expressed in bacterial cells to divide the plasmids into two parts: mini-circle DNA containing the therapeutic gene expression module and the left parts carrying the origin of replication and antibiotic genes (Kay et al. 2010). However, the preparation process of mini-circle DNA was complicated and should be repeatedly treated through the stepwise purification to improve the purity, which increased the processing time and cost. From the point of cell transfection efficiency, the minicircle DNA showed the higher cell transfection and expressed more proteins than the normal plasmids. With the loss of extra bacterial antibiotic genes, the safety in the clinical application can be ensured. Compared to the normal plasmids, the formation of shorter mini-circles contributed to the increase of the negative supercoils and further compressed the mini-circle DNA into the compact structure with the smaller size, which have been demonstrated to be effective in defending against the serum and also adding the more probability of entering into the cell nucleus.

Linear DNA Although plasmids shared the obvious advantages, the potential risks from the incomplete removal of endotoxin, the prevalent existence of unmethylated CpG motifs and bacterial sequences would also add the purification cost and time for the safe plasmid product. As the cost and time of in vitro DNA synthesis and DNA assembly were gradually reduced, the non-plasmid DNA for gene therapy acquired more attention. The most frequently used non-plasmid DNA can be the linear DNA. The term “linear” denotes the physical form of DNA amplicons produced by polymerase chain reaction (PCR). The direct use of linear DNA for gene therapy has provided an alternative way other than the traditional applications of bacterially prepared plasmids. Linear DNA exhibited several unique advantages for clinical applications:1) the easier synthesis and simpler purification than viral vectors, 2) lower risks of unwanted immunogenicity than plasmids and viral vectors, 3) less

54

Gene Nanovector for Genome Therapy

1709

preexisting immunity and allergic response to a synthetic DNA than viral vectors, 4) feasible incorporation of longer DNA sequences in multigene cassettes, 5) no bacterial contaminants or bacterial sequences of replication origin and antibiotic resistance genes, and 6) more approaches to the chemical modifications. Linear DNA holds potential to serve as a viable alternative to plasmids and viral vectors for therapeutic gene delivery, while it still has lower serum stability and lower transfection efficiency than plasmids and viral vectors, which requires the combination of DNA nanotechnology for the highly ordered assembly (Munoz-Ubeda et al. 2011).

Viral Vector For hard-to-transfect cell types, viral vectors are often employed to induce the transient or permanent foreign gene expression with the transfection efficiency over 90%. In clinical applications, viral vectors have been widely engineered to mediate the in vitro, ex vivo, and in vivo gene delivery. Unlike plasmids and linear DNA, viral vectors are the whole system including the viral gene sequence and the auxiliary delivery vehicle, which show the uniformed diameter and highly specific cell-targeting capacity. As known, adenoviruses, adeno-associated viruses, and lentiviruses/retroviruses have become main viral vectors for gene therapy after decades of directed evolutionary modification in viral genome engineering and capsid modifications. A wide range of diseases including infectious diseases, different types of cancers, genetic disorders, and other organ-associated disorders have been tested with viral vectors in the preclinical studies. The categories of viral delivery systems and the genes for therapeutic use are chosen depending on the disease types, but they all share the common principle to ensure structure stability, high transduction efficiency, cell-type specificity, and low health risks. Firstly, the recombinant virus must be stable in genome organization and particle structure when foreign genes were inserted. The instability of genome organization brought about by unwanted gene recombination would severely affect the therapeutic reproducibility of viral vectors (Nyberg-Hoffman and Aguilar-Cordova 1999). Then, the high transduction efficiency is the critical parameter to maintain the high level of gene expression in cells. The interference between regulatory elements and promoters in viral genome should be avoided as far as possible. Thirdly, the capsid protein can be artificially designed to endow the cell type-specific binding or the nonspecific binding capacity to make the pseudotyped virus vector (Tabebordbar et al. 2021). Finally, the potential health risks from viral vector would be great concern in the design of viral vectors. Generally, viral vectors are mostly modified by the deletion of the key elements for viral replication; the pseudotyped virus vector only retained the capacity of infecting cells to express foreign genes but could not mediate the normal replication and form the new virions. However, there is still the potential risk of random insertional mutagenesis in the chromosome, which might induce the inactivation of tumor suppressor genes but also stimulate the hyperactive expression of oncogenes. The imbalance between tumor suppressor genes and oncogenes would bring about the risk of carcinogenicity. Furthermore, the high-dose injection in multiple times might cause unwanted allergic response, which would cause damage to organs in the body.

1710

D. Ma and Z. Xi

Adenoviruses are naked icosahedral capsid viruses with the diameter of 70–90 nm, which carry its own 36 kb double-stranded DNA. The adenovirus genome includes the following parts: two inverted terminal repeats (IR) at both ends, packaging sequence (ψ), early transcription units (E1A, E1B, E2A, E2B, E3, E4), and late transcription units (L1, L2, L3, L4, L5). Relying on the coxsackievirus adenovirus receptor (CAR) distributed on the surface of host cells, adenoviruses can transiently transduce nearly all mammalian cell types via integrin-mediated endocytosis and nucleus entry (McDonald et al. 1999). Among all adenovirus serotypes, the adenovirus serotypes 2 and 5 are widely used for the construction of adenoviral vectors, which can be either replication competent or replication deficient. Until now, there are three generations of adenoviral vectors. The first-generation adenoviral vectors were modified to delete the E1 or E3 region and can be propagated in helper cells expressing E1. However, the unwanted gene expression from other adenoviral genes unexpectedly evoked the activation of cytotoxic T lymphocyte, which would completely eliminate the transfected cells and cause the failure of gene expression in the designation cells. To reduce the strong immune response, the further deletion of E2 and E4 generated the second-generation adenoviral vectors, which increased the nonviral gene insertion capacity to 14 kb. They showed weaker immune response but reduced virus packaging efficiency. To further improve the virus packaging efficiency and the virus stability, the third-generation adenoviral vectors are constructed through removing all coding viral regions except the ITR and ψ sequence, which added the gene insertion capacity up to 36 kb. Until now, adenoviral vectors have shown great promise in the treatment of cancers and vaccines. Adeno-associated viruses (AAV) are naked icosahedral capsid viruses with the average diameter of 25 nm, which carry 4.7 kb single-stranded DNAs. AAV vectors harbor more than 100 serotypes, in which AAV2 has been well studied and engineered as the viral vector for gene therapy. The AAV genome includes two inverted terminal repeats (IR) at both ends including cis-regulatory elements for replication and packaging, rep gene for replication, and cap gene for the capsid assembly. The recombinant AAV (rAAV) vectors can be artificially designed through the complete removal of the rep and cap genes to provide the maximum gene capacity (4.7 kb) for the introduction of foreign genes. However, there is the potential risk of integration of rAAV genomic sequences into a specific site AAVS1 on chromosome 19 when other helper virus exists (Huser and Heilbronn 2003). rAAV vectors hold the great advantage in harboring numerous serotypes for different cell-type tropisms, which offers diverse tissue-specific targeted delivery such as the liver, brain, colon, lymph nodes, bone marrow, heart, kidney, and spleen (Gigout et al. 2005). Although the gene capacity was severely limited, the combinational dual rAAV vector and self-complementary (scAAV) vector have achieved larger gene product through in situ recombination (Duan et al. 2003). Lentiviruses as a subgroup of the retrovirus family are enveloped spherical viruses with the diameter of 80–120 nm, which carry two copies of 9 kb positivesense single-stranded RNAs. Most lentiviral vectors (LVs) are constructed based on the human immunodeficiency virus (HIV), which allows the stable and long-lasting

54

Gene Nanovector for Genome Therapy

1711

gene expression through the chromosome integration of virus genome sequence. LVs have the advantage in infecting nondividing, terminally differentiated hematopoietic and neuronal cells. The intact genome of HIV-1 consists of two long terminal repeat (LTR) at both ends, packaging signal Psi (Ψ), three essential genes (Gag, Pol, Env), two regulatory genes (Tat, Rev), and four accessory genes (Vif, Vpr, Vpu, Nef). Originally, LVs have the narrow tropisms to cell types like human immune cells through the interaction between gp120 and CD4 receptor and the co-receptors (CCR5 or CXCR4) for the endocytosis (Lee et al. 2011). To expand the applications of LVs in non-CD4 expressing cell types, the pseudotyped LVs were further engineered with the heterologous envelope from other viruses like Vesicular Stomatitis Virus glycoprotein G (VSV-G). The replacement of VSV-G not only enhanced the stability and titer of LVs but also expanded their tropisms to a wide range of dividing cells and nondividing cells (Park and Choi 2007). The engineering of the envelope proteins also improved the safety of LVs in the clinical applications (Persons 2010). As the RNA virus, lentiviruses are prone to high-frequency mutations and RNA recombination. The construction of LVs has met with many challenges especially in improving biosafety, which was mainly solved by the virus genome decomposition into different separate vectors. In the case of HIV-1 genome, three essential genes (Gag, Pol, Env) are separately transferred into different plasmids, and four viral accessory genes (Vif, Vpr, Vpu, Nef) are directly deleted. The remaining sequences also retain LTRs and packaging signal Psi (Ψ) as the lentiviral vector backbone for the transfer of gene of interest. The development of LVs has passed through three generations. The preparation of the first-generation LVs requires three plasmids (transfer plasmid; helper plasmid including Gag, Pol, Tat, Rev., and four viral accessory genes; and heterologous envelope plasmid). The second-generation LVs consist of three plasmids (transfer plasmid; helper plasmid including Gag, Pol, Tat, and Rev.; and heterologous envelope plasmid). There is a great improvement in the third-generation LVs. In the transfer plasmid, the 30 LTR sequence is modified and the 50 LTR is replaced with a heterologous enhancer/ promoter (CAG, CBH, EF-1a, etc.) to initiate the transcription when Tat gene was also eliminated. The other alteration is the separation of Gag, Pol, and Rev. into two plasmids. The gradual advancements of LVs improved the biosafety and stability of gene expression through the minimization of recombination events of forming a replication competent lentivirus (RCL) during the preparation and applications.

Nano DNA Through the comparison of all above gene vectors, we could see that gene capacity and size control were two fundamental determinants for the design of gene vectors. To find a suitable gene vector for genome therapy use, it is more important to balance the high DNA capacity and cell-penetrable nanosize, especially when various gene regulation tools are needed to be loaded in the all-in-one vector. With this assumption, more attentions were paid on designing artificial DNA nanostructure (nano DNA) mimicking chromosome DNA with the high DNA loading capacity to carry various gene regulation tools into the same vector without altering the nanoscale size. So far, DNA has been redesigned and fabricated as the basic module to

1712

D. Ma and Z. Xi

assemble into the highly ordered nanomaterials for the wide range of applications such as diagnostics, protein assembly, drug delivery, and synthetic biology (Seeman and Sleiman 2018). The core principle of DNA nanotechnology relies on combining various DNA strands to assemble into the predesigned shapes in the one-, two-, and three-dimensional (1D, 2D, 3D) geometries through the specific interactions (Watson-Crick base pairing). In recent years, DNA nanotechnology could be classified as structural DNA nanotechnology, dynamic DNA nanotechnology, and functional DNA nanotechnology. Structural DNA nanotechnology focused on using bottom-up strategy to combine all the strands into the nanostructure with different sizes and shapes (Pinheiro et al. 2011). The strategy of strand displacement could help to alter the shape or size of the nanostructure according to the stimulator, which was called dynamic DNA nanotechnology (Zhang and Seelig 2011). When an independent molecular device (enzyme, functional nucleic acids, etc.) was installed into the nanostructure, it will endow the nanostructure a new function for diagnosis and drug delivery. In early years (1983), Nadrian C. Seeman used the Holliday junctions as the branched junctions to make up the unique arms for the connection (Kallenbach et al. 1983), which inspired wide attention and devotion to developing various kinds of junction types such as three-arm junctions and double-crossover (DX) for the assembly of two- and three-dimensional geometries (Chen and Seeman 1991; Fu and Seeman 1993). Owing to the simple and low-cost synthesis of linear DNA, linear DNA has been widely used to comprise the basic modules for the nanostructure and has exhibited the extensive applications in drug delivery, imaging, and sensing. The use of circular DNA as the scaffold DNA has promoted the development of DNA origami. In 2006, Paul Rothemund used a long, single-stranded DNA as the scaffold DNA and added the short staple strands to assemble into many shapes like rectangles, stars, and faces (Rothemund 2006). The computer-based design of DNA origami further simplified the synthesis of DNA nanomaterials with uniform sizes and predesigned shapes. As DNA origami showed the accurate spatial addressability, the chemical bio-conjugation of various functional tags such as ligands, aptamers, antibodies, antisense RNA, siRNA, and genes to any position would be possible and site-controllable (Ji et al. 2021; Weizenmann et al. 2021). The branched junction has been developed to direct the assembly of DNA nanostructure. Previous studies have been extensively focused on the construction of such DNA nanostructure as the scaffold for the loading of other functional modules mentioned above through the bio-orthogonal reaction interactions. In principle, it will be more appealing to see that the DNA sequence in the DNA nanostructure in itself can be also endowed the genetic information for the encoding in cells. However, the DNA sequence in the DNA nanostructure is still mostly too short to be used for the protein expression although the long sequence is used in DNA origami. The DNA in DNA origami is single-stranded DNA, which may also lost its capacity for gene expression. In order to use long double-stranded DNA as the block to construct the nanostructure DNA, the development of polymerase chain reaction (PCR) with branched DNA as the primer has brought about the possibility of utilizing long double-

54

Gene Nanovector for Genome Therapy

1713

stranded DNA as the unit to assemble into the nanostructure DNA for gene expression. In this way, the basic unit of DNA nanostructure was in itself the gene module, which was different from previous DNA nanotechnology. In 2008, we envisioned that branched RNA/DNA could lead to defined nanoparticles through base-pairing recognition using defined chemical modification and linkage (Xi et al. 2008, Xi et al. 2012). Different from the traditional PCR with linear DNA, the nanostructure can be made from branched DNA primer-mediated PCR, which was called branch-PCR (Xi et al. 2012; Liu et al. 2015a, b, 2016; Cheng et al. 2018). Branched primer was the critical component of branch-PCR, in which the extension of branched primer DNA by DNA polymerase generated the long branched primer sequence, which would further denature and anneal to form the nanostructure. Keller et al. used covalently connected bDNA with Y-motifs as branched DNA primers to establish three-dimensional DNA network through PCR (Keller et al. 2008). Finke et al. further used click chemistry with the cyclic peptide c(RGDfK) to construct branched DNA primers to make functional DNA-based network as extracellular matrix for cell culture (Finke et al. 2016). Also, a thermostable branched primer retaining the Y-shaped region and a single-stranded region was constructed by chemical cross-linking and exhibited the extension in PCR reaction for the construction of thermostable DNA nanostructures (Hartman et al. 2013). Guo et al. used this kind of branched primers to construct gene nanostructures and validated the function of this nanostructure in improving gene expression efficiency in cell-free translation system (Guo et al. 2019). Unlike the DNA network structure by the abovementioned thermostable branched primer, the DNA nanostructure constructed from this kind of branched primer was highly compacted and retained the spherical geometry suitable for cell uptake. With possibility of obtaining highly compressed structure, the nanostructure from branch-PCR could enter the cell nucleus and then express the designed DNA functions in living cells. Relying on the high DNA loading capacity, branch-PCR constructed DNA nanovectors based on this kind of branched primer would be quite appealing in the possibility of combining various gene regulation tools in the same vector while still retaining the designed nanoscale size. Only in this way, branch-PCR with this kind of branched primer could produce a chromosome-like gene nanovector levering high DNA capacity, size controllability, and multiplex biological functions for genome therapy.

The Presumptive Model of Archimedes Solid-like Nanostructures Assembled from Branch-PCR To investigate the structure of branch-PCR constructed DNA nanovectors, Zhen Xi group proposed a mathematical model to reveal the possible categories of DNA nanostructures from this kind of branched primers (Xi et al. 2012; Zhang 2010; Cheng et al. 2018). In this mathematical model, it assumed that two kinds of branched strands could self-assemble into a nanostructure resembling Archimedean geometry with different numbers of double-stranded edges (E), vertex (V), and face (F) in a defined size and shape through Watson-Crick base pairing. This kind of

1714

D. Ma and Z. Xi

nanostructure with the shape of Archimedean geometry is called Archimedes solidlike nanostructures (ASN). To verify the model, Zhen Xi group firstly used the branched sense RNA (25 mer) and branched antisense RNA (25 mer) to construct the siRNA nanoparticle, which simplify the assembly process without the extension by PCR. There are three restrictive geometry conditions for ASN to be considered: (i) The structure should be constructed only by edges of the same length; (ii) each face of the structure should be polygons with even number of edges; and (iii) the structure should be constructed only by the precise 1:1 ratio of two-component branched RNA. Based on the restriction conditions, only four possible topological closure structures including cube with 12 edges and 3 kinds of Archimedean solids (truncated octahedron TO with 36 edges, truncated cuboctahedron TC with 72 edges, and truncated icosidodecahedron T-IC with 180 edges) could be obtained with the calculated diameters. In order to validate the proposed model, Zhen Xi group constructed the tri-branched sense RNA and tri-branched antisense RNA (S3 and A3) by Michael addition reaction between the thiol group modified at 50 -end of RNA and three maleimide groups of 3 M molecules (Xi et al. 2012; Zhang 2010). The nanostructure was directly assembled through the annealing process between tri-branched S3 and A3 at equimolar concentrations (Fig. 2). The physical characterization from transmission electrical microscope (TEM), atomic force microscope (AFM), and dynamic light scattering (DLS) consistently revealed that the diameter of the nanoparticle assembled from branched sense RNA (25 mer) and branched antisense RNA (25 mer) was 30.3 nm, which was very close to the calculated diameter (29.8 nm) of the Archimedes

Fig. 2 The presumptive model of the four possible nanostructures assembled from tri-branched sense RNA and tri-branched antisense RNA (S3 and A3)

54

Gene Nanovector for Genome Therapy

1715

solid-like truncated octahedron nanostructures (ASN-TO). Based on the truncated icosidodecahedron model, ASN-TO had siRNA edges, which constituted the proportion of 87% siRNA components in the nanostructure. The construction of ASN-TO not only improved the siRNA loading capacity but also showed the high resistance to RNase and serum. The dual-luciferase reporter assay system in HEK293A cells further confirmed that ASN-TO could mediate 87% of firefly luciferase mRNA knockdown with IC50 of 1.06 nM, which was comparable to liposome encapsulated siRNA (98% firefly luciferase mRNA knockdown) at 48 hours of post-transfection, indicative of the maintaining of high activity of gene silencing with this kind of highly ordered nanostructure (Wei et al. 2013). The biochemical assay with in vitro Dicer cleavage also revealed that ASN-TO could be effectively cleaved by Dicer to yield siRNA at 20 h, which exhibited a gradient siRNA release. The combination of the theoretical model and experimental evidence demonstrated that the nanostructure from tri-branched oligonucleotides conformed to the hierarchical assembly process of truncated icosidodecahedron in a high probability.

The Design of ASN-TO Gene Nanovector Supposing the short RNA sequence in branched strands were changed into DNA primers (F3 and R3), the long branched single-stranded DNA strands including the information of gene expression will be able to assemble into Archimedean nanoparticles after the extension of short branched primer to the equal length by DNA polymerase in PCR reaction (Fig. 3, Fig. 5a, b). With the guidance of this nanostructure model of truncated octahedron comprised of 36 edges, it will be possible to construct a kind of gene nanovector with high-capacity gene contents and controllable size for cell delivery relying on the long branched single-stranded DNA strands. Taking 7000 bp gene expression cassette as the example, an ideal gene nanovector harboring the ASN-TO DNA nanostructure would in theory load 252 kbp gene contents (7000 bp  36 edges), which was a significant advance in gene capacity expansion compared to currently used gene delivery vectors. Similar to the nucleosome-like folding in chromatin or superhelical folding in plasmid DNA, the highly ordered folding and twisting in the nanostructure might further compress DNA into the more condensed nanoparticle with the size approaching 100 nm when the modular gene sequence was much more longer. The size-tunability of ASN-TO gene nanovector will be discussed in the following part. The characteristic of DNA condensability in multiple dimensional folding may further increase gene capacity for this kind of ASN-TO DNA nanostructure. This would allow us to put various therapeutic genes in this all-in-one chromatin-like nanostructure.

Size-Tunability of ASN-TO Gene Nanovector As known, the branch-PCR constructed nanovector carrying different length of gene modules exhibited various sizes. To have a knowledge of the dynamic changes of the

1716

D. Ma and Z. Xi

Fig. 3 The branch-PCR assembly of ASN-TO gene nanovector with branched primers

ASN-TO gene nanovector, Lu et al. established the relationship model between gene length (100–7000 bp) and the size of ASN-TO gene nanovector (Lu et al. 2022). The diameter of different nanovectors assembled from linear DNAs with different gene length showed great volatility. In the range from 100 bp to 700 bp, the average size from 123.9 nm to 762.8 nm was linearly dependent on the increasing gene length. In the range from 700 to 2000 bp, the average size was dropped down from 762.8 nm to 245.6 nm. Starting from 2000 to 6000 bp, the average size increased up to 596.9 nm at 3000 bp and then declined to 246.7 nm at 4000 bp. When the gene length reached 5000 bp, the average size increased up to 385.6 nm but then decreased to 127.5 nm at 6000 bp. For the similar size of 123.9 nm, there was nearly about 60-fold expansion of gene capacity from 100 bp to 6000 bp. When the gene length started to increase to 7000 bp from 6000 bp, the average particle size only exhibited a slight increase from 127.5 nm to 161.9 nm. The particle size of the nanovector assembled from 6000 to 7000 bp linear DNA was in accordance with that of the gene nanovectors encoding the full CRISPR/Cas9 components (6945 bp). The dynamic changes of the particle size revealed the size-tunability according to the gene length of linear DNA, which might be attributed to the transformation of double helix-based folding to the nucleosome helix-based folding when the gene sequence was extended to a critical point with the maximum torsional strain (Fig. 4). This kind of sequence lengthdependent size-tunability would help to design biocompatible sizes for the extended applications in genome editing (Lu et al. 2022). However, the correlation relationship was established based on the same sequence; the effect of different sequence composition on the particle size of gene nanovectors assembled from the same gene length was still unknown and should be considered in the future.

54

Gene Nanovector for Genome Therapy

1717

Fig. 4 The possible DNA folding mode of gene nanovector

The Applications of ASN-TO Gene Nanovector in Genome Therapy With the fast development of multiple tools for operating gene expression in different levels (chromatin unfolding, DNA transcription, posttranscription, mRNA translation, posttranslation, etc.), the effective therapy against complex diseases like rare genetic diseases by multiple genes and cancers is hard to reach with only the single gene-operation tool or the gene regulation in one level, which has been demonstrated to be correlated with side effect, drug resistance, and disease relapse. To find a solution, the strategies of combining multiple gene-operation tools or regulating multiple genes in different levels have been widely recognized to combat the complex diseases. However, the application of the combination strategy has been going on slowly mainly due to the limit of gene vectors, which is hard to maintain the balance between the gene loading content and vector size. Hence, the development of the suitable vectors to deliver multiple gene-operation tools together is critical to the performance of the combination strategy. Compared with traditionally used nonviral vectors (linear DNA, plasmids) and viral vectors, ASN-TO gene nanovector was assembled from the linear DNA as the edge of the nanostructure. ASN-TO gene nanovector harbored several unique advantages in the following aspects: 1. 2. 3. 4. 5. 6. 7.

Structurally compacted with the nanoscale size and defined spherical shape. Size controllable according the gene length and its condensation. High percentage of gene content assembled in the nanostructure. With the well-defined chemical structure for the easier druggability assessment. Simple and low-cost procedures to synthesize and purify. High stability against nuclease and serum. Low cytotoxicity and immunogenicity for no bacterial sequence, unmethylated CpG motifs, and bacterial endotoxin contaminants. 8. Chemically modifiable for the ligand conjugation for better targeting and release.

1718

D. Ma and Z. Xi

9. Well biocompatible with diverse gene regulation toolboxes. 10. Spatial and temporal control of multiplex gene regulation in multiple levels to mediate the network regulation in the all-in-one nanovector, which would be not only applicable to gene therapy against single gene-related diseases but also expanded to genome therapy against multiple gene-associated diseases. Based on the unique characteristic of ASN-TO gene nanovector, it will be promising to integrate all necessary gene-operation tools mentioned above as an open source toolkit into ASN-TO gene nanovector as all-in-one gene nanovectors. In a way, ASN-TO gene nanovector resembles an artificial miniature chromosome, where diverse tools for multiplex gene regulation in different levels are stalled. In addition to traditional gene therapy, genome therapy with ASN-TO gene nanovector expanded the application scopes for disease therapy, which mainly includes three unique characteristics: 1) structurally similar to chromosome, 2) functionally similar to chromosome, and 3) network target-based multiplex gene regulation to fine-tune the phenotype. In principle, a gene vector suitable for genome therapy should provide enough space but still showed the compacted nanoscale size to accommodate multiple tools of gene regulation (gene overexpression, gene silencing, gene editing, etc.) in a rational position as an integrative all-in-one toolkit. All the gene cassettes in the vectors could be sequentially expressed in the spatiotemporal mode and fine-tuned in multiple levels (chromatin unfolding, DNA transcription, posttranscription, mRNA translation, posttranslation, etc.). Furthermore, an intended cell phenotype should be also tightly controlled by the multiple gene expression from the network gene targets. Network target-based multiplex gene regulation will enable the fine-tuning of the cell fate throughout the whole cell cycle through the synergistic regulation of different genes in multiple signaling pathways of gene network in the genome landscape. Based on the hypothesis of genome therapy, the applications of ASN-TO gene nanovector have been experimentally validated and will be described as follows.

The Gene Overexpression of ASN-TO Gene Nanovector In 2015, Liu et al. firstly performed the proof of concept validation with a pair of tri-branched primer (F3 and R3). Through thiol-Michael addition, F3 and R3 were synthesized based on the reaction between 50 -terminated thiol of oligonucleotides and three Michael receptors of branching molecules (3 M) (Liu et al. 2015a, b). The reaction products should be purified by 15% denatured PAGE to ensure the high purity, which would severely interfere with branch-PCR mediated assembly. For branch-PCR, a linear reporter gene expression module with the sequence length 809 bp was constructed. The sequence compositions were listed as follows: F1 box, SP6 promoter, Kozak sequence, EGFP open reading frame (ORF), 20 nt polyA, and R1 box. The procedure of branch-PCR needs two turns of PCR reaction: 1) the linear DNA amplification with linear primers (F1 and R1) including the sequence of F3 and R3 at the 50 end and 2) the linear DNA set as the template for the PCR assembly with

54

Gene Nanovector for Genome Therapy

1719

tri-branched primer (F3 and R3). The feature of this kind of nanovectors is their obvious slower mobility in agarose gel than linear DNA and spherical structure in the image of atomic force microscope. The cell-free protein expression was preliminarily carried out with an SP6 high-yield wheat germ protein expression system and indicated the function of gene transcription and protein translation mediated by this kind of gene nanovectors. After the substitution of promoter and transcription terminator, another linear reporter gene expression module with the sequence length of 1679 bp was also constructed to validate its function in living cells. The sequence compositions were listed as follows: F1 box, CMV promoter, Kozak sequence, EGFP open reading frame (ORF), SV40 polyA, and R1 box. The observed fluorescence emitted by the expressed EGFP proteins showed that this kind of gene nanovectors could efficiently enter the cell nucleus to initiate the transcription and the following translation. It was the first evidence of proving the gene expression function in living cells of branch-PCR constructed gene nanovectors. The successful trials of expressing reporter genes with this kind of gene nanovectors pointed out their great potential in gene therapy against various diseases. With this kind of branch-PCR constructed gene nanovectors, Cheng et al. further tried to treat cancer cells through the introduction of TP53 gene overexpression (Cheng et al. 2018). A wide range of cancer diseases have been reported to be tightly associated with the functional loss of tumor suppressor genes like TP53 caused by gene mutation. The rescue of p53 function with the nanovector to deliver TP53 gene into cancer cells was an alternative approach to the small molecule drug-based target therapy. The high capacity of gene contents, high serum stability, and low immunological risks of this gene nanovector presented the unique advantages in cancer gene therapy. The sequence compositions of TP53 gene expression cassette (2222 bp) were listed as follows (Fig. 5d): F1 box, CMV promoter, Kozak sequence, TP53 open reading frame (ORF), BGH polyA, and R1 box. The TP53 gene nanovector was constructed by branch-PCR through the same tri-branched primer (F3 and R3). Though the sequence length was more than 2000 bp, the branch-PCR product was still showed as the main band at the top of 1% agarose gel. The structure characterization from AFM, TEM, and DLS also verified that the branch-PCR product was still shown as the spherical structure with the diameter of 150  25 nm, which was a suitable nanoparticle size for the cell delivery. The serum stability assay in 30% fetal bovine serum (FBS) revealed that branch-PCR product was much more stable than linear DNA and plasmids. The cell transfection tests further confirmed that the branch-PCR product could be efficiently delivered into cancer cells for the p53 protein expression and promote cell apoptosis. The apoptosis ratio could reach up to 86% when 4 μg gene nanovectors were transfected into HepG2 cells. Aside from the in vitro anticancer efficacy, the in vivo cancer therapy of this gene nanovector was also evaluated. The tumors were significantly inhibited when H22 tumor-bearing BALB/c mice were injected with the TP53 gene nanovectors. Intriguingly, the gene nanovector had the ability to autonomously enter into tumor cells and showed stronger antitumor activity than plasmids when no any transfection agents were used. The successful trials of TP53 gene nanovectors in cancer therapy demonstrated

1720

D. Ma and Z. Xi

Fig. 5 The sequence design of ASN-TO gene nanovector for different applications. (a) The tri-branched primers. (b) The illustration of the construction of ASN-TO gene nanovectors with tri-branched primers. (c) The ASN-TO gene nanovector for the transcription of shRNA array for gene silencing. (d) The ASN-TO gene nanovector for the transcription of TP53 gene for gene overexpression. (e) The ASN-TO gene nanovector for the co-transcription of SpCas9 gene and gRNA for genome editing

their unique advantages in both expanding the gene capacity and retaining the nanoscale structure for the improvements of serum stability.

The Gene Silencing of ASN-TO Gene Nanovector Since the establishment of RNA interference (RNAi) technology with short interfering double-stranded RNA (siRNA) in 1998, siRNA have been gradually exploited to silence the hyperactive oncogenic genes for cancer therapy. Except the chemically synthesized siRNA, short hairpin RNA (shRNA) was discovered to be effective as siRNA and could be delivered with viral or nonviral vectors. Different from the direct cytosol delivery of synthetic siRNA, the expression cassette encoding shRNA was firstly delivered into cell nucleus and then transcribed into shRNA, which was then exported into the cytoplasm for dicer processing and AGO2 loading to assemble as the RISC complex. siRNAs have exhibited many advantages in large-scale production and various chemical modifications but possessed the high cost. On the contrary, vector-encoded shRNA, especially for lentivirus-based shRNA library, could be prepared in parallel for the high-throughput gene screening and also exhibited the long-lasting activity for the integration of shRNA expression cassette

54

Gene Nanovector for Genome Therapy

1721

into chromosome. In terms of the clinical applications of shRNA-expressing vectors, the promoter activity, vector stability, and high-load shRNA copies were three restrictions, which was still unmet with current vectors. The invention of ASN-TO gene nanovector provided a new avenue to accommodating the shRNA-expressing cassette for RNAi-based cancer gene therapy. The first trial of accommodating the shRNA-expressing cassette into ASN-TO gene nanovector was performed by Liu et al. in 2016. The sequence compositions of shRNA-expressing cassette were listed as follows: F1 box, U6 promoter, sense sequence and antisense sequence with 9 nt loop sequence (TTCAAGAGA), terminal sequences (TTTTTT), and R1 box. With the same branched primer, the gene cassette expressing shRNAs targeting EGFP or firefly luciferase were firstly amplified with the first-round PCR using the linear primer (F1 and R1), the products of which were then used as the linear template for branch-PCR to construct the ASN-TO gene nanovector carrying the shRNA-expressing gene cassette. The AFM and DLS characterization showed that the gene nanovector was a spherical structure with a diameter of 555  50 nm, which was still stable in solution. This gene nanovector could mediate the efficient and long-lasting gene silencing activity under 4 pM, indicative of its effectiveness in expressing shRNA with this kind of nanovector (Liu et al. 2016). However, the gene silencing efficiency (20,000 bp from TSS. It also showed that 42.7% of gene promoters contained at least one PQS and that the density of these motifs in the first 100 bp upstream TSS is 12-fold higher than the average PQS density of the genome (Huppert and Balasubramanian 2007). The bioinformatic studies supported the hypothesis that G4 DNA may play a regulatory role in transcription. In fact, the biological role of G4 DNA in the cell is still hotly debated. In vitro studies have shown that G4 DNA in the transcription region of a gene can interfere with the movement of the RNA polymerase complex and affects gene expression (Nayun 2019). For example, a G4 formed in the template or transcribed DNA strand can act

2146

H. Choudhary and L. E. Xodo

as a physical blockade for RNA Pol II (Belotserkovskii et al. 2010). Moreover, a hybrid DNA:RNA G4 formed by the non-template strand and the nascent RNA could also impede the processivity of the incoming RNA Pol II complex (Zheng et al. 2013). Another mechanism would be that an intramolecular G4 formed in the non-template strand promotes annealing of the nascent RNA to the template strand and forms an R-loop that impedes the incoming RNA Pol II complex (Belotserkovskii et al. 2017). Although these mechanisms of transcriptional inhibition have been characterized, the function of G4 DNA in the upstream TSS region, where PQS density is at its peak, is probably more important in the cellular context. One probable function in gene promoter is that G4 DNA provides a platform for high-affinity binding of transcription factors (Spiegel et al. 2021). One of the first examples supporting this hypothesis is the G4 formed by the 32R sequence located upstream TSS in human and mouse KRAS promoters (Cogoi and Xodo 2006). Combining pull-down experiments, performed with biotinylated 32R and Panc-1 extract, with mass spectrometry, it has been found that the folded structure of KRAS is bound by several proteins including PARP-1, Ku70, and hnRNPA1 (Cogoi and Xodo 2006). Subsequently, it was discovered that the zinc finger protein MAZ, whose consensus binding sequence is GGGAGGG, also binds to the KRAS G4 structure (Cogoi et al. 2013). Another example of a protein that binds to G4 DNA is SP1, which has been reported to recognize G4 structures in the c-KIT and HRAS promoters (Cogoi et al. 2014; Todd and Neidle 2008). A genome-wide analysis showed that approximately 36% of SP1 binding sites detected by ChIP-seq did not contain the 50 -GGGCGG-30 -Sp1 consensus binding sequence, but that the majority of these sites contained one or more G4 motifs (Todd and Neidle 2008). This suggests that the transactivation of SP1 is also dependent on DNA conformation. A recent study supports the hypothesis that G4 DNA may serve as docking sites for transcription factors. The authors reported that several transcription factors are recruited to sites containing endogenous G4s in human chromatin. In particular, the G4 promoters of highly expressed genes are recognized by a large number of transcription factors (Spiegel et al. 2021). Furthermore, a recent study using G4 ChIP-seq/RNA-seq analysis in liposarcoma cells showed that G4s in promoters are associated with high levels of transcription in open chromatin (Lago et al. 2021). The authors compared the transcription levels in liposarcoma cells with available data on keratinocytes and discovered that the promoter sequences of the same genes in the two cell lines had a different G4 folding state, with high transcription levels consistently associated with G4 folding. The transcription factors AP-1 and SP1, whose binding sites were most abundant in the G4-folded sequences, were coimmunoprecipitated with their G4-folded promoters. Of particular interest is the interaction between PARP-1 and certain G4 structures, which is functional in nature, in the sense that the interaction stimulates the enzymatic activity of the protein. PARP-1 catalyzes poly(ADP-ribosyl)ation (PARylation) of proteins including itself (auto-PARylation) (Alemasova and Lavrik 2019). PARP-1 consists of six domains connected by a flexible linker (Alemasova and Lavrik 2019). When PARP-1 binds via its two zinc-finger domains to DNA bearing a lesion, it undergoes a structural change that triggers the synthesis of

66

Targeted Cancer Therapy: KRAS-Specific Treatments for Pancreatic Cancer

2147

ADP-ribose units using NAD+ as a source of ADP-ribose (up to 200 ADP-ribose units at the target protein). PARP-1 can also transfer few ADP-ribose units or even only a single [mono(ADP-ribosyl)ation] (Eustermann et al. 2011; Gupte et al. 2017). In the PARylation process, dsDNA and G4-DNA behave like a positive allosteric effector by upregulating the basal catalytic activity of PARP-1. The allosteric function of G4 DNA towards PARP-1 was first documented by Soldatenkov et al. (2008), who discovered that PARP-1 undergoes auto-PARylation upon binding to the c-KIT quadruplex. A recent study showed that although PARP-1 binds to multiple G4 structures, only certain G4s promote PARP-1 activity (Edwards et al. 2021). This study supports the notion that the sequence and size of the G4 loops regulate PARP-1 activity. For example, c-KIT forms a G4 characterized by two 1-nt loops, a 5-nt loop, and three cytosines at the 50 -end, which may mediate the formation of an alternative stem-loop structure. This G4 binding to PARP-1 activates auto-PARylation of the protein, albeit to a fourfold lesser extent than with an 18-bp dsDNA. In contrast, PARP-1 is only weakly stimulated by T15, hTEL, or c-MYC. Shortening the cytosine tail at the 50 end or the pentanucleotide loop, or both, reduced PARP-1 activation twofold, suggesting that both stem-loop and G4-DNA loop features are required for PARP-1 activation by c-KIT. Recently, the role of PARP-1 in the activation of the KRAS gene in response to oxidative stress was investigated, and the results provide a detailed example of how G4 DNA serves as a docking site for the assembly of a multiprotein complex (Cinque et al. 2020). The G4s formed by the 32R motif of KRAS induced auto-PARylation of PARP-1 after binding to the protein, increasing the molecular weight from 113 to approximately 250 kDa, indicating extensive auto-PARylation synthesis. This was observed in vitro by incubating KRAS G4 with increasing amounts of PARP-1 in the presence of NAD+. After incubation, the mixture was run in a gel, blotted, and analyzed with an anti-poly/mono ADP-ribose antibody. The G4s from the promoter sequences 32R and G4-mid of KRAS as well as the oxidized G4s from 32R activated the autoparylation of PARP-1 (Cinque et al. 2020; Cogoi et al. 2018). As expected, autoPARylation was not observed when veliparib, an inhibitor of PARP-1, was added to the reaction mixture. To observe PARylation in a cellular context, Panc-1 cells were treated with H2O2 to induce guanine oxidation and recruitment of PARP-1, MAZ, and hnRNP A1 to the KRAS promoter (Cogoi et al. 2018). The nuclear extract obtained from H2O2 treated cells were used to perform pull-down assays with biotinylated 32R G4 (b-32R) as well as its oxidized form b-92 (containing one 8-oxoguanine) or b-96 (containing two 8-oxoguanines). The pull-down samples analyzed by Western blot with a poly/mono ADP-ribose antibody showed that all three G4 baits pulled down PARylated proteins. Compared to untreated cells, PARylation was found to increase with cell exposure to H2O2, resulting in PARylated proteins in the input (extract) ranging between ~130 and 250 kDa, while PARylated protein captured from the G4 baits yielded only a sharp band corresponding to a protein the size of PARP-1. This suggests that PARP-1 captured by G4 is characterized by limited auto-PARylation, if not mono (ADP-ribosyl)ation. To confirm that PARP-1 is PARylated in Panc-1 cells treated with H2O2, nuclear extracts from untreated or H2O2-treated Panc-1 cells were used for pull-down assays

2148

H. Choudhary and L. E. Xodo

Fig. 3 (a) The expression of KRAS in Panc-1 cells is stimulated by ROS (6 h of treatment with H2O2) (Adapted from Cinque et al. (2020)). (b) Mechanism of transcription activation promoted by oxidative stress. Under enhanced oxidative stress, certain guanines of G4 32R can undergo oxidation to 8-oxoguanine. The oxidized guanines recruit PARP-1 to the promoter where upon interaction with the G4 undergoes auto-PARylation. Auto-PARylated PARP-1 is negative and attracts by electrostatic interaction cationic transcription factors as MAZ and hnRNPA1. (c) Structure of PARP-1 from Swiss Prot

with an anti-poly/mono-ADP ribose antibody. The recovered PARylated proteins were assayed in SDS gel and blotted in nitrocellulose. The blotted membrane was tested with antibodies specific for PARP-1, MAZ, and hnRNPA1. Only PARP-1 was found PARylated, whereas MAZ and hnRNPA1 were not. Taken together, the data show that PARP-1 undergoes PARylation after binding to the KRAS G4s. As suggested by the genome-wide analysis, the G4s of highly transcribed genes may be the platform for transcription factor recruitment (Spiegel et al. 2021). The KRAS G4 is a well-documented example of a protein-docking G4 structure. The mechanism illustrated in Fig. 3 is supported by the observation that the expression of KRAS is significantly increased when pancreatic cancer cells are treated with H2O2, i.e., when the oxidative stress in the cell is increased (Fig. 3a; Cogoi et al. 2018). The rise of cellular ROS results in an increase of 8OG in the promoter region of KRAS containing the G4 motifs and of the recruitment of PARP-1, MAZ, and hnRNP A1. The co-localization of G4 and 8OG in the same promoter region (at 0.2 kb resolution) was confirmed by pull-down experiments of genomic DNA with biotinylated G4 ligand (b-6438) followed by ChIP with anti 8OG Ab (Cinque et al. 2020). Assuming that 8OG is present in the G4 motif due to its high guanine content,

66

Targeted Cancer Therapy: KRAS-Specific Treatments for Pancreatic Cancer

2149

H2O2 increases the transcription factor occupancy of the G4 promoter motif. This is indeed consistent with the pull-down and Western blot assays showing that the 32R motif forms a multiprotein complex with PARP-1, MAZ, and hnRNP A1, which is most likely the transcription pre-initialization complex. It is possible that 8OG acts as an epigenetic mark for the recruitment of transcription factors. The 32R motif of the KRAS promoter is in equilibrium with its folded form. Folding probably occurs spontaneously, as suggested by the polymerase stop assay performed with 32R inserted into the plasmid (Cogoi and Xodo 2006). 32R should fold into a G4 that is, according to NMR data, in equilibrium between two different forms, both of which are recognized by transcription factors essential for KRAS. Under increased oxidative stress, as occurs in cancer cells, including PDAC cells, certain guanines are oxidized because they have the lowest oxidation potential among DNA nucleobases (Saito et al. 1995). Since the guanines most susceptible to oxidation are those in G cluster, it is likely that 32R is oxidized. 8-Oxoguanine in oxidized 32R G4 behaves like an epigenetic marker that attracts PARP-1 to the KRAS promoter. When the protein binds to the G4, the folded DNA acts as a positive allosteric effector that increases the catalytic activity of PARP-1, which undergoes auto-PARylation. This was confirmed in vitro, while cell-based experiments indicated that only a few ADP-ribose units are present on PARylated PARP-1. Since each ADP-ribose unit contains a negative charge, PARylated PARP-1 becomes anionic and becomes a platform for the recruitment of cationic transcription factors. MAZ and hnRNP A1, which have an isoelectric point (pI) of 8.1 and 9.2, respectively, are indeed cationic under physiological conditions. Thus, the cationic transcription factors spatially accumulate in the promoter region of 32R, where the transcription-initialization complex should form. According to this model, KRAS transcription is expected to be inhibited when MAZ or hnRNP A1 are repressed. This has been demonstrated in several studies (Cogoi et al. 2013; Paramasivam et al. 2009). Given the important role attributed to PARP-1, suppression or inhibition of its catalytic activity represents another interesting strategy to inhibit KRAS in PDAC cells: some recent data obtained with olaparib and veliparib support this hypothesis.

Antigene Strategies Based on G4-Binding Small Molecules A growing body of evidence suggests that G4 structures in the promoter of cancerrelated genes serve as hubs for TFs (Cogoi et al. 2008; Spiegel et al. 2021; Ferino et al. 2021). This encouraged researchers to hypothesize that small molecules with high affinity for G4 DNA could compete away the interaction between TFs and the target G4 structures, thus representing a class of compounds with anticancer activity. One of the first G4-binding molecules showing the ability to inhibit transcription is TMPyP4 (Cogoi and Xodo 2006; Siddiqui-Jain et al. 2002). This cationic porphyrin was found to strongly reduce the expression of CAT directed by the murine KRAS promoter, to 20% of control (Cogoi and Xodo 2006), while a luciferase assay showed that binding of TMPyP4 to the Pu27-G4 motif of c-MYC reduced transcription by 50% (Siddiqui-Jain et al. 2002). More recently, Paulo and co-workers

2150

H. Choudhary and L. E. Xodo

Fig. 4 (a) Structures of 5-methyl-indolo[3,2-c]quinoline derivatives (IQc) with a range of alkyldiamine side chains from Lavrado et al. (2015b). (b) Structures of 7-carboxylate indolo [3,2-b]quinoline tri-alkylamine derivatives from Brito et al. (2015)

developed a library of 5-methyl-indolo[3,2-c]quinoline IQc derivatives with a series of alkyldiamine side chains targeting DNA and RNA G4s located in the promoter and 50 -UTR mRNA of the KRAS gene (Lavrado et al. 2015a, b; Fig. 4a). The monosubstituted alkyldiamine IQc 2b-g compounds stabilize the 21R G4 structure of KRAS and increase its TM of 10–15  C. The disubstituted alkyldiamine derivatives 3f–j caused a similar G4 stabilization. While the disubstituted alkyldiamine derivatives 3d, e caused strong stabilization with ΔTM values between 12 and 22  C (Lavrado et al. 2015b). Interestingly, IQc derivatives without alkyldiamine side chains (1, 2a, 3c) showed a lower ability to stabilize KRAS 21R G4. Moreover, the derivatives with a bulky benzyl N5 substituent (3c and 3j) showed lower G4 stabilization compared to their monosubstituted counterparts 2a and 3e. All IQcs showed low affinity for ds-DNA (1.5 < ΔTM < 9.1  C for 2b-g and 1.6 < ΔTM < 6.5  C for 3d–j), suggesting good selectivity of the IQcs for the 21R G4 as compared to ds-DNA. Compounds 3e and 2d were also tested for their ability to stabilize the G4s formed by the entire 32R motif of KRAS. The former increased the TM of the G4s by 15  C, the latter by 10.7  C, suggesting that the IQcs can interact with both folded G4 structures of 32R. The designed IQc molecules 2a, 2d, 3d, and 3e were found to inhibit the metabolic activity of cells harboring mutant KRAS. The compounds showed IC50 values between 0.4 and 1.45 μM in lung cancer cells A 594; 1.98–2.20 μM in pancreatic cancer cells MiaPaCa2; 0.22–4.80 μM in pancreatic cancer cells Panc-1; 0.14–3.46 μM in colon cancer cells HCT116; and 0.2–4.74 μM in SW620 cells.

66

Targeted Cancer Therapy: KRAS-Specific Treatments for Pancreatic Cancer

2151

The effect of the compounds on transcription was first investigated using a luciferase reporter assay. Two different sized KRAS promoter constructs containing the G4 motif were cloned into the Firefly luciferase pGL3 plasmid (pGL-Ras0.5 and pGL-Ras2.0). These plasmids were co-transfected in HEK293T cells with Renilla luciferase pRL-TK, and compounds 2d, 3d, and 3e were found to reduce Firefly luciferase activity by ~25–50% compared to Renilla luciferase. The same decrease in promoter activity was observed with plasmids of different sizes (500 and 2000 bp), suggesting that the target region of the designed compounds is at most 500 bp upstream from the beginning of the coding region, which overlaps with the region where the G4 sequence is located.The effects of the IQc compounds on KRAS transcription in colon cancer cells (HCT116 and SW620) were examined by realtime RT-PCR. The results showed that the IQc compounds significantly reduced KRAS transcription in colon cancer cells to 20% (2d), 80% (3d), and 60% (3e) of control (DMSO treated cells). The results were confirmed by immunoblotting, which showed that the compounds reduced the KRAS protein to 40–70% of the control in both cancer cell lines, and the relative efficacy of the compounds followed the trend 3e < 3d < 2d (Brito et al. 2015). Subsequently, 7-carboxylate indolo[3,2-b]quinoline tri-alkylamine derivatives were found to be effective stabilizers of KRAS 21R G4 and potent anti-KRAS agents capable of inhibiting gene expression and inducing cell death by apoptosis in colon cancer cell lines (Lavrado et al. 2013; Fig. 4b). Calabrese et al. (2018) used a small molecule microarray (SMM) approach to identify preferential interaction between chlorhexidine and KRAS 21R G4 (Fig. 5a). Chlorhexidine showed a specific, low micromolar binding interaction with the KRAS G4. NMR and docking experiments suggest a binding mode determined by both aromatic stacking and groove binding interactions. Cancer cells with oncogenic mutations in the KRAS gene show increased sensitivity to chlorhexidine. Treatment of breast cancer cells with chlorhexidine leads to a downregulation of the KRAS protein level, whereas KRAS, transiently expressed by a promoter lacking G4, is not affected. Taken together, these studies provided strong evidence that G4 ligands can be promising anticancer drugs. Recently, a trisubstituted naphthalenediimide quadruplex-binding compound [2,7-bis(3-morpholinopropyl)-4-((2-(pyrrolidin-1-yl)ethyl)amino)benzo[lmn][3,8] phenanthroline-1,3,6,8 (2H,7H)-tetraone] (CM03) was developed by computer modelling as an inhibitor of cell growth in PDAC cell lines (Fig. 5b; Marchetti et al. 2018). In vitro studies showed that CM03 stabilizes both KRAS 21R and 32R G4s by 11 and 9.6  C, respectively. In contrast, it does not stabilize duplex DNA. The antiproliferative effect of CM03 was tested in lung adenocarcinomas (A549), breast adenocarcinomas (MCF7), and PDAC cell lines (MIA PaCa-2 and PANC-1), as well as in the fetal lung fibroblast-like non-oncogenic control cell line (WI-38). The compound showed strong growth inhibitory activity, particularly in the lung and pancreatic cancer cell lines, with IC50 values of 24, 159, 7, 18, and 1190 nM, respectively. In a mouse MIA PaCa-2 xenograft model for PDAC, CM03 showed a dose-dependent antitumor effect. The effect of CM03 was also tested in a genetically modified mouse model for PDAC. The KPC (Pdx1-Cre; LSLKrasG12D/+;

2152

H. Choudhary and L. E. Xodo

Fig. 5 (a) Structures of chlorhexidine, alexidine, and proguanil (Adapted from Calabrese et al. (2018)). (b) Trisubstituted naphthalene diimide quadruplex-binding compound 2,7-bis(3-morpholinopropyl)-4-((2-(pyrrolidin-1-yl)ethyl)amino)benzo[lmn][3,8]phenanthroline-1,3,6,8(2 H,7 H)tetraone (CM03). Molecular model of CM03 bound to the native parallel human telomeric G4 structure from Marchetti et al. (2018)

LSL-Trp53R172H/+) mouse model develops tumors with genetic and pathological features of human PDAC. KPC mice treated twice a week with 15 mg/kg CM03 survived longer than the untreated animals, with two mice surviving more than twice as long. The effects of this quadruplex-binding small molecule on global gene expression were analyzed by RNA-Seq. The experiment showed that a large number of genes rich in G4 elements were downregulated, which are involved in essential signaling pathways for PDAC survival, metastasis, and drug resistance.

66

Targeted Cancer Therapy: KRAS-Specific Treatments for Pancreatic Cancer

2153

G4-Binding Compounds Binding to the 50 -UTR Region of the KRAS Gene Recently, The author’s laboratory, focused on the G4 motifs formed in the 50 -UTR of KRAS mRNA and used two types of G4-binding compounds to inhibit KRAS translation in PDAC cells. The KRAS 50 -UTR forms three nonoverlapping RNA G4 structures (RG4s) stabilized by two G-tetrads, namely G4 utr-1, G4 utr-z, and G4 utr-c, which can serve as a platform for G4-binding compounds (Fig. 2b). UV-melting experiments showed that the RG4s melt cooperatively with TM of 53  0.5 (utr-1), 52  0.5 (utr-c), and 64  0.5  C (utr-z), and ΔG values between 6.1 and 7.3 kcal/mol in 100 mM KCl (Miglietta et al. 2017). For comparison, the 254 nt 50 -UTR of NRAS mRNA is characterized by a G4 motif between 240 and 222 from ATG forming an RG4 with three G-tetrads and TM of 74  C in 20 mM KCl. The first compounds used in the author’s laboratory to target the KRAS RG4s are anthrafurandiones (1a, 2a) and anthrathiophenediones (1b, 2b) (Miglietta et al. 2017; Fig. 6). At equimolar ratios, 2a stabilizes RG4s of 10–15  C, while 2b of 4–10  C. The compounds showed excellent ability to penetrate the cell membrane: 2a and 2b with aminoethyl side chains are taken up 20- and 4-fold more than the corresponding guanidino analogues 1a and 1b, respectively. This is because the positive charge of the guanidine group reduces the transport of the compounds through the lipid bilayer. Although 2a differs from 2b only by one atom in the five-member ring (oxygen versus sulfur), the ability of the former to penetrate the cell membrane is fivefold greater than that of the latter. This can possibly be explained by the higher polarizability of sulfur compared to oxygen. Luciferase experiments with a plasmid carrying Renilla driven by the KRAS promoter with the 50 -UTR element showed that 2a and to a lesser extent 2b reduce luciferase expression in a dose-response relationship. The ability of the compounds to inhibit also genomic KRAS was tested in Panc-1 cells. Consistent with the luciferase assay, anthrafuranedione 2a, but not 2b, was

Fig. 6 Structures of 4,11-bis (2-aminoethylamino)anthra [2,3-b]furan-5,10-dione (2a), 4,11-bis(2-aminoethylamino) anthra[2,3-b]thiophene-5,10dione (2b), and corresponding guanidino-modified derivatives 1a and 1b

2154

H. Choudhary and L. E. Xodo

Fig. 7 (a) The porphyrin macrocycle. (b) Jablonski diagram of cationic porphyrins showing type-I and type-II processes from Xodo et al. (2016). (c) Absorption UV-visible spectrum of a porphyrin with typical Soret and Q-bands

found to downregulate KRAS by 50%, a result that confirms the efficacy of targeting the 50 -UTR RG4s to suppress the KRAS gene in PDAC cells. However, the designed compounds could have a complex behavior: besides binding to cytoplasmic RG4s, they could also target genomic G4 structures in gene promoters and cause undesirable side effects. Therefore, the author’s pursued a new strategy using cationic porphyrins, as they have the following interesting properties of (i) binding to RG4s with high affinity; (ii) producing reactive oxygen species and singlet oxygen 1O2 when illuminated with light; and (iii) accumulating mainly in the cytoplasm. Porphyrins are naturally occurring molecules that perform important functions in the human body as they are involved in the transport of oxygen and in the cellular respiration. They consist of four pyrrole rings that are connected to each other via a methine bridge. The porphyrins that were used to target KRAS are synthetic and have been designed with the tetrapyrrole macrocycle substituted at the meso- or β-position (Fig. 7a–c). The π-electron system of the porphyrin absorbs strongly at ~400 nm (Soret band) and weakly at >500 nm (Q bands). When light is absorbed, an electron is transferred from the ground state (S0) to an excited, short-lived state (S1 or S2).

66

Targeted Cancer Therapy: KRAS-Specific Treatments for Pancreatic Cancer

2155

Fig. 8 (D) cationic alkyl-modified porphyrins tri-meso-(N-methyl-4-pyridyl)-meso-(N-(dodecadecyl, tetradecyl, hexadecyl, or octadecyl)-4-pyridyl) porphine (2a–d)

The Soret band at ~420 nm is due to the S0 ! S2 transition, while the Q bands between 550 and 700 nm are due to the S0 ! S1 transition (Xodo et al. 2016). While some of the excited-state molecules return to the ground state by fluorescence emission, the majority of the molecules undergo spin inversion (intersystem crossing, ISC) and populate triplet T states from which the excited porphyrins can interact with surrounding molecules (proteins, phospholipids, and nucleic acids) (type I process) or with molecular oxygen to generate singlet oxygen 1O2 (type II). Contrary to common belief, 1O2 has a relatively long lifetime (Skovsen et al. 2005), allowing it to diffuse over considerable distances within the cell. The reactive oxygen species generated by the porphyrins can oxidize DNA/RNA, especially the guanines, as they have the lowest redox potential among the nucleobases (Steenken and Jovanovic 1997). The quantum yield of singlet oxygen generation (ΦΔ) of the designed cationic porphyrins is relatively high, ranging from 0.50 to 0.77, suggesting that the main photochemical reaction that takes place when they are in the excited triplex state is a type II process (Xodo et al. 2016). To increase the ability to penetrate cell membranes, four analogues of porphyrin TMPyP4 bearing an alkyl side chain with 12, 14, 16, or 18 carbons, namely 2a, 2b, 2c, and 2d were developed (Fig. 8; Ferino et al. 2020). Although they carry four positive charges, the designed porphyrins are highly competent in penetrating the

2156

H. Choudhary and L. E. Xodo

Fig. 9. (a, b) FACS analyses of Panc-1 cells treated with 5 μM 2a–d, TMPyP2 (P2), or TMPyP4 (P4) for 6 h; (b) Effect of dynasore on the uptake of P4, 2b, and 2d, from Ferino et al. (2020); (c, d) Confocal microscopy images of Panc-1 cells treated with 5 μM 2d for 6 h. The nuclei of the cells have been stained with Hoechst. Images show that the alkyl porphyrins co-localize with the lysosomes, from Di Giorgio et al. (2022); (e, f) Typical fluorescence titrations of 1.0 μM 2b with utr-z RG4 in 50 mM Tris-HCl, pH 7.4, and 100 mM KCl. Right panel shows the fraction of bound porphyrin versus RG4 binding site concentration. The binding curve has been best fitted with a standard binding equation (Sigma Plot 11). The Job plot in 50 mM Tris-HCl, pH 7.4, 100 mM KCl, relative to the binding of 2b to utr-z RG4 is shown. The ordinate reports the absorbance difference at 420 nm. Plots gave stoichiometries of 6 2b per RG4, from Ferino et al. (2020)

cell membrane as determined by cell cytometry and confocal microscopy. Porphyrins 2a, 2b, 2c, and 2d showed 8-, 14-, 40-, and 60-fold higher uptake than TMPyP4,

66

Targeted Cancer Therapy: KRAS-Specific Treatments for Pancreatic Cancer

2157

respectively (Fig. 9a, b). The mechanism of uptake was investigated using dynasore, a noncompetitive inhibitor of dynamin GTPase activity that blocks dynamindependent endocytosis in the cell. Dynasore was found to decrease the uptake of 2b and 2d, but not of TMPyP4, suggesting that the transport of alkyl porphyrins 2b and 2d occurs also via endocytosis. The confocal micrographs shown in Fig. 9c demonstrate that the endocytic vesicles in living cells occur in the cytoplasm with a punctate distribution. The vesicles co-localize with the lysosomes as lysotracker fluorescence (green) is strongly quenched by the porphyrin. Moreover, the fluorescence of the cells stained with LAMP-1 (green), an antibody specific for lysosomalassociated membrane protein 1, co-localizes with the porphyrin fluorescence (red), forming yellow foci indicating co-localization between 2d and lysosomes. In addition to transport by endocytosis, cationic porphyrins fuse with the cell membrane, from which they are then released into the cytoplasm: a mechanism of internalization based on passive diffusion (see Fig. 9d). The interaction between the KRAS RG4s and the cationic alkyl porphyrins was analyzed by UV-vis and fluorescence titration (Ferino et al. 2020). A typical fluorescence titration obtained by adding increasing amounts of KRAS utr-z RG4 to a solution of porphyrin 2b is shown in Fig. 9e. After excitation at 420 nm (Soret band), the porphyrin emits between 660 and 730 nm. When the porphyrin binds to RG4, the fluorescence is quenched because the stacking interactions between the porphyrin and RG4 favor electron transfer from the excited singlet state of the porphyrin to guanines. Plotting fluorescence at 680 nm as a function of increasing amounts of RG4 yielded binding curves from which KD values 5.00 1.4 2.15 0.48

Oligonculeotides were transfected using Lipofectamine ® 2000

TSK1 oligonucleotide contained in the assay kit. Levels of telomerase activity were obtained from a minimum of three assays of at least two independent prepared extracts from each sample. The methodology utilized in the TRAPeze ® Gel-Based Telomerase Detection kit is based on an improved version of the original method described by Kim et al. (1994). This technique is a highly sensitive in vitro assay system utilizing the polymerase chain reaction. In the first step of the reaction, telomerase adds a number of telomeric repeats (TTAGGG) onto the 30 -end of a substrate oligonucleotide (TS). In the second step, the extended products are amplified by PCR using the TS and RP (reverse) primers, generating a ladder of products with 6 base increments starting at 50 nucleotides: 50, 56, 62, 68, etc. This kit reduces amplification artifacts and permits better estimation of telomerase processivity.

2224

M. Fujii et al.

Antisense inhibition effects of synthesized sDNA-peptide conjugates targeting hTERC of were evaluated in Jurkat cells by telomeric repeat amplification protocol (TRAP) assay as described above. The conjugates were transfected into Jurkat cells without the use of cationic lipid as transfecting agent at a concentration of 5 μM and 1 μM. These concentrations were chosen because based on the previous report (Chen et al. 2002), 5 μM was the concentration that produced maximal inhibition of telomerase by anti-hTERC oligonucleotide in their work, while several papers reported that the IC50 values for inhibition of telomerase in various cell lines were measured to be as low as 0.5 μM. Cells were harvested, one batch after 24 h while another batch after 48 h. The results clearly showed that antisense inhibition of telomere elongation was dramatically affected by the localization of oligonucleotides. Oligonucleotide-NLS conjugates C11, C12, and C13 showed higher inhibitory effect (43%, 38%, and 70% inhibition, respectively, after 48 h) than native antisense oligonucleotide N2 (0% inhibition) and oligonucleotide-NES conjugate C14 (24% inhibition). It should be noted that phosphorothioate oligonucleotide-NLS conjugate C18 completely suppressed telomerase activity (99.6% in 24 h and 95.3% in 48 h), while phosphorothioate oligonucleotide S1 inhibited telomerase in 87% in 24 h and 78% in 48 h. It is also to be pointed out that the inhibitory effects of N2, C11, C12, C13, and C14 in 48 h were higher than those in 24 h in the presence and in the absence of Lipofectamine ® 2000 while the inhibitory effects of S1 and C18 in 24 h were higher than those in 48 h in the absence of Lipofectamine ® 2000 and lower than those in 48 h in the presence of Lipofectamine ® 2000. It is plausible that the permeation through the cell membrane or the escape from endosome delayed the exhibition of inhibitory effect of N2, C11, C12, C13, and C14 and that S1 and C18 could permeate the cell membrane quickly or by a mechanism different from endocytosis as shown above. Since the results above indicated that the antisense inhibitions of normal oligonucleotide-NLS conjugates C11, C12, and C13 were not so high as phosphorothioate oligonucleotide-NLS conjugates C18, we evaluated the antisense inhibitions of phosphorothioate oligonucleotide-NLS conjugates against template RNA in human TERC. Four phosphorothioate oligonucleotide (sDNA)-peptide conjugates C18–C21were prepared by SPFC and evaluated their inhibitory effects targeting hTERCby TRAP assay in human T lymphocyte cell line Jurkat. S1; 50 -s(CAGTTAGGGTTAG)-30 R-50 -s(CAGTTAGGGTTAG)-30 C18: R ¼ Ac-GPKKKRKVK*G-OH (SV40LT-ant NLS) C19: R ¼ Ac-GRKKRRQRRRPPGK*G-OH (HIV-1 tat NLS) C20: R ¼ Ac-LPPLERLTLK*G-OH (HIV-1 rev NES) C21: R ¼ Ac-LRALLRALLRALLRALK*G-OH (designed) *ε-Amino group of K was deprotected after the treatment with TFA and linked to the oligonucleotide via the linker.

68

Controlled Intracellular Trafficking and Gene Silencing. . .

2225

A representative example of PAGE image of TRAP assay using 5 μM sDNApeptide conjugate in Jurkat cells is shown in Fig. 7a (24 h) and Fig. 7b (48 h). The relative telomerase activity for each sample is presented as percentage of the telomeric product generated (TPG) (average of 3 assays) in Fig. 8. Decreased telomerase activity was apparent in Jurkat cells treated with 5 μM of each of the conjugates as compared to the intensity of bands seen in the positive control and the TSR8 standard. However, the sDNA-NLS conjugates, C18 and C19 showed a much higher inhibitory effect (approximately 96% and 80%, respectively) than sDNANES conjugates C20 and C21 (approximately 20% and 50%, respectively) after 24 h

Fig. 7 Inhibition of telomerase activity by ASO in Jurkat cells. [ASO] ¼ 5 μM, 6A; 24 h, 6B; 48 h after transfection. Lane 1, M: 50 bp molecular weight marker; Lane 2, C+: positive control (untreated cells); Lane 3, TSR8 (quantitation standard); Lane 4, C21; Lane 5, C20; Lane 6, C19; Lane 7, C18; Lane 8, C-, negative control (internal control)

Fig. 8 Telomerase activities as telomerase product generated (TPG) in Jurkat cells. Control, the value for amplified products in untreated cells as a positive control. The results were expressed as mean  SD from three determinants

2226

M. Fujii et al.

of transfection, as evidenced by the reduced number of telomeric ladder pattern. After 48 h the inhibitory effect of the conjugates was slightly reduced. In Jurkat cells treated with 1 μM of each of the four conjugates, only a slight reduction of telomerase activity was observed. The results of TRAP assay as TPG units indicate that sDNA-NLS conjugates repressed telomerase activity very efficiently at 5 μM in Jurkat cells. These results clearly indicated the importance of intracellular trafficking of ASOs for antisense activities. As demonstrated in the previous studies (Kubo et al. 2005), DNA-NLS conjugates C18 and C19 were expected to be recognized by importin α and transported into the cellular nucleus, and DNA-NES conjugates C20 and C21 were expected to be recognized by exportin and transported out of the nucleus. The target hTERC is located in the cellular nucleus and DNA-NLS conjugates C18 and C19 could bind the target hTERC more effectively than DNA-NES conjugates C20 and C21. In summary, oligonucleotide-peptide conjugates were shown to be taken up effectively into cells without using any transfection reagents. Controlled nuclear localization was achieved by oligonucleotide-NLS conjugates, and cytoplasmic localization was achieved by oligonucleotide-NES conjugates, respectively. Antisense oligonucleotide-NLS conjugates suppressed human telomerase in leukemia cells highly effectively. These findings strongly suggested that oligonucleotidepeptide conjugates can be promising candidates for the ideal genetic medicines of the next generation and that intelligent oligonucleotides can be created by linking oligonucleotides to natural and unnatural molecules which will never meet together in nature. The results clearly showed that antisense inhibitions of telomere elongation were dramatically affected by the localization of oligonucleotides. sASO-NLS conjugates C18 and C19 showed high inhibitory effect while normal oligonucleotide and sASONES conjugate C20 showed entirely no inhibitory effect. In particular, sASO-NLS conjugate C18 completely suppressed telomerase activity (99.6% in 24 h and 95.3% in 48 h).

Silencing of BCR/ABL Chimeric Gene by siRNA-NES Conjugates Chronic myelogenous leukemia (CML) originates in a pluripotent hematopoietic stem cell of the bone marrow and is characterized by greatly increased numbers of granulocytes in the blood (Ross et al. 2014). On the cellular level, CML is associated with a specific chromosome abnormality, the t(9; 22) reciprocal translocation that forms the Philadelphia (Ph) chromosome. The Ph chromosome is the result of a molecular rearrangement between the c-ABL proto-oncogene on chromosome 9 and the BCR (breakpoint cluster region) gene on chromosome 22. Most of ABL is linked with a truncated BCR. The BCR/ABL fusion gene codes for an 8-kb mRNA and a novel 210-kDa protein which has higher and aberrant tyrosine kinase activity than the normal c-ABL-coded counterpart. Phosphorylation of a number of substrates such as GAP, GRB-2, SHC, FES, CRKL, and paxillin is considered a decisive step in

68

Controlled Intracellular Trafficking and Gene Silencing. . .

2227

transformation. An etiological connection between BCR/ABL and leukemia is indicated by the observation that transgenic mice bearing BCR/ABL DNA construct develop leukemia of B, T, and myeloid cell origin. CML cells proliferate and expand in an almost unlimited manner (Pasternak et al. 1998).

Silencing of BCR/ABL Chimeric Gene by siRNAs Bearing 50 -Amino Modifier 5 As a preliminary study in order to find out the desirable site for conjugation and to assess the effect of linker on the silencing activity of siRNA prior to the synthesis of siRNA-NES conjugates, silencing efficiencies ofsiRNA1, siRNA2, siRNA3, and siRNA4 bearing 50 -Amino Modifier 5 (Glen Research) at 50 -end of sense and/or antisense strand against BCR/ABL chimeric gene were evaluated (Shinkai et al. 2017). Target sequence of BCR/ABL mRNA(355–390) 50 -ggauuuaagcagaguucaa/aagcccuucagcggcca-30 siRNAs (anti BCR/ABL mRNA361–381) control siRNA (scramble): sense 50 -UCUCGCUUGGGCGAGAGUAATT-30 antisense 30 -TTAGAGCGAACCCGCUCUCAUU-50 siRNA1: sense 50 -GCAGAGUUCAAAAGCCCUUTT-30 antisense 30 -TTCGUCUCAAGUUUUCGGGAA-50 siRNA2: sense 50 -XGCAGAGUUCAAAAGCCCUUTT-30 antisense 30 -TTCGUCUCAAGUUUUCGGGAA-50 siRNA3: sense 50 -GCAGAGUUCAAAAGCCCUUTT-30 antisense 30 -TTCGUCUCAAGUUUUCGGGAAX-50 siRNA4: sense 50 -XGCAGAGUUCAAAAGCCCUUTT-30 . antisense 30 -TTCGUCUCAAGUUUUCGGGAAX-50 . X ¼ -PO4-CH2CH2OCH2CH2NH2 Silencing efficiencies of negative control (scramble) siRNA, siRNA1, siRNA2, siRNA3, and siRNA4 were evaluated against BCR/ABL chimeric gene in human chronic myelogenous leukemia cell line K562 using quantitative RT-PCR normalized by GAPDH expression intensity (Fig. 9). The negative control siRNA with scramble sequence showed no effect and siRNA1 as a positive control suppressed 59.8% of BCR/ABL gene expression at 200 nM and 53.7% at 50 nM. siRNA2 with X in the sense strand showed comparable silencing efficiency to unmodified siRNA1, while siRNA3 and siRNA4 with X in the antisense strand showed reduced silencing efficiencies. These results indicated that X in the sense strand did not influence a series of processes toward RISC formation, contacting to DICER,

2228

M. Fujii et al.

Fig. 9 Silencing of BCR-ABL mRNA in K562 by 50 -modified siRNA. Normalized BCR/ABL/ GAPDH mRNA expression in K562 was measured 24 h after transfection. siRNAs were transfected by Lipofectamine ® 2000

formation of RISC loading complex (RLC), loading to RISC, unwinding to matured RISC, and RISC activity. On the other hand, X in the antisense strand hindered RISC formation and/or RISC activity. It can be interpreted that X in the sense strand did not decrease nor increase the ratio of incorporation of the antisense strand as a guide strand into RISC and that X in the antisense strand as a guide strand destabilized RISC after unwinding process because X would be protonated in water and electrostatically repel the cationic residues in the MID domain of human Argonaute2 (Fig. 10) (Schirle et al. 2012; Shiohama et al. 2022). These results strongly suggested that siRNA could be conjugated to a peptide at 50 -end of the sense strand without any serious loss of silencing efficiency.

Silencing of BCR/ABL Chimeric Gene by siRNA-NES Conjugates According to the results mentioned above, it was indicated that NES peptides should be covalently attached to the 50 -end of the sense strand of siRNA by SPFC. Shinkai et al. prepared two types of siRNA-NES conjugates by their original SPFC and evaluated silencing efficiencies of siRNA-NES conjugates against BCR/ABL chimeric gene in human chronic myelogenous leukemia cell line K562 (Shinkai et al. 2017).

68

Controlled Intracellular Trafficking and Gene Silencing. . .

2229

Fig. 10 (a) 50 -Terminus modified with X and (b) phosphorylated 50 nucleotides of the guide strand recognized by the MID and PIWI domains of human Argonaute2 [44]. (B) From Schirle NT and MacRae IJ (2012). The crystal structure of human Argonaute2. Science, 336(6084), 1037–1040. Reprinted with permission from AAAS

The TFIIIA NES peptide fragment (NH2CH2CH2CH2CO-LPVLENLTL-OH) derived from Xenopus laevis (Fridell et al. 1996) used in the present synthesis has γ-aminobutyric acid (GABA) moiety with free reactive amino group at its N-terminus, and TFIIIA NES peptide fragment was linked to the 50 -end of RNA via GABA amino group to give RNA (sense strand)-NES conjugate C22. The HIV-1 REV NES peptide fragment (Ac-LPPLERLTL-KG-OH) (Malim et al. 1991) has an acetyl cap on N-terminal leucine and an extra unprotected lysine and glycine at its C-terminus. HIV-1 REV NES peptide fragment was linked to RNA via ε-amino group of lysine with adequate nucleophilic reactivity to give RNA (sense strand)-NES conjugate C23. C22: R-50 -GCAGAGUUCAAAAGCCCUUTT-30 R ¼ -HNCH2CH2CH2CO-LPVLENLTL-OH (TFIIIA NES) C23:R-5’-GCAGAGUUCAAAAGCCCUUTT-30 R ¼ Ac-LPPLERLTLK*G-OH (HIV-1REV NES) *ε-Amino group of K was linked to the oligonucleotide via the linker. Silencing efficiencies of siRNA5 and siRNA6 having C22 and C23 as a sense strand, respectively, were evaluated targeting 21 nt (361–381) in the junction region of b3a2 transcript from BCR/ABL chimeric gene in human chronic myelogenous leukemia cell line K562. siRNA5: sense:R-50 -GCAGAGUUCAAAAGCCCUUTT-30 antisense: 30 -TTCGUCUCAAGUUUUCGGGAA-50 R ¼ -HNCH2CH2CH2CO-LPVLENLTL-OH (TFIIIA NES)

2230

M. Fujii et al.

siRNA6: sense:R-50 -GCAGAGUUCAAAAGCCCUUTT-30 antisense: 30 -TTCGUCUCAAGUUUUCGGGAA-50 R ¼ Ac-LPPLERLTLK*G-OH (HIV-1REV NES) *ε-Amino group of K was linked to the oligonucleotide via the linker. As shown in Fig. 11, siRNA5 having C22 (TFIIIA NES) suppressed 91.7% of BCR/ABL gene expression at 200 nM and 88.4% at 50 nM, and siRNA6 bearing C23 (HIV-1 REV NES) suppressed 96.0% at 200 nM and 93.9% at 50 nM. Silencing efficiencies at lower concentrations of siRNA6 were clearly dependent on its concentrations while those of siRNA1 reached around 60% at 10 nM and did not increase at higher concentrations as shown in Fig. 12. It is obvious that the distinguished increase of silencing efficiencies could be attributed to the presence of NES peptides. Previously we verified that oligonucleotide-HIV-1 REV NES conjugate was taken up into human T lymphocyte cell line Jurkat and localized in the cytoplasm not in the nucleus (Kubo et al. 2005). It can be reasonably presumed that the remarkable enhancement of silencing efficiency by siRNA-NES conjugates was due to the cytoplasmic localization of the conjugates and efficient degradation of target mRNA. It should be pointed out that both NES peptides, one linked to the siRNA sense strand through amino group of N-terminal GABA and the other through theε-amino group of lysine close to C-terminus, gave similar effects on RNA interference. And it is also speculated that modification of 50 -end of the sense strand with an NES peptide increased the chance of selection of the unmodified antisense strand as a guide strand in RNA induced silencing complex (RISC).

Fig. 11 Silencing of BCR-ABL mRNA by siRNA5 in K562. Normalized BCR/ABL/GAPDH mRNA expression in K562 was measured 24 h after transfection. siRNAs were transfected by Lipofectamine ® 2000

68

Controlled Intracellular Trafficking and Gene Silencing. . .

2231

Fig. 12 Silencing of BCR-ABL mRNA by siRNA6 in K562. Normalized BCR/ABL/GAPDH mRNA expression in K562 was measured 24 h after transfection. siRNAs were transfected by Lipofectamine ® 2000

The siRNA-NES conjugates were not taken up into cells alone while oligonucleotide-NES conjugates could penetrate into cells without any assistance of transfection reagents (Kubo et al. 2005). The siRNA-NES conjugates required the aid of commercial transfection reagent to penetrate into cells, and Lipofectamine® 2000 was used for the transfection. The reason why siRNA-NES conjugates could not penetrate into cells, while single-stranded oligonucleotide-NES conjugates could penetrate into cells without any transfection reagents, might be due to the doublestranded structure of siRNA which has a rigid rodlike shape with more condensed negative charge than a single-stranded oligonucleotide.

Nontoxic Transfection of siRNA-NES Conjugates by Designed Peptides It is clear that nontoxic transfection of siRNAs into cells is one of the most crucial technologies to be established for in vivo use and medical applications. Transfection of siRNA-NES conjugates by designed peptides which could bind to and internalize RNA into cells was explored. In the previous studies, it was found that amphiphilic designed peptides could bind to double-stranded DNA and RNA (Yokoyama et al. 2001). Various types of designed peptides for the transfection of siRNA were investigated and two designed peptides were found which efficiently internalize the siRNA-NES conjugates into cells. One was named peptideα6 with a sequence of LRALLRALLRALLRALLRALLRAL which was known to form an α-helical structure in the presence of double-stranded DNA or RNA (Fujii et al. 1999), and the other was named peptideβ7 with a sequence of RLRLRLRLRLRLRL which

2232

M. Fujii et al.

was known to form an antiparallel β-sheet structure in the presence of doublestranded DNA or RNA (Kubo et al. 2001). Transfection of siRNA-NES conjugates with 10–20 equivalent amounts of peptideα6 and peptideβ7 was carried out to be found successful. Silencing of BCR/ABL in K562 by the non-covalent complex of peptideα6 or peptideβ7 with siRNA6 was evaluated, and the results were

Fig. 13 Silencing of BCR-ABL mRNA by siRNA6 transfected by peptideα6 in K562. Normalized BCR/ABL/GAPDH mRNA expression in K562 was measured 24 h after transfection. siRNAs were transfected by peptideα6

Fig. 14 Silencing of BCR-ABL mRNA by siRNA6 transfected by peptideβ7 in K562. Normalized BCR-ABL/GAPDH mRNA expression in K562 were measured 24 h after transfection. siRNAs were transfected by peptideβ7

68

Controlled Intracellular Trafficking and Gene Silencing. . .

Table 5 Resistance of siRNAs against nucleases in 10% FBS

siRNA siRNA1 siRNA1 þ Lipofectamine ® 2000 siRNA1 þ peptideβ7 siRNA6 siRNA6 þ Lipofectamine ® 2000 siRNA6 þ peptideβ7

2233

Half-life time (t1/2, h) 1 2.5 7.0 9.5 32.0 >>48

summarized in Figs. 13 and 14, respectively. The complex peptideα6-siRNA6 suppressed BCR/ABL expression up to 74.0% at 100 nM while the complex peptideα6-siRNA1 suppressed BCR/ABL expression at only 19.8% at the same concentration. The complex peptideβ7-siRNA6 suppressed BCR/ABL expression up to 95.2% at 100 nM as high as siRNA6-Lipofectamine ® 2000 complex while the complex peptideβ7-siRNA1 suppressed BCR/ABL expression at only 30.2% at the same concentration. We observed peptideα6- and peptideβ7-siRNA complexes were almost nontoxic to Jurkat and K562 cell lines and that both complexes were so resistant against nuclease digestions in 10% FBS that the half-lives of siRNA in the complex were extended over 48 h, while that of naked siRNA was shorter than 1 h (Table 5, unpublished results).

Conclusions These studies have proved that oligonucleotide-peptide conjugates exhibited some improved properties for therapeutic applications. It was shown that oligonucleotideNLS conjugates were taken up into cells without any transfection reagents and localized in the cellular nucleus and that phosphorothioate oligonucleotide-NLS conjugates against hTERC inhibited 99% of telomerase activity in human T lymphocyte cell line Jurkat. Moreover, it was also found that oligonucleotide-NES conjugates were taken up into cells without any transfection reagents and localized in the cytoplasm and that siRNA-NES conjugates suppressed BCR/ABL chimeric gene in human chronic myelogenous leukemia cell line K562 very intensively and that a combination of siRNA-NES conjugate and peptideβ7 enabled nontoxic cellular uptake and excellent silencing efficiency. These results strongly suggested that controlled intracellular localization of therapeutic oligonucleotides dramatically increase their silencing efficiencies and that oligonucleotide-signal peptide conjugates and the combination of them with DDS nanotechnologies are a very promising candidate for the next generation of nucleic acid therapeutics.

References Antopolsky M et al (1999) Peptide-oligonucleotide phosphorothioate conjugates with membrane translocation and nuclear localization properties. Bioconjug Chem 10:598–606 Antopolsky et al (2002) Towards a general method for the stepwise solid-phase synthesis of peptide–oligonucleotide conjugates. Tetrahedron Lett 43:527–530

2234

M. Fujii et al.

Aubert G et al (2008) Telomeres and aging. Physiol Rev 88(2):557–579 Chen Z et al (2002) Telomerase inhibition, telomere shortening and decreased cell proliferation by cell permeable 20 -O-methoxyethyl oligonucleotides. J Med Chem 45:5423–5425 Chernikov IV et al (2019) Current development of siRNA bioconjugates: from research to the clinic. Front Pharmacol 10:444 Church GM et al (1977) Secondary structural complementarity between DNA and proteins. Proc Natl Acad Sci U S A 74(4):1458–1462 Crooke ST et al (2018) RNA-targeted therapeutics. Cell Metab 27(4):714–739 D’Souza Y et al (2013) Regulation of telomere length and homeostasis by telomerase enzyme processivity. J Cell Sci 126:676–687 Dar SA et al (2016) siRNAmod: A database of experimentally validated chemically modified siRNAs. Sci Rep 6:20031 De Lange T et al (2004) T-loops and the origin of telomeres. Nat Rev Mol Cell Biol 5:323–329 De Lange T et al (2005) Shelterin: the protein complex that shapes and safeguards human telomeres. Genes Dev 19:2100–2110 Diala I et al (2020) Telomerase inhibition, telomere attrition and proliferation arrest of cancer cells induced by phosphorothioate ASO-NLS conjugates targeting hTERC and siRNAs targeting hTERT. Nucleosides Nucleotides Nucleic Acids 39(1–3):407–425 Dingwell C et al (1986) Protein import into the cell nucleus. Annu Rev Cell Biol 2:367–390 Fridell RA et al (1996) Amphibian transcription factor IIIA proteins contain a sequence element functionally equivalent to the nuclear export signal of human immunodeficiency virus type 1 Rev. Proc Natl Acad Sci U S A 93:2936–2940 Fujii M et al (1999) Enhancement of stability of double and triple stranded DNA by cationic amphiphilic α-helix peptide. Nucleosides Nucleotides Nucleic Acids 6:1623–1624 GoldFarb DS et al (1986) Synthetic peptides as nuclear localization signals. Nature 322:641–642 Jafri MA et al (2016) Roles of telomeres and telomerase in cancer, and advances in telomerasetargeted therapies. Genome Med 8(1):69 Jaskelioff M et al (2011) Telomerase reactivation reverses tissue degeneration in aged telomerasedeficient mice. Nature 469(7328):102–106 Kaszubowska L et al (2008) Telomere shortening and ageing of the immune system. J Physiol Pharmacol 59(Suppl 9):169–186 Kim NW et al (1994) Specific association of human telomerase activity with immortal cells and cancer. Science 266:2011–2015 Kubo T et al (2001) Design, synthesis and characterization of DNA binding peptides. Pept Sci 2000: 109–112 Kubo T et al (2003) Synthesis of DNA-peptide conjugates by solid-phase fragment condensation. Org Lett 5:2623–2626 Kubo T et al (2005) Controlled intracellular localization and enhanced antisense effect of oligonucleotides by chemical conjugation. Organ Biomol Chem 3:3257–3259 Kulkarni JA et al (2019) Lipid nanoparticle technology for clinical translation of siRNA therapeutics. Acc Chem Res 52(9):2435–2444 Kulkarni JA et al (2021) The current landscape of nucleic acid therapeutics. Nat Nanotechnol 16(6): 630–643 Lönnberg H (2009) Solid-phase synthesis of oligonucleotide conjugates useful for delivery and targeting of potential nucleic acid therapeutics. Bioconjug Chem 20:1065–1094 Malim MH et al (1991) Mutational definition of the human immunodeficiency virusType 1 Rev activation domain. J Virol 65:4248–4254 Murao S et al (2009) Organic synthesis and antisense effects of oligonucleotide-peptide conjugates. Curr Org Chem 13(14):1366–1377 Naito Y et al (2009) siDirect 2.0: updated software for designing functional siRNA with reduced seed-dependent off-target effect. BMC Bioinformatics 10(392) Nakamura M et al (2014) Mitochondrial defects trigger proliferation of neighboring cells via a senescence-associated secretory phenotype in Drosophila. Nat Commun 5(5264):1–11

68

Controlled Intracellular Trafficking and Gene Silencing. . .

2235

Newmeyer DD et al (1986) Assembly in vitro of nuclei active in nuclear protein transport: ATP is required for nucleoplasmin accumulation. EMBO J 5:5001–5010 Nikam RR et al (2018) Journey of siRNA: clinical developments and targeted delivery. Nucleic Acid Ther 28(4):209–224 Pasternak G et al (1998) Chronic myelogenous leukemia: molecular and cellular aspects. J Cancer Res Clin Oncol 124(12):643–660 Patil NA et al (2021) Conjugation approaches for peptide-mediated delivery of oligonucleotides therapeutics. Aust J Chem. https://doi.org/10.1071/CH21131 Ross TS et al (2014) Re-evaluating the role of BCR/ABL in chronic myelogenous leukemia. Mol Cell Oncol 1(3):e963450 Schirle NT et al (2012) The crystal structure of human argonaute2. Science 336:1037–1040 Schrank Z et al (2018) Oligonucleotides targeting telomeres and telomerase in cancer. Molecules 23(9):pii: E2267 Selvam C et al (2017) Therapeutic potential of chemically modified siRNA. Recent trends. Chem Biol Drug Des 90(5):665–678 Shawi M et al (2008) Telomerase, senescence and ageing. Mech Ageing Dev 129(1–2):3–10 Shinkai Y et al (2017) Silencing of BCR/ABL chimeric gene in human chronic myelogenous leukemia cell line K562 by siRNA-nuclear export signal peptide conjugates. Nucl Acid Ther 27(3):168–175 Shiohama Y et al (2022) Elimination of off-target effect by chemical modification of 50 -end of siRNA. Nucl Acid Ther 32(5):438–447 Soukchareun et al (1995) Preparation and characterization of antisense oligonucleotide-peptide hybrids containing viral fusion peptides. Bioconjug Chem 6:43–53 Stewart JA et al (2012) Maintaining the end: roles of telomere proteins in end-protection, telomere replication and length regulation. Mutat Res 730:12–19 Tai W et al (2019) Current aspects of siRNA bioconjugate for in vitro and in vivo delivery. Molecules 24(12):2211 Taskova M et al (2017) Synthetic nucleic acid analogues in gene therapy: an update for peptideoligonucleotide conjugates. Chembiochem 18(17):1671–1682 Turner KJ et al (2019) Telomere biology and human phenotype. Cells 8:73 Wanyi T et al (2017) Functional peptides for siRNA delivery. Adv Drug Deliv Rev 110-111: 157–168 Yokoyama K et al (2001) Amphiphilic β-sheet peptides can bind to double and triple stranded DNA. Nucleosides Nucleotides Nucleic Acids 20:1317–1320 Yonezawa S et al (2020) Recent advances in siRNA delivery mediated by lipid-based nanoparticles. Adv Drug Deliv Rev 154:64–78

First- and Second-Generation Nucleoside Triphosphate Prodrugs: TriPPProCompounds for Antiviral Chemotherapy

69

Xiao Jia, Chenglong Zhao, and Chris Meier

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Earlier Nucleoside Triphosphate Prodrugs Bearing One Masking Group . . . . . . . . . . . . . . . . . . . . Nucleoside Triphosphate Prodrugs Bearing Two Biodegradable Masking Groups First Generation Compounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application of the TriPPPro-Concept to Various Nucleoside Analogues . . . . . . . . . . . . . . . . . . . . γ-Nonsymmetrically Modified TriPPPro-Prodrugs Bearing One Biodegradable Group (Second Generation Triphosphate Delivery Systems) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Primer Extension Assays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2238 2241 2243 2252 2254 2260 2261 2262

Abstract

Currently, a number of biologically active nucleoside analogues are extensively used as antiviral, anticancer, antiparasitic, and antibacterial therapeutic agents. However, when considering viruses, their antiviral efficacy is strongly dependent on the intracellular conversion by virus-encoded or, in most cases, host cellular kinases to give the corresponding bioactive nucleoside analogue triphosphates. In this minireview, the recent work on the development of nucleoside triphosphate prodrugs, the so-called TriPPPro-approach is described. First generation TriPPPro-compounds bearing two biodegradable masking units attached to the γ-phosphate group were prepared using the phosphoramidite and H-phosphonate X. Jia · C. Meier (*) Organic Chemistry, Department of Chemistry, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany e-mail: [email protected]; [email protected] C. Zhao Organic Chemistry, Department of Chemistry, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, Hamburg, Germany R&D Center for Nucleic Acid Drug, CSPC Pharmaceutical Group Limited, Shanghai, China © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_72

2237

2238

X. Jia et al.

routes, respectively. These TriPPPro-compounds enter cells and deliver the nucleoside triphosphate analogues, and therefore they bypass all steps of the intracellular phosphorylation in contrast to their parent nucleoside analogues. Second generation TriPPPro-compounds comprising a non-cleavable γ-alkyl moiety in addition to a biodegradable prodrug moiety at the γ-phosphate or γ-phosphonate units, respectively, and d4T as a nucleoside analogue will be summarized as well. Such compounds formed γ-alkylated nucleoside triphosphate analogues by chemical hydrolysis or in cell extracts with high selectivity. These γ-alkylated nucleoside triphosphate derivatives proved to be highly resistant toward dephosphorylation and showed a superior selectivity to act as substrates for the viral HIV-RT as compared to three cellular DNA polymerases. The synthesis, the chemical and biological hydrolysis and the antiviral activity of these compounds will be discussed. Keywords

Nucleoside analogue · Nucleoside triphosphates · TriPPPro concept · Pronucleotides · Antivirals · Biodegradable · Triphosphate prodrugs

Introduction In recent decades, the emergence of serious diseases such as infections caused by HIV, hepatitis B, and C viruses, herpes virus (HSV, VZV, and CMV), Ebola virus, and the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has resulted in significant morbidity and mortality worldwide (Juliano et al. 2018; Kaner and Schaack 2016; Xia et al. 2018). Among them, the current coronavirus disease 2019 (COVID-19) (Wu et al. 2020; Zhu et al. 2020) is a global pandemic, causing more than 5.4 million deaths from early December 2019 until today, despite the availability of vaccines on the market (WHO 2021). In contrast to ordinary coronaviruses (CoVs) (e.g., HCoV-229E, HCoV-OC43, HCoV-NL63, and HCoV-HKU1) leading to the mild seasonal symptoms of the common cold (Gaunt et al. 2010; Woldemeskel et al. 2020), COVID-19 (SARS-CoV-2) and other two human coronavirus (HCoV) species (SARS-CoV-1 and MERS-CoV) are responsible for the onset of these lifethreatening respiratory syndromes (Cevik et al. 2020; Andersen et al. 2020; Rabaan et al. 2020). However, the World Health Organization has not approved any effective antiviral treatment for COVID-19. Nevertheless, the use of antiviral drugs, including the family of nucleoside analogues (Fig. 1), has attracted significant attention in the development of promising therapeutic measures against this life-threatening disease (Jordan et al. 2018; Yates and Seley-Radtke 2018; Seley-Radtke and Yates 2018). For more than 50 years, nucleoside analogues have been considered as the cornerstones for treatment of patients with different virus infections (El Safadi et al. 2007; Jordheim et al. 2013; Cihlar and Ray 2010; Burton and Everson 2009; Deval 2009; Pastuch-Gawolek et al. 2019). Nucleoside analogues that are active against viruses need to be phosphorylated to form sequentially the 50 -monophosphate (NMP), the 50 -diphosphate (NDP), and ultimately the 50 -triphosphate (NTP) after they have entered cells

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2239

Fig. 1 A compilation of antiviral nucleoside drugs and further antiviral nucleoside analogues

(El Safadi et al. 2007; Balzarini et al. 1989, 1988; Bazzoli et al. 2010; Deville-Bonne et al. 2010). Phosphorylation of nucleoside analogues is generally catalyzed by cellular kinases (e.g., for thymine comprising nucleoside analogues thymidine kinase (TK), thymidylate kinase (TMPK), and nucleoside diphosphate kinase (NDPK)) as illustrated in Fig. 2. Sometimes also viral kinases are involved, e.g., in the monophosphorylation of acyclovir (ACV). However, the stepwise transformation often occurs ineffective because of the strong specificity of kinases for their substrates. For example, the first phosphorylation step, catalyzed by salvage pathway enzyme TK, is rate-limiting for the anti-HIV drug 30 -deoxy-20 ,30 -didehydrothymidine (d4T) (Balzarini et al. 1988; Ho and Hitchcock

2240

X. Jia et al.

Fig. 2 Host-cell-mediated sequential enzymatic phosphorylation steps and several mechanisms of nucleoside analogues to form the corresponding 50 -triphosphate. AZT, zidovudine; d4T, stavudine; BVdU, brivudine; FTC, emtricitabine; 3TC, lamivudine; ddI, didanosine; PMPA, tenofovir (PMPA diphosphate is a triphosphate analogue); ABC, abacavir

1989; Zhu et al. 1990), whereas for 30 -deoxy-30 -azidothymidine(AZT) (Balzarini et al. 1989; Furman et al. 1986), the metabolism-limiting step is the second phosphorylation from AZT-50 -monophosphate (AZTMP) to the corresponding AZT-50 -diphosphate metabolite (AZTDP) which is mediated by the host cell enzyme TMPK (Fig. 2). Unlike for d4T and AZT, however, the formation of FTC-triphosphate (FTCTP) by NDPK is the bottleneck (Paff et al. 1994; Zhu et al. 1998). Thus, the metabolism is clearly nucleoside-dependent. Moreover, poor oral bioavailability, low biological half-lives, mutations of nucleoside transporters, and the development of drug resistance are further limitations that may also hamper the clinical efficacy of nucleoside analogues (Van Rompay et al. 2000; Asahchop et al. 2012; Boswell-Casteel and Hays 2017). Additionally, nucleotides exhibit poor cellular permeability because of the presence of negative charges at physiological pH. Therefore, these phosphorylated metabolites cannot be used as drug candidates (Roll et al. 1956). To overcome these limitations and to provide the active nucleoside 50 -triphosphates after in vivo administration, a number of successful prodrug strategies have been explored over the past decades (Fig. 2) (Pradere et al. 2014). Various masking groups have been introduced to shield/neutralize the negative charges and increase the lipophilicity of these nucleotides to enable the entry of these prodrugs into the cells, releasing there the charged nucleotides (NMP, NDP, or NTP) efficiently. This is known as the principle of pro-nucleotides. Among these approaches, the cycloSal-, (Meier et al. 1998; Vukadinovic et al. 2005; Meier and Balzarini 2006) DTE, (Puech et al. 1993) SATE- (Peyrottes et al. 2004), HepDirect (Erion et al. 2004), bis(AB)-nucleotides

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2241

Fig. 3 Various reported nucleoside monophosphate prodrug technologies

(Thomson et al. 1993), or ProTides (Mehellou et al. 2018) have been introduced to bypass the first phosphorylation step in nucleoside analogue activation (Fig. 3). The phosphoramidate pro-nucleotides (ProTides) have revolutionized the field of antiviral nucleoside therapy, overcoming the major hurdles and improving clinical success, for example, in the case of the anti-hepatitis C drug sofosbuvir (Fig. 1) (Sulkowski et al. 2014). Moreover, two approaches developed by Hostetler (Hostetler et al. 1990, 1992, 1993; van Wijk et al. 1991) and Huynh-Dinh (Bonnaffé et al. 1995a, b, 1996) have been reported with the intention to act as nucleoside diphosphate prodrugs. Recently, Meier et al. developed a nucleoside monophosphate approach (cycloSaltechnology) (Meier et al. 1998; Vukadinovic et al. 2005; Meier and Balzarini 2006) and a nucleoside diphosphate prodrug system (bis(4-acyloxybenzyl)-nucleoside diphosphates, DiPPro-NDPs) (Pertenbreiter et al. 2015; Schulz et al. 2014; Weinschenk et al. 2015; Meier et al. 2015) (Fig. 4). The cycloSal- and DiPPro-prodrugs efficiently penetrate the cell membrane and delivered intracellularly NMPs and NDPs, respectively. However, as with other pronucleotide approaches, the cycloSal-technology was less efficient in the case of the first approved anti-HIV nucleoside drug AZT (cycloSalAZTDPs; Fig. 4) (Jessen et al. 2008). Nevertheless, in both approaches the released nucleotides still need further phosphorylation to their ultimately active triphosphate forms by cellular kinases in order to interact with or block viral polymerases. In this minireview, the development of the TriPPPro-prodrug approach for intracellular delivery of nucleoside triphosphates will be described.

Earlier Nucleoside Triphosphate Prodrugs Bearing One Masking Group Until recently, very few approaches have been published for the development of nucleoside triphosphate prodrugs (Bonnaffé et al. 1995a, 1996; Van Wijk et al. 1994; Kreimeyer et al. 1996, 1998) most probably because of their low chemical stability, poor

2242

X. Jia et al.

Fig. 4 The development of our nucleotide prodrug approaches

Fig. 5 DSG-AZTTP 1 and acyl-NTP prodrugs as examples

deliverability, and high enzymatic sensitivity for dephosphorylation (Tan et al. 1999). In 1994, vanden Bosch et al. reported on the synthesis, characterization, and biological activity of AZTTP distearoylglycerol (DSG-AZTTP 1) (Fig. 5) (Van Wijk et al. 1994). Compound 1 was prepared from distearoylphosphatidic acid morpholidate and AZTDP. The incubation of these compounds in a rat liver mitochondrial enzyme preparation showed that unfortunately AZT and AZTMP only were released from DSG-AZTTP 1. Later, Bonnaffé et al. developed γ-acyl nucleoside triphosphate prodrugs (13-acyld4TTPs 2 and 13-acyl-AZTPs 3) (Bonnaffé et al. 1995a, 1996; Kreimeyer et al. 1998). Initial studies on the metabolism of 13-acyl-NTPs 2 and 3 revealed that the corresponding NTPs were formed from these prodrugs, and thus it seemed that these compounds were good candidates to be further developed as lipophilic NTPs prodrugs. However, the lipophilic acyl moiety of prodrugs 2 and 3 showed similar antiviral activity as compared to the included parent nucleotides, although the prodrugs were more lipophilic (Bonnaffé et al. 1996). It was speculated that a rapid aminolysis of acyl nucleotide 2 (t1/2 ¼ 1.4 h) in the RPMI culture media limits its transmembrane diffusion, which can explain the poor antiretroviral activity differences

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2243

between the nucleotide and the acylated prodrugs 2 and 3. Later these results guided the authors to develop cholesteryloxycarbonyl-ATP (Chol-ATP, 5), and they investigated the transmembrane transport of such liponucleotide conjugates via (Asahchop et al. 2012) P-NMR spectroscopy (Kreimeyer et al. 1998). It was demonstrated that ATP bearing a γ-cholesteryl moiety can indeed cross biological barriers.

Nucleoside Triphosphate Prodrugs Bearing Two Biodegradable Masking Groups First Generation Compounds Taking previous reports into account, nucleoside triphosphate prodrugs can, in principle, bypass all steps of the intracellularly needed phosphorylation by delivering the active nucleoside triphosphates inside cells (Kreimeyer et al. 1998). The development of nucleoside triphosphate prodrugs has received tremendous attention because in principle they would enable to maximize the intracellular concentration of the ultimately bioactive NTPs. However, one reason for the significant difficulties in the development of nucleoside triphosphate prodrugs is related to the chemical instability of the energy-rich bond between the phosphate moieties: a triphosphate is composed of two reactive phosphate anhydride bonds and at physiological conditions contains up to four negative charges. Additionally, the triphosphate moiety can also be cleaved quickly by enzymes, leading to a rapid dephosphorylation in cell media. Therefore, until recently, the common opinion in the nucleoside community was that the development of lipophilic nucleoside triphosphate prodrugs would be practically impossible: “Direct delivery of triphosphate or diphosphate forms of nucleoside analogues would be desirable but is impractical because of their instability during synthesis.” (Tan et al. 1999) In 2015, Meier et al. published the first publication on a successful delivery system of nucleoside triphosphates by a prodrug approach, which was named the TriPPPro-concept (Gollnest et al. 2015). The first generation TriPPPro-prodrugs 9 and 10 bearing two identical lipophilic masking groups (alkyl and alkoxy chains) at the γ-phosphate to obtain membrane permeability and d4T as a nucleoside analogue were prepared using the phosphoramidite chemistry route (Scheme 1) (Gollnest et al. 2015). This method was based on a coupling reaction of bis (4-acyloxylbenzyl)phosphoramidites 8 and d4TDP. In the first step, d4TDP was prepared by using our previously reported cycloSal-approach in 55% yields (Warnecke and Meier 2009). D4T was converted into the corresponding 5-chlorosubstituted cycloSal-phosphate triester as a mixture of two diastereomers in high yields using 5-chlorosaligenylchlorophosphite and the oxidation reagent tertbutylhydroperoxide. Subsequent phosphorylation with tetra-n-butylammonium phosphate yielded d4TDP. For the synthesis, the preparation of bis(4-acyloxylbenzyl)phosphoramidites 8 was carried out by reacting benzyl alcohols 7 and (N,N-diisopropyl) dichlorophosphine 6(PIII reagent) in the presence of triethylamine at low temperatures. Finally, bis(4-acyloxylbenzyl)phosphoramidites 8 were coupled with d4TDP (nBu4N+ form) to form a P(III)–P(V) intermediate in a very fast dicyanoimidazole (DCI)-mediated coupling reaction in CH3CN, followed by oxidation with tert-

2244

X. Jia et al.

Scheme 1 Synthesis of TriPPPro-prodrugs 9, 10, 14, 15, 18, and 19 using the phosphoramidite and H-phosphonate routes. Reagents and conditions: (i) triethylamine, THF, 0  C-rt, 20 h; (ii) (1) DCI, CH3CN, rt., 1 min and (2) t-BuOOH in n-decane, 0  C-rt, 15 min; (iii) pyridine, DPP, 38  C, 2–12 h; (iv) (1) NCS, CH3CN, rt., 2 h, (2) (H2PO4) Bu4N, CH3CN, rt., 1 h; (v) (1) TFAA, Et3N, CH3CN, 0  C, 10 min, (2) 1-methylimidazole, Et3N, CH3CN, 0  C–rt, 10 min, 3.d4TMP, rt., 1–3 h

butylhydroperoxide to give the crude products 9 (nBu4N+ form). TriPPPro-prodrugs 9 (NH4+ form) were obtained as white solids after reverse-phase (rp) column chromatography and a Dowex 50WX8 (NH4+) ion exchange column, followed by a second rp18-column chromatography (H2O/CH3CN) and subsequent freezedrying. It should be noted that the solubility of TriPPPro-prodrugs 9 decreased with long acyl residues (R > C10H21) in CH3CN. As a consequence, THF was added to accomplish complete solubility of the reagents, and TriPPPro-prodrugs 9 were then quickly purified by rp-column chromatography with H2O/THF as the eluent. The consumption of d4TDP could be detected by rp-HPLC (Gollnest et al. 2015). Thus, TriPPPro-prodrugs 9 and 10 were prepared in modest to good chemical yields (29–70%). In 2020, Meier et al. reported the first synthesis of nonsymmetric TriPPProprodrugs 14 and 15 bearing two different biodegradable masking groups attached to

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2245

the γ-phosphate unit (Jia et al. 2020a). Subsequently, a series of TriPPPro-prodrugs 18 and 19 bearing two different AB moieties was also reported by our group (Zhao et al. 2020a). Both TriPPPro-compounds 14, 15, 18, and 19 were prepared using the Hphosphonate route described previously with modest to high yields (23–78%) (Jia et al. 2020a; Zhao et al. 2020a). Nonsymmetric H-phosphonates 12 (AB:ACB) were easily prepared from4-acyloxybenzyl alcohols 7, 4-alkoxycarbonyloxybenzyl alcohols 11, and diphenyl hydrogen phosphonate (DPP). Similarly, nonsymmetric H-phosphonates 12 (ACB:ACB) (Jia et al. 2020a) and 16 (AB:AB or AB;ab-PEG) (Zhao et al. 2020a) were also synthesized using the same method. Among them, PEG-bearing benzyl alcohols were synthesized starting from 2-(2-(2-methoxyethoxy) ethoxy)ethan-1-ol (MEEE) (Zhao et al. 2020a). Next, H-phosphonates 12 and 16 were reacted with N-chlorosuccinimide (NCS) to give the phosphorochloridates, which were then treated with tetra-nbutylammonium phosphate (3.0 eq.) to generate the corresponding pyrophosphates such as 13. Pyrophosphates 13 were purified by extraction (CH2Cl2/H2O), activated with trifluoroacetic acid anhydride (TFAA) and N-methylimidazole (Mohamady and Jakeman 2005; Mohamady and Taylor 2011), and then coupled with d4TMP to form their corresponding non-symmetric TriPPPro-d4TTPs 14 (n-Bu4N+ form). Subsequently, TriPPPro-d4TTPs 14 (NH4+ form) were obtained using the abovementioned purification method in low to high yields (between 7% and 71%). As compared to the phosphoramidite route, the advantage of using the H-phosphonate procedure is that d4TMP (Sowa and Ouchi 1975) was easier to prepare than d4TDP (Warnecke and Meier 2009) and no oxidation step was needed as the final reaction step. Thus, a more efficient conversion of the parent nucleoside analogue to the TriPPPro-compounds was obtained, and, more importantly, nucleoside analogues and/or masking groups which are sensitive to oxidation could also be used. Moreover, H-phosphonates 12 and 16 were found to be more stable than phosphoramidites 8. The mono-masked nucleoside triphosphate derivatives were synthesized as well and were used as reference compounds to study the hydrolysis properties of TriPPPro-prodrugs 9 and their delivery mechanism. Several synthesis methods based on DCC-activated coupling have been reported by Bonnaffé and Bosch for potential triphosphate prodrugs (Van Wijk et al. 1994; Kreimeyer et al. 1996, 1998). In a first attempt to prepare such mono-masked acyloxybenzyl-nucleoside triphosphate derivatives 22, a TriPPPro-compound was hydrolyzed and the product of the first demasking reaction was isolated, intermediate 22. However, the yields were very poor (Schulz et al. 2014). Next, Meier and co-workers disclosed a new method using cycloSal chemistry (Gollnest et al. 2015). Benzyl-(5-nitro-cycloSal)-phosphate triesters 21 were prepared from 5-nitrosaligenylchlorophosphite 20 and 4-acyloxybenzyl alcohols 7, followed by addition of d4TDP to yield the monomasked triphosphates 22 in yields of 26–30%. To synthesize γ-alkoxycarbonyloxybenzyl-d4TTPs 26 (γ-ACB-d4TTPs), the β-cyanoethyl group was used for protection of the γ-phosphate group (Scheme 2). It was expected that the β-cyanoethyl moiety would be cleaved by β-elimination, thus leading to γ-(ACB)-d4TTPs26. First, nonsymmetric H-phosphonates 23

2246

X. Jia et al.

Scheme 2 Synthesis of mono-alkylated triphosphates

(ACB;β-cyanoethyl) were prepared by the same chemistry as used for the preparation of H-phosphonates 12 and 16. Then, γ-(ACB;β-cyanoethyl)-d4TTPs 25 (n-Bu4N+ form) were synthesized using the H-phosphonate procedure. Surprisingly it was observed that γ-(ACB;β-cyanoethyl)-d4TTPs 25 (n-Bu4N+ form) were already deprotected during the ion-exchange step to give the mixture of γ-(ACB)-d4TTPs 26 (NH4+ form) (10–23% yield) and γ-(ACB;β-cyanoethyl)-d4TTPs 25 (NH4+ form) (52–63% yield). Chemical hydrolysis studies. To investigate the chemical or biological stability and the product distribution of first generation TriPPPro-prodrugs 9, 10, 14, 15, 18, and 19, the compounds were incubated in phosphate buffered saline (PBS, 25 mM, pH 7.3), pig liver esterase (PLE) in PBS, and human CD4+ T-lymphocyte (CEM) cell extracts. All products were analyzed by means of analytical rp18-HPLC. The halflives of TriPPPro-prodrugs are summarized in Table 1; t1/2 reflecting the removal of the first bioreversible group (AB or ACB) to yield intermediates 22 or 26, respectively. The delivery mechanism of these TriPPPro-compounds is summarized in Fig. 6 (lower section). In many cases the half-lives determined for TriPPPro-prodrugs 9 (Gollnest et al. 2015) as well as TriPPPro-prodrugs 14 (Jia et al. 2020a) increased with an increase of lipophilicity of the masking group in PBS. However, the half-lives of the most lipophilic TriPPPro-compounds 9 (R: C15-C17) (Gollnest et al. 2015) surprisingly decreased, probably due to altered solubility behavior or micelle formation. Chemical stabilities of some TriPPPro-prodrugs 14 (AB-C4; ACB: C14-C18) were in the same range (t1/2 ¼ 69–74 h) (Jia et al. 2020a). The half-lives of TriPPPro-

C2H5 C4H9 C8H17 C9H19 OC8H17 C2H5 C4H9 C4H9 OC4H9 C2H5 C4H9 / / /

Comp.

9a78 9b78 9c78 9d78 10a78 14a80 14b80 14c80 15a80 18a81 18b81 26a80 26b80 d4T78

C2H5 C4H9 C8H17 C9H19 OC8H17 OC16H33 OC12H25 OC16H33 OC12H25 C14H29 C17H35 OC12H25 OC16H33

R2 (ABor ACB)

17 22 52 44 82 83 87 74 107 59 50 625 >1600

PBS pH ¼ 7.3 t1/2 [h] 0.12 0.43 0.98 2.8 2.6 1.9 1.2 3.7 n.d.c 0.8 3.3 n.d.c n.d.c

CEM/ 0 extracts t1/2 [h] PLE t1/2 [h] 0.42 0.063 0.013 0.082 0.12 4.7 0.17 1.6 0.28 n.d.c n.d.c n.d.c n.d.c

HIV-2 (ROD) EC50a [μM] 0.72  0.16 1.05  0.30 0.62  0.30 0.33  0.03 0.47  0.10 0.0048  0.0065 0.017  0.015 0.014  0.015 0.17  0.014 0.39  0.22 0.10  0.03 0.25  0.06 0.29  0.06 0.89  0.00

HIV-1 (HE) EC50a [μM] 0.43  0.25 0.40  0.00 0.31  0.01 0.25  0.07 0.36  0.06 0.027  0.0092 0.040  0.029 0.032  0.017 0.73  0.53 0.24  0.17 0.12  0.05 0.33  0.13 0.50  0.29 0.33  0.11

>10 >10 2.26  1.02 0.50  0.14 1.26  0.00 0.11  0.0071 0.073  0.036 0.12  0.048 1.12  0.21 1.1  0.82 0.54  0.41 1.98  1.67 1.46  1.34 150  9

CEM/TK HIV-2 (ROD) EC50a [μM]

Cell toxicity EC50b [μM] 57  6 58  3 52  1 34  5 51  5 34  9.3 27  4.9 21  17 54  13 20  0 33  7 >100 61  36 79  3

n.d., not determined The hydrolysis experiments of first generation TriPPPro-compounds 9, 10, 14, 15, 18, and 26 were conducted in aqueous 25 mM phosphate buffer (PBS, pH ¼ 7.3), PLE, and CEM/0 cell extracts. The hydrolysis products were detected by analytical rp18 HPLC a Antiviral activity determined in CD4+T-lymphocytes: 50% effective concentration; values are the mean  SD of n ¼ 2–3 independent experiments b Cytotoxicity: 50% cytostatic concentration or compound concentration required to inhibit CD4+T-cell (CEM) proliferation by 50%; values are the mean  SD of n ¼ 2–3 independent experiments. c n.m., not measurable

R1 (AB or ACB)

Table 1 Half-lives of TriPPPro-compounds 9, 10, 14, 15, 18, and 26 in different media and their biological evaluation

69 First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . . 2247

2248

X. Jia et al.

compounds 18 and 19 bearing two different AB-masking groups were found to be between 45 h and 64 h without showing a clear trend with respect to the length of the alkyl chains in the AB groups (Zhao et al. 2020a). Interestingly, the cleavage of the symmetric TriPPPro-prodrugs 9 proceeded faster than the nonsymmetric γ-(AB; ACB)-d4TTPs 14 and γ-(AB;AB)-d4TTPs 18 (R1 6¼ R2), respectively. For example, the half-lives for γ-(AB-C2;AB-C2)-d4TTP 9a (t1/2 ¼ 17 h) were found to be significantly lower by almost five- or two-fold than γ-(AB-C2;ACB-C16)d4TTP14a (t1/2 ¼ 83 h) and γ-(AB-C2;AB-C14)-d4TTP18a (t1/2 ¼ 59 h) (Table 1), respectively. It was also found that the half-lives of TriPPPro-prodrugs 10a (ACB-C8;ACBC8) and 15a (ACB-C4;ACB-C12) comprising two alkoxycarbonyloxybenzyl (ACB) groups (ACB ¼ ACB or ACB 6¼ ACB) were found to be more stable than TriPPPro-compounds 9c (AB-C8;AB-C8) and 14b (AB-C4;ACB-C12), respectively (Table 1) due to altered chemical stability of the two ACB residues. As compared to the double-masked prodrugs, the half-lives of intermediates 22 and 26 were found to be significantly higher likely because the additional negative charge of the intermediate caused additional electronic repulsion for the approaching nucleophile. Again, the half-lives of the ester intermediates (γ-(AB)-d4TTPs) 22 (Gollnest et al. 2015) were lower than the corresponding carbonate intermediates (γ-(ACB)-d4TTPs) 26 (Jia et al. 2020a). For example, as compared to γ-(ACB-C16)d4TTP 26b (t1/2 > 1600 h) (Jia et al. 2020a), the chemical stability of γ-(AB-C17)d4TTP 22 (t1/2 ¼ 583 h) (Gollnest et al. 2015) was found to be significantly lower by almost a factor of 3. As expected, the initial step of the delivery process for all these first generation TriPPPro-compounds has a similar mechanism. In that regard, the three possible hydrolysis pathways of TriPPPro-nucleotide prodrugs are summarized in Fig. 6 (lower section) (Gollnest et al. 2015; Jia et al. 2020a; Zhao et al. 2020a). The hydrolysis studies showed that, (i) when the starting material disappeared, the expected intermediates (γ-(AB)-d4TTPs 22 and γ-(ACB)-d4TTPs 26) were formed, indicating that the hydrolysis of TriPPPro-d4TTPs 14 mainly followed pathways (A1 and A2), and (ii) both intermediates were subsequently hydrolyzed to release d4TTP. Therefore, an increase in d4TTP concentration was detected before the complete consumption of the starting material. At the same time, a small amount of d4TDP with an upward trend was observed at the beginning, while almost no further increase of d4TDP was obtained after the complete consumption of the TriPPPro-d4TTPs (Fig. 6, A and B, upper section). Therefore, it was concluded that d4TDP was released from the starting TriPPPro-d4TTPs 14 either by a nucleophilic attack at the γ-phosphate unit following pathway B or by a nucleophilic attack at the benzyl-carbon atom. Furthermore, almost no formation of d4TMP (pathway C) was detected during the chemical hydrolysis experiments (Gollnest et al. 2015; Jia et al. 2020a; Zhao et al. 2020a). Remarkably, in the case of the hydrolysis of γ-(AB-C2;ACB-C16)-d4TTP 14a in PBS, the formation of intermediate γ-(ACB-C16)-d4TTP 26b proceeded faster as compared to the cleavage to give γ-(AB-C2)-d4TTP 22 (Fig. 6, A, upper section)

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2249

Fig. 6 Chemical hydrolysis of TriPPPro-d4TTPs 14a and 15a in PBS (pH 7.3) and the hydrolysis pathways of TriPPPro-d4TTPs (Gollnest et al. 2015; Jia et al. 2020a; Zhao et al. 2020a)

(Jia et al. 2020a), indicating a highly selective cleavage of one of the two biodegradable moieties (AB) of TriPPPro-compounds 14. In contrast, in the hydrolysis of γ-(ACB-C4;ACB-C12)-d4TTP 15a (Jia et al. 2020a), both intermediates γ-(ACB-C2)-d4TTP and γ-(ACB-C12)-d4TTP were detected in almost identical amounts (Fig. 6, B, upper section), which was in agreement with the results obtained from the chemical studies of non-symmetric TriPPPro-d4TTPs 18 (AB;AB) in PBS (Zhao et al. 2020a). As compared to the nonsymmetric TriPPPro-d4TTPs 18 (AB; AB), a highly selective conversion of the TriPPPro-compounds 14 (AB;ACB) into γ-(ACB)-d4TTPs 26 was demonstrated. Hydrolysis in biological media. The cleavage of the symmetric TriPPProd4TTPs 9 and 10 and nonsymmetricTriPPPro-d4TTPs 14, 15, 18, and 19 was also induced by PLE (PBS, pH ¼ 7.3) as a model for an esterase and CEM cell extracts to study their stability and to identify the hydrolysis products or to identify the chemoselectivity. As expected, the half-lives of TriPPPro-d4TTPs with PLE (t1/2 ¼ 0.013–44 h) and in cell extracts (t1/2 ¼ 0.05–13 h) were dramatically lower than the corresponding chemical stabilities of TriPPPro-compounds in PBS (up to 4000fold), indicating a significant contribution of the enzymatic cleavage.

2250

X. Jia et al.

The studies have shown that both intermediates 22 and 26 were readily formed due to the cleavage by PLE; thus, an increase of d4TTP concentration and just a very small amount of d4TMP and d4TDP were observed due to the cleavage of the AB or the ACB moiety in TriPPPro-compounds (Fig. 7). Interestingly, intermediate γ-(ACB-C16)-d4TTP 26b bearing a long lipophilic carbonate moiety was cleaved much slower than intermediate γ-(AB-C2)-d4TTP 22 bearing a short alkyl moiety. In contrast to γ-(AB-C2;ACB-C16)-d4TTP 14a (Jia et al. 2020a) and γ-(ACB-C4; ACB-C12)-d4TTP 15a (Jia et al. 2020a), γ-(AB-C4;AB-C15)-d4TTP 18 was cleaved to form intermediates γ-(AB-C2)-d4TTP and γ-(AB-C15)-d4TTP in almost identical amounts (Zhao et al. 2020a). Similarly, TriPPPro-compounds were also readily cleaved to give intermediates 22 and 26 in CEM/0 cell extracts and then hydrolyzed further to form d4TTP. In contrast to hydrolysis studies in PBS and with PLE, a large amount of d4TDP and a very low concentration of d4TTP were observed due to the fast dephosphorylation of d4TTP (t1/2 ¼ 38 min) (Gollnest et al. 2015) by phosphorylases/kinases present in the cell extracts as well as the much higher biological stability of d4TDP (t1/2 ¼ 59 h) (Gollnest et al. 2015). As was observed in the chemical hydrolysis, the stabilities of carbonate containing γ-(ACB)-d4TTPs 26 were found to be higher than the corresponding ester comprising γ-(AB)-d4TTPs 22. As an example, in the case of γ-(AB-C11;ACB-C6)-d4TTP, after 8 h incubation in CEM/0 cell extracts, the ratio of γ-(ACB-C6)-d4TTP (t1/2 ¼ 540 h) (Jia et al. 2020a) and γ-(AB-C11)-d4TTP (t1/2 ¼ 460 h) (Gollnest et al. 2015) was 10:1,

4-hydroxybenzyl alkohol

d4TTP

2 min 30 min

inc ub at ion

180 min

tim e

60 min

120 min 300 min

d4TDP

420 min 810 min

d4TMP

1420 min 3180 min 0

5

10

15

20

t (min)

Fig. 7 HPLC profile of γ-(AB-C2;ACB-C16)-d4TTP 14a after incubation with PLE (PBS) (Jia et al. 2020a)

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2251

demonstrating a highly selective cleavage of the AB moiety that led to the formation of γ-(ACB)-d4TTPs 26. This was different from the studies performed with nonsymmetric TriPPPro-d4TTPs 18 and 19. Antiviral evaluation. First generation TriPPPro-prodrugs and intermediates 26 were tested for their ability to inhibit HIV replication in HIV-1- and HIV-2-infected wild-type CEM/0 cell cultures. For comparison, HIV-2-infected mutant thymidine kinase-deficient (CEM/TK) cell cultures were treated with TriPPPro-compounds as well. As can be seen in Table 1, the parent d4T showed very low if any antiviral activity (EC50 ¼ 150 μM) (Gollnest et al. 2015) in the mutant CEM/TK cells, because the conversion into d4TMP catalyzed by TK has been identified as being the limiting step for the antiviral activity (Balzarini et al. 1988; Ho and Hitchcock 1989; Zhu et al. 1990). All TriPPPro-compounds showed similar or slightly better activities against HIV-1 and HIV-2 than the parent in wild-type (CEM/0) cells. Importantly, the inhibition of the replication of HIV-2 by TriPPPro-compounds 9 was higher than d4Tin CEM/TK cell cultures. For symmetric TriPPPro-d4TTPs 9 and 10, a somewhat increased antiviral activity was observed with increasing alkyl chain lengths, probably due to their advantageous permeability. The shorter alkyl chain length (R  C6H13) led to the lower stability of the prodrugs and no antiviral activity in the TK-deficient cells. In contrast, all TriPPPro-d4TTPs 9 and 10 (R  C8H17) showed higher antiviral activity than the parent d4T. As can be seen in Table 1, nonsymmetric TriPPPro-d4TTPs 14 showed higher activity than the corresponding symmetric TriPPPro-d4TTPs 9. For example, the antiviral activity determined for the nonsymmetric TriPPPro-d4TTP 14a (AB-C2; ACB-C16) (EC50 ¼ 0.0048 μM/HIV-2, Table 1) (Jia et al. 2020a) and the symmetric TriPPPro-d4TTP 9a (AB-C2;AB-C2) (EC50 ¼ 0.72 μM/HIV-2, Table 1) (Gollnest et al. 2015) was 65-fold and 1.2-fold higher as compared to the parent d4T in wildtype CEM/0 cells. More importantly, γ-(AB-C2;ACB-C16)-d4TTP 14a (EC50 ¼ 0.11 μM/HIV-2) (Jia et al. 2020a) also showed better activity against HIV-2 than γ-(AB-C2;AB-C2)-d4TTP 9a (EC50 > 10 μM/HIV-2) (Gollnest et al. 2015) in CEM/TK cells. It was concluded that TriPPPro-d4TTPs 14 and 15 can permeate the cell membranes and deliver the nucleoside triphosphate d4TTP intracellularly to result in the antiviral activity. Interestingly and somewhat surprisingly, the intermediate γ-(ACB-C16)-d4TTP 26b (EC50 ¼ 1.46 μM/HIV-2) (Jia et al. 2020a) also showed moderate antiviral activity against HIV-2 in CEM/TK cell. Consequently, it appears that the lipophilic carbonate moiety of compound 26 enables, at least in part, the cellular uptake of the compound, followed by the release of d4TTP. Similarly, the nonsymmetrically masked TriPPPro-compounds 18 and 19 also proved highly potent in CEM/TK cells. In addition, it was confirmed that one of the two masking groups, comprising a long, lipophilic aliphatic chain attached to γ-phosphonate moiety, ensures that the symmetric TriPPPro-d4TTPs are active, which is in agreement with the results for nonsymmetric TriPPPro-d4TTPs 14 and 15 (Jia et al. 2020a).

2252

X. Jia et al.

Application of the TriPPPro-Concept to Various Nucleoside Analogues Guided by the results from TriPPPro-compounds 9 and 14, the TriPPPro-approach was applied to a variety of approved as well as inactive nucleoside analogues (Fig. 1). It was expected that as observed for TriPPPro-compounds 9 and 14, a selective conversion of these symmetric and nonsymmetric analogues into their corresponding nucleoside triphosphates would finally be achieved in CEM cell extracts (Gollnest et al. 2016; Jia et al. 2020b). A series of symmetric TriPPPro-NTPs 9 and nonsymmetric TriPPPro-NTPs 14 bearing different nucleoside analogues was prepared by using the H-phosphonate protocol (Scheme 3, top). The method was conducted similarly to the previously reported procedures for non-symmetric TriPPPro-d4TTPs 14 and 15. First, nucleoside monophosphates, e.g., AZTMP, were prepared in good overall yields using a known procedure (Kore et al. 2012a, b). Next, H-phosphonates 12 and 27 were prepared and then converted into the corresponding pyrophosphates 13 and 28 in almost quantitative yields. Finally, pyrophosphates 13 and 28 were reacted with different nucleoside monophosphates to form the corresponding TriPPPro-NTPs 9 and 14. Taking previous results into account, Meier and co-workers applied the TriPPPro-technology to the nucleoside analogue T-1106 (Huchting et al. 2018). T-1106-TriPPPro prodrug 32a (AB-C9;AB-C9) and the nonsymmetrical T-1106TriPPPro prodrug 32b (AB-C4;AB-C14) were synthesized according to the phosphoramidite protocol (Scheme 3, bottom) (Weinschenk et al. 2015; Jessen et al. 2008). For this, first T-1106-DP was synthesized starting from the T-1106MP using the phosphoramidite chemistry. In the case of protected T-1106-diphosphate (DP) 29 (Fm;Fm), one Fm group was rapidly cleaved under basic conditions (triethylamine). As compared with doubly protected T-1106-diphosphate 29, the chemical stability of the mono-protected T-1106-diphosphate 30 proved to be higher. The mono-masked T-1106-diphosphate 30 was isolated using rp-chromatography, followed by the second deprotection to give T-1106-DP in high purity (> 98%). Finally, the synthesis of the T-1106-TriPPPro-compound 32a was achieved following the same phosphoramidite protocol for TriPPPro-d4TTPs 9 and 10 as discussed above (Gollnest et al. 2015). Moreover, T-1106-TP was also prepared using this strategy, starting from T-1106-DP (Huchting et al. 2018). As before, TriPPPro-NTPs 9 and 14 were also evaluated in different media. In the case of TriPPPro-NTPs 9 and 14, the following conclusions can be made: (1) an increase of NTP concentrations was observed in PBS (pH 7.3) (slow process), and small amounts of NTPs were detected in CEM/0 cell extracts (fast process); (2) as before, the stability of the intermediates γ-(ACB)-NTPs 26 were found to be higher than the corresponding intermediates γ-(AB)-NTPs 22; (3) NDPs were also formed from the starting TriPPPro-compounds by a nucleophilic reaction at the γ-phosphate moiety or the benzyl-carbon atom (in PBS); (4) no formation of NMPs was detected in all chemical hydrolysis experiments; and (5) with PLE, all TriPPPro-compounds were rapidly enzymatically cleaved and formed the corresponding NTPs.

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2253

Scheme 3 Reagents and conditions for the synthesis of TriPPPro-NTPs 9, 14, and 32. Reagents and conditions: (i) (1) NCS, CH3CN, rt., 1–2 h, (2) (H2PO4)Bu4N, CH3CN, rt., 1 h; (ii) (1) TFAA, Et3N, CH3CN, 0  C, 10 min, (2) 1-methylimidazole, Et3N, CH3CN, 0  C–rt, 10 min, (3) NMPs, rt., 1–5 h; (iii) (1) bis(9H-fluoren-9-ylmethyl)-diisopropylaminophosphite (in CH2Cl2), DCI, DMF, 10–20 min, rt.; (2) TBHP, DMF, 5 min, rt.; (iv) TEA, CH3CN, 10–20 min, rt.; (v) CH3OH, H2O, TEA, 24–48 h, rt

In addition, when T-1106-TriPPPro-prodrugs 32a and b were evaluated in crude enzyme preparations (i.e., MDCKor MDCK TGres cell extracts), the mono-masked triphosphate intermediates 33, T-1105-RDP, and T-1105-RMP were detected, but no T-1105-RTP was observed, probably due to rapid dephosphorylation after the initial formation (Huchting et al. 2018). Interestingly, most of the enzyme-cleavable TriPPPro-NTPs 9 and 14 were found to be highly active against HIV-1 and HIV-2 in wild-type CEM/0 cell cultures

2254

X. Jia et al.

(Gollnest et al. 2016; Jia et al. 2020b). The advantage of the TriPPPro-approach for intracellular delivery of NTPs was demonstrated by these prodrugs with higher antiviral activity against HIV-2 compared to their parent nucleosides in CEM/TK cells. Some entirely inactive nucleoside analogues were also converted into moderately or even highly potent TriPPPro-compounds. As an example, the inhibition of the replication of HIV-1 and HIV-2 by γ-(AB-C8;AB-C8)-Fdd(Cl)UTP(EC50 ¼ 3.4 μM/HIV-1; EC50 > 10 μM/HIV-2) (Gollnest et al. 2016) and γ-(AB-C2;ACBC16)-FddClUTP (EC50 ¼ 1.3 μM/HIV-1; EC50 ¼ 2.4 μM/HIV-2) (Jia et al. 2020b) was much higher than their parent nucleosides Fdd(Cl)U (EC50 > 250 μM/HIV-1; EC50 > 250 μM/HIV-2) in CEM/0 cell cultures. More importantly, γ-(AB-C8;ABC8)-Fdd(Cl)UTP (EC50 > 10 μM/HIV-2) (Gollnest et al. 2016) and γ-(AB-C2;ACBC16)-Fdd(Cl)UTP (EC50 ¼ 28 μM/HIV-2) (Jia et al. 2020b) were also potent in CEM/TK cells whereas Fdd(Cl)U lacked any relevant anti-HIV activity (EC50 > 250 μM/HIV-2). This proved that these prodrugs were taken up by the cells and delivered intracellularly the NTPs. Thus, the TriPPPro-prodrug system is indeed able to convert inactive nucleoside analogues into antivirally active nucleoside triphosphates. In further experiments, the uptake into CEM cells and the delivery of nucleoside triphosphates (here: ddBCNATP) was also proven by a study using a fluorescent nucleoside analogue (ddBCNA) (Gollnest et al. 2016). It was worth noting that T-1106-TriPPPro-prodrugs 32a and b also showed markedly better anti-influenza activities than their parent nucleoside T-1106 in MDCK and MDCK-TGres cells (Huchting et al. 2018). The T-1106-TriPPPro-prodrugs 32a and b bearing two biodegradable masking groups were able to permeate across cell membranes and to directly deliver T-1106-TP with high selectivity by an enzyme-triggered mechanism, thus bypassing all steps of the intracellular phosphorylation catalyzed by HGPRT and kinase in cells.

γ-Nonsymmetrically Modified TriPPPro-Prodrugs Bearing One Biodegradable Group (Second Generation Triphosphate Delivery Systems) From the results summarized above, the doubly, bioreversibly modified TriPPProcompounds enable the intracellular delivery of nucleoside triphosphates, and the technology could also be used to convert inactive nucleoside analogues into powerful biologically active metabolites (Jia et al. 2020a, b; Zhao et al. 2020a). The introduction of the two different masking groups (AB and ACB) led to the selective formation of γ-(ACB)-NTPs by chemical hydrolysis and, more importantly, by enzymes presented in CEM cell extracts. However, some of the TriPPPro-compounds such as 14 showed a loss of antiviral activity in the infected TK-deficient cell cultures. One possible reason for this observation could be the instability of the NTPs in cell extracts due to their fast dephosphorylation. Additionally, in primer extension assays, the mono-masked AB- or ACB-intermediates proved to be substrates for HIV-RT in contrast to the original doubly-masked TriPPPro-compounds

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2255

9. These results guided us to conduct a study on a series of TriPPPro-prodrugs 39 and 43 bearing only one biodegradable acyloxybenzyl moiety in combination with a non-bioreversible moiety at the γ-phosphate group (Zhao et al. 2020b; Nack et al. 2020). Furthermore, a new class of TriPPPro-prodrugs 47 and 48 was prepared in which the γ-phosphate group has been replaced by a γ-alkyl-phosphonate moiety (Scheme 4) (Jia et al. 2020c). As can be seen in Scheme 4, the synthesis of γ-(AB;ketobenzyl (kb))-d4TTPs 39 was based on the phosphoramidites route (upper section) (Nack et al. 2020). In the first step, OTBDMS-protected 4-bromo-benzylalcohol 34 was reacted with i-PrMgClLiCl to form compound 35. Weinreb amides 36 were prepared in almost

Scheme 4 Synthesis of TriPPPro-compounds 39, 43, 47, and 48. Reagents and conditions: (a) i-PrMgClLiCl, THF/dioxane, rt., 16 h; (b) Me(OMe)NHCl, NEt3, CH2Cl2, rt., 2 h; (c) THF, 0  C, 2 h; (d) TBAF, THF, rt., 1.5 h; (e) DCI, CH3CN, rt., 1 h; and (f) (1) (n-Bu4N)2+d4TDP, DCI, CH3CN, rt., 2 h and (2) t-BuOOH, CH3CN, rt., 30 min; (g) pyridine, DPP, 38  C, 3.3 h; (h) (1) NCS, CH3CN, rt., 2 h, (2) (H2PO4)Bu4N, CH3CN, rt., 1 h; (i) (1) TFAA, Et3N, CH3CN, 0  C, 10 min, (2) 1-methylimidazole, Et3N, CH3CN, 0  C–rt, 10 min, (3) d4TMP, rt., 3–5 h; (j) EDC, DMAP, CH2Cl2, rt., 12 h

2256

X. Jia et al.

quantitative yields. After the coupling reaction between these two compounds, TBAF was added to the mixture which resulted in the formation of benzyl alcohols 37. Compounds 37 were converted into (AB;kb)-phosphoramidites 38 according to the procedure reported previously (Gollnest et al. 2015). Finally, d4TDP was mixed with (AB;kb)-phosphoramidites 38 to form γ-(AB;kb)-d4TTPs 39 (34–63%). In addition, γ-(AB;alkyl)-d4TTPs 43 were synthesized using the H-phosphonate route (middle section) (Zhao et al. 2020b). For the synthesis of γ-(AB)-γ-C-(alkyl)-d4TTPs 47, the H-phosphonate route was developed (lower section) (Jia et al. 2020c). The synthesis of phosphinic acids 44 was accomplished by using a known literature method (Jia et al. 2020c) in moderate yields ranging from 55 to 70%. Then, H-phosphinates 45 were prepared from phosphinic acids 44, 4-dimethylaminopyridine (DMAP), 1-(3-dimethylaminopropyl)-3-ethylcarbodiimide (EDC), and 4-acyloxybenzyl alcohols 7 in yields ranging from 48 to 85%. Next, H-phosphinates 45 were converted into phosphonate phosphates 46 in almost quantitative yield. After coupling with d4TMP, a series of γ-(AB)-γ-C-(alkyl)-d4TTPs 47 (n-Bu4N+) were obtained. These prodrugs were purified by using automated rp18 chromatography and transformed into the corresponding ammonium form by the Dowex 50WX8 ion-exchange resin. Finally, γ-(AB)-γ-C-(alkyl)-d4TTPs 47 (NH4+) were isolated as white solids (25–68%) by using automated rp18 chromatography again. For comparison, a series of γ-(ACB)-γ-C-(alkyl)-d4TTPs 48 was also prepared using the identical method. Using the H-phosphonate synthesis method, the linkage between the α- and the β-phosphate was formed, and no oxidation is necessary, which was similar to the previously reported H-phosphonate route (Gollnest et al. 2016). In addition to the second generation TriPPPro-compounds 39, 43, 47, and 48, the alkylated nucleoside triphosphate derivatives 49, 53, and 57 were synthesized to investigate the hydrolysis pathways of the prodrugs and their delivery mechanism. The synthesis of compounds 49, 53, and 57 was based on the phosphoramidite approach, the H-phosphonate route, and the H-phosphonate method, respectively. These procedures are depicted below (Scheme 5). With PLE, γ-(kb)-d4TTPs 49 were formed from γ-(AB;kb)-d4TTPs 39 but only in low amounts. Alternatively, γ-(kb)d4TTPs 49 was prepared using the phosphoramidite chemistry, in which 9-fluorenylmethanol (FM) instead of the β-cyanoethyl group was used as a protecting group. It is worth mentioning that the Fm moiety was readily cleaved using NEt3 in CH3CN and the overall yields varied between 60% and 70% (upper section, Scheme 5). For the synthesis of compounds 53 and 57, the β-cyanoethyl group was included as a base labile protecting group at the γ-phosphate and γ-phosphonate moiety, respectively. Under basic conditions, the β-cyanoethyl group was cleaved to form γ-(alkyl)d4TTPs 53 (15–46%) and γ-C-(alkyl)-d4TTPs 57 (40–65%), respectively, by β-elimination reaction. Chemical hydrolysis. As described above, it was expected that the biodegradable prodrug moiety (AB or ACB) would be cleaved to yield compounds 49, 53, and 57 comprising the second lipophilic but non-cleavable moiety at the γ-phosphate or γ-phosphonate group, respectively. The initial cleavage of the biodegradable groups (AB or ACB) in prodrugs 39, 43, 47, and 48 was initiated by an ester or carbonate hydrolysis and the hydrolysis mechanism proceeded similarly to the previously

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2257

Scheme 5 Synthesis of γ-(kb)-d4TTPs 49, γ-(alkyl)-d4TTPs 53, and γ-C-(alkyl)-d4TTPs 57. Reagents and conditions: (i) (FmO)P(N(i-Pr)2)2, THF, 0  C, 2 h; (ii) (1) (n-Bu4N)2+NDP, DCI, CH3CN, rt., 2 h; (2) t-BuOOH, CH3CN, rt., 30 min; and (3) NEt3, CH3CN, rt., 15–70 min; (iii) PLE, H2O/phosphate buffer, 50 mM pH ¼ 7.3, 11 h; (iv) pyridine, DPP, 38  C, 3.3 h; (v) (1) NCS, CH3CN, rt., 2 h, (2) (H2PO4)Bu4N, CH3CN, rt., 1 h; (vi) (1) TFAA, Et3N, CH3CN, 0  C, 10 min; (2) 1-methylimidazole, Et3N, CH3CN, 0  C–rt, 10 min; (3) d4TMP, rt., 3–5 h; (vii) (1) n-Bu4N+OH, 8–20 h, Dowex 50WX8 (NH4+ form) ion exchange; (viii) EDC, DMAP, CH2Cl2, rt., 12 h

reported cleavage pathways for TriPPPro-d4TTPs 14 (Fig. 1) (Jia et al. 2020a). In all cases, before complete consumption of the starting materials 39, 43, 47, and 48, an increase of alkylated nucleoside triphosphate derivatives 49, 53, and 57 (pathway A, Scheme 6) and some concomitant formation of d4TDP (pathway B, Scheme 6) were observed. At the same time, no d4TTP was detected in these hydrolysis studies (pathway D, Scheme 6), which was in line with our design. Furthermore, no further increase of d4TDP was observed after full conversion of the starting compounds 39, 43, 47, or 48, indicating that these prodrugs were prone to cleavage between the γ-phosphate/γ-phosphonate and the β-phosphate (pathway B). Chemical hydrolysis led to the alkylated nucleoside triphosphate derivatives 49, 53, and 57 as well as to a very small amount of d4TMP (< 4%) (pathway C, Scheme 6) (Jia et al. 2020a). In phosphate buffer (PB, pH 7.3), the half-lives of γ-(AB;alkyl)-d4TTPs 43 were calculated to be between 94 h and 269 h (Zhao et al. 2020b), and those of γ-(AB;kb)d4TTPs 39 were found to be between 11 and 109 h (Nack et al. 2020). Neither showed a clear trend. In contrast, the stability of γ-(AB or ACB)-γ-C-(alkyl)-d4TTPs

2258

X. Jia et al.

Scheme 6 Hydrolysis mechanism of TriPPPro-compounds 39, 43, and 47

47 and 48 increased with increasing alkyl chain lengths (R1 and R2) (Jia et al. 2020c). Interestingly, the half-lives of γ-(AB)-γ-C-(alkyl)-d4TTPs 47 (t1/2 ¼ 103–919 h) (Jia et al. 2020c) were higher than the corresponding ester products γ-(AB;ACB)-d4TTPs14 (t1/2 ¼ 25–95 h) (Jia et al. 2020a). Furthermore, as expected, the hydrolytic stabilities of the ester prodrugs 47 were found to be lower than the corresponding carbonate prodrugs γ-(ACB)-γ-C-(alkyl)-d4TTPs 48 (t1/2 ¼ 123–1240 h) (Jia et al. 2020c). Hydrolysis using PLE. The prodrugs were also incubated with PLE in pH 7.3 phosphate buffer. The prodrugs were quickly cleaved and subsequently released nucleoside triphosphate derivatives 49, 53, and 57 much faster than the hydrolysis in PB (pH 7.3). As an example, the half-lives of γ-(ACB-C4)-γ-(alkyl-C12)-d4TTP 48 with PLE (t1/2 ¼ 0.009 h) (Jia et al. 2020c) were found to be significantly lower by almost a factor of 40.000 as compared to the half-lives of this prodrug in phosphate buffer (t1/2 ¼ 345 h), proving a strong contribution of the enzymatic cleavage. Moreover, no d4TTP was formed. This proved the initial concept of introducing an enzyme-stable moiety to the γ-phosphate and γ-phosphonate, respectively. Hydrolysis in CEM cell extracts. The hydrolysis of TriPPPro-d4TTPs 39, 43, 47, and 48 was further investigated in CEM cell extracts. Half-lives as low as 0.043 h to 10 h were observed and found to be significantly lower than the half-lives in phosphate buffer. Interestingly, the biological stability of prodrugs 47 correlated well with the different chain lengths (R1 and R2). Studies showed that the half-lives of γ-(AB:C1C12)-γ-C-(alkyl-C12)-d4TTPs 47 and γ-(ACB:C1-C12)-γ-C-(alkyl-C12)-d4TTPs 48 increased with increasing AB or ACB alkyl chain lengths (R1-moiety), while the stability for γ-(AB-C4)-γ-C-(alkyl:C12-C18)-d4TTPs 47 as well as γ-(ACB-C4)-γ-C(alkyl:C12-C18)-d4TTPs 48 also increased with increasing alkyl chain lengths in the non-bioreversible group (R2). The stability of prodrugs γ-(AB)-γ-C-(alkyl)-d4TTPs 47 was found to be lower than the corresponding compounds γ-(ACB)-γ-C-(alkyl)d4TTPs 48, which was in agreement with the results from the studies of these prodrugs described in PBS. In contrast to the studies with the first generation TriPPPro-d4TTPs 9, 10, 14, 15, 18, and 19 bearing two bioreversible groups, a predominate formation of the alkyl nucleoside triphosphate derivatives 49, 53, and 57 was observed in these studies

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2259

using prodrugs 39, 43, 47, and 48 in cell extracts (an example shown in Fig. 8). In addition, a very low concentration of d4TDP was observed, but no d4TTP. Furthermore, in contrast to d4TTP, compounds 49, 53, and 57 (t1/2 > 30 h) (Zhao et al. 2020b; Nack et al. 2020; Jia et al. 2020c) proved very stable in cell extracts against dephosphorylation. Antiviral evaluation. The effectiveness of the TriPPPro-prodrugs 39, 43, 47, and 48 and the corresponding alkylated nucleoside triphosphate derivatives 49, 53, and 57 to act as HIV-1 and HIV-2 inhibitors were determined in HIV-1 and HIV-2infected wild-type (CEM/0) and HIV-2-infected TK-deficient (CEM/TK) cell cultures. TriPPPro-prodrugs 39 and 43 showed virtually similar or even slightly better activities against HIV-1 and HIV-2 than those of the parent nucleoside d4T in CEM/0 cell cultures. However, their antiviral activities did not differ much, likely due to the similar lipophilicity of TriPPPro-prodrugs 49 and 53 and the combination of aliphatic alkyl groups in the masking groups as well as in the stable γ-alkyl-moiety and γ-kb-moiety, respectively (Zhao et al. 2020b; Nack et al. 2020). In contrast, the inhibition of the replication of HIV by prodrugs 47 and 48 was at least similar to or even markedly higher compared to their parent nucleoside d4T in wild-type CEM/0 cells. The antiviral activity of TriPPPro-prodrugs 47 and 48 increased with increasing alkyl chain lengths in the R1-moiety and R2-moiety, respectively (Jia et al. 2020c). Furthermore, there were also some clear trends: (1) γ-(alkylC18)-d4TTP 53 (EC50 ¼ 0.05 μM) (Zhao et al. 2020b), γ-(kb-C17)-d4TTP 49 (EC50 ¼ 0.26 μM) (Nack et al. 2020), and γ-C-(alkyl-C18)-d4TTP 57 (EC50 ¼ 1.58 μM) (Jia et al. 2020c) comprising longer alkyl residues in the non-cleavage

Fig. 8 HPLC profile of γ-(AB-C4)-γ-C-(alkyl-C12)-d4TTP 47 after incubation in CEM/0 cell extracts at different times (Jia et al. 2020c)

2260

X. Jia et al.

moiety at the γ-phosphate or γ-phosphonate group, respectively, were active against HIV-2 in CEM/TK cells, indicating a successful cell membrane passage of these compounds, (2) none of the prodrugs 39, 43, 47, or 48 showed significantly higher cytotoxicity than the parent d4T, and (3) almost all of the TriPPPro-prodrugs 39, 43, 47, and 48 were highly potent in CEM/TK cells. Surprisingly, the antiviral activity determined in the wild-type CEM/0 cell cultures was completely retained in the case of γ-(alkyl-C18)-d4TTP 53 (EC50 ¼ 0.05 μM) (Zhao et al. 2020b) and γ-(kb-C17)d4TTP 49 (EC50 ¼ 0.26 μM) (Nack et al. 2020) in mutant thymidine-deficient CEM cells (TK), which were better than their original prodrugs γ-(AB-C4;alkyl-C18)d4TTP 43 (EC50 ¼ 0.17 μM) (Zhao et al. 2020b) and γ-(AB-C6;kb-C17)-d4TTP 39 (EC50 ¼ 0.4 μM) (Nack et al. 2020), respectively. In contrast, as compared to γ-C-(alkyl-C18)-d4TTP 57 (EC50 ¼ 1.58 μM) (Jia et al. 2020c), the antiviral activity of γ-(ACB-C4)-γ-C-(alkyl-C18)-d4TTP 47 (EC50: 0.032 μM, 1000-fold more active than d4T) and γ-(AB-C4)-γ-C-(alkyl-C18)-d4TTP 48 (EC50: 0.042 μM) in CEM/0 cells improved by 50-fold and 38-fold, respectively, indicating it was advantageous to combine the prodrug strategy with the stable γ-alkyl-modifications (Jia et al. 2020c). In the case of compounds 43 and 48, the antiviral activity determined for γ-(ACB-C4)-γ-C-(alkyl-C18)-d4TTP 48 (EC50 ¼ 0.0018 μM/HIV-1; EC50 ¼ 0.026 μM/HIV-2) (Jia et al. 2020c) improved by 100-fold and six-fold, respectively, as compared to the corresponding γ-(AB-C4; alkyl-C18)d4TTP 43 (EC50 ¼ 0.18 μM/HIV-1; EC50 ¼ 0.16 μM/HIV-2) (Zhao et al. 2020b). More importantly, with γ-(ACB-C4)-γ-C-(alkyl-C18)-d4TTP 48 (EC50 ¼ 0.032 μM/HIV-2) (Jia et al. 2020c), the antiviral activity in CEM/TK cells was improved by5-fold as compared to γ-(AB-C4;alkyl-C18)-d4TTP 43 (EC50 ¼ 0.17 μM/ HIV-2) (Zhao et al. 2020b). With TriPPPro-prodrugs 39, 43, 47, and 48, the antiviral activity in CEM/TK cells was considerably improved as compared to first generation TriPPPro-prodrugs. Therefore, TriPPPro-prodrugs 39, 43, 47, and 48 (second generation) have a higher potential for use as antiviral agents than TriPPPro-prodrugs 9, 10, 14, 15, 18, or 19 (first generation). On the other hand, the first generation TriPPProtechnology appears to be very interesting for application with nucleoside analogues such as T-1105 that showed severe limitations in its activation to give the corresponding T-1105-RTP. It can be expected that this second generation TriPPProconcept may be used to convert T-1105 and other inactive nucleoside analogues (such as Fdd(Cl)U) into powerful biologically active metabolites.

Primer Extension Assays An important property of nucleoside analogue triphosphates is their ability to act as substrates for viral polymerases such as HIV-RT. Only by that conversion can they be incorporated into the growing viral DNA strand and then terminate replication due to a mechanism called obligate or immediate chain termination. This mechanism is attributed to nucleosides acting as obligate chain terminators (e.g., d4T) which lack the 30 -hydroxyl needed for extension of the DNA chain. Other mechanisms for triphosphate analogues include non-obligate chain terminators (delayed chain

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2261

termination; e.g., the sofosbuvir nucleoside triphosphate), which possess the 30 -hydroxyl but still lead to chain termination or a mechanism known as error catastrophe due to extensive incorporation and multiple mutation events (e.g., ribavirin triphosphate or T-1105 triphosphate). Moreover, it is critical that the nucleoside analogue triphosphates be as selective as possible for the viral target polymerase compared to the cellular polymerases such as DNA polymerase γ because nonselective incorporation may be responsible for various side effects and toxicity (e.g., mitochondrial toxicity). Therefore, the γ-alkylated d4TTPs 49, 53, and 57 were studied in primer extension assays for their substrate recognition properties to HIV-RT as well as to three cellular DNA polymerases α, β, and γ. First the hydrolysis mixture of γ-(AB–C8;AB–C8)-d4TTP 9 resulting from PLE incubation and the synthesized γ-(AB-C8)-d4TTP 22 against HIV-RT was examined (Gollnest et al. 2015). As expected, the incorporation of d4TMP was observed. This proved that d4TTP was released from TriPPPro-compounds 9. For comparison, γ-(AB–C8;AB–C8)-TTP was also investigated in primer extension assays, because the released natural TTP should act as a substrate for HIV-RT. Taking the results from the hydrolysis studies, no d4TTP was formed from γ-alkylated TriPPPro-compounds 39, 43, 47, or 48 (Zhao et al. 2020b; Nack et al. 2020; Jia et al. 2020c), which was different from the first generation TriPPPro-compounds (Gollnest et al. 2015; Jia et al. 2020a; Zhao et al. 2020a). In our studies on the second generation TriPPPro-compounds 39, 43, 47, and 48, the alkylated nucleoside triphosphate derivatives 49, 53, and 57 also showed good antiviral activity against HIV-2 in CEM/TK cell cultures. It was speculated that 49, 53, and 57 were responsible for the inhibitory effect of TriPPPro-compounds 39, 43, 47, and 48. To confirm the prodrug concept, compounds 49, 53, and 57 were studied in primer extension assays and investigated their suitability to be substrates for the HIV-RT as compared to three different human DNA-polymerases: DNA polymerase α, DNA polymerase β, and DNA polymerase γ. In fact, compounds 49, 53, and 57 proved to be substrates for HIV-RT. (Zhao et al. 2020b; Nack et al. 2020; Jia et al. 2020c) For example, the result of a primer extension assay in which γ-C-(alkyl)-d4TTPs 48 (lane 3 and lane 4) (Jia et al. 2020c) and HIV-RT were used is shown in Fig. 9. The n þ 1 band was detected because d4TMP was incorporated and acted as an obligate chain terminator. Next, these compounds were studied using human DNA polymerases α, β, and γ, and d4TTP or/and TTP were used as reference compoumds (Zhao et al. 2020b; Nack et al. 2020; Jia et al. 2020c). It was found that the γ-alkylated triphosphate analogues 49, 53, and 57 did not act as substrates for all these DNA polymerases (Zhao et al. 2020b). Thus, as compared to DNA polymerases, a high selectivity for these compounds for HIV-RT was found (Zhao et al. 2020b).

Summary and Conclusion In this minireview, the development of the first and second generation TriPPProprodrugs for the intracellular delivery of NTP derivatives with efficient metabolic bypass and superior antiviral properties is summarized. TriPPPro-compounds

2262

X. Jia et al.

Fig. 9 Primer extension assay using HIV’s RT (Jia et al. 2020c). Lane 1 (+), dATP, dGTP, dCTP, and TTP with HIV-RT; lane 2 (), dATP, dGTP, dCTP, and TTP without HIV-RT; lane 3 (γ-C18d4TTP), γ-C-(alkyl-C18)-d4TTP; lane 4 (γ-C12-d4TTP), γ-C-(alkyl-C12)-d4TTP; lane 5, TTP; lane 6, d4TTP

bearing two lipophilic groups at the γ-phosphate or γ-phosphonate group, respectively, are designed to enable a successful crossing of biological barriers. Most of the first generation TriPPPro-NTPs (9, 10, 14, 15, 18, and 19) comprising two biodegradable masking groups showed high activities against HIV-1 and HIV-2 in CEM/0 cells and more importantly against HIV-2 in CEM/TK cells as well. The TriPPProstrategy was applied successfully to different nucleoside analogues that showed different limitations in their activation process to give the corresponding nucleoside analogue 50 -triphosphates. This approach is not limited to anti-HIV active compounds because T-1106-TriPPPro-prodrugs 32a,b also retained full anti-influenza activities in MDCK-TGres (HGPRT-deficient) cells (Huchting et al. 2018; Jia et al. 2021). Furthermore, second generation TriPPPro-prodrugs (39, 43, 47, and 48) comprising a non-cleavable moiety in addition to a biodegradable prodrug moiety at the γ-phosphate or γ-phosphonate group, respectively, revealed a higher potential for use in antiviral chemotherapies than the first generation TriPPPro-prodrugs bearing two biodegradable masking groups. In summary, it was convincingly shown that the development of this TriPPPro-strategy can make a significant contribution to the development of improved antiviral agents based on nucleoside analogues.

References Andersen KG, Rambaut A, Lipkin WI, Holmes EC, Garry RF (2020) The proximal origin of SARSCoV-2. Nat Med 26:450–452 Asahchop EL, Wainberg MA, Sloan RD, Tremblay CL (2012) Antiviral drug resistance and the need for development of new HIV-1 reverse transcriptase inhibitors. Antimicrob Agents Chemother 56:5000–5008 Balzarini J, Pauwels R, Baba M, Herdewijn P, De Clercq E, Broder S, Johns DG (1988) The in vitro and in vivo anti-retrovirus activity, and intracellular metabolism of 30 -azido-

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2263

20 ,30 -dideoxythymidine and 20 ,30 -dideoxycytidine are highly dependent on the cell species. Biochem Pharmacol 37:897–903 Balzarini J, Herdewijn P, De Clercq E (1989) Differential patterns of intracellular metabolism of 20 ,30 -didehydro-20 ,30 -dideoxythymidine and 30 -azido-20 ,30 -dideoxythymidine, two potent antihuman immunodeficiency virus compounds. J Biol Chem 264:6127–6133 Bazzoli C, Jullien V, Le Tiec C, Rey E, Mentré F, Taburet A-M (2010) Intracellular pharmacokinetics of antiretroviral drugs in HIV infected patients, and their correlation with drug action. Clin Pharmacokinet 49:17–45 Bonnaffé D, Dupraz B, Ughetto-Monfrin J, Namane A, Huynh Dinh T (1995a) Synthesis of acyl pyrophosphates – application to the synthesis of nucleotide lipophilic prodrugs. Tetrahedron Lett 36:531–534 Bonnaffé D, Dupraz B, Ughetto-Monfrin J, Namane A, Huynh Dinh T (1995b) Synthesis of nucleotide lipophilic prodrugs containing 2 inhibitors targeted against different phases of the HIV replication cycle. Nucleosides Nucleotides Nucleic Acids 14:783–787 Bonnaffé D, Dupraz B, Ughetto-Monfrin J, Namane A, Henin Y, Huynh Dinh T (1996) Potential lipophilic nucleotide prodrugs: synthesis, hydrolysis, and antiretroviral activity of AZT and d4T acyl nucleotides. J Org Chem 61:895–902 Boswell-Casteel RC, Hays FA (2017) Equilibrative nucleoside transporters-a review. Nucleosides Nucleotides Nucleic Acids 36:7–30 Burton JR, Everson GT (2009) HCV NS5B polymerase inhibitors. Clin Liver Dis 13:453–465 Cevik M, Tate M, Lloyd O, Maraolo AE, Schafers J, Ho A (2020) SARS-CoV-2, SARS-CoV-1 and MERS-CoV viral load dynamics, duration of viral shedding and infectiousness: a systematic review and meta-analysis. Lancet Microbe 2:e13–e22 Cihlar T, Ray AS (2010) Nucleoside and nucleotide HIV reverse transcriptase inhibitors: 25 years after zidovudine. Antivir Res 85:39–58 Deval J (2009) Antimicrobial strategies: inhibition of viral polymerases by 30 -hydroxyl nucleosides. Drugs 69:151–166 Deville-Bonne D, El Amri C, Meyer P, Chen Y, Agrofoglio LA, Janin J (2010) Human and viral nucleoside/nucleotide kinases involved in antiviral drug activation: structural and catalytic properties. Antivir Res 86:101–120 El Safadi Y, Vivet-Boudou V, Marquet R (2007) HIV-1 reverse transcriptase inhibitors. Appl Microbiol Biotechnol 75:723–737 Erion MD, Reddy KR, Boyer SH, Matelich MC, Gornez-Galeno J, Lemus RH, Ugarkar BG, Colby TJ, Schanzer J, van Poelje PD (2004) Design, synthesis, and characterization of a series of cytochrome P-450 3A-activated prodrugs (HepDirect prodrugs) useful for targeting phosph(on) ate-based drugs to the liver. J Am Chem Soc 126:5154–5163 Furman PA, Fyfe JA, StClair MH, Weinhold K, Rideout JL, Freeman GA, Lehrman SN, Bolognesi DP, Broder S, Mitsuya H, Barry DW (1986) Phosphorylation of 30 -azido-30 -deoxythymidine and selective interaction of the 50 -triphosphate with human-immunodeficiency-virus reversetranscriptase. Proc Natl Acad Sci USA 83:8333–8337 Gaunt ER, Hardie A, Claas ECJ, Simmonds P, Templeton KE (2010) Epidemiology and clinical presentations of the four human coronaviruses 229E, HKU1, NL63, and OC43 detected over 3 years using a novel multiplex real-time PCR method. J Clin Microbiol 48:2940–2947 Gollnest T, de Oliveira TD, Schols D, Balzarini J, Meier C (2015) Lipophilic prodrugs of nucleoside triphosphates as biochemical probes and potential antivirals. Nat Commun 6(8716) Gollnest T, de Oliveira TD, Rath A, Hauber I, Schols D, Balzarini J, Meier C (2016) Membranepermeable triphosphate prodrugs of nucleoside analogues. Angew Chem Int Ed 55:5255–5258 Ho HT, Hitchcock MJM (1989) Cellular pharmacology of 20 ,30 -dideoxy-20 ,30 -didehydrothymidine, a nucleoside analog active against human immunodeficiency virus. Antimicrob Agents Chemother 33:844–849 Hostetler KY, Stuhmiller LM, Lenting HBM, Vandenbosch H, Richman DD (1990) Synthesis and antiretroviral activity of phospholipid analogs of azidothymidine and other antiviral nucleosides. J Biol Chem 265:6112–6117 Hostetler KY, Richman DD, Carson DA, Stuhmiller LM, Vanwijk GMT, Vandenbosch H (1992) Greatly enhanced inhibition of human-immunodeficiency-virus type-1 replication in cem and

2264

X. Jia et al.

Ht4-6c cells by 30 -deoxythymidine diphosphate dimyristoylglycerol, a lipid prodrug of 30 -deoxythymidine. Antimicrob Agents Chemother 36:2025–2029 Hostetler KY, Parker S, Sridhar CN, Martin MJ, Li JL, Stuhmiller LM, Vanwijk GMT, Vandenbosch H, Gardner MF, Aldern KA, Richman DD (1993) Acyclovir diphosphate dimyristoylglycerol – a phospholipid prodrug with activity against acyclovir-resistant herpessimplex virus. Proc Natl Acad Sci USA 90:11835–11839 Huchting J, Vanderlinden E, Winkler M, Nasser H, Naesens L, Meier C (2018) Prodrugs of the phosphoribosylated forms of hydroxypyrazinecarboxamide pseudobase T-705 and its de-fluoro analogue T-1105 as potent influenza virus inhibitors. J Med Chem 61:6193–6210 Jessen HJ, Schulz T, Balzarini J, Meier C (2008) Bioreversible protection of nucleoside diphosphates. Angew Chem Int Ed 47:8719–8722 Jia X, Schols D, Meier C (2020a) Anti-HIV-active nucleoside triphosphate prodrugs. J Med Chem 63:6003–6027 Jia X, Schols D, Meier C (2020b) Lipophilic triphosphate prodrugs of various nucleoside analogues. J Med Chem 63:6991–7007 Jia X, Weber S, Schols D, Meier C (2020c) Membrane permeable, bioreversibly modified prodrugs of nucleoside diphosphate-γ-phosphonates. J Med Chem 63:11990–12007 Jia X, Ganter B, Meier C (2021) Improving properties of the nucleobase analogs T-705/T-1105 as potential antiviral. Ann Rep Med Chem 57:1–47 Jordan PC, Stevens SK, Deval J (2018) Nucleosides for the treatment of respiratory RNA virus infections. Antivir Chem Chemother 26:1–19 Jordheim LP, Durantel D, Zoulim F, Dumontet C (2013) Advances in the development of nucleoside and nucleotide analogues for cancer and viral diseases. Nat Rev Drug Discov 12:447–464 Juliano AD, Roguski KM, Chang H (2018) 6631450H. Estimates of global seasonal influenzaassociated respiratory mortality: a modelling study. Lancet 391:1285–1300 Kaner J, Schaack S (2016) Understanding Ebola: the 2014 epidemic. Glob Health 12:53–60 Kore AR, Xiao ZJ, Senthilvelan A, Charles I, Shanmugasundaram M, Mukundarajan S, Srinivasan B (2012a) An efficient synthesis of pyrimidine specific 20 -deoxynucleoside-50 -tetraphosphates. Nucleosides Nucleotides Nucleic Acids 31:567–573 Kore AR, Shanmugasundaram M, Senthilvelan A, Srinivasan B (2012b) An improved protectionfree one-pot chemical synthesis of 20 -deoxynucleoside-50 -triphosphates. Nucleosides Nucleotides Nucleic Acids 31:423–431 Kreimeyer A, Ughetto Monfrin J, Namane A, Huynh-Dinh T (1996) Synthesis of acylphosphates of purine ribonucleosides. Tetrahedron Lett 37:8739–8742 Kreimeyer A, Andre F, Gouyette C, Huynh-Dinh T (1998) Transmembrane transport of adenosine 50 -triphosphate using a lipophilic cholesteryl derivative. Angew Chem Int Ed 37:2853–2855 Mehellou Y, Rattan HS, Balzarini J (2018) The ProTide prodrug technology: from the concept to the clinic. J Med Chem 61:2211–2226 Meier C, Balzarini J (2006) Application of the cycloSal-prodrug approach for improving the biological potential of phosphorylated biomolecules. Antivir Res 71:282–292 Meier C, Lorey M, De Clercq E, Balzarini J (1998) cycloSal-20 ,30 -dideoxy-20 ,30 -didehydrothymidine monophosphate (cycloSal-d4TMP): synthesis and antiviral evaluation of a new d4TMP delivery system. J Med Chem 41:1417–1427 Meier C, Jessen HJ, Schulz T, Weinschenk L, Pertenbreiter F, Balzarini J (2015) Rational development of nucleoside diphosphate prodrugs: DiPPro-Compounds. Curr Med Chem 22: 3933–3950 Mohamady S, Jakeman DL (2005) An improved method for the synthesis of nucleoside triphosphate analogues. J Org Chem 70:10588–10591 Mohamady S, Taylor SD (2011) General procedure for the synthesis of dinucleoside polyphosphates. J Org Chem 76:6344–6349 Nack T, de Oliveira TD, Weber S, Schols D, Balzarini J, Meier C (2020) γ-Ketobenzyl-modified nucleoside triphosphate prodrugs as potential antivirals. J Med Chem 63:13745–13761 Paff MT, Averett DR, Prus KL, Miller WH, Nelson DJ (1994) Intracellular metabolism of() and (+)-cis-5-fluoro-1-[2-(hydroxymethyl)-1,3-oxathiolan-5-yl]cytosine in Hepg2 derivative 2.2.15 (subclone P5a) Cells. Antimicrob Agents Chemother 38:1230–1238

69

First- and Second-Generation Nucleoside Triphosphate Prodrugs:. . .

2265

Pastuch-Gawolek G, Gillner D, Krol E, Walczak K, Wandzik I (2019) Selected nucleos(t)idebased prescribed drugs and their multi-target activity. Eur J Pharmacol 865:172747 Pertenbreiter F, Balzarini J, Meier C (2015) Nucleoside mono- and diphosphate prodrugs of 20 ,30 -dideoxyuridine and 20 ,30 -dideoxy-20 ,30 -didehydrouridine. Chem Med Chem 10:94–106 Peyrottes S, Egron D, Lefebvre I, Gosselin G, Imbach JL, Perigaud C (2004) Sate pronucleotide approaches: an overview. Mini-Rev Med Chem 4:395–408 Pradere U, Garnier-Amblard EC, Coats SJ, Amblard F, Schinazi RF (2014) Synthesis of nucleoside phosphate and phosphonate prodrugs. Chem Rev 114:9154–9218 Puech F, Gosselin G, Lefebvre I, Pompon A, Aubertin AM, Kirn A, Imbach JL (1993) Intracellular delivery of nucleoside monophosphates through a reductase-mediated activation process. Antivir Res 22:155–174 Rabaan AA, Al-Ahmed SH, Haque S, Sah R, Tiwari R, Malik YS, Dhama K, Yatoo MI, BonillaAldana DK, Rodriguez-Morales AJ (2020) SARS-CoV-2, SARS-CoV, and MERS-COV: a comparative overview. Infez Med 28:174–184 Roll PM, Weinfeld H, Carroll E, Brown GB (1956) Utilization of nucleotides by the mammal. IV. Triply labeled purine nucleotides. J Biol Chem 220:439–454 Schulz T, Balzarini J, Meier C (2014) The DiPPro approach: synthesis, hydrolysis, and antiviral activity of lipophilic d4T diphosphate prodrugs. Chem Med Chem 9:762–775 Seley-Radtke KL, Yates MK (2018) The evolution of nucleoside analogue antivirals: a review for chemists and non-chemists. Part 1: early structural modifications to the nucleoside scaffold. Antivir Res 154:66–86 Sowa T, Ouchi S (1975) Facile synthesis of 50 -nucleotides by selective phosphorylation of a primary hydroxyl group of nucleosides with phosphoryl chloride. Bull Chem Soc Jpn 48:2084–2090 Sulkowski MS, Gardiner DF, Rodriguez-Torres M, Reddy KR, Hassanein T, Jacobson I, Lawitz E, Lok AS, Hinestrosa F, Thuluvath PJ, Schwartz H, Nelson DR, Everson GT, Eley T, WindRotolo M, Huang SP, Gao M, Hernandez D, McPhee F, Sherman D, Hindes R, Symonds W, Pasquinelli C, Grasela DM (2014) Daclatasvir plus Sofosbuvir for previously treated or untreated chronic HCV infection. N Engl J Med 370:211–221 Tan XL, Chu CK, Boudinot FD (1999) Development and optimization of anti-HIV nucleoside analogs and prodrugs: a review of their cellular pharmacology, structure-activity relationships and pharmacokinetics. Adv Drug Deliver Rev 39:117–151 Thomson W, Nicholls D, Irwin WJ, Almushadani JS, Freeman S, Karpas A, Petrik J, Mahmood N, Hay AJ (1993) Synthesis, bioactivation and anti-HIV activity of the bis(4-acyloxybenzyl) and mono(4-acyloxybenzyl) esters of the 50 -monophosphate of AZT. J Chem Soc Perkin Trans 1(11):1239–1245 Van Rompay AR, Johansson M, Karlsson A (2000) Phosphorylation of nucleosides and nucleoside analogs by mammalian nucleoside monophosphate kinases. Pharmacol Ther 87:189–198 van Wijk GMT, Hostetler KY, van den Bosch H (1991) Lipid conjugates of antiretroviral: release of antiretroviral nucleoside monophosphate by a nucleoside diphosphate diglyceride hydrolase activity from rat liver mitochondria. Biochim Biophys Acta Lipids Lipid Metab 1084:307–310 Van Wijk GMT, Hostetler KY, Kroneman E, Richman DD, Sridhar CN, Kumar R, van den Bosch H (1994) Synthesis and antiviral activity of 30 -azido-30 -deoxythymidine triphosphate distearoylglycerol – a novel phospholipid conjugate of the anti-HiV agent AZT. Chem Phys Lipids 70:213–222 Vukadinovic D, Boge NPH, Balzarini J, Meier C (2005) “Lock-in” modified cycloSal nucleotides – the second generation of cycloSal prodrugs. Nucleosides Nucleotides Nucleic Acids 24: 939–942 Warnecke S, Meier C (2009) Synthesis of nucleoside di- and triphosphates and dinucleoside polyphosphates with cycloSal-Nucleotides. J Org Chem 74:3024–3030 Weinschenk L, Schols D, Balzarini J, Meier C (2015) Nucleoside diphosphate prodrugs: nonsymmetric DiPPro-nucleotides. J Med Chem 58:6114–6130 WHO Coronavirus (COVID-19) Dashboard 2021. https://www.who.int/emergencies/diseases/ novel-coronavirus-2019. Accessed 9 Jan 2022 Woldemeskel BA, Kwaa AK, Garliss CC, Laeyendecker O, Ray SC, Blankson JN (2020) Healthy donor T cell responses to common cold coronaviruses and SARS-CoV-2. J Clin Investig 130: 6631–6638

2266

X. Jia et al.

Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, Yuan ML, Zhang YL, Dai FH, Liu Y, Wang QM, Zheng JJ, Xu L, Holmes EC, Zhang YZ (2020) A new coronavirus associated with human respiratory disease in China. Nature 579:265–269 Xia HJ, Xie XP, Shan C, Shi PY (2018) Potential mechanisms for enhanced Zika epidemic and disease. ACS Infect Dis 4:656–659 Yates MK, Seley-Radtke KL (2018) The evolution of antiviral nucleoside analogues: a review for chemists and non-chemists. Part II: complex modifications to the nucleoside scaffold. Antivir Res 162:5–21 Zhao CL, Jia X, Schols D, Balzarini J, Meier C (2020a) γ-Non-symmetrically dimasked TriPPProprodrugs as potential antiviral agents against HIV. Chem Med Chem 16:499–512 Zhao CL, Weber S, Schols D, Balzarini J, Meier C (2020b) Prodrugs of γ-alkyl-modified nucleoside triphosphates: improved inhibition of HIV reverse transcriptase. Angew Chem Int Ed 59: 22063–22071 Zhu Z, Ho HT, Hitchcock MJ, Sommadossi JP (1990) Cellular pharmacology of 20 ,30 -didehydro20 ,30 -dideoxythymidine (d4T) in human peripheral-blood mononuclear-cells. Biochem Pharmacol 39:R15–R19 Zhu YL, Dutschman GE, Liu SH, Bridges EG, Cheng YC (1998) Anti-hepatitis B virus activity and metabolism of 20 ,30 -dideoxy-20 ,30 -didehydro-beta-L(-)-5-fluorocytidine. Antimicrob Agents Chemother 42:1805–1810 Zhu N, Zhang DY, Wang WL, Li XW, Yang B, Song JD, Zhao X, Huang BY, Shi WF, Lu RJ, Niu PH, Zhan FX, Ma XJ, Wang DY, Xu WB, Wu GZ, Gao GGF, Tan WJ (2020) Coronavirus, C. N. A novel coronavirus from patients with pneumonia in China. 2019. N Engl J Med 382:727–733

New Molecular Technologies for Oligonucleotide Therapeutics-1: Properties and Synthesis of Boranophosphate DNAs

70

Kazuki Sato and Takeshi Wada

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Properties of PB Oligodeoxyribonucleotides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chemical Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Duplex Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nuclease Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RNase H Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syntheses of PB Oligodeoxynucleotides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Challenges of PB Oligonucleotide Syntheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Synthesis of PB Oligonucleotides from the Phosphoramidite Monomer Bearing Amino-Protecting Groups is Compatible with a Boronation Reaction . . . . . . . . . . . . . . . . . . . . Synthesis of PB Oligonucleotides Employing a Nucleoside 30 -O-H-Phosphonate . . . . . . . Synthesis with a P-Boronated Monomer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stereoselective Synthesis of PB Oligonucleotides Employing an Oxazaphospholidine Monomer Bearing an Acid-Labile Chiral Auxiliary and Amino-Protecting Groups . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2268 2268 2269 2270 2271 2271 2272 2272 2273 2275 2276 2280 2283 2284

Abstract

Antisense oligonucleotides are one of the most successful modalities of oligonucleotide therapeutics. The significant progress on the research of antisense oligonucleotides is largely attributed to the development of chemical modifications at phosphorus and sugar moieties. However, there is still a growing demand for a novel chemical modification that enable the development of a potent antisense oligonucleotides with high safety and efficacy. Under these circumstances, a boranophosphate oligodeoxynucleotide, which have borano groups on phosphorus atoms, is regarded as one of the promising K. Sato · T. Wada (*) Department of Medicinal and Life Sciences, Faculty of Pharmaceutical Sciences, Tokyo University of Science, Noda, Chiba, Japan e-mail: [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_73

2267

2268

K. Sato and T. Wada

analogs for antisense oligonucleotides. A boranophosphate oligodeoxynucleotide induces an RNase H activity, exhibits high nuclease resistance, and notably, low cytotoxicity. The former half of this chapter introduces the details of their elucidated properties. Following the promising properties of boranophosphate oligodeoxynucleotides, the synthesis of boranophosphate oligodeoxynucleotides have been vigorously investigated. However, the generally used phosphoramidite chemistry is incompatible to the synthesis: use of N-acyl protected phosphoramidite monomers cause fatal side reactions during the synthesis. The latter part of the chapter describes the several approaches for the synthesis of boranophosphate oligodeoxynucleotides including stereoselective one. Keywords

Boranophosphate DNA · Stereoselective synthesis

Introduction Oligonucleotide therapeutics generally require chemical modifications to achieve substantial biological stability. Thus, numerous phosphate, sugar, and base modifications have been investigated (Deleavey and Damha 2012). Among the phosphate modifications, phosphorothioate (PS) modification represents the most efficient (Eckstein 2014). In addition to improving the biological stability of oligonucleotide drugs, PS modification offers desirable pharmacokinetics, which may accrue from the interaction of certain proteins. However, this strong interaction with proteins causes cytotoxicity (Iannitti et al. 2014; Winkler et al. 2010). Therefore, PBs can be considered a prospective option for modifying oligonucleotide therapeutics (Li et al. 2007). Particularly, PB deoxyribonucleic acids (PB-DNAs) have been well studied as a candidate for antisense oligonucleotides. PB-DNAs exhibit higher resistance toward nuclease digestion than PS-DNAs (PS-DNAs); moreover, they can induce a ribonuclease (RNase) H activity and exhibit low cytotoxicity (Hall et al. 1993). However, the complexity of synthesizing PB-DNA represents a hurdle that must be overcome. In this chapter, the properties and syntheses of PB-DNA are described in detail. Particularly, PB-DNA synthesis requires ingenuity, and the process comprises four categories employing (1) amino-protecting groups, which is compatible with boronation; (2) the H-phosphonate method without the amino-protecting groups; (3) P-boronated monomers; (4) an oxazaphospholidine monomer bearing an acid-labile chiral auxiliary and amino-protecting groups.

Properties of PB Oligodeoxyribonucleotides Since the first report on the chemical synthesis of PB-DNAs by Sood et al. (1990), PB oligodeoxynucleotides have been synthesized via chemical and enzymatic reactions, and their properties have been investigated (Li et al. 2007). The following

70

New Molecular Technologies for Oligonucleotide Therapeutics-1:. . .

2269

subchapters present the elucidated properties (chemical stability, duplex stability, and RNase H activity) of PB-DNAs.

Chemical Stability Sood et al. synthesized a dithymidine boranophosphate 1 and investigated its chemical stability (Sood et al. 1990, Fig. 1). The PB diester linkage of 1 was maintained by an overnight treatment employing a concentrated NH3 aqueous solution at 55  C. In addition, no more than 10% of the compound 1 was converted into its phosphate counterpart by treatment with a 1 N HCl aq–MeOH mixture (1:1, v/v) at RT overnight. The results revealed the stabilities of the internucleotidic boranophosphodiester linkage in acidic and basic media. Conversely, boranophosphomonoester exhibited higher susceptibility to hydrolysis. Li et al. reported the hydrolysis rate constant of 2 was 4  104 s and that the first hydrolysis products were thymidine and boranophosphate, indicating that the hydrolysis cleaved the P–O bond rather than the P ! B one (Li et al. 1996, Fig. 1). Notably, the boranophosphodiesters and triesters were susceptible to oxidation by cations, such as a trityl-cation. Sergueeva et al. observed that an internucleotidic boranophosphodiester and a triester degraded in the presence of a dimethoxytrityl (DMTr) cation and that the former was more susceptible to the degradation, which converted it into an H-phosphonate diester. Thus, this reaction must be followed carefully. Alternatively, a boranophosphodiester can be considered the “protected” H-phosphonate. Shimizu et al. and Kawanaka et al. reported the syntheses of phosphate and P-modified analogs from a boranophosphodiester via an H-phosphonate diester (Kawanaka et al. 2007, 2008; Shimizu et al. 2004a, Scheme 1). Since the H-phosphonate diester represents a valuable precursor of Pmodified analogs, this strategy can be efficient for the synthesis of P-modified analogs. Further, Paul et al. utilized a boranophosphodiester as a precursor of Pmodified analogs. They reported that phosphoramidates were efficiently obtained from the boranophosphodiester in the presence of I2 and an amine (Paul et al. 2015). Gołębiewska et al. extensively investigated the mechanism of the reaction, noting that the reaction proceeded via an H-phosphonate diester (Gołębiewska et al. 2018).

Fig. 1 Structures of dithymidine boranophosphate 1 and 50 -boranophosphate thymidine 2

2270

K. Sato and T. Wada

P-modification

Scheme 1 Reaction of the boranophosphodiester 3 and the DMTr cation

Duplex Stability The duplex stabilities of PB-DNA with the complementary DNA and RNA have been investigated by several groups. Sergueev et al. reported that the temperature of melting (Tm) values of the duplex of an unmodified dodecaadenylate with PO, methylphosphonate, PS, and PB dodecathymidylate were 47  C, 32  C, 28  C, and 14  C, respectively, indicating that PB-DNA exhibited lower hybridization ability to the complementary DNA (Sergueev and Shaw 1998). The hybridization abilities of PO, PS, and PB-DNA comprising four types of nucleobases to complementary DNA and RNA were investigated by Shimizu et al. (2006) and Uehara et al. (2014). The order of the ability corresponded to that reported by Sergueev et al., i.e., PO > PS > PB-DNA. Conversely, it was observed that the Tm values of PB-DNA and the complementary RNA were comparable with those of PS-DNA and the unmodified RNA (PB-DNA: 29.9  C, PS-DNA: 32.2  C) (Uehara et al. 2014). Thus, PB-DNA formed a relatively more stable duplex with the complementary RNA than with the complementary DNA. Further, circular dichroism has been performed to demonstrate that a PB-DNA and its complementary RNA formed an A-type duplex (Sato et al. 2019). Considering that the DNA/DNA duplex is typically a B-type, PB-DNA might be more prone to forming an A-type duplex than a B-type. Additionally, the duplex-forming ability of PO/PB-chimeric DNAs was examined, and the Tm value of the duplex of a PO/PB-chimeric DNA and its complementary RNA increased based on the number of phosphate linkages (Brummel and Caruthers 2002; Sato et al. 2019; Takahashi et al. 2022). Notably, the duplex-forming ability of PB-DNA depended on the stereochemistry of its phosphorus atoms. PB-DNAs with Rp linkages exhibit low duplex-forming abilities, whereas Sp PB-DNAs form significantly stable duplexes with their complementary RNAs. Hara et al. synthesized stereocontrolled PB-DNAs and PS-DNAs bearing Rp and Sp linkages. Notably, the spatial orientation of an Rp PS corresponds to that of an Sp PB because the assignment priority of the atoms is Sulfur > Oxygen > Borane. The order of the duplex-forming ability with a complementary RNA was PO-DNA > (Sp)–PBDNA > (Rp)–PS-DNA > (Sp)–PS-DNA > (Rp)–PB-DNA (Hara et al. 2019). The

70

New Molecular Technologies for Oligonucleotide Therapeutics-1:. . .

2271

result demonstrated that the stereochemistry of PB-DNA significantly affected its properties more than that of PS-DNA. Notably, a homouridylate bearing 20 -O-Me and stereocontrolled (Sp)-PB linkages formed a more stable duplex with a homoadenylate than the PO counterpart (Nukaga et al. 2015).

Nuclease Resistance The biological stability of PB-DNAs has been investigated by several groups. As preliminary experiments, Sood et al. demonstrated that dithymidine boranophosphate exhibited stability toward digestion by calf spleen phosphodiesterase and snake venom phosphodiesterase (SVP). They also confirmed that >90% of the phosphate counterpart was cleaved under the same conditions (Sood et al. 1990). The nuclease resistance of PB-DNA oligomers has also been reported. Sergueev et al. compared the stabilities of PO, PS, and PB dodecathymidylate in the presence of different nucleases (SVP, bovine spleen phosphodiesterase (BSP), S1 nuclease, and P1 nuclease). PB dodecathymidylate exhibited a longer half-life than the PS counterparts, except when treated with BSP (Sergueev and Shaw 1998). The nuclease resistance of chimeric oligonucleotides bearing PB linkages has also been investigated. McCuen et al. reported that PO/PB-chimeric DNAs bearing PB linkages at every other or every third demonstrated substantial resistance against DNase I digestion. Notably, this resistance to digestion was higher than that of the PO/PS chimeric counterpart (McCuen et al. 2008). Conversely, Takahashi et al. reported that a PB/PS/PO chimeric DNA and a 20 -OMe gapmer comprising three or four consecutive phosphate linkages in the central region were completely degraded by SVP. Although SVP is a representative 30 -exonuclease, it could act as an endonuclease and facilitate the digestion of consecutive phosphate linkages. Notably, PB/PS chimeric DNA remained largely intact with the SVP treatment (Takahashi et al. 2022).

RNase H Activity RNase H is a nuclease, which mainly exists in a nucleus. It plays a crucial role in the expression of antisense activities. RNase H recognizes a DNA/RNA duplex, after which it selectively cleaves the RNA strand. RNase H activity is among the most crucial factors of RNase H-dependent antisense oligonucleotides. Phosphorus modifications, which induce the loss of a negative charge on the phosphate moiety, e.g., a methylphosphonate, impede the RNase H activity. Since a PB linkage bears a negative charge, PB-DNAs would exhibit RNase H activity. Zhu et al. reported that PB tetradecathymidylate induced an E. coli RNase H-mediated cleavage of PO tetradecaadenylate and that the rate was ten times lower than that of PO tetradecathymidylate (Zhu et al. 1997). PB-DNAs typically exhibit lower RNase H activities than their unmodified and PS counterparts. The effect of the chirality control was also investigated, revealing that stereocontrolled PB-DNA comprising

2272

K. Sato and T. Wada

Sp linkages can induce E. coli RNase H-mediated RNA cleavage, while their Rp counterparts cannot. The order of E. coli RNase H activity was PO-DNA > (Rp)–PSDNA > (Sp)–PB-DNA > (Sp)–PS-DNA > (Rp)–PB-DNA (Hara et al. 2019).

Syntheses of PB Oligodeoxynucleotides Challenges of PB Oligonucleotide Syntheses Oligonucleotides are generally synthesized via the phosphoramidite method, which was developed by Carurthers et al. A typical procedure for synthesizing an oligonucleotide bearing phosphate internucleotidic linkages is, as follows (Scheme 2): a phosphoramidite monomer whose exocyclic amino groups are protected by an acyl group is condensed with a hydroxy group of another nucleoside or nucleotide on a solid support in the presence of an acidic activator, such as tetrazole (condensation). Thereafter, the unreacted hydroxy groups are capped by acetic anhydride (capping). Afterward, the resultant phosphite is oxidized into phosphotriester by iodine and water (oxidation), followed by the removal of the DMTr group on the 50 -hydroxy group under acidic conditions (detritylation). These condensation, capping, oxidation, and detritylation steps are repeated until the desired length is achieved. The oligomers on the solid support are deprotected and released from the support by a treatment employing an ammonia or amine aqueous solution to afford fully deprotected oligonucleotides (Beaucage and Iyer 1992). Although the substitution of the oxidation steps by boronation would afford the PB oligonucleotides, it is a

Scheme 2 Synthesis of PO or PS-DNAs on a solid support via the phosphoramidite method

70

New Molecular Technologies for Oligonucleotide Therapeutics-1:. . .

2273

very challenging task. The acyl-type protecting groups on the amino group of a nucleobase cannot withstand boronation conditions; thus, they are reduced into alkyl groups, which cannot be deprotected (Sergueeva et al. 2001). Additionally, PB linkages are susceptible to the DMTr cations that are generated in the detritylation steps, as already described. Thus, the synthesis of PB oligomers requires a specific strategy. As already described, there are four representative strategies for synthesizing PB oligonucleotides with (1) amino-protecting groups, which are compatible with a boronation reaction; (2) the H-phosphonate method without the aminoprotecting groups; (3) P-boronated monomers; and (4) an oxazaphospholidine monomer exhibiting an acid-labile chiral auxiliary and amino-protecting groups. These strategies are explained in the following subsections.

Synthesis of PB Oligonucleotides from the Phosphoramidite Monomer Bearing Amino-Protecting Groups is Compatible with a Boronation Reaction Sood et al. first synthesized dithymidine boranophosphate with a 30 -O-phosphoramidite (11) (Scheme 3; Sood et al. 1990). Compound 11 was activated by tetrazole in MeCN and condensed with a 50 -hydroxy group of another molecule to yield a phosphite triester derivative (13). The phosphite triester was converted into its boranophosphotriester counterpart (14) via treatment with a BH3•SMe2 complex. Notably, the 50 -O-DMTr group was removed under the boronation conditions. A methyl group on the PB moiety and 30 -O-Ac group was deprotected by the ammonia aqueous solution treatment to produce the fully deprotected dithymidine boranophosphate (Sood et al. 1990). Despite this successful synthesis, the exocyclic amino groups on the nucleobases (adenine, cytosine, and guanine) must still be protected to avoid side reactions with

Scheme 3 Synthesis of dithymidine boranophosphate (15)

2274

K. Sato and T. Wada

an activated phosphoramidite monomer. However, as aforementioned, the typically employed acyl-type amino-protecting groups are not compatible with boronation reaction conditions. McCuen et al. utilized trityl derivatives to protect the amino groups, thereby completely suppressing the side reactions on the nucleobases (McCuen et al. 2006, 2008). Furthermore, they synthesized 50 -O-DMTr 30 -O-methyl N,N-diisopropyl phosphoramidite nucleosides, which comprise di-t-butylisobutylsilyl (BIBS) protecting groups on the exocyclic amino groups (Roy et al. 2013). The outline of the synthesis was, as follows: the phosphoramidite monomer was activated by tetrazole and condensed with a hydroxy group. After the unreacted hydroxy groups were capped by acetic anhydride, the resulting phosphite was boronated by a BH3•THF complex. The DMTr group on the 50 -position was removed under acidic conditions in the presence of trimethyl boranophosphate as a trityl-cation scavenger to suppress the side reaction on the borano groups. These cycles were repeated, and the methyl groups on the PB linkages were removed by treatment with disodium-2-carbamoyl-2-cyanoethylene-1,1-dithiolate. Thereafter, the silyl protecting groups of the exocyclic amino groups were removed by a mixture of Et3N and HF, followed by the release of the oligomer from the solid support. They obtained PB-DNAs up to 24mer. Notably, this method, which also afforded PB/phosphate (PB/PO) chimeric DNAs, simply substituted the boronation step with oxidation employing t-BuOOH. At present, this method yielded the longest boranophosphate and PB/PO chimeric DNAs (Roy et al. 2013, Scheme 4).

Scheme 4 Solid-phase synthesis of PB-DNAs and PB/PO chimeric DNAs employing a phosphoramidite monomer bearing silyl-type amino-protecting groups

70

New Molecular Technologies for Oligonucleotide Therapeutics-1:. . .

2275

Synthesis of PB Oligonucleotides Employing a Nucleoside 30 -O-HPhosphonate Higson et al., Zhang et al., and Sergueev et al. almost simultaneously reported the synthesis of PB oligonucleotides, which were longer than a decamer, via the H-phosphonate method (Higson et al. 1998; Sergueev and Shaw 1998; Zhang et al. 1997). The typical procedure for synthesizing oligonucleoside H-phosphonates is, as follows (Scheme 5): a nucleoside 30 -H-phosphonate is employed as the monomer unit, which is condensed with a hydroxy group on a solid support, to form an internucleotidic H-phosphonate diester linkage in the presence of a condensing reagent, such as PivCl. The cycles, which comprised condensation and detritylation, were repeated to obtain an oligomer with the desired length. The internucleotidic H-phosphonate linkages were simultaneously converted into the corresponding silylphosphites by silylating reagents, such as N,O-bis (trimethylsilyl)acetamide (BSA), followed by boronation to yield silyl ester of PBs. The treatment with ammonia afforded the desired PB-DNA, and PB homothymidylates of up to 15mer were obtained via this strategy. In 2001, Lin et al. employed the H-phosphonate method to synthesize PB-DNAs bearing four kinds of nucleobases. They utilized N-unprotected 30 -H-phosphonate monomers to synthesize the oligomer, thereby avoiding the side reactions on the nucleobases (Lin and Shaw 2001). Wada et al. reported the condensation reaction of an N-unprotected 30 -H-phosphonate nucleoside via an O-selective process employing a phosphonium-type condensing reagent, such as 2-(benzotriazol-1-yloxy)-1,3-dimethyl2-pyrrolidin-1-yl-1,3,2-diazaphospholidinium hexafluorophosphate (BOMP; Wada et al. 1997). Sergueev et al. employed this strategy to synthesize an N-unprotected nucleoside H-phosphonate on a solid support and converted the internucleotidic linkages into their PB counterparts upon the treatment of BSA and a borane–DIPEA complex, obtaining up to 10mer (dTPBCPBAPBAPBCPBGPBTPBTPBGPBA). However, they reported that the synthetic mixture contained a substantial amount of shorter-thanexpected oligomers and that the yield was moderate probably because of the

Scheme 5 Solid-phase synthesis of PB-DNAs via the H-phosphonate method

2276

K. Sato and T. Wada

susceptibility of the H-phosphonate diester linkage to the attack of a 50 -hydroxy group, particularly under basic conditions. They also reported that the complex dissociated upon treatment with an ammonia solution even though an N-unprotected nucleobase generally forms a complex with borane (Sergueev and Shaw 1998).

Synthesis with a P-Boronated Monomer Shimizu et al. developed the synthetic strategy employing P-boronated monomers to allow the utilization of acyl-type amino-protecting groups. First, they investigated the synthesis of PB-DNAs via the boranophosphotriester method (Scheme 6; Shimizu et al. 2006). A 30 -boranophosphodiester monomer, bearing a characteristic P ! BH3 moiety was condensed with a hydroxy group of a nucleoside or nucleotide to form an internucleotidic boranophosphotriester linkage employing a condensing reagent. Thereafter, the DMTr group on the 50 -hydroxy group was removed under acidic conditions in the presence of a trityl-cation scavenger, such as Et3SiH, to circumvent the side reaction on the borano group. After repeating these steps for the designated times, the protecting groups on the internucleotidic boranophosphotriesters were removed into the diester counterparts. Afterward, the oligomers were deprotected and released from the solid support. Since this strategy involves the pre-introduction of the borano group on a phosphorus atom, it does not require the boronation step in the oligomer synthesis, thus allowing the utilization of the acyl-type amino-protecting groups. In the beginning, Wada et al. examined the synthesis of the monomer units (Scheme 7). The reaction between a nucleoside bearing a free 50 -hydroxy group with triethylammonium dimethyl PB as the boranophosphorylation reagent employing bis(2-oxo-3-oxazolidin-1-yl)phosphinic chloride (Bop-Cl) as the

Scheme 6 Solid-phase (boranophosphodiesters)

synthesis

of

PB-DNAs

employing

P-boronated

monomers

70

New Molecular Technologies for Oligonucleotide Therapeutics-1:. . .

2277

Scheme 7 Synthesis of the boranophosphodiester monomers

condensing reagent in the presence of iPr2NEt proceeded swiftly and afforded boranophosphotriesters (31) in good yields. Next, one of the methyl groups was removed by a benzenethiolate anion to yield the diester counterparts (32; Wada et al. 2002). Subsequently, 32 was subjected to the condensation reaction with the thymidine bearing a 50 -OH group in the presence of Bop-Cl (the condensing reagent), 3-nitro1,2,4-triazole (NT, nucleophilic catalyst), and iPr2NEt. Although the reactions employing the 30 -boranophosphodiester of deoxycytidine and the thymidine derivatives proceeded efficiently, the utilization of purine derivatives produced moderate yields. These phenomena were attributed to the side reactions of the activated PB monomers with the O4 and O6 positions of thymidine and deoxyguanosine, respectively. Since the purine bases reduced the reactivity of the monomer unit via their steric hindrance, these side reactions would be significant. Thus, Wada et al. synthesized N3 benzoyl thymidine and O6 diphenylcarbamoyl deoxyguanosine derivatives to suppress the side reactions. Notably, the dimers were successfully obtained with good isolated yields (Scheme 8). These results demonstrated that N3-protected thymidine and O6-protected deoxyguanosine derivatives were employed in the boranophosphotriester method (Shimizu et al. 2004b). Shimizu et al. also applied the boranophosphotriester method to the solid-phase synthesis and observed that 3-nitro1,2,4-triazol-1-yl-tris(pyrrolidin-1-yl)phosphonium hexafluorophosphate (PyNTP) (Oka et al. 2006), a phosphonium-type condensing reagent, facilitated the efficient condensation reaction of a boranophosphodiester bearing a hydroxy group. PyNTP comprises an NT as a leaving group. Upon elimination, NT acts as a nucleophilic catalyst for the condensation reaction. The potent reactivity of PyNTP allows for the utilization of weaker bases, such as

2278

K. Sato and T. Wada

Scheme 8 Synthesis of the PB-DNA dimers employing boranophosphodiester monomers

2,6-lutidine and 2,4,6-collidine, as well as a cyanoethyl group as the protecting group for the PB moiety. Therefore, N4-Bz-cytosine 30 -boranophosphodiester bearing a cyanoethyl group was condensed with the 50 -OH group of thymidine on the solid support employing PyNTP in the presence of 2,6-lutidine. Thereafter, the DMTr group on the 50 -position was removed via treatment with 3% DCA in the presence of Et3SiH as the trityl-cation scavenger. After capping the free 50 -position with an acetyl group, the removal of the cyanoethyl group by DBU, as well as the removal of the protecting groups on the nucleobases and cleavage of the linker by ammonia treatment afforded a dimer bearing a PB linkage. However, the HPLC yield of the dimer, as determined by the area ratio of CPBT/(CPBT þ T), was 78%, indicating that the reaction conditions were not suitable for the synthesis of the oligomers. Therefore, it was desirable to develop a more potent condensation reagent. In addition to the leaving group of a condensing reagent, it has been proposed that the structure of a phosphonium framework highly affects the rate of condensation reactions. Thus, Oka et al. developed 1,3-dimethyl-2-(3-nitro-1,2,4-triazol-1-yl)-2-pyrrolidin-1-yl-1,3,2-diazaphospholidinium hexafluorophosphate (MNTP) as a novel condensing reagent, observing that it improved the HPLC yield (Oka et al. 2006), although double-coupling with 2,4,6-collidine or single-coupling with 1,8-bis (dimethylamino)naphthalene (DMAN) as the base obtained better results. The syntheses of a tetramer (d(CPBAPBGPBT)) and a dodecamer (d((CPBAPBGPBTPB)2CPBAPBGPBT)) were attempted, following these reaction conditions; the tetramer and dodecamer were isolated with 30% and 16% yields, respectively (Scheme 9). Notably, the procedure for deprotecting PB-DNAs was crucial to the reaction outcome, and the 50 -end hydroxy group of the oligomer was capped with an acetyl group before the removal of the cyanoethyl group on the internucleotidic linkage(s). The absence of the capping step caused the following significant backbiting side reaction: the 50 -end hydroxy group attacked the adjacent 30 -internucleotidic linkage (Shimizu et al. 2006). As mentioned previously, the boranophosphotriester method afforded mediumlength PB-DNA. However, the efficiency of the condensation reactions was not sufficiently high. Thus, Higashida et al. and Uehara et al. developed a more efficient method employing a monomer unit with higher reactivity.

70

New Molecular Technologies for Oligonucleotide Therapeutics-1:. . .

2279

Scheme 9 Solid-phase synthesis of PB-DNAs with boranophosphodiester monomers

Generally, as demonstrated by the higher reactivities of phosphoramidites and H-phosphonates than phosphodiesters, a phosphorus atom with a lower oxidation number will exhibit higher electrophilicity. Thus, Higashida et al. and Uehara et al. focused on an H-boranophosphonate monoester, which exhibits a characteristic H–P ! BH3 structure. Since an H-boranophosphonate monoester comprises a phosphorus atom whose oxidation state is +1, the reactivity of the monomer can be higher than that of a PB diester. Additionally, the hydrogen atom of the H–P moiety, which is present in H-boranophosphonate, would be transformed into a heteroatom or functional group, such as an oxygen or sulfur atom and an amino group, and this feature offers the opportunity for synthesizing many P-boronated derivatives. First, the synthesis of the monomers was attempted employing pyridinium H-boranophosphonate and the 30 -OH nucleosides. The reaction conditions employing Bop-Cl and pyridine as the condensing reagent and solvent, respectively, were effective, and the 30 -H-boranophosphonate of deoxyadenosine, deoxycytidine, deoxyguanosine, and thymidine bearing suitable protecting groups on the nucleobases were successfully obtained (Higashida et al. 2009). Subsequently, P-boronated oligonucleotides were synthesized employing the monomers. The monomer was condensed with a hydroxy group on a solid support, after which the 50 -DMTr group was removed under acidic conditions in the presence of Et3SiH. These cycles were repeated to obtain the desired length of the oligomer. Next, the internucleotidic linkages were oxidized into boranophosphodiesters. The resulting oligomer was treated with ammonia to deprotect and release the oligomer (Scheme 10). Thus, PB-DNAs of up to 12mer were successfully isolated, and the yields were higher than those of the synthesis, following the boranophosphotriester approach. This is attributable to the higher reactivity of the H-boranophosphonate monoester compared with that of its boranophosphodiester counterpart (Uehara et al. 2014). In addition to the oligonucleotides whose internucleotidic linkages are all PB diesters, the synthesis of chimeric oligonucleotides has been attempted (Sato et al. 2019; Takahashi et al. 2022). Takahashi et al. utilized H-boranophosphonate, H-phosphonothioate, and H-phosphonate monoesters to construct internucleotidic

2280

K. Sato and T. Wada

Scheme 10 Solid-phase synthesis of PB-DNAs employing the H-boranophosphonate monomers

H-boranophosphonate, H-phosphonothioate, and H-phosphonate diester linkages, and these linkages were simultaneously oxidized into PB, PS, and phosphate, respectively, by treatment with CCl4 and H2O in the presence of a base. The most challenging task of the syntheses was the chemoselective condensation of H-phosphonothioate, which comprised two nucleophilic atoms (oxygen and sulfur). The activation of the oxygen and sulfur atoms induced the formation of H-phosphonothioate and the H-phosphonate diester, respectively. Thus, the oxygen atom must be selectively activated (Zain and Stawiński 1996). Further, phosphonium-type condensing reagents bearing 3-nitro-1,2,4-triazole were observed to be effective for the chemoselective condensation reaction, and PyNTP was more potent than MNTP. Employing three kinds of monomers and suitable condensing reagents, Takahashi et al. obtained PB/PS/PO chimeric DNAs up to 12mer. This is the first example of the synthesis of an oligodeoxynucleotide exhibiting PB and PS modifications (Scheme 11; Takahashi et al. 2022).

Stereoselective Synthesis of PB Oligonucleotides Employing an Oxazaphospholidine Monomer Bearing an Acid-Labile Chiral Auxiliary and Amino-Protecting Groups The physiochemical and biological properties of P-modified oligonucleotides are significantly affected by the stereochemistry of the phosphorus atom. Thus, the stereocontrolled synthesis of the P-modified oligonucleotide would represent a promising strategy for obtaining a prominent antisense oligonucleotide. Sergueeva et al. reported the stereoselective synthesis of a PS-DNA employing an oxazaphospholidine derivative (Sergueeva et al. 1999). Oxazaphospholidine is a

70

New Molecular Technologies for Oligonucleotide Therapeutics-1:. . .

2281

Scheme 11 Solid-phase synthesis of PB/PS/PO-DNAs Fig. 2 Structures of the oxazaphospholidine monomers

phosphoramidite derivative exhibiting a chiral auxiliary and a cyclic structure, and Sergueeva et al. synthesized it from ephedrine (Iyer et al. 1995; Fig. 2). Oka et al. investigated the suitable structure of oxazaphospholidine and observed that its bicyclic structure bearing a pyrrolidine ring, as well as the presence of a Ph group on the 5-position of the oxazaphospholidine ring, offered the stability of the phosphorus configuration and high stereoselectivity in the condensation reactions (Oka et al. 2008). Additionally, it has been observed that N-cyanomethyl pyrrolidinium triflate (CMPT) is a suitable acidic activator for stereocontrolled syntheses because of its non-nucleophilic nature (Oka et al. 2003). They achieved the stereocontrolled synthesis of PS-DNAs and RNAs up to 12mer (Oka et al. 2008). However, as aforementioned, the synthesis of stereocontrolled PB oligonucleotides requires a specific strategy because of the incompatibility of the acyl-type protecting groups on the exocyclic nucleobase amino groups. Iwamoto et al. developed a method for synthesizing stereocontrolled PB-DNAs with acid-labile chiral auxiliaries and amino-protecting groups (Iwamoto et al.

2282

K. Sato and T. Wada

2012). They designed an oxazaphospholidine monomer bearing phenyl and methyl groups at the 5-position of the oxazaphospholidine ring (Iwamoto et al. 2009). Following the condensation reaction of the monomer and a hydroxy group, the phosphite intermediate was converted into the corresponding H-phosphonate diester, thereby liberating a tertiary carbocation under acidic conditions to remove the DMTr group on a 50 -hydroxy group. Further, the exocyclic amino groups were protected by the trimethoxy or dimethoxy trityl groups, which were also deprotected under acidic conditions. The O6 position of deoxyguanosine was protected with a 2-(trimethylsilyl)ethyl group (Gaffney and Jones 1982), which was removed in the detritylation step (Iwamoto et al. 2012). After repeating the condensation and detritylation cycles for designated times, the internucleotidic H-phosphonate diester linkages were subjected to boronation and treated with a silylating reagent to yield PB derivatives (Iwamoto et al. 2012, Scheme 12). The key point of the strategy is that the condensation reaction of the oxazaphospholidine monomer proceeded even when the exocyclic amino groups were unprotected; if the monomer reacted with the amino group, the intramolecular nucleophilic attack from the amino group of the chiral auxiliary would reverse the reaction (Oka et al. 2010). They synthesized a stereocontrolled 4mer comprising a full set of nucleobases, PB homothymidylate (Iwamoto et al. 2012), and 20 -O-Me PB homouridylate (Nukaga et al. 2015). However, the synthesis of longer oligomers than 4mer containing all the four nucleobases was challenging because of the low coupling efficiency owing to the

Scheme 12 Stereocontrolled solid-phase synthesis of PB-DNAs employing the oxazaphospholidine monomer (48) bearing an acid-labile chiral auxiliary and amino-protecting groups

New Molecular Technologies for Oligonucleotide Therapeutics-1:. . .

70

HO

TMSO

B

O

HO

silylation

O

P

O

B

O

O H P O O

B

O

O

O 55

O

B O

N&O&P deprotection under acidic conditions

N H

O

2283

O

52

N H

O

DMTrO

O

BPRO

H OTf

NC N +

1) boronation 2) release from solid support

O

DMTrO HO

O

NH2 H3B –O

P

O

O O

n

O

P

MeO

BPRO

O

H+

B

O P

CMPT N

53

condensation

O O

O

B

B O

O

OH

N H

O

OMe

56

54 O HN BPRO :

N N

O O

N

HN OMe

N AMCbz

OMe

N N

TMS

O

O N N

O CMCbz

O NH

N N G

tse

NH2

N

O

T

Scheme 13 Stereocontrolled solid-phase synthesis of PB-DNAs employing the oxazaphospholidine monomer (53) bearing the acid-labile chiral auxiliary and amino-protecting groups

steric hindrance of the oxazaphospholidine ring and/or trityl-type protecting groups on the nucleobases. Hara et al. developed another strategy and synthesized an oxazaphospholidine monomer bearing a methoxy phenyl group at the 5-position of the oxazaphospholidine ring (Hara et al. 2019, Scheme 13). The introduction of an electrondonating group on the phenyl group enabled the formation of a secondary carbocation. Additionally, the amino groups of adenine and cytosine were protected with 4-methoxybenzyloxycarbonyl (MCbz) groups, which were successfully deprotected under detritylation reaction conditions. The amino group of guanine was unprotected because of the challenging introduction of the MCbz group, and the O6 position was protected by the 2-(trimethylsilyl)ethyl group to obtain the solubility of the monomer in an organic solvent. The monomers afforded dodecamers bearing four kinds of nucleobases in a stereoselective manner (Hara et al. 2019).

Conclusion Since the first report of the synthesis of dithymidine boranophosphate in 1990, different groups have reported the synthetic method and properties of PB-DNAs. Indeed, the challenge of synthesizing PB-DNAs hampers its application in antisense oligonucleotides, although efficient synthetic methods have been certainly established. Additionally, since the syntheses of chimeric DNAs bearing PB linkages

2284

K. Sato and T. Wada

and stereocontrolled PB-DNAs are becoming possible, the research on the properties of PB-DNAs as antisense oligonucleotides is expected to progress in the coming decades.

References Beaucage SL, Iyer RP (1992) Advances in the synthesis of oligonucleotides by the phosphoramidite approach. Tetrahedron 48(12):2223–2311. https://doi.org/10.1016/S0040-4020(01)88752-4 Brummel HA, Caruthers MH (2002) Chemical synthesis of an oligodeoxythymidylate containing boranephosphate and phosphate linkages. Tetrahedron Lett 43(5):749–751. https://doi.org/10. 1016/S0040-4039(01)02278-X Deleavey GF, Damha MJ (2012) Designing chemically modified oligonucleotides for targeted gene silencing. Chemistry and Biology, 19(8):937–954. https://doi.org/10.1016/j.chembiol.2012.07.011 Eckstein F (2014) Phosphorothioates, essential components of therapeutic oligonucleotides. Nucleic Acid Ther 24(6):374–387. https://doi.org/10.1089/nat.2014.0506 Gaffney BL, Jones RA (1982) A new strategy for the protection of deoxyguanosine during oligonucleotide synthesis. Tetrahedron Lett 23(22):2257–2260. https://doi.org/10.1016/S00404039(00)87315-3 Gołębiewska J, Rachwalak M, Jakubowski T, Romanowska J, Stawinski J (2018) Reaction of boranephosphonate diesters with amines in the presence of iodine: The case for the intermediacy of H-phosphonate derivatives. The Journal of Organic Chemistry, 83(10):5496–5505. https:// doi.org/10.1021/acs.joc.8b00419 Hall IH, Burnham BS, Rajendran KG, Chen SY, Sood A, Spielvogel BF, Shaw BR (1993) Hypolipidemic activity of boronated nucleosides and nucleotides in rodents. Biomed Pharmacother 47(2–3):79–87. https://doi.org/10.1016/0753-3322(93)90295-V Hara RI, Saito T, Kogure T, Hamamura Y, Uchiyama N, Nukaga Y, Iwamoto N, Wada T (2019) Stereocontrolled synthesis of boranophosphate DNA by an oxazaphospholidine approach and evaluation of its properties. J Org Chem 84(12):7971–7983. https://doi.org/10.1021/acs.joc. 9b00658 Higashida R, Oka N, Kawanaka T, Wada T (2009) Nucleoside H-boranophosphonates: a new class of boron-containing nucleotide analogues. Chemical Communications, 7345(18), 2466. https:// doi.org/10.1039/b901045a Higson AP, Sierzchala A, Brummel H, Zhao Z, Caruthers MH (1998) Synthesis of an oligothymidylate containing boranophosphate linkages. Tetrahedron Lett 39(23):3899–3902. https://doi. org/10.1016/S0040-4039(98)00687-X Iannitti T, Morales-Medina J, Palmieri B (2014) Phosphorothioate oligonucleotides: Effectiveness and toxicity. Current Drug Targets, 15(7):663–673. https://doi.org/10.2174/ 1389450115666140321100304 Iwamoto N, Oka N, Sato T, Wada T (2009) Stereocontrolled solid-phase synthesis of oligonucleoside H-phosphonates by an oxazaphospholidine approach. Angew Chem Int Ed 48(3):496–499. https://doi.org/10.1002/anie.200804408 Iwamoto N, Oka N, Wada T (2012) Stereocontrolled synthesis of oligodeoxyribonucleoside boranophosphates by an oxazaphospholidine approach using acid-labile N-protecting groups. Tetrahedron Lett 53(33):4361–4364. https://doi.org/10.1016/j.tetlet.2012.06.015 Iyer RP, Yu D, Ho N-H, Tan W, Agrawal S (1995) A novel nucleoside phosphoramidite synthon derived from 1R, 2S-ephedrine. Tetrahedron Asymmetry 6(5):1051–1054. https://doi.org/10. 1016/0957-4166(95)00122-6 Kawanaka T, Shimizu M, Wada T (2007) Synthesis of dinucleoside phosphates and their analogs by the boranophosphotriester method using azido-based protecting groups. Tetrahedron Lett 48(11):1973–1976. https://doi.org/10.1016/j.tetlet.2007.01.064

70

New Molecular Technologies for Oligonucleotide Therapeutics-1:. . .

2285

Kawanaka T, Shimizu M, Shintani N, Wada T (2008) Solid-phase synthesis of backbone-modified DNA analogs by the boranophosphotriester method using new protecting groups for nucleobases. Bioorg Med Chem Lett 18(13):3783–3786. https://doi.org/10.1016/j.bmcl.2008. 05.053 Li H, Hardin C, Shaw BR (1996). Hydrolysis of thymidine boranomonophosphate and stepwise deuterium substitution of the borane hydrogens. 31 P and 11 B NMR studies. Journal of the American Chemical Society, 118(28):6606–6614. https://doi.org/10.1021/ja9540280. Li P, Sergueeva ZA, Dobrikov M, Shaw BR (2007) Nucleoside and oligonucleoside boranophosphates: chemistry and properties. Chem Rev 107(11):4746–4796. https://doi.org/ 10.1021/cr050009p Lin J, Shaw BR (2001) Synthesis of new classes of boron-containing nucleotides. Nucleosides Nucleotides Nucleic Acids 20(4–7):587–596. https://doi.org/10.1081/NCN-100002335 McCuen HB, Noé MS, Sierzchala AB, Higson AP, Caruthers MH (2006) Synthesis of mixed sequence borane phosphonate DNA. J Am Chem Soc 128(25):8138–8139. https://doi.org/10. 1021/ja061757e McCuen HB, Noe MS, Olesiak M, Sierzchala AB, Caruthers MH, Higson AP (2008) Synthesis and biochemical activity of new oligonucleotide analogs. Phosphorus Sulfur Silicon Relat Elem 183(2–3):349–363. https://doi.org/10.1080/10426500701734745 Nukaga Y, Takemura T, Iwamoto N, Oka N, Wada T (2015) Enhancement of the affinity of 20 -OMe-oligonucleotides for complementary RNA by incorporating a stereoregulated boranophosphate backbone. RSC Adv 5(4):2392–2395. https://doi.org/10.1039/C4RA11335G Oka N, Wada T, Saigo K (2003) An oxazaphospholidine approach for the stereocontrolled synthesis of oligonucleoside phosphorothioates. J Am Chem Soc 125(27):8307–8317. https://doi.org/10. 1021/ja034502z Oka N, Shimizu M, Saigo K, Wada T (2006) 1,3-Dimethyl-2-(3-nitro-1,2,4-triazol-1-yl)-2pyrrolidin-1-yl-1,3,2-diazaphospholidinium hexafluorophosphate (MNTP): a powerful condensing reagent for phosphate and phosphonate esters. Tetrahedron 62(15):3667–3673. https://doi.org/10.1016/j.tet.2006.01.084 Oka N, Yamamoto M, Sato T, Wada T (2008) Solid-phase synthesis of stereoregular oligodeoxyribonucleoside phosphorothioates using bicyclic oxazaphospholidine derivatives as monomer units. J Am Chem Soc 130(47):16031–16037. https://doi.org/10.1021/ja805780u Oka N, Maizuru Y, Shimizu M, Wada T (2010) Solid-phase synthesis of oligodeoxyribonucleotides without base protection utilizing O-selective reaction of oxazaphospholidine derivatives. Nucleosides Nucleotides Nucleic Acids 29(2):144–154. https://doi.org/10.1080/15257771003612839 Paul S, Roy S, Monfregola L, Shang S, Shoemaker R, Caruthers MH (2015) Oxidative substitution of boranephosphonate diesters as a route to post-synthetically modified DNA. J Am Chem Soc 137(9):3253–3264. https://doi.org/10.1021/ja511145h Roy S, Olesiak M, Shang S, Caruthers MH (2013) Silver nanoassemblies constructed from boranephosphonate DNA. J Am Chem Soc 135(16):6234–6241. https://doi.org/10.1021/ ja400898s Sato K, Imai H, Shuto T, Hara RI, Wada T (2019) Solid-phase synthesis of phosphate/ boranophosphate chimeric DNAs using the H-phosphonate– H-boranophosphonate method [research-article]. J Org Chem 84(23):15032–15041. https://doi.org/10.1021/acs.joc.9b01257 Sergueev DS, Shaw BR (1998) H-phosphonate approach for solid-phase synthesis of oligodeoxyribonucleoside boranophosphates and their characterization. J Am Chem Soc 120(37): 9417–9427. https://doi.org/10.1021/ja9814927 Sergueeva ZA, Sergueev DS, Shaw BR (1999) Synthesis of dithymidine boranophosphates via stereospecific boronation of H-phosphonate diesters and assignment of their configuration. Tetrahedron Lett 40(11):2041–2044. https://doi.org/10.1016/S0040-4039(99)00088-X Sergueeva ZA, Sergueev DS, Shaw BR (2001) Borane-amine complexes – versatile reagents in the chemistry of nucleic acids and their analogs. Nucleosides Nucleotides Nucleic Acids 20(4–7):941–945. https://doi.org/10.1081/NCN-100002464

2286

K. Sato and T. Wada

Shimizu M, Tamura K, Wada T, Saigo K (2004a) BH 3 as a protecting group for phosphonic acid: a novel method for the synthesis of dinucleoside H-phosphonate. Tetrahedron Lett 45(2): 371–374. https://doi.org/10.1016/j.tetlet.2003.10.158 Shimizu M, Wada T, Oka N, Saigo K (2004b) A novel method for the synthesis of dinucleoside boranophosphates by a boranophosphotriester method. J Org Chem 69(16):5261–5268. https:// doi.org/10.1021/jo0493875 Shimizu M, Saigo K, Wada T (2006) Solid-phase synthesis of oligodeoxyribonucleoside boranophosphates by the boranophosphotriester method. J Org Chem 71(11):4262–4269. https://doi.org/10.1021/jo0603779 Sood A, Shaw BR, Spielvogel BF (1990) Boron-containing nucleic acids. 2. Synthesis of oligodeoxynucleoside boranophosphates. J Am Chem Soc 112(24):9000–9001. https://doi. org/10.1021/ja00180a066 Takahashi Y, Sato K, Wada T (2022) Solid-phase synthesis of boranophosphate/phosphorothioate/ phosphate chimeric oligonucleotides and their potential as antisense oligonucleotides. J Org Chem 87(6):3895–3909. https://doi.org/10.1021/acs.joc.1c01812 Uehara S, Hiura S, Higashida R, Oka N, Wada T (2014) Solid-phase synthesis of P-boronated oligonucleotides by the H-boranophosphonate method. J Org Chem 79(8):3465–3472. https:// doi.org/10.1021/jo500185b Wada T, Sato Y, Honda F, Kawahara S, Sekine M (1997) Chemical synthesis of oligodeoxyribonucleotides using N-unprotected H-phosphonate monomers and carbonium and phosphonium condensing reagents: O-selective phosphonylation and condensation. J Am Chem Soc 119(52): 12710–12721. https://doi.org/10.1021/ja9726015 Wada T, Shimizu M, Oka N, Saigo K (2002) A new boranophosphorylation reaction for the synthesis of deoxyribonucleoside boranophosphates. Tetrahedron Lett 43(23):4137–4140. https://doi.org/10.1016/S0040-4039(02)00780-3 Winkler J, Stessl M, Amartey J, Noe CR (2010) Off-target effects related to the phosphorothioate modification of nucleic acids. Chem Med Chem, 5(8):1344–1352. https://doi.org/10.1002/ cmdc.201000156. Zain R, Stawiński J (1996) Nucleoside H-phosphonates. 17. Synthetic and 31 P NMR studies on the preparation of dinucleoside H-phosphonothioates. J Org Chem 61(19):6617–6622. https://doi. org/10.1021/jo960810m Zhang J, Terhorst T, Matteucci MD (1997) Synthesis and hybridization study of a boranophosphatelinked oligothymidine deoxynucleotide. Tetrahedron Lett 38(28):4957–4960. https://doi.org/10. 1016/S0040-4039(97)01090-3 Zhu X-X, Cai M-S, Zhou R-L (1997) Studies on the synthesis of two tetrasaccharides and the reactivity difference between them. Carbohydr Res 303(3):261–266. https://doi.org/10.1016/ S0008-6215(97)00169-9

Extracellular Vesicle-Mediated CRISPR/Cas Delivery: Their Applications in Molecular Imaging and Precision Biomedicine

71

Dong Bingxue, Lang Wenchao, and Bengang Xing

Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Extracellular Vesicle Platforms for Targeted CRISPR/Cas Delivery . . . . . . . . . . . . . . . . . . . . . . Delivery of CRISPR/Cas with Various Genome Editing Modes via Extracellular Vesicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biomedical Applications via Extracellular Vesicle-Mediated CRISPR/Cas Delivery . . . . . Conclusion and Future Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2288 2290 2296 2299 2304 2305

Abstract

In this book chapter, we provide a comprehensive overview of the latest breakthroughs of delivery systems of CRISPR-Cas via extracellular vesicles (EVs) towards precise therapeutics and diagnosis. Typically, representative examples that reflect a rapid progress in applications of EVs for CRISPR/Cas are systematically provided. We highlight important principles of various extracellular delivery systems including the strategies through endogenous extracellular vesicles, engineered extracellular vesicles, and hydride extracellular vesicles which show the promising merits to overcome the inherent targeting biases and immunogenicity of conventional delivery systems. In light of the great therapeutic potential of CRISPR/Cas technology, we have further reviewed and compared D. Bingxue · L. Wenchao Division of Chemistry and Biological Chemistry, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore e-mail: [email protected]; [email protected] B. Xing (*) Division of Chemistry and Biological Chemistry, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2023 N. Sugimoto (ed.), Handbook of Chemical Biology of Nucleic Acids, https://doi.org/10.1007/978-981-19-9776-1_74

2287

2288

D. Bingxue et al.

extracellular vesicle cargo systems towards delivering CRISPR/Cas in various approaches including their forms based on DNA, mRNA, and ribonucleoprotein (RNP). Importantly, this book chapter also covers a series of conspicuous applications of EV-based CRISPR-Cas platforms lying in therapeutic and diagnosis fields. Moreover, the adoption of machine learning and artificial intelligence to the new EV-based CRISPR-Cas design for precise disease therapy and diagnosis was also discussed. Finally, the challenges and the future perspectives of CRISPR/Cas delivery system via extracellular vesicles for this promising field are outlined. In summary, EV-based CRISPR-Cas system is spurring a revolution for gene editing technology and early-stage theranostics with the unlimited potential to apply CRISPR/Cas9 technology in clinic applications. Keywords

CRISPR/Cas delivery · Extracellular vesicles · Molecular imaging · Precise gene therapy

Introduction The clustered regularly interspaced short-palindromic repeat (CRISPR)-associated protein 9 (CRISPR-Cas9) nuclease system was originated from the adaptive immune system in archaea and bacteria as a defense mechanism to cleave invading bacteriophage DNA (Jinek et al. 2012; Cong et al. 2013; Hwang et al. 2013; Mali et al. 2013). Two critical components of the CRISPR-Cas9 gene editing system are small guide RNA (gRNA) and Cas9 endonuclease. The guide RNA specifically recognizes the genomic locus of the pairing bases in the corresponding DNA and precisely directs the Cas9 nuclease to cleave the recognized DNA sequence. The natural sgRNA contains two essential parts: CRISPR RNA (crRNA) and trans-activating CRISPR RNA (tracrRNA). The same Cas9 endonuclease can be used with different guide RNA targeting at different complementary genomic sequences, equipping the system with flexibility and versatility for different applications. Due to the robustness, target specificity, and programmability, this revolutionary gene editing technology has opened up a new avenue in the field of biomedical research and has raised the prospect of widespread applications in gene therapeutics, molecular imaging, and disease diagnostics. This newly developed precise CRISPR-Cas technologies offer great therapeutic potential in various hereditary diseases and metabolic disease diagnostics owing to their capacity to induce precise changes in the genome. Albeit the major progress and promising traits of CRISPR/Cas genome editing machinery, one of the biggest factors that hamper the application of this powerful genome edification tool hinges on the delivery efficiency and safety into the targeted cells in vivo and in vitro. The components of CRISPR/Cas machinery are necessary to be transferred into a targeted cell nucleus. Thus, development of effective and systemic delivery system of CRISPR-Cas which overcomes the biological barriers remains elusive. Safe, effective, and stable delivery of CRISPR-Cas9 systems to

71

Extracellular Vesicle-Mediated CRISPR/Cas Delivery: Their Applications. . .

2289

targeted cells and tissues is crucial to achieve a therapeutic success in in vivo genome editing. At present, different prevalent and promising delivery technologies have been serendipitously attempted for targeted CRISPR-Cas delivery. Economical physical delivery methods such as microinjection and electroporation directly causing cell membrane perturbations have been used in vitro on different cell lines. However, such methods either require highly skilled manipulation or cause inevitable damage to cells. The incontinence and toxicity have hampered their virtual applications in vivo and in zygotes. Despite the high transfection efficiency of conventional viral delivery systems including recombinant adeno-associated viral vectors, lentiviral vectors, the wide clinical application of the viral delivery system was impeded by their non-transient insertion of viral genomes into host chromosomes, prolonged expression, limited packaging capacity (