Machine Learning-Augmented Spectroscopies for Intelligent Materials Design (Springer Theses) 303114807X, 9783031148071

139 95 10MB

English Pages [106]

Table of contents :
Supervisor's Foreword
Acknowledgments
Parts of This Thesis Have Been Published in the Following Journal Articles and Preprints
Contents
1 Introduction
1.1 Neutron and Photon Scattering and Spectroscopy
1.2 Integration of Machine Learning
1.3 Thesis Objectives
References
2 Background
2.1 Neutron and Photon Scattering and Spectroscopy
2.1.1 Inelastic Neutron Scattering
2.1.2 Raman Spectroscopy
2.1.3 Polarized Neutron Reflectometry
2.1.4 X-ray Absorption Spectroscopy
2.2 Data-Driven Methods
2.2.1 Dimensionality Reduction
Singular Value Decomposition
Principal Component Analysis
Non-negative Matrix Factorization
2.2.2 Machine Learning
Support Vector Machines
Neural Networks
References
3 Data-Efficient Learning of Materials' Vibrational Properties
3.1 Introduction
3.2 Materials Data Representations
3.3 Euclidean Neural Networks
3.3.1 Graph Representation of Crystal Structures
3.3.2 Network Operations
3.4 Phonon DoS Prediction
3.4.1 Data Processing
3.4.2 Results
3.4.3 Comparison with Experiment
3.4.4 High-CV Materials Discovery
3.4.5 Partial Phonon Density of States
3.4.6 Alloys and Strained Compounds
3.5 Unsupervised Representation Learning of Vibrational Spectra
3.5.1 Dimensionality Reduction
3.5.2 Data Processing Methods
3.5.3 Results
3.6 Conclusion
References
4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron Reflectometry Measurements
4.1 Introduction
4.2 Polarized Neutron Reflectometry
4.3 Variational Autoencoder
4.3.1 VAE-Based PNR Parameter Retrieval
4.3.2 Data Preparation
4.3.3 Results
4.4 Resolving Interfacial AFM Coupling
4.5 Discussion
4.6 Conclusion
References
5 Machine Learning Spectral Indicators of Topology
5.1 Introduction
5.2 Topological Materials Discovery
5.3 Data Preparation and Pre-processing
5.4 Exploratory Analysis
5.5 Results
5.6 Conclusion
References
6 Conclusion and Outlook
6.1 Thesis Summary
6.2 Perspectives and Outlook
Reference

Recommend Papers

Computational and Machine Learning Tools for Archaeological Site Modeling (Springer Theses) 9783030885663, 9783030885670, 3030885666

This book describes a novel machine-learning based approach to answer some traditional archaeological problems, relati

114 65 11MB Read more

Computational and Machine Learning Tools for Archaeological Site Modeling (Springer Theses) 9783030885670, 9783030885663, 3030885674

123 2 83MB Read more

Organic Semiconductor Devices for Light Detection (Springer Theses) 3030944638, 9783030944636

In recent decades, the way human beings interact with technology has been significantly transformed. In our daily life,

100 74 11MB Read more

Integrated Electronics on Aluminum Nitride: Materials and Devices (Springer Theses) 3031171985, 9783031171987

This thesis outlines the principles, device physics, and technological applications of electronics based on the ultra-wi

115 57 33MB Read more

Spin Dynamics in Two-Dimensional Quantum Materials: A Theoretical Study (Springer Theses) 3030861139, 9783030861131

This thesis focuses on the exploration of nontrivial spin dynamics in graphene-based devices and topological materials,

119 22 5MB Read more

Hartree-Fock-Slater Method for Materials Science: The DV-X Alpha Method for Design and Characterization of Materials (Springer Series in Materials Science, 84) 3540245081, 9783540245087

Molecular-orbital calculations for materials design such as alloys, ceramics, and coordination compounds are now possibl

108 69 6MB Read more

A Digital Signal Processor for Particle Detectors: Design, Verification and Testing (Springer Theses) [1st ed. 2021] 3030715582, 9783030715588

To cope with the new running conditions in the ALICE experiment at the Large Hadron Collider at CERN, a new integrated c

119 94 5MB Read more

Advances in Nonlinear Observer Design for State and Parameter Estimation in Energy Systems (Springer Theses) [1st ed. 2023] 3031389239, 9783031389238

This book reports on a set of advances relating to nonlinear observer design, with a special emphasis on high-gain obser

121 89 4MB Read more

Intelligent Materials and Structures 9783110338027

Intelligent Materials and Structures provides exceptional insights into designing intelligent materials and structures f

173 18 18MB Read more

Intelligent Materials and Structures 9783110338027

Intelligent Materials and Structures provides exceptional insights into designing intelligent materials and structures f

149 55 23MB Read more

Machine Learning-Augmented Spectroscopies for Intelligent Materials Design (Springer Theses)
303114807X, 9783031148071

Author / Uploaded
Nina Andrejevic

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Springer Theses Recognizing Outstanding Ph.D. Research

Nina Andrejevic

Machine Learning-Augmented Spectroscopies for Intelligent Materials Design

Springer Theses Recognizing Outstanding Ph.D. Research

Aims and Scope The series “Springer Theses” brings together a selection of the very best Ph.D. theses from around the world and across the physical sciences. Nominated and endorsed by two recognized specialists, each published volume has been selected for its scientific excellence and the high impact of its contents for the pertinent field of research. For greater accessibility to non-specialists, the published versions include an extended introduction, as well as a foreword by the student’s supervisor explaining the special relevance of the work for the field. As a whole, the series will provide a valuable resource both for newcomers to the research fields described, and for other scientists seeking detailed background information on special questions. Finally, it provides an accredited documentation of the valuable contributions made by today’s younger generation of scientists. Theses may be nominated for publication in this series by heads of department at internationally leading universities or institutes and should fulfill all of the following criteria • They must be written in good English. • The topic should fall within the confines of Chemistry, Physics, Earth Sciences, Engineering and related interdisciplinary fields such as Materials, Nanoscience, Chemical Engineering, Complex Systems and Biophysics. • The work reported in the thesis must represent a significant scientific advance. • If the thesis includes previously published material, permission to reproduce this must be gained from the respective copyright holder (a maximum 30% of the thesis should be a verbatim reproduction from the author’s previous publications). • They must have been examined and passed during the 12 months prior to nomination. • Each thesis should include a foreword by the supervisor outlining the significance of its content. • The theses should have a clearly defined structure including an introduction accessible to new PhD students and scientists not expert in the relevant field. Indexed by zbMATH.

Nina Andrejevic

Machine Learning-Augmented Spectroscopies for Intelligent Materials Design Doctoral Thesis accepted by Massachusetts Institute of Technology, USA

Nina Andrejevic Argonne National Laboratory Lemont, IL, USA

ISSN 2190-5053 ISSN 2190-5061 (electronic) Springer Theses ISBN 978-3-031-14807-1 ISBN 978-3-031-14808-8 (eBook) https://doi.org/10.1007/978-3-031-14808-8 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Supervisor’s Foreword

In materials science and condensed matter physics research, the interpretation of data plays a crucial role in bringing new knowledge to an experiment. For instance, it was the correct interpretation of neutron diffraction data that led to the discovery of antiferromagnetic ordering, which was theoretically predicted by Louis Néel but had once been considered an impossible ground state for a quantum system. However, the interpretation of experimental data does not always come easily. It is not uncommon that one phenomenon may be reasonably explained by multiple mechanisms, and there are numerous factors—both intrinsic and extrinsic—that can lead to different experimental outcomes. This challenge is becoming increasingly important to address as experimental apparatuses become more powerful and generate more data than ever, calling for streamlined and unbiased data analysis tools to facilitate the interpretation of big data. Dr. Nina Andrejevic’s thesis demonstrates the possibilities for enhancing various materials characterization techniques by approaching experimental data analysis from the perspective of machine learning. Analyzing experimental data by overlaying them onto a theoretical curve is a common data fitting practice for experimentalists. However, the sensitivity of conventional fitting approaches is a long-standing problem in many applications. Nina demonstrated that machine learning methods can help improve the fitting of polarized neutron reflectometry data by leveraging the learned low-dimensional latent space, enabling a factor of two improvement in resolution to elucidate an elusive magnetic effect with vast spintronic applications. Additionally, machine learning can help clarify the hidden links between two quantities with complex relationships. For example, the past decade has witnessed a surge of topological materials research, yet experimental verification of electronic band topology has been highly nontrivial and requires extensive effort. Nina’s work develops a neural network classifier of band topology that reveals an unexpected link between topology and a much simpler experimental probe, and achieves an accuracy of 90%. This discovery raises additional questions such as the possible theoretical foundation for such a connection that may be explored in future work. This highlights the beauty of intersecting materials’ characterization and machine learning: Data-driven insights can prompt us to revisit v

vi

Supervisor’s Foreword

and reinterpret experimental outcomes from a fresh perspective. In light of this, Dr. Andrejevic’s thesis contains a few key insights to inform experimental design and materials discovery and advance the boundary of the grand question, what are measurables? Norman C. Rasmussen Assistant Professor of Nuclear Science and Engineering Massachusetts Institute of Technology Cambridge, MA, USA

Mingda Li

Acknowledgments

This thesis work reflects the incredible support and wisdom of so many who have been an integral part of my graduate studies at MIT. First and foremost, I would like to thank my advisor, Professor Mingda Li, for embarking on this unique journey with me. Thank you for your consistent support and mentorship, and your willingness to pursue and nurture a variety of research interests in our group. I would also like to thank my committee members, Professor Silvija Gradeˇcak and Professor Jeffrey Grossman, for their valuable insight and guidance over the course of my Ph.D. I would like to express my sincere gratitude to the incredible collaborators with whom I had the pleasure of working: Professor Tess Smidt and Professor Jing Kong for the phonon DoS project; Dr. Maria Chan and Dr. Michael Davis for the Raman project; and Professor Chris Rycroft for the XAS project. I have learned so much through your insight and expertise in a broad range of disciplines. In addition, this work would not have been possible without the assistance of our collaborators at scientific user facilities: Dr. Ahmet Alatas, Dr. Jeffrey Lynn, Dr. Valeria Lauter, and Dr. Alexander Grutter. I am incredibly fortunate to have been surrounded by amazing colleagues and group members during my Ph.D. I would like to thank Dr. Fei Han, Dr. Ricardo Pablo-Pedro, Dr. Yoichiro Tsurimaki, Thanh Nguyen, Zhantao Chen, Tongtong Liu, and Nathan Drucker for their advice, friendship, and countless helpful discussions over the past several years. I am deeply grateful to the amazing friends who made my graduate years so memorable. A special thank you to Elliott Kim, Vrindaa Somjit, Haeyeon Lee, Thanh Nguyen, and Jack Zhao for their encouragement and friendship, and the many dinners, game nights, and activities that brought so much fun and joy to my life outside of MIT. I am especially thankful to Yen-Ting Chi for his unfailing love and support over the past 5 years. Last but not least, I owe many thanks to my family for their constant faith and support, no matter our distance. To my sister, Jovana, for her inspiring optimism and courage to face new challenges. And to my parents, Vladimir and Ivana, for their strength and unconditional love. vii

viii

Acknowledgments

The work described in this thesis was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. 1122374, and the U.S. DOE, Office of Science (SC), Basic Energy Sciences (BES) under award No. DESC0020148.

Parts of This Thesis Have Been Published in the Following Journal Articles and Preprints

1. Chen, Z. et al. Machine learning on neutron and X-ray scattering and spectroscopies. Chemical Physics Reviews 2, 031301 (2021). 2. Chen, Z. et al. Direct prediction of phonon density of states with Euclidean neural networks. Advanced Science, 2004214 (2021). 3. Andrejevic, N. et al. Elucidating proximity magnetism through polarized neutron reflectometry and machine learning. Applied Physics Reviews 9, 011421 (2022). 4. Andrejevic, N., Andrejevic, J., Rycroft, C. H. & Li, M. Machine learning spectral indicators of topology. arXiv preprint arXiv:2003.00994 (2020).

ix

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Neutron and Photon Scattering and Spectroscopy. . . . . . . . . . . . . . . . . . . . . . 1.2 Integration of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 5 6

2

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Neutron and Photon Scattering and Spectroscopy. . . . . . . . . . . . . . . . . . . . . . 2.1.1 Inelastic Neutron Scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Raman Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Polarized Neutron Reflectometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 X-ray Absorption Spectroscopy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Data-Driven Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 9 9 13 15 19 21 21 23 26

3

Data-Efficient Learning of Materials’ Vibrational Properties . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Materials Data Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Euclidean Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Graph Representation of Crystal Structures . . . . . . . . . . . . . . . . . . . . 3.3.2 Network Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Phonon DoS Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Comparison with Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.4 High-CV Materials Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.5 Partial Phonon Density of States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.6 Alloys and Strained Compounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29 29 31 32 32 33 34 34 36 37 39 41 42

xi

xii

Contents

3.5 Unsupervised Representation Learning of Vibrational Spectra . . . . . . . . 3.5.1 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Data Processing Methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44 45 46 47 51 52

Machine Learning-Assisted Parameter Retrieval from Polarized Neutron Reflectometry Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Polarized Neutron Reflectometry. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Variational Autoencoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 VAE-Based PNR Parameter Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Resolving Interfacial AFM Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57 57 58 60 61 64 66 70 72 73 74

5

Machine Learning Spectral Indicators of Topology. . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Topological Materials Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Data Preparation and Pre-processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Exploratory Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79 79 81 83 83 85 90 90

6

Conclusion and Outlook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Thesis Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Perspectives and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95 95 96 97

4

Chapter 1

Introduction

Abstract Neutron and photon scattering and spectroscopy represent two fundamental categories of characterization techniques used to interrogate materials’ structural and dynamical properties at atomic to mesoscopic length scales. As advances at scientific user facilities enable the collection of ever larger data volumes in higher-dimensional parameter spaces, the design, analysis, and interpretation of such experiments becomes both increasingly valuable and complex. At the same time, interest in novel functional and quantum materials for next-generation technologies, including dissipationless electronics, energy harvesting, and quantum computing, demands an understanding of unconventional or emergent properties beyond the scope of many approximate models. Machine learning methods are designed to leverage large, high-dimensional datasets in order to detect underlying patterns and make informed predictions on related tasks. In this chapter, we introduce the basic principles for integration of these data-driven methods with neutron and photon spectroscopies, setting the stage for the applications addressed in this thesis.

1.1 Neutron and Photon Scattering and Spectroscopy A central objective in condensed matter research is to understand, manipulate, and ultimately control materials properties toward the development of next-generation technologies for energy-efficient electronics, energy harvesting, and quantum computation. Materials characterization is an integral part of this endeavor, comprising a broad range of experimental techniques which leverage specific interactions with matter to provide distinct and often complementary insights to materials’ structure and dynamics. Neutron and photon spectroscopies represent two essential technologies facilitating materials’ characterization down to near-fundamental time and length scales. Since the pioneering work of the 1950s, scientists have harnessed synchrotron radiation with improved precision and efficiency through decades of technological advances in beamline instrumentation and optics, contributing to orders of magnitude increases in beam brightness and resolution [1]. Together with the development of powerful experimental techniques such as X-ray absorption, © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Andrejevic, Machine Learning-Augmented Spectroscopies for Intelligent Materials Design, Springer Theses, https://doi.org/10.1007/978-3-031-14808-8_1

1

2

1 Introduction

photoelectron spectroscopy and holography, and inelastic X-ray scattering, among many others, X-ray science at scientific user facilities has revolutionized materials research in a wide range of disciplines. Similarly, neutron scattering techniques have increased in sensitivity and scope over the past several decades [2], giving rise to a variety of tailored methodologies and providing unprecedented insight into the rich interplay between spin, charge, and lattice degrees of freedom. A consequence of these technical advances is a rapidly growing data stream, requiring innovative computational tools for data processing, analysis, and interpretation at the scale of big data. Additionally, higher brightness and sensitivity enable measurement of more diverse types of materials and at higher dimensions in a single scattering experiment. The emerging frontier of multimodal scattering, which simultaneously measures samples with multiple probes or under diverse in situ environments, introduces additional dimensions to the measured parameter space and adds inevitable complexities to the data analysis process [3]. However, as materials research is traditionally an iterative process in which theory, calculation, and/or experiments establish structure-property relationships that inform materials design, human capabilities are often limited to analyzing a well-defined subset of data, or recommending only a few candidate systems at a time. This suggests a need for time- and data-efficient methodologies to accelerate and broaden materials innovation. In its Roundtable on Producing and Managing Large Scientific Data [4], the Office of Basic Energy Sciences identifies the extraction of robust and meaningful information from vast, complex data produced at scientific user facilities as one of its primary research objectives, facilitated in large part by artificial intelligence and machine learning. Machine learning has already been widely applied to materials research, enabling prediction of materials’ mechanical [5–8], thermodynamic [7, 9–11], and electronic properties [5, 12–18] directly from structural inputs; acquisition of interatomic force fields and potential energy surfaces competitive with first-principles calculations at much lower computational cost [19– 24]; analysis of neutron and X-ray diffraction [25–31] and small-angle scattering [32–40] experiments to recover underlying structures; and image reconstruction [41–44] and segmentation [45–47] of neutron and X-ray tomography, ptychography, and phase-contrast imaging series. A paradigm shift in our approach to materials data analysis is clearly underway, empowered by complementary developments in machine learning architectures and algorithms [48–54].

1.2 Integration of Machine Learning Due to their flexibility and predictive power, machine learning models offer an avenue to accelerate and enhance data analysis, discover additional insights about complex datasets, and even refine experiments in operando. To illustrate the general relationship between machine learning and scattering experiments, we consider the prototypical scattering setup illustrated in Fig. 1.1a, in which a target sample scatters an incident photon or neutron beam with initial momentum k and energy E to a final

1.2 Integration of Machine Learning

3

Fig. 1.1 Machine learning in a neutron and X-ray scattering pipeline. (a) Schematic of a typical scattering setup to measure an observable Sexp . The typical workflows to recover materials properties from measured data through traditional fitting and machine learning methods are illustrated above. (b) Materials properties serving as input to a supervised machine learning model for direct prediction of a spectroscopic signature. (c) Spectral data serving as input to a supervised machine learning model for classification or property prediction tasks. (d) Spectral data as part of an unsupervised machine learning task to identify inherent patterns or clusters within the dataset, which may be correlated with physical parameters. Reproduced from [3], with the permission of AIP Publishing

momentum k + Q and energy E−ω and onto a detector, which records an observable Sexp (Q, ω, t, ...). For instance, spectroscopies like time-of-flight inelastic neutron scattering measure the dynamical structure factor in 4D momentum–energy (Q, ω) space, while X-ray photon correlation spectroscopy measures the intensity autocorrelation in 4D momentum–time (Q, t) space. These observables are often associated with theoretical models, Smodel (Q, ω, t, ...), parameterized by a set of fitting parameters p that specify the properties of the underlying materials system. Thus, the materials properties are typically extracted by obtaining the optimal fitting parameters popt through an optimization problem of the form popt = argmin L(Sexp (Q, ω, t, ...), Smodel (Q, ω, t, ...)),

(1.1)

p

where L is a loss function comparing the experimental and theoretical values of the observable. However, measurement of new functional and quantum materials often reveals novel or unexpected emergent properties that go beyond traditional models; yet, even with perfect agreement between Sexp (Q, ω, t, ...) and Smodel (Q, ω, t, ...), the information that can be extracted is ultimately limited by the theoretical model itself. By contrast, the flexibility of machine learning models provides a way to access materials properties outside the parameter space of available theoretical models. For instance, given a set of parameter-observable (p-S) pairs with an

4

1 Introduction

unknown relationship, supervised learning techniques can be used to establish a surrogate model, such as a neural network, relating the two quantities, which can subsequently be used to predict S from a new choice of p. This amounts to solving the forward problem of obtaining a measurable quantity from existing structural information through machine learning, which is illustrated in Fig. 1.1b for the case of spectrum prediction using materials structural or property data. Conversely, if disentangling p from a given observable is required, the network may be trained to predict a set of target parameters p using the observable Sexp (Q, ω, t, ...) as input, shown in Fig. 1.1c. This constitutes an inverse scattering problem, or the inversion of data to a structural solution or other physical model. The choice of perspective—forward or inverse—is often guided by the accessibility of information in either domain. However, in either case, the machine learning model is generally parameterized by a set of trainable weights θ, which are optimized using the training data to obtain a predictive model for the chosen task. For instance, in the example of Fig. 1.1c, the network weights would be optimized according to θ opt = argmin L(p, fN N (Sexp (Q, ω, t, ...))), θ

(1.2)

where fN N represents a sequence of neural network operations on the input data, and L is a suitable loss function comparing the predicted and target parameters p. In this case, while the relationship between the materials’ properties and measured observable is unknown, the relevant quantities (e.g. the relevant parameter set p) are identified beforehand to define the supervised learning task. However, one can also leverage unsupervised learning techniques such as dimensionality reduction, autoencoding, and clustering to build models that learn to both identify relevant physical parameters and perform predictions, as shown in Fig. 1.1d. Autoencoders represent one type of unsupervised learning framework in which the input is encoded in a lower-dimensional space and subsequently reconstructed from this reduced representation. For example, a network operating on the experimental observable Sexp (Q, ω, t, ...) could be optimized according to θ opt = argmin L(Sexp (Q, ω, t, ...), fN N (Sexp (Q, ω, t, ...))), θ

(1.3)

where L is a loss function comparing the original and reconstructed signals. In this setup, while a well-defined set of descriptors p may not be obvious from the raw measured data, the latent representation can offer a useful and potentially more transparent view of the relevant parameters underlying the observations. For example, the number of relevant latent dimensions may inform the complexity of the system, while the presence of distinct clusters in the latent space mapping of the input data may suggest the presence of different measured phases. This hints at the potential for machine learning to overcome certain shortcomings of approximate models and uncover hidden insights by learning directly from the data. However, while the Universal Approximation Theorem asserts that neural networks can approximate any function—even one that generates a complex and

1.3 Thesis Objectives

5

high-dimensional observable—these networks are often regarded as inherently black box methods without a clear understanding of how and if they encode any underlying mechanisms. Thus, a key consideration when applying machine learning methods to scientific problems is their interpretability. In particular, how do we incorporate scientific domain knowledge in the design of network architectures and algorithms, or discover scientific principles behind well-performing models, without diminishing their flexibility and predictive power? This question serves as a guiding principle for the development of machine learning models presented in this thesis.

1.3 Thesis Objectives The success of machine learning methods is intimately linked to the choice of architectures, algorithms, and data representations used for a given problem; that is, how do we construct fN N , select and optimize L, and represent S and p to extract meaningful insights from trained models? In this thesis, we aim to develop a set of frameworks for improved experimental design and analysis of photon and neutron scattering and spectroscopies motivated by this set of questions. Each chapter implements one of the three prototypical architectures introduced in Fig. 1.1 in the context of a specific experimental technique. First, in Chap. 3 we apply supervised machine learning to develop a surrogate model that predicts the vibrational properties of crystalline solids, typically measured through inelastic neutron and X-ray scattering, directly from the atomic masses and positions of their constituent atoms [55]. In addition to demonstrating rapid, high-throughput prediction at low computational cost to support experimental planning and/or screening of candidate materials, we show that our model captures essential physics without explicit training, enabled by the choice of neural network architecture that is both symmetry-aware and equivariant to Euclidean transformations [56– 60]. We further consider how machine learning methods can be applied to develop effective, low-dimensional representations of materials’ spectral signatures, exemplified through unsupervised representation learning of Raman spectra. Such learned representations are proposed as efficient prediction targets for supervised learning from relevant structural or chemical attributes, or as convenient parameter spaces for optimization of physical models. In Chap. 4, we implement a semi-supervised learning approach to improve upon conventional methods of parameter retrieval from polarized neutron reflectometry measurements [61]. In particular, we focus on elucidating the subtle interaction mechanisms in proximity-coupled heterostructures and show that the trained model learns an interpretable latent representation of the relevant parameters. Finally, in Chap. 5 we combine supervised and unsupervised learning methods to develop a neural network classifier of materials’ electronic band topology using X-ray absorption near-edge structure spectra [62]. This work illustrates the capabilities of machine learning methods to not only extract materials’ properties in the absence of direct analytical models, but also elucidate aspects of the spectroscopic signatures themselves. Through these examples, we aim to broaden

6

1 Introduction

the application scope of machine learning in the context of neutron and photon spectroscopies by providing novel machine learning frameworks for intelligent experimental design and analysis, which may help advance discoveries at scientific user facilities and the broader scientific community.

References 1. Yabashi, M., & Tanaka, H. (2017). The next ten years of X-ray science. Nature Photonics, 11, 12–14. 2. Taylor, A., et al. (2007). A route to the brightest possible neutron source? Science, 315, 1092– 1095. 3. Chen, Z., et al. (2021). Machine learning on neutron and x-ray scattering and spectro- scopies. Chemical Physics Reviews, 2, 031301. 4. Ratner, D., et al. BES roundtable on producing and managing large scientific data with artificial intelligence and machine learning, tech. rep. (DOESC Office of Basic Energy Sciences, 2019). 5. Xie, T. & Grossman, J. C. (2018). Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical Review Letters, 120, 145301. 6. Chen, C., Ye, W., Zuo, Y., Zheng, C. & Ong, S. P. (2019). Graph networks as a universal machine learning framework for molecules and crystals. Chemistry of Materials, 31, 3564– 3572. 7. Isayev, O., et al. (2017). Universal fragment descriptors for predicting properties of in- organic crystals. Nature Communications, 8, 1–12. 8. Pilania, G., Wang, C., Jiang, X., Rajasekaran, S., & Ramprasad, R. (2013). Accelerating materials property predictions using machine learning. Scientific Reports, 3, 1–6. 9. Carrete, J., Li, W., Mingo, N., Wang, S., & Curtarolo, S. (2014). Finding unprecedentedly low-thermal-conductivity half-Heusler semiconductors via high-throughput materials modeling. Physical Review X, 4, 011019. 10. Tawfik, S. A., Isayev, O., Spencer, M. J., & Winkler, D. A. (2020). Predicting thermal properties of crystals using machine learning. Advanced Theory and Simulations, 3, 1900208. 11. Van Roekeghem, A., Carrete, J., Oses, C., Curtarolo, S., & Mingo, N. (2016). High- throughput computation of thermal conductivity of high-temperature solid phases: the case of oxide and fluoride perovskites. Physical Review X, 6, 041061. 12. Dong, Y., et al. (2019). Bandgap prediction by deep learning in configurationally hybridized graphene and boron nitride. npj Computational Materials, 5, 1–8. 13. Meredig, B., et al. (2018). Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery. Molecular Systems Design & Engineering, 3, 819–825. 14. Stanev, V., et al. (2018). Machine learning modeling of superconducting critical temperature. npj Computational Materials, 4, 1–14. 15. Scheurer, M. S., & Slager, R. J. (2020). Unsupervised machine learning and band topology. Physical Review Letters, 124, 226401. 16. Ward, L., Agrawal, A., Choudhary A., & Wolverton, C. (2016). A general-purpose machine learning framework for predicting properties of inorganic materials. npj Computational Materials, 2, 1–7. 17. Zhuo, Y., Mansouri Tehrani, A., & Brgoch, J. (2018). Predicting the band gaps of inorganic solids by machine learning. The Journal of Physical Chemistry Letters, 9, 1668–1673. 18. Mortazavi, B., et al. (2020). Machine-learning interatomic potentials enable first-principles multiscale modeling of lattice thermal conductivity in graphene/borophene heterostructures. Materials Horizons, 7, 2359.

References

7

19. Botu, V., Batra, R., Chapman, J., & Ramprasad, R. (2017). Machine learning force fields: construction, validation, and outlook. The Journal of Physical Chemistry C, 121, 511–522. 20. Glielmo, A., Sollich, P., & De Vita, A. (2017). Accurate interatomic force fields via machine learning with covariant kernels. Physical Review B, 95, 214302. 21. Kruglov, I., Sergeev, O., Yanilkin, A., & Oganov, A. R. (2017). Energy-free machine learning force field for aluminum. Scientific Reports, 7, 1–7. 22. Li, Z., Kermode, J. R., & De Vita, A. (2015). Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. Physical Review Letters, 114, 096405. 23. Zhang, L., Lin, D. Y., Wang, H., Car, R., & Weinan, E. (2019). Active learning of uniformly accurate interatomic potentials for materials simulation. Physical Review Materials, 3, 023804. 24. Deringer, V. L., et al. (2021). Origins of structural and electronic transitions in disordered silicon. Nature, 589, 59–64. 25. Garcia-Cardona, C., et al. (2019). Learning to predict material structure from neutron scattering data, in 2019 IEEE International Conference on Big Data (Big Data) (pp. 4490– 4497). 26. Oviedo, F., et al. (2019). Fast and interpretable classification of small X-ray diffraction datasets using data augmentation and deep neural networks. npj Computational Materials, 5, 1–9. 27. Liu, C. H., Tao, Y., Hsu, D., Du, Q., & Billinge, S. J. (2019). Using a machine learning approach to determine the space group of a structure from the atomic pair distribution function. Acta Crystallographica Section A: Foundations and Advances, 75, 633–643. 28. Bai, J., et al. (2018). Phase mapper: Accelerating materials discovery with AI. AI Magazine, 39, 15–26. 29. Long, C., Bunker, D., Li, X., Karen, V., & Takeuchi, I. (2009). Rapid identification of structural phases in combinatorial thin-film libraries using X-ray diffraction and non-negative matrix factorization. Review of Scientific Instruments, 80, 103902. 30. Stanev, V., et al. (2018). Unsupervised phase mapping of X-ray diffraction data by nonnegative matrix factorization integrated with custom clustering. npj Computational Materials, 4, 1–10. 31. Venderley, J., et al. (2020). Harnessing interpretable and unsupervised machine learning to address big data from modern X-ray diffraction. Preprint. arXiv:2008.03275. 32. Franke, D., Jeffries, C. M., & Svergun, D. I. (2018). Machine learning methods for X-ray scattering data analysis from biomacromolecular solutions. Biophysical Journal, 114, 2485– 2492. 33. Demerdash, O., et al. (2019). Using small-angle scattering data and parametric machine learning to optimize force field parameters for intrinsically disordered proteins. Frontiers in Molecular Biosciences, 6, 64. 34. Hura, G. L., et al. (2019). Small angle X-ray scattering-assisted protein structure pre- diction in CASP13 and emergence of solution structure differences. Proteins: Structure, Function, and Bioinformatics, 87, 1298–1314. 35. Liu, S., et al. (2019). Convolutional neural networks for grazing incidence x-ray scattering patterns: thin film structure identification. MRS Communications, 9, 586–592. 36. Archibald, R. K., et al. (2020). Classifying and analyzing small-angle scattering data using weighted k nearest neighbors machine learning techniques. Journal of Applied Crystallography, 53, 326–334. 37. Chang, M. C., Wei, Y., Chen, W. R., & Do, C. (2020). Deep learning-based super- resolution for small-angle neutron scattering data: attempt to accelerate experimental workflow. MRS Communications, 10, 11–17. 38. Chen, Y. L., & Pollack, L. (2020). Machine learning deciphers structural features of RNA duplexes measured with solution X-ray scattering. IUCrJ, 7, 870. 39. Do, C., Chen, W. R., & Lee, S. (2020). Small angle scattering data analysis assisted by machine learning methods. MRS Advances, 5, 1577–1584. 40. He, H., Liu, C., & Liu, H. (2020). Model reconstruction from small-angle X-ray scattering data using deep learning methods. Iscience, 23, 100906.

8

1 Introduction

41. Micieli, D., Minniti, T., Evans, L. M., & Gorini, G. (2019). Accelerating neutron tomography experiments through artificial neural network based reconstruction. Scientific Reports, 9, 1–12. 42. Yang, X., et al. (2020). Tomographic reconstruction with a generative adversarial net- work. Journal of Synchrotron Radiation, 27, 486–493. 43. Cherukara, M. J., et al. (2020). AI-enabled high-resolution scanning coherent diffraction imaging. Applied Physics Letters, 117, 044103. 44. Scheinker, A., & Pokharel, R. (2020). Adaptive 3D convolutional neural network-based reconstruction method for 3D coherent diffraction imaging. Journal of Applied Physics, 128, 184901 . 45. Zhang, X. G., Xu, J. J., & Ge, G. Y. (2004). Defects recognition on X-ray images for weld inspection using SVM in Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No. 04EX826) (Vol. 6, pp. 3721–3725). 46. Rale, A. P., Gharpure, D., & Ravindran, V. (2009). Comparison of different ANN techniques for automatic defect detection in X-ray images, in 2009 International Conference on Emerging Trends in Electronic and Photonic Devices & Systems (pp. 193–197). 47. Zimmermann, J., et al. (2019). Deep neural networks for classifying complex features in diffraction images. Physical Review E, 99, 063309. 48. Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O., & Walsh, A. (2018). Machine learning for molecular and materials science. Nature, 559, 547–555. 49. Rupp, M. (2015). Machine learning for quantum mechanics in a nutshell. International Journal of Quantum Chemistry, 115, 1058–1073. 50. Mehta, P., et al. (2019). A high-bias, low-variance introduction to machine learning for physicists. Physics Reports, 810, 1–124. 51. Carleo, G., et al. (2019). Machine learning and the physical sciences. Reviews of Modern Physics, 91, 045002. 52. Batra, R., Song, L., & Ramprasad, R. (2020). Emerging materials intelligence ecosystems propelled by machine learning. Nature Reviews Materials, 1–24. 53. Suh, C., Fare, C., Warren, J. A., & Pyzer-Knapp, E. O. (2020). Evolving the materials genome: how machine learning is fueling the next generation of materials discovery. Annual Review of Materials Research, 50, 1–25. 54. Doucet, M., Archibald, R. K., & Heller, W. T. (2021). Machine learning for neutron reflectometry data analysis of two-layer thin films. Machine Learning: Science and Technology, 2, 035001. 55. Chen, Z., et al. (2021). Direct prediction of phonon density of states with Euclidean neural networks. Advanced Science, 8, 2004214. 56. Geiger, M., et al. (2020). e3nn: A modular framework for Euclidean neural networks, version 0.1.1. Dec. 2020. https://doi.org/10.5281/zenodo.5292912 57. Smidt, T. E., Geiger, M., & Miller, B. K. (2020). Finding symmetry breaking order parameters with euclidean neural networks. arXiv e-prints, arXiv:2007.02005. [cs.LG]. 58. Weiler, M., Geiger, M., Welling, M., Boomsma, W., & Cohen, T. (2018). 3D Steerable CNNs: learning rotationally equivariant features in volumetric data. Advances in Neural Information Processing Systems, 32, 10402–10413. 59. Miller, B. K., Geiger, M., Smidt, T. E., & Noé, F. (2020). Relevance of rotationally equivariant convolutions for predicting molecular properties. arXiv e-prints, arXiv:2008.08461. [cs.LG]. 60. Thomas, N., et al. (2018). Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. arXiv e-prints, arXiv:1802.08219. [cs.LG]. 61. Andrejevic, N., et al. (2022). Elucidating proximity magnetism through polarized neutron reflectometry and machine learning. Applied Physics Reviews, 9, 011421. 62. Andrejevic, N., Andrejevic, J., Rycroft, C. H., & Li, M. (2020). Machine learning spectral indicators of topology. Preprint. arXiv:2003.00994.

Chapter 2

Background

Abstract The primary effort of this thesis is to develop a set of machine learning workflows for improved experimental design and analysis of quintessential photon and neutron spectroscopic techniques. The areas of impact for machine learning methods are identified in the context of four distinct characterization techniques—inelastic X-ray and neutron scattering, Raman spectroscopy, polarized neutron reflectometry, and X-ray absorption spectroscopy—ubiquitous in materials research. In this chapter, we first provide a concise overview of the basic principles underlying these characterization methods, identifying the key challenges that call for data-driven insights. Then, we summarize several existing data-driven methodologies and introduce the fundamental building blocks of neural networks, which are implemented to address the identified challenges.

2.1 Neutron and Photon Scattering and Spectroscopy 2.1.1 Inelastic Neutron Scattering Inelastic scattering by neutrons and photons forms the basis for experimental determination of materials’ vibrational properties. Today, energy- and momentumresolved inelastic X-ray and neutron scattering can be used to map the phonon dispersion with meV resolution or obtain the phonon density of states through momentum integration. Additionally, inelastic scattering by visible light, i.e., Raman spectroscopy, is both a widely accessible and highly sensitive tool for interrogating phonon energies in the small wavevector limit. As the energies of visible light are of the order of ∼1 eV, well above the typical energies of optical phonons, the fractional change in energy of the photon during Raman scattering is small, and the technique is also sometimes termed “quasielastic” [1]. In this section, we first review the basic principles of inelastic neutron scattering, which possesses a theoretical foundation similar to its X-ray counterpart. These concepts are used to inform the selection and interpretation of machine learning models for predicting the phonon density of states in Chap. 3. This section is followed by a concise

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Andrejevic, Machine Learning-Augmented Spectroscopies for Intelligent Materials Design, Springer Theses, https://doi.org/10.1007/978-3-031-14808-8_2

9

10

2 Background

summary of Raman scattering theory, which guides the development of informative low-dimensional spectral representations addressed at the end of Chap. 3. Inelastic neutron scattering probes the dynamical structure factor S(Q, ω), where the accessible energy and wavevector transfers are limited by the corresponding conservation laws. Consider a neutron with incident wavevector k0 and energy E0 , which is scattered into a state with wavevector k and energy E through the emission of a phonon with wavevector q and energy h¯ ωq . By momentum conservation, the momentum transfer to the target is Q = k0 − k,

(2.1)

and by energy conservation, the corresponding energy transfer is h¯ ω = E0 − E.

(2.2)

We can denote the wavefunctions of the initial and final states of the composite probe-target system as |i = |ϕk0 ⊗ |ψi and |f = |ϕk ⊗ |ψf , respectively, where |ϕ and |ψ are the respective wavefunctions of the probe neutron and target systems. The interaction of neutrons with atomic nuclei is considered a strong interaction, with an interaction range of ∼10−15 m (1 fm). This is much smaller than the wavelength of thermal neutrons (∼1–10 Å), which allows for an approximation of the interaction potential between probe neutrons and scatterers in the target crystal as the sum of point-like sources [2], V (r) =

N 2π h¯ 2 bj δ(r − rj ), mn

(2.3)

j =1

where mn is the neutron mass, N is the number of scattering centers (nuclei), and bj is the nuclear scattering length of the nucleus at rj . For simplicity, we assume N identical scatterers with nuclear scattering length b in order to define, V (r) =

N N 2π h¯ 2 b δ(r − rj ) ≡ Vp (r − rj ), mn j =1

(2.4)

j =1

noting that the dependence of each interaction potential Vp depends only on the difference between the probe neutron and target coordinates. The probability per unit time of transition from state |i to state |f mediated by this interaction is then given by Fermi’s golden rule [1],

Pi→f

2 N 2π δ(Ef − Ei − h¯ ω), = V (r − r )|i f | p j h¯ j =1

(2.5)

2.1 Neutron and Photon Scattering and Spectroscopy

11

where Ei and Ef denote the energies of the initial and final target states, respec tively. We can explicitly compute the matrix elements of the form ϕk | N j =1 Vp (r− rj )|ϕk0 using the plane wave approximation of the neutron wavefunction, ϕk |

N

Vp (r − rj )|ϕk0 =

j =1

1 V

e−ik·r Vp (r − rj )eik0 ·r dr

1 = eiQ·rj V =

(2.6)

eiQ·r Vp (r)dr

h¯ 2 b 2π iQ·rj , e mn V

(2.7) (2.8)

where V is the normalization volume of the plane wave states, and we have substituted in the definition given in Eq. 2.4. This leads to the expression of the transition rate,

Pi→f

2 N h¯ 3 b2 (2π )3 iQ·rj = e |ψi δ(Ef − Ei − h¯ ω). ψf | 2 2 mn V

(2.9)

j =1

To obtain the transition rate for scattering the neutron probe from state |ϕk0 to |ϕk over all possible states of the target, we further perform a summation over the initial and final target states. For a scatterer in thermal equilibrium at a finite temperature T , the accessible states should be weighted by the probability pi = exp (−Ei /kB T )/ exp (−Ei /kB T ), giving the transition rate [1], Pk0 →k =

h¯ 3 b2 (2π )3 S(Q, ω), m2n V 2

(2.10)

where we have introduced the dynamical structure factor,

S(Q, ω) =

f,i

2 N pi ψf | eiQ·rj |ψi δ(Ef − Ei − h¯ ω). j =1

(2.11)

Note that the measured signal in inelastic neutron scattering experiments is more precisely the double differential scattering cross-section, given by Grosso and Parravicini [1], d 2σ k V 2 m2n k Pk0 →k = b2 S(Q, ω), = 3 3 dωdΩ k0 (2π ) h¯ k0

(2.12)

where Ω is the solid angle intercepting the scattered neutrons with specific momentum variation. The dynamical structure factor is conventionally expressed

12

2 Background

in the language of correlation functions, specifically the time and space Fourier transforms of the autocorrelation function of the probe-target coupling term. Recall from Eq. 2.3 that the neutron interaction potential is proportional to j δ(r − rj ), which has the space Fourier transform j eiQ·rj . We introduce the identity [1],

2 pi ψf |A|ψi δ(Ef −Ei − h¯ ω) =

f,i

1 2π h¯

∞

−∞

dte−iωt A† (0)A(t),

(2.13)

where A(t) = eiH t/h¯ Ae−iH t/h¯ is an operator in the Heisenberg representation, and ... denotes a thermal average. Then, S(Q, ω) can be rewritten as, 1 S(Q, ω) = 2π h¯

∞

−∞

dte−iωt

e−iQ·rj eiQ·rk (t) ,

(2.14)

j,k

iQ·r j . We can specialize the form of Eq. 2.14 for the case of where A = je scattering from a three-dimensional harmonic crystal, where rj now indicates an atomic position in the Bravais lattice, rj = Rj + uj , where Rj and uj are, respectively, the lattice translation vector and the atomic displacement away from the equilibrium position with amplitude, 1 uj = √ N q

h¯ i(q·Rj −ωq t) + aq† e−i(q·Rj −ωq t) , aq e 2Mωq

(2.15)

where M is the mass of the nucleus, and a † /a are the phonon creation/annihilation operators. Since uj is linear in the phonon creation and annihilation operators, we 2 2 can apply the identity eA eB = eA +B +2AB/2 to rewrite Eq. 2.14 as, S(Q, ω) =

N −2W iQ·Rk ∞ e dte−iωt eQ·u0 Q·uk (t) , e 2π h¯ −∞

(2.16)

Rk

factor with 2W = (Q · u0 )2 , and the double sum where e−2W is the Debye-Waller j,k is reduced to N Rk with Rj = 0 fixed by invoking the periodicity of the crystal lattice. We can see that by expanding the final exponential into a series, the dynamical structure factor decomposes naturally into a sum of scattering processes, with the n-th term giving the contribution of the n-phonon process, S(Q, ω) =

∞ N −2W iQ·Rk ∞ 1 e dte−iωt (Q·u0 Q·uk (t))n . e 2π h¯ n! −∞ n=0

Rk

(2.17)

2.1 Neutron and Photon Scattering and Spectroscopy

13

For instance, we can compute the one-phonon process explicitly by evaluating, h¯ Q2 1 i(q·Rk −ωq t) + (nq + 1)e−i(q·Rk −ωq t) , nq e 2NM q ωq (2.18) where nq = aq† aq = (exp(h¯ ωq /kB T ) − 1)−1 is the average number of phonons, given by the Bose–Einstein distribution at a temperature T , and for simplicity we have defined Q as the magnitude of Q in the direction of the atomic displacement. This leads to a first-order dynamical structure factor S (1) (Q, ω) of the form, Q · u0 Q · uk (t) =

S (1) (Q, ω) =

nq + 1 Q2 −2W i(Q+q)·R nq δ(ωq + ω) + ei(Q−q)·R δ(ωq − ω) , e e 2M ωq ωq q,R

(2.19) where the two terms correspond to scattering processes involving one-phonon absorption (annihilation) and emission (creation), respectively. It is worth noticing that S (1) (Q, ω) depends entirely on the properties of the measured target, not the probe neutron. This facilitates extraction of key physical insights directly from inelastic scattering measurements and further suggests that the observable can be modeled by specifying the properties of the target alone. For example, it is often the goal of experimental data analysis to extract the underlying phonon density of states g(ω) = N1 q δ(ωq − ω) [3], which can be derived from the measured structure factor using the so called “incoherent approximation”, whereby measured spectra are averaged over the solid angle Ω. This leads to an approximate form of Eq. 2.19 for one-phonon emission [4], (1)

Sinc (ω) =

Q2 −2W n(ω) + 1 e g(ω). 2M ω

(2.20)

While inelastic neutron scattering enables direct measurement of the phonon density of states, it is typically conducted at scientific user facilities and is therefore often a limited resource. Efficient use of available beamtime requires advance planning informed by effective, approximate models of the expected observations. However, the complexity for phonon density of states calculations scales as O(N 4 ), posing a significant challenge for high-throughput evaluation of candidate materials. In Chap. 3, machine learning methods are introduced to efficiently predict the phonon density of states from structural and atomic descriptors with outcomes directly comparable to experiment.

2.1.2 Raman Spectroscopy Raman spectroscopy measures the inelastic scattered intensity I of a visible photon from a molecule or crystal. The inelastic scattering of visible light, called the

14

2 Background

Raman effect, is typically the result of optical phonon creation (Stokes process) or annihilation (anti-Stokes process), and I thereby exhibits characteristic peaks at phonon energies h¯ ωνq , each corresponding to a normal mode ν with wavevector q [5]. The Raman intensity for the Stokes processes can be derived within thirdorder perturbation theory by treating the electron–photon HeR and electron–phonon Hep interaction Hamiltonians as perturbations of the Hamiltonian of electrons and phonons in a crystal, H0 , H = H0 + H| (t) = H0 + HeR (t) + Hep ,

(2.21)

where the unperturbed and interaction Hamiltonians take the forms [6], H0 =

nk

† εnk cnk cnk +

νq

1 † aνq + h¯ ωνq aνq 2

e † pnmk cnk cmk A(t) · m nmk h¯ νq † † = gnmk cnk cmk aνq + aν(−q) . ωνq nmν

(2.22)

HeR (t) =

(2.23)

Hep

(2.24)

kq

In the above equations, c† /c and a † /a are the creation/annihilation operators of electrons and phonons, respectively; A is the vector potential; εnk is the energy of νq the single-particle electronic state |nk; and pnmk = nk|p|mk and gnmk = nk + q|∂νq V |mk are the momentum and electron–phonon matrix elements, respectively.

Note a general time-dependent perturbation H (t) can be written as H (t) = that

ωn H (ωn ) exp(−iωn t) [6]. In first-order Raman scattering, H (t) consists of three distinct frequency components: the input and output frequencies ωin and ωout of the electron-photon interaction, and zero frequency for the time-independent electronphonon coupling. In the spirit of Sect. 2.1.1, the perturbative Hamiltonians are used to specify a transition rate using Fermi’s golden rule; specifically, here we construct the third-order transition rate for all one-phonon Stokes Raman processes from an initial state of the system, |i, to a final state |f [6],

(3) Pi→f

2

f |H (ω1 )|aa|H (ω2 )|bb|H (ω3 )|i 2π = δ(Ef −Ei − h¯ ω), (Ei − Ea + h¯ ω2 + h¯ ω3 )(Ei − Eb + h¯ ω3 ) h¯ ab (ω ω ω ) 1 2 3

(2.25) where h¯ ω denotes the energy transfer of the scattered photon. The summations with respect to a, b are performed over all eigenstates |a, |b with energies Ea , Eb of the unperturbed system, and the summation over (ω1 ω2 ω3 ) indicates a sum over all permutations of (ω1 ω2 ω3 ) with ω1 + ω2 + ω3 = ω fixed. These are typically enumerated by constructing the Feynman diagrams for the scattering

2.1 Neutron and Photon Scattering and Spectroscopy

15

process, which yields six distinct pathways [5]. Additionally, group theory may

be used to decide whether a matrix element a|H |b vanishes by symmetry, e.g.,

if H |b is orthogonal to |a. This can be deduced from the respective characters

of H , |a, and |b to obtain the selection rules for each matrix element, which ultimately determines the Raman activity of the phonon mode. Equation 2.25 can further be simplified by considering the case in which the initial and final states of the system are given by |i = |0 ⊗ |nν and |f = |0 ⊗ |nν + 1, respectively, where |0 denotes the ground state of the electronic system, and |nν is a state with nν phonons at frequency ων , with nν = (exp(h¯ ων /kB T ) − 1)−1 given by the Bose–Einstein distribution at a temperature T [6]. Due to momentum conservation, only the terms involving phonons with q = 0 remain, so ων = ων0 . Under these conditions, Ef − Ei = h¯ ων , and the Raman intensity I (ω) is obtained by summing over all possible final states, i.e., phonon modes ν [6], 2 nν + 1 α ν β I (ω) = I0 u R u in αβ out δ(ω − ων ), ων ν αβ

(2.26)

ν encapsulates the scattering amplitude of a photon with where the Raman tensor Rαβ β

incident polarization uαin to one with polarization uout through the creation of a phonon with energy ων , and I0 denotes a normalization constant. In practice, the shape of the Raman peak is broadened by a finite phonon lifetime, so rather than a delta function δ(ω − ων ), the lineshape is often approximated by a Lorentzian curve [5], 1 D(ω − ων ) = π γν

1+

ω − ων γν

2 −1 ,

(2.27)

for γν ων , where the full width at half maximum intensity (FWHM) is given by 2γν . In Chap. 3, we invoke these fundamental features of Raman peak intensities and lineshapes to construct synthetic spectra for unsupervised learning of lowdimensional spectral representations.

2.1.3 Polarized Neutron Reflectometry Polarized neutron reflectometry (PNR) is a method of depth-resolved measurement of nuclear and magnetic structure in thin film systems. Specifically, PNR probes the coefficient of specular reflection, or reflectivity, R, averaged over the lateral dimension of a thin film surface or interface. This mode of reflection is governed by Snell’s law, which reflects neutrons at an angle equal to the angle of incidence, θ, with reflected intensity proportional to the depth-dependent refractive index. As the momentum transfer Q to the neutron is purely perpendicular to the sample surface,

16

2 Background

PNR profiles are obtained by varying Q = 4π sin θ/λ, where λ represents the neutron wavelength. Accessible Q typically range from 0.1 to ∼1.8 nm−1 , with resulting reflectivity profiles R(Q) spanning over six orders of magnitude. This enables the determination of nuclear density profiles with subnanometer resolution over a few hundred nanometers in depth. At the same time, the sensitivity of neutrons to magnetic moments in magnetized samples can be leveraged to obtain spin-dependent R(Q) and thereby profile the depthwise magnetic structure. As mentioned in Sect. 2.1.1, the interaction of neutrons with atomic nuclei of the scattering medium can be approximated as the sum of point-like sources [2], VN (r) =

N 2π h¯ 2 bj δ(r − rj ), mn

(2.28)

j =1

where mn is the neutron mass, N is the number of scattering centers (nuclei), and bj is the nuclear scattering length of the nucleus at rj . This is often approximated by an average nuclear interaction potential, VN =

2π h¯ 2 ρN , mn

(2.29)

where ρN = nb is the nuclear scattering length density (NSLD) and n is the number of scattering centers per unit volume. To construct the magnetic contribution to the interaction potential when measuring magnetic media, we first define the wavefunction describing a generic neutron state as the superposition of the spin up and down spinors, ψ = C+ χ+ + C− χ− .

(2.30)

The magnetic contribution to the interaction potential is then given by the matrix potential [2], ˆn · VˆM (r) = −μ

N

Bj (r − rj )

(2.31)

j =0

ˆ n = −μn σˆ is the neutron magnetic moment operator, denoting the neutron where μ magnetic moment μn projected onto the vector of Pauli matrices σˆ = (σˆ x + σˆ y + σˆ z ), and Bj corresponds to the effective magnetic induction generated by the magnetic moment of the j -th magnetic atom. Note that we can take the neutron polarization to be in the plane of the scattering interface, perpendicular to the direction of wavevector transfer Q, as any out-of-plane component does not contribute to scattering [7]. We can further restrict the scattering process to one which does not change the initial spin state of the neutron, also known as a “non-spin-flip” (NSF) ˆ n and a single, parallel process. This assumption isolates the σˆ z component of μ

2.1 Neutron and Photon Scattering and Spectroscopy

17

component of Bj . We make the simplifying assumption that the magnetic induction takes an average value B = 4π M, where M is the magnetization of the magnetic film. Then, by introducing the magnetic scattering length, p=

mn 2π h¯ 2

μn B,

(2.32)

we can define an average magnetic interaction potential, 2π h¯ 2 ρM σˆz cos φ, VˆM = μn σˆz B cos φ = mn

(2.33)

with magnetic scattering length density (MSLD) ρM = np, and φ representing the angle between the magnetization of the film and the neutron polarization. The NSF interaction consists only of diagonal entries corresponding to each neutron spin state, equivalent to ± VM =±

2π h¯ 2 ρM cos φ. mn

(2.34)

± The total interaction potential is then V ± = VN + VM , giving the total spindependent neutron energy in the medium,

E± =

h¯ 2 k 2 h¯ 2 k 2 2π h¯ 2 +V± = + (ρN ± ρM cos φ) , 2mn 2mn mn

(2.35)

where k denotes the neutron wavevector in the medium. This is contrasted with the total energy of the free neutron, E=

h¯ 2 k02 , 2mn

(2.36)

with wavevector k0 . By equating Eqs. 2.35 and 2.36 using energy conservation, we obtain the refractive index through the ratio k/k0 [8], n(k0 ) =

1 − 4π(ρN ± ρM cos φ)/k02 .

(2.37)

√ This gives a critical value for total external reflection of k0 = (4π(ρN ± ρM cos φ)). The reflectance r at the vacuum-film interface is given by the ratio of incident and reflected amplitudes of the neutron wavefunction, which is approximately, r=

k0 − k . k0 + k

(2.38)

18

2 Background

This can be recognized as Fresnel’s law. If instead the film were deposited on a substrate, we would observe interference fringes, with the total reflectance now given by, r=

r1 + r2 e2ikt , 1 + r1 r2 e2ikt

(2.39)

where r1 and r2 denote the respective reflectances at the vacuum-film and filmsubstrate interfaces, and t is the film thickness. The experimentally-measured quantity, the coefficient of specular reflection, or reflectivity, R, is simply R = |r|2 . In practice, this layered representation of the scattering media can be extended to arbitrarily complex heterostructures with different nuclear and magnetic scattering lengths. In general, the spin-dependent coefficients of reflection R ±± measured by PNR can be obtained by solving the pair of time-independent wave equations,

h¯ 2 ∂ + V ± (z) − E ψ± (z) = 0 − 2mn ∂z

(2.40)

where z is a depthwise coordinate perpendicular to the sample surface. Substituting the expressions for E and V ± , these equations can be written as [8],

Q2 ∂ + − 4π (ρN (z) ± ρM (z) cos φ) ψ± (z) = 0, ∂z 4

(2.41)

where the scattering wavevector Q = 2k0 . These equations are solved in a piecewise continuous manner, treating the nuclear and magnetic scattering lengths as constant within each layer and imposing continuity of the wavefunctions and their first derivatives across the interfaces [7]. The reflection coefficient is then similarly computed as the squared ratio of amplitudes of the reflected and incident neutron wavefunctions. The exact reflectometry profile for the case of specular scattering from a flat slab of infinite lateral extent takes the form [8], R

±±

2 16π 2 ∞ −ik0 z (Q) = ψ(Q, z) (z) ± ρ (z) cos φ) e dz (ρ N M . Q2 −∞

(2.42)

At sufficiently large Q, interaction with the scattering medium does not significantly distort the incident neutron wavefunction from its form in vacuum. In this “Born approximation”, the neutron wavefunction within the scattering medium may be replaced by that in vacuum, e−ik0 z , to give, R

±±

2 16π 2 ∞ −iQz (Q) = dz . (ρN (z) ± ρM (z) cos φ) e 2 Q −∞

(2.43)

2.1 Neutron and Photon Scattering and Spectroscopy

19

It is clear from Eq. 2.43 that PNR suffers information loss about the phase of the reflected neutrons, resulting in potentially similar reflectivities generated by very different NSLD and MSLD profiles. This presents one important challenge for parameter retrieval algorithms attempting to recover the underlying forms of ρN (z) and ρM (z). Independent knowledge of the underlying system can be used to construct a layer model in terms of physical parameters, such as the density, thickness, roughness, and magnetization of each constituent layer, which must be fit to the measured PNR data. Thus, our conclusions about systems measured by PNR depend on the success of the inversion strategy. In Chap. 4, we study the effectiveness of machine learning methods to obtain more robust solutions during PNR parameter retrieval.

2.1.4 X-ray Absorption Spectroscopy X-ray absorption spectroscopy (XAS) is a core-level spectroscopic technique that can be used to characterize the chemical state and local atomic structure of atomic species in a material. Specifically, XAS refers to the excitation of a core-level electron to an unoccupied state through absorption of an X-ray photon. The Xray absorption near-edge structure (XANES) describes the spectral region within approximately 50 eV of the absorption edge. This is contrasted with the extended X-ray absorption fine structure (EXAFS), which can extend up to 1000 eV or more [9]. These regimes are distinguished by the chemical and structural insights as well as the analysis methods valid in each energy range. For example, the XANES region is sensitive to the oxidation state, coordination chemistry, and orbital hybridization of the absorbing atom, while the EXAFS region probes the radial distribution of electron density around the absorbing atom and can quantitatively reveal the bond length and coordination number [10]. Quantitative interpretation of EXAFS is often possible due to an attainable analytical form; however, this breaks down in the XANES regime, which is thereby interpreted more qualitatively, with individual spectral features attributed to properties of the electronic structure through empirical evidence and spectral matching [11]. Nonetheless, several basic principles underlying the interaction of X-rays with atomic electrons are valid in both regimes. We consider the Hamiltonian of atomic electrons with momenta pj and mass m, H0 =

p2j j

2m

+ V (rj ),

(2.44)

where the potential V (rj ) considers the Coulomb interaction with the nucleus and may also include the Coulomb repulsion between electrons and the spin–orbit interaction [12]. To first-order, the interaction Hamiltonian is given by De Groot and Kotani [12],

20

2 Background

H =

e e pj · A(rj ) + σj · ∇ × A(rj ), mc 2mc j

(2.45)

j

where A(rj ) is the vector potential of the incident electromagnetic field, and σj is the electron spin. The vector potential is expressible in terms of the photon † creation/annihilation operators bkλ /bkλ with wavevector k and polarization λ [12], A(r) =

† −ik·r A0 ekλ bkλ eik·r + bkλ e ,

(2.46)

kλ

where ekλ represents a directional unit vector. The observable obtained by XAS is the absorption coefficient μ as a function of the X-ray energy h¯ ω, which is determined from the decay in the X-ray beam intensity I with distance x through the sample according to the Beer-Lambert law [14], I = I0 e−μx

(2.47)

where I0 is the intensity of the incident beam. When the X-ray energy corresponds to an allowed excitation of a core shell electron, the absorption coefficient increases abruptly, giving rise to an absorption edge above which the oscillatory spectral features of XANES and EXAFS are observable. At present, the most common theoretical description of the absorption coefficient is based on Fermi’s golden rule [12–14], μ(ω) ∝

ψf |T |ψi 2 δ(Ef − Ei − h¯ ω),

(2.48)

f

specifying the transition rate between the initial and final eigenstates |ψi and |ψf of the electronic system with energies Ei and Ef , respectively, due to the absorption of an incident X-ray photon with energy h¯ ω. Note that formally the initial and final states comprise the product states of the photon and electron wave functions, but for simplicity the photon term is suppressed. The transition operator T is obtained to first order as [12], T (1) =

e h¯ A0 bkλ (ekλ · pj )eik·rj + bkλ (ekλ · σj × k)eik·rj . mc 2

(2.49)

kλj

Of particular significance is the first term of the transition operator, which comes from the electromagnetic interaction. Under the approximation of the series expansion eik·rj ≈ 1, this term is simply proportional to ekλ · pj , known as the dipole approximation. This enables a derivation of the dipole selection rules, which are particularly important for X-ray transitions. Because the initial core state is expressible as a single spherical harmonic, it possesses a well-defined symmetry which determines the orbital character of the final states permitted by the transition.

2.2 Data-Driven Methods

21

For instance, if the core state is an s-state, it can only make a transition to plike states. The absorption edge of such a transition is termed the K-edge of the absorbing atom. Similarly, only s-like and d-like states are accessible from a core p-state; these transitions constitute the absorbing atom’s L-edges. The transition probability directly relates to the strength of the measured absorption. It is worth mentioning that orbital mixing and hybridization, as well as quadrupole transitions obtained through consideration of the first term (proportional to k · rj ) in the series expansion, often permit observation of transitions forbidden by dipole selection rules (e.g., 1s → 3d) [9]. Nonetheless, as a result of the orbital selectivity, XAS can, to a certain extent, be regarded as an orbital-selective probe of the unoccupied electronic density of states (DoS). However, the rich geometric and electronic structural information contained in XAS signatures, beyond the simple theoretical picture portrayed here, should not be overlooked. Particularly in the XANES regime, the sensitivity to multiple scattering from distant atoms contributes to increased complexity of the experimental signature but at the same time offers important insight into the three dimensional structure surrounding the absorbing atom. Despite substantial progress in recent years [15], computational methods are not as well-established for XANES as for EXAFS. This appears to be one catalyst of recent machine learning efforts introduced to automate the estimation of materials parameters such as coordination environments [16–20], oxidation states, [17, 20], and crystal-field splitting [21] from XANES and other core-level spectroscopies, and even enable direct prediction of XANES spectra from structural and atomic descriptors [22–24]. In Chap. 5, we propose to take machine learning methods further by developing a neural network-indicator of topological character based on XANES signatures.

2.2 Data-Driven Methods Data-driven methods comprise a wide variety of data analysis techniques which rely on patterns of the data itself rather than prescribed models to draw conclusions or establish relationships between variables. In the following, we provide a brief summary of several established unsupervised and supervised learning methods implemented in this thesis work.

2.2.1 Dimensionality Reduction One important achievement of unsupervised learning methods is the realization of efficient, low-dimensional representations capturing the most informative features of complex signals such as spectral signatures. Here, we give a concise overview of several established dimensionality reduction techniques. In the following, we

22

2 Background

represent a spectral dataset by an m × n real-valued matrix A ∈ Rm×n consisting of n samples (observations) with m features, e.g., intensity values at m different wavenumbers.

Singular Value Decomposition One of the most common methods of dimensionality reduction is to truncate the singular value decomposition (SVD) [25] of A to include only the d (d ˆ min(m, n)) largest singular values and thereby obtain a low-rank approximation A. In particular, A ≈ Aˆ = U ΣV ,

(2.50)

where U ∈ Rm×d and V ∈ Rn×d are transformation matrices with orthonormal columns (i.e., Ui Uj = δij for columns Ui , Uj of U ), and Σ ∈ Rd×d is a diagonal matrix of the d largest singular values of A. The latent features of each column in A are given by the corresponding column in ZA = (U A) ∈ Rd×n . Once U and V are established using the training dataset A, the latent features of a new dataset B can be obtained by projecting the columns of B along the left singular vectors, i.e., ZB = U B. The corresponding low-rank approximation Bˆ is then given by the linear combination of left singular vectors weighted by the columns of ZB , i.e., Bˆ = U ZB .

Principal Component Analysis Principal component analysis (PCA) is likewise a linear dimensionality reduction technique which projects the observations in A along a set of singular vectors, termed principal components; however, it also centers the columns of A before applying SVD. Centering subtracts the row-wise (per feature) means of A from the raw feature values, thereby removing the effect of the mean observation on the principal axes. On the other hand, kernel PCA (kPCA) [26] is an extension of PCA that introduces non-linearity in the form of a positive-definite kernel function k(x, y) computed between pairs of observations (e.g., Ai , Aj ). The radial basis function (RBF) kernel, k(x, y) = exp(−γ x − y2 ), is a common choice and is the kernel adopted in this work. The kPCA is then conducted similarly to PCA by centering and decomposing the resulting kernel matrix K with entries Kij = k(Ai , Aj ).

Non-negative Matrix Factorization A key advantage of SVD and its derivatives is the ability to select the latent dimension d which minimizes information loss while significantly reducing the

2.2 Data-Driven Methods

23

dimensionality of the feature space. However, because these methods project data along linear combinations of the original feature vectors, assigning physical meaning to the resulting latent features is often difficult or even impossible. Nonnegative matrix factorization (NMF) [27] assumes an alternative perspective in which the non-negative data matrix A ∈ Rm×n ≥0 is decomposed into two lower rank non-negative matrices W ∈ Rm×d and H ∈ Rd×n ≥0 ≥0 according to A ≈ Aˆ = W H.

(2.51)

Here, each observation Ai is regarded as a purely additive linear combination of the latent vectors given by the columns of W weighted by the rows of H . In this case, the latent features ZA of A are given directly by the columns of H . Due to the non-negativity constraint, NMF can produce a parts-based representation of the dataset that is oftentimes more interpretable than the latent representations obtained by SVD or PCA.

2.2.2 Machine Learning Machine learning methods leverage large and often complex datasets in order to make predictions on related tasks or detecting underlying patterns in the data. In the following, we briefly introduce two common frameworks for building machine learning models: support vector machines and neural networks.

Support Vector Machines Classification is at the heart of many machine learning problems, from discerning the subject of an image to detecting outliers in numerical data. In the simplest case of binary classification, we seek a mapping f : Rm → {−1, 1} that assigns a point in an m−dimensional feature space to one of two classes, −1 or 1. While there are a number of possibilities for such a mapping, a natural objective is to find the optimal way to correctly partition all points in the feature space into their respective classes. Support vector machines (SVMs) address this question by identifying the (m−1)−dimensional hyperplane that achieves the largest separation, or margin, between data of opposite classes seen in training [28]. Consider a set of n training points x (1) , x (2) , · · · , x (n) , where each x (i) ∈ Rm , and associated classes y (1) , y (2) , · · · , y (n) , for y (i) ∈ {−1, 1}. An (m − 1)-dimensional hyperplane in Rm may be expressed as θ T x + θ0 = 0,

(2.52)

24

2 Background

where θ ∈ Rm is a vector normal to the hyperplane, and θ0 is a constant offset. For linearly separable data, we can identify two parallel hyperplanes with a shared normal vector θ but distinct offsets θ0+ and θ0− , respectively, whose distance w = |θ0+ − θ0− |/θ is maximal, where θ0 = θ0+ + θ0− /2. Proper rescaling allows us to set θ0+ = θ0 + 1 and θ0− = θ0 − 1 without loss of generality, and thus w = 2/θ. Maximizing the margin therefore coincides with minimizing θ or, in a more convenient form, minimizing the loss L = θ2 . The training data introduce constraints on the optimization. Specifically, each data point x (i) , y (i) must lie on the correct side of the hyperplane. This task may be expressed by the inequality constrained optimization problem, min θ 2 , subject to y (i) θ T x (i) + θ0 ≥ 1, for 1 ≤ i ≤ n, θ

(2.53)

resulting in a hard-margin between the data, meaning that no misclassification is permitted. However, data are often not linearly separable, and some misclassification must be tolerated to produce the best separating hyperplane between points. This can be accomplished by introducing the hinge loss Lh , which produces a softmargin, (i) Lh = max 0, 1 − y (i) θ T x (i) + θ0 .

(2.54)

The hinge loss substitutes the inequality constraint with a penalty for misclassification. The optimization problem then becomes, 1 (i) Lh , n n

min L, with L = λθ2 + θ

(2.55)

i=1

where λ tunes the relative strength of the weight norm relative to the hinge loss. More complex relationships between the input features and target classes often require greater flexibility from the model f : Rm → {−1, 1}, which can be introduced through nonlinearities in the mapping. While the separating hyperplane produced by SVMs is a linear decision boundary, it is possible to construct nonlinear classifiers using kernel methods that transform the feature space, such that the boundary is nonlinear in the space of the original inputs. Alternately, neural networks, discussed in the following section, constitute a broad class of universal function approximators that often represent arbitrarily complex, nonlinear mappings.

Neural Networks Neural networks are another class of machine learning methods which greatly simplify the introduction of nonlinearities. The basic computational unit of a neural

2.2 Data-Driven Methods

25

Fig. 2.1 Building blocks of neural networks. Illustration of a single neural network layer (left), composed of individual neurons (right)

network is a neuron, which consists of a linear function f : Rm → R followed by an activation function σ : R → R, as depicted schematically in Fig. 2.1. The activation function may be linear or nonlinear, and common choices include the sigmoid, hyperbolic tangent, and ReLU functions. The output zj of a single neuron with index j is given by, zj = σ (f (x)) = σ θ T x + θ0 ,

(2.56)

where θ and θ0 are a weight vector and bias, respectively. Several neurons may be arranged in parallel in a single layer of the neural network to construct a more expressive model through a higher dimensional, intermediate representation of the input data. In a feed-forward neural network, data flow proceeds in one direction from the input to the output through one or more layers of neurons. The output z ∈ Rd of a single layer of d neurons in a feed-forward network can be expressed as z = σ T x + 0 , (2.57) where ∈ Rm×d , 0 ∈ Rd , and σ is applied element-wise to each component of the argument. This equation defines the basic operation performed by a layer of the neural network and is often repeated sequentially several times. However, neural network operations are typically tailored to the data types on which they operate by incorporating additional constraints called inductive biases. These encompass specific architectural or algorithmic assumptions which allow the learning algorithm to prioritize one solution over another [29], ranging from regularization terms added to avoid overfitting, to convolutional layers which enforce locality and translation invariance of the network operations. These basic principles underlie the development of the machine learning frameworks presented in the following chapters.

26

2 Background

References 1. Grosso, G., & Parravicini, G. P. (2013). Solid state physics. Academic Press. 2. Toperverg, B. P. (2015). Polarized neutron reflectometry of magnetic nanostructures. The Physics of Metals and Metallography, 116, 1337–1375. 3. Fultz, B. (2010). Vibrational thermodynamics of materials. Progress in Materials Science, 55, 247–352. 4. Carpenter, J., & Price, D. (1985). Correlated motions in glasses studied by coherent inelastic neutron scattering. Physical Review Letters, 54, 441. 5. Jorio, A., Dresselhaus, M. S., Saito, R., & Dresselhaus, G. (2011). Raman spectroscopy in graphene related systems. John Wiley & Sons. 6. Taghizadeh, A., Leffers, U., Pedersen, T. G., & Thygesen, K. S. (2020). A library of ab initio Raman spectra for automated identification of 2D materials. Nature Communications, 11, 1– 10. 7. Majkrzak, C. (1996). Neutron scattering studies of magnetic superlattices. Magnetic Neutron Scattering, 78. 8. Majkrzak, C., O’Donovan, K., & Berk, N. (2006). Neutron Scattering from Magnetic Materials (pp. 397–471). Elsevier. 9. Penner-Hahn, J. E., et al. (2003). X-ray absorption spectroscopy. Comprehensive Coordination Chemistry II, 2, 159–186. 10. Newville, M. (2004). Fundamentals of XAFS, Consortium for Advanced Radiation Sources. University of Chicago. 11. Gaur, A., & Shrivastava, B. (2015). Speciation using X-ray absorption fine structure (XAFS). Review Journal of Chemistry, 5, 361–398. 12. De Groot, F., & Kotani, A. (2008). Core level spectroscopy of solids. CRC Press. 13. Rehr, J. J., & Albers, R. C. (2000). Theoretical approaches to X-ray absorption fine structure. Reviews of Modern Physics, 72, 621. 14. Rehr, J., & Ankudinov, A. (2005). Progress in the theory and interpretation of XANES. Coordination Chemistry Reviews, 249, 131–140. 15. Liang, Y., et al. (2017). Accurate x-ray spectral predictions: An advanced self-consistent-field approach inspired by many-body perturbation theory. Physical Review Letters, 118, 096402. 16. Carbone, M. R., Yoo, S., Topsakal, M., & Lu, D. (2019). Classification of local chemical environments from X-ray absorption spectra using supervised machine learning. Physical Review Materials, 3, 033604. 17. Torrisi, S. B., et al. (2020). Random forest machine learning models for interpretable X-ray absorption near-edge structure spectrum-property relationships. npj Computational Materials, 6, 1–11. 18. Zheng, C., Chen, C., Chen, Y., & Ong, S. P. (2020). Random forest models for accurate identification of coordination environments from x-ray absorption near-edge structure. Patterns, 1, 100013. 19. Kiyohara, S., Miyata, T., Tsuda, K., & Mizoguchi, T. (2018). Data-driven approach for the prediction and interpretation of core-electron loss spectroscopy. Scientific Reports, 8, 1–12. 20. Guda, A., et al. (2021). Understanding X-ray absorption spectra by means of descriptors and machine learning algorithms. npj Computational Materials, 7, 1–13. 21. Suzuki, Y., Hino, H., Kotsugi, M., & Ono, K. (2019). Automated estimation of materials parameter from X-ray absorption and electron energy-loss spectra with similarity measures. Npj Computational Materials, 5, 1–7. 22. Carbone, M. R., Topsakal, M., Lu, D., & Yoo, S. (2020). Machine-learning X-ray absorption spectra to quantitative accuracy. Physical Review Letters, 124, 156401. 23. Rankine, C. D., Madkhali, M. M., & Penfold, T. J. (2020). A deep neural network for the rapid prediction of X-ray absorption spectra. The Journal of Physical Chemistry A, 124, 4263–4270. 24. Lueder, J. (2021). A machine learning approach to predict L-edge X-ray absorption spectra of light transition metal ion compounds. Preprint. arXiv:2107.13149.

References

27

25. Schütze, H., Manning, C. D., & Raghavan, P. (2008). Introduction to information retrieval. Cambridge University Press. 26. Schölkopf, B., Smola, A., & Müller, K. R. (1997). Kernel principal component analysis, in International Conference on Artificial Neural Networks (pp. 583–588). 27. Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791. 28. Alpaydin, E. (2020). Introduction to machine learning. MIT Press. 29. Battaglia, P. W., et al. (2018). Relational inductive biases, deep learning, and graph networks. Preprint. arXiv:1806.01261.

Chapter 3

Data-Efficient Learning of Materials’ Vibrational Properties

Abstract One central objective of materials research is to establish structureproperty relationships—that is, how specific atomic arrangements lead to certain macroscopic functionalities. While this question is historically addressed through a combination of structure and property characterization, theory, and calculation, machine learning methods guided by crystalline symmetry constraints may provide an alternate route. In this chapter, we demonstrate the use of Euclidean neural networks (E(3)NNs) to directly predict materials’ phonon densities of states (DoS) using simple structural inputs: atomic species, masses, and positions. E(3)NNs are by construction equivariant to 3D rotations, translations, and inversion, inherently capturing crystalline symmetries and thereby enabling data-efficient learning. A model trained on only ∼1000 examples is found to reproduce key features of both computational and experimental data without the need for data augmentation. Moreover, we observe that the model learns the partial phonon DoS as intermediate features without explicit training. We demonstrate a potential application of our model by performing high-throughput predictions to screen for high phononic specific heat capacity compounds. Finally, we discuss a few avenues of future and ongoing work, including the application to alloy systems with substitutional disorder, and the prediction of other vibrational signatures, namely Raman spectra. This work provides an efficient method to obtain materials’ vibrational properties directly from atomic structure, which is promising for accelerating materials design of highperformance thermal storage materials and phonon-mediated superconductors.

3.1 Introduction Phonons, or collective excitations of atoms in a crystalline solid, arise naturally due to random fluctuations of the lattice at finite temperatures. Their ubiquity in all real systems implores an understanding of their interactions with other quasiparticle excitations and the resulting implications on materials’ properties. For example, phonons dictate the temperature dependence of electrical resistivity in metals, mediate Cooper pairing of electrons in conventional superconductors, and participate in optical absorption in indirect gap semiconductors [1]. Much of © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Andrejevic, Machine Learning-Augmented Spectroscopies for Intelligent Materials Design, Springer Theses, https://doi.org/10.1007/978-3-031-14808-8_3

29

30

3 Data-Efficient Learning of Materials’ Vibrational Properties

a material’s phononic signature is encapsulated in the phonon density of states (DoS), or the number of phonon modes with a particular vibrational frequency. This property arises in the calculation of fundamental quantities such as lattice heat capacity and thermal conductivity, interfacial thermal resistance [2, 3], and the superconducting critical temperature [4]. However, the acquisition of experimental and computed phonon DoS is a nontrivial task due to limited inelastic scattering facility resources and high computational cost of ab initio calculations, particularly for alloys and disordered materials [1, 5]. Existing approaches like the virtual crystal approximation (VCA) can fail both qualitatively and quantitatively without wellcontrolled approximations [6]. This calls for an approach that acquires phonon DoS more efficiently and accessibly than the methods currently available. Recent advances in machine learning (ML) suggest a paradigm shift in how structure-property relationships can be directly obtained [7, 8]. To date, ML has seen success in a growing number of materials applications, including materials discovery and design [9–12], process automation and optimization [13, 14], and prediction of materials’ mechanical [15–18], thermal [16, 18–22], and electronic properties [17, 23–30], as well as atomistic potentials [31–37]. Thus, machine learning methods may offer a new pathway for predicting phonon DoS from accessible structural and atomic properties. However, the phonon DoS prediction task poses certain challenges for machine learning approaches. First, most property prediction studies consider a low-dimensional output consisting of one or few discrete points; thus, the prediction of continuous properties from limited input information remains challenging due to the output complexity and limited training data available (O(103 )). Moreover, the input data types—crystal structures— carry symmetries that the neural network must respect; specifically, the choice of coordinate system in which atomic positions are represented should not influence the final prediction. However, capturing arbitrary rotations or translations of the coordinate system through data augmentation is computationally expensive and inefficient. In this chapter, we develop a ML-based predictive model based on Euclidean neural networks [38] to directly predict the discretized phonon DoS over a range of frequencies using only a crystal’s atomic positions and masses as input [40]. Original code and an updated tutorial accompanying this work are also made publicly available [39, 41]. This chapter is organized as follows: First, we introduce Euclidean neural networks, underscoring their relevance to predicting materials properties of crystalline solids. Next, we apply E(3)NNs to our target problem of predicting the phonon DoS. We evaluate the performance of our model both on calculated and experimental phonon DoS examples. Then, we demonstrate a proposed use case of our trained model by performing high-throughput predictions of phonon DoS to screen for high heat capacity compounds. We further suggest additional avenues of interest for future work, including preliminary results for the application to alloy systems with substitutional disorder with no added computational cost. Finally, we offer a short perspective on unsupervised representation learning of other common vibrational signatures, namely Raman spectra, with the goal of developing interpretable low-dimensional representations for more effective prediction of highdimensional targets with E(3)NNs.

3.2 Materials Data Representations

31

3.2 Materials Data Representations Machine learning models are often shaped by types of data structures on which they operate. In the case of materials data, these often consist of crystal structures with atomic descriptors or spectral signatures. For example, the representation of crystal structures in terms of atomic coordinates is often a poor choice, as a rigid rotation or translation of these coordinates still represents the same crystal. Such non-unique representations can incur a higher computational cost during training by requiring extensive data augmentation to capture these arbitrary transformations and thereby learn to respect the underlying symmetries. Thus, the crystal data representations used in many machine learning models often respect a certain set of criteria, shown schematically in Fig. 3.1a. In particular, the mapping between crystal structures and the representation space should map equivalent structures to the same features and represent continuous deformations of a structure to a smooth deformation of the features in the representation space [42]. These representations are then associated with atomic properties or descriptors that correlate strongly with the predicted targets, such as the fractional composition of a compound, or the atomic masses, radii, and electronegativities of its atoms. Examples of crystal structure representations used as neural network inputs include graph representations, in which atoms and bonds correspond to nodes and edges, respectively [17]; topological representations such as persistent homology based on a geometrical shape analysis of the crystal structure [43]; or data-driven continuous representations obtained through unsupervised or semi-supervised training of autoencoder networks [11]. Similar principles can also be applied to the representation of spectral signatures, illustrated in Fig. 3.1b. In

Fig. 3.1 Representation of materials data types. (a) Design principles for the representation of crystal structures. The desired mapping is structurally diverse, invariant to arbitrary rotations and translations of the crystal lattice, and smooth with respect to continuous deformations of the crystal structure. (b) Design principles for the representation of spectral signatures. The desired mapping encodes diverse spectral signatures and is smooth with respect to continuous evolution of spectral features (e.g., the height and position of an individual peak. Representation spaces with meaningful axis definitions, such as a dependence on chemical composition, are more interpretable but not always easily obtained

32

3 Data-Efficient Learning of Materials’ Vibrational Properties

this chapter, we adopt a graph-like representation of crystal structures described in Sect. 3.3.1. In Sect. 3.5, we also consider the development of effective spectral representations in the context of Raman spectroscopy.

3.3 Euclidean Neural Networks Euclidean neural networks (E(3)NNs) are networks which operate on 3D geometries and are by construction equivariant to 3D translations, rotations, and inversion [38, 44–46]. This work utilizes E(3)NNs as implemented in the open-source e3nn repository [38], which merges implementations of [44] and [46] and additionally implements inversion symmetry. E(3)NNs offer two important advantages over generic neural networks in the context of materials property prediction from crystal structure inputs. First, because they are inherently symmetry-aware, E(3)NNs preserve all geometric information of the input and thereby eliminate the need for expensive data augmentation to capture arbitrary rotations or translations of the input crystal structure [47]. Second, imposing equivariance not only injects physical information but also constrains the functional space over which we optimize during training, requiring fewer training examples for convergence. The architecture of Euclidean neural networks is similar to that of graph convolutional neural networks. To achieve Euclidean symmetry equivariance, E(3)NN convolutional filters are functions of the radial distance vector between two points and are composed of learned radial functions and spherical harmonics W (r ) = R(|r|)Ylm (ˆr ). As a consequence of this filter choice, all inputs, intermediate data, and outputs are geometric tensors. Therefore, all scalar operations (e.g., addition and multiplication) in the network must be replaced with general geometric tensor algebra. Additionally, nonlinearities applied to geometric tensor data must also be replaced with equivariant equivalents. A feature that emerges from equivariance is that the symmetry of the outputs of Euclidean neural networks are guaranteed to have equal or higher symmetry than the inputs, which means that these networks are guaranteed to respect the space group symmetries (which are a subgroup of Euclidean symmetry) of the input crystal geometry [47].

3.3.1 Graph Representation of Crystal Structures The crystal structures on which E(3)NNs operate are first converted into periodic graphs of the unit cell, with atoms (nodes) connected by edges to neighboring atoms within a specified radial cutoff rmax , including any periodic images in neighboring unit cells (Fig. 3.2a). The input node features are denoted xai , where a is the atom index and i is the feature index, and are represented as arrays of 118 scalars that encode the atom type and mass through an atomic mass-weighted one-hot encoding scheme, shown in Fig. 3.2b. The atomic mass is inserted in the Z-th entry

3.3 Euclidean Neural Networks

33

Fig. 3.2 Overview of the E(3)NN architecture for phonon DoS prediction. (a) Crystals are converted to periodic graphs by considering all periodic neighbors within a radial cutoff rmax = 5 Å. The example of SrTiO3 is shown. (b) Node features are encoded as atomic mass-weighted one-hot arrays of the atom type. (c) Edges join neighboring atoms and are associated with the relative distance vector from the central atom to its neighbor. (d) The radial distance vectors are used to define the continuous convolutional filters W (rab ) comprising learned radial functions and spherical harmonics. (e) The E(3)NN operates on the node and edge features using convolution and gated nonlinear layers. The result is passed to a final activation, aggregation, and normalization layer to generate the predicted output. The network weights are trained by minimizing the loss function between the predicted and ground-truth phonon DoS. Reproduced from [40] under a Creative Commons Attribution License (CC BY)

of the feature vector, where Z is the corresponding atomic number. For instance, a hydrogen atom is encoded as xH = [mH , 0, . . . , 0]. Each edge in the graph carries the radial distance vector between the central atom a and its neighbor b, rab (Fig. 3.2c), which is used to define the continuous convolutional kernels of the E(3)NN (Fig. 3.2d).

3.3.2 Network Operations To articulate network operations, we use Einstein summation notation with implicit summation over repeated indices. A single layer of our network operates on input node features xai and relative distance vectors of graph edges rab , (q+1)

xai

(q)

= σ (xbj ⊗ Kij (rab )),

(3.1)

where σ is an equivariant nonlinearity, and ⊗ signifies a tensor product where representation indices of inputs and filters are contracted using Clebsch-Gordan coefficients. We used a “gated” rotation equivariant nonlinearity as implemented in e3nn, which was first introduced in [46] and is extended in e3nn to handle parity (inversion). K is the convolutional kernel, which is composed of learned radial functions and spherical harmonics. Clebsch-Gordan coefficients are included in the kernel to yield the traditional channel indices,

34

3 Data-Efficient Learning of Materials’ Vibrational Properties

Kij (rab ) = Kabij = Rw (rab )Yk (ˆrab )Cij k δw,k∈irrep(w) ,

(3.2)

where δw,k∈irrep(w) indicates that radial functions are shared for all components of a given irreducible representation (irrep), e.g., the 5 components of a L = 2 irrep share the same radial function, and Cij k are the Clebsch-Gordan coefficients. In tensor notation, a convolutional operation is written as Conv(xbi , rab ) = xbj Kij (rab ) = yai .

(3.3)

For the prediction of phonon DoS, we use convolutional filters up to L ≤ 1 and rotation order for intermediate features of L ≤ 1. The learned radial functions are dense (fully-connected) neural networks acting on a finite radial basis. For example, a two layer radial function would be expressed as R(|rab |) = Wkh σ (Whq Bq (|rab |)),

(3.4)

where Bq are the set of radial basis functions; in this work, we use a finite set of Gaussian radial basis functions. The full architecture used for phonon DoS prediction is illustrated in Fig. 3.2e. After an initial embedding layer which converts the length-118 node feature vectors to 64 scalar features, the annotated graph is passed to the E(3)NN. The first two convolutional layers generate L = {0, 1} atomic features and additional scalars to be used by following gated blocks for nonlinearizing L = 1 pseudovectors [48]. The final convolution operation yields learned hidden features of order L = 0 on each atom. Finally, the hidden features at all sites within the unit cell are aggregated into a (q) one-dimensional array a∈N xia . We then apply a ReLU activation and normalize by dividing by the maximum intensity to predict the phonon DoS. The absolute magnitude of the phonon DoS can easily be recovered from the normalized DoS by noticing that g(ω)dω = 3N, where g(ω) is the phonon DoS at frequency ω and N is the number of atoms in the unit cell; thus, we ensure that normalization of the DoS does not compromise meaningful prediction.

3.4 Phonon DoS Prediction 3.4.1 Data Processing The database of density functional perturbation theory (DFPT)-based phonon DoS [49] of 1521 crystalline solids is used to train the phonon DoS predictor. This database is also made available on the Materials Project [50] website along with the relaxed structures and calculation details. We subdivided the dataset at random into training (80%), validation (10%), and test (10%) sets. We also manually added 3

0.8

1.0

SiC

0.6 0.4 0.2 0.0

b

0.6

0.2

1.0

ScAlCO PhononDoS(a.u.)

0.8 0.6 0.4 0.2 0.0

0.04

Training Validation

0.4

0.0

1.0 PhononDoS(a.u.)

Y2Cl3

0.8

NbInO4

0.8 0.6

MSEHistory

1.0

35

PhononDoS(a.u.)

a PhononDoS(a.u.)

3.4 Phonon DoS Prediction

0.03 0.02 0.01 0.00

0.4

0

20

40

60

Epochs

0.2 0.0

0

200

400 600 800 Frequency(cm −1)

1000

0

200

400 600 800 Frequency(cm −1)

1000

Fig. 3.3 (a) Representative examples of original calculated and interpolated phonon DoS used as the model output. (b) Loss history during training, showing the mean squared error versus number of epochs. Reproduced from [40] under a Creative Commons Attribution License (CC BY)

additional phonon DoS examples, Au, Cu, and Ag, from [51] in order to provide the network instances of single-element compounds. Cu and Ag examples were added to the training set and Au to the test set. The resulting training set consisted of 1220 examples, and the validation and test sets each had 152 samples. The DFPT-calculated phonon DoS data has high energy resolution, requiring a large number of parameters in the neural network to fit the output dimension. Given limited training data, it is challenging to train a predictive model with too many trainable weights. To ensure a balanced output dimension and resolution while retaining the main features of the phonon DoS, we interpolated a smoothed spectrum in the energy range 0 ≤ ω ≤ 1000 cm−1 to 51 equally-spaced points. Smoothing was achieved by applying a Savitzky-Golay filter of window length 101 and polynomial order 3, which was found to optimally reduce small fluctuations while retaining the main DoS profile. Representative raw and interpolated phonon DoS curves are shown in Fig. 3.3a. The number of interpolated points is chosen to approximately match the instrument resolution of inelastic scattering measurements, below which finer features are not distinguishable by any existing techniques; for instance, the state-of-the-art inelastic scattering instrument at NSLS-II has ∼2.0 meV resolution. The sampled data points correspond to an energy resolution of ∼2.4 meV (20 cm−1 ). It should also be noted that the phonon DoS of some calculated materials in the dataset contain negative frequencies. According to [49], the presence of small negative frequencies in the acoustic phonon modes close to the Γ point could be associated with poor choices of the k or q point grids; however, they find that these only rarely indicate a real incommensurate instability. Since our model is confined to the energy range 0 ≤ ω ≤ 1000 cm−1 , it is intrinsically unaware of the truncated information at negative frequencies and is thus not expected to distinguish between stable and unstable compounds.

36

3 Data-Efficient Learning of Materials’ Vibrational Properties

Table 3.1 Hyperparameter search space and selections Hyperparameter Multiplicity of irrepsa Number of gated blocks Number of radial bases Length of embedding vector AdamW optimizer learning rateb AdamW optimizer weight decay a b

Range 16, 32, 48, 64 1, 2, 3, 4 5, 10, 15, 20 16, 32, 64, 128, 160 5e−4, 1e−3, 5e−3, 1e−2 1e−3, 1e−2, 5e−2, 1e−1

Initial selection 64 3 10 128 1e−3 1e−2

Final selection 32 2 10 64 5e−3 × 0.96k 5e−2

For outputs of the first two convolutional layers only We observe the lowest validation loss at a learning rate of 5e−3 among all values tested, and thus adopt an exponentially decaying learning rate with epoch k

3.4.2 Results The E(3)NN weights are optimized by minimizing the mean squared error (MSE) loss function between the DFPT-computed DoS g(ω) and E(3)NN-predicted g(ω). ˆ The loss history is shown in Fig. 3.3b. Table 3.1 lists the hyperparameters of the model that were tested in determining an optimal architecture, in addition to our final selections. We perform several analyses to evaluate our model given the limited training data. Figure 3.4a shows no obvious correlation between the MSE and the number of basis atoms within unit cells among training, validation, and test datasets. The overall test set error is higher compared to the training set but similar to that of the validation set, suggesting good generalizability. We dω g(ω) ω further compute the average phonon frequency ω¯ = dω g(ω) for both predicted and DFPT ground truth spectra, which show excellent agreement (Fig. 3.4b). To visualize the model performance, we survey seven randomly selected examples from the test set in each error quartile, shown in Fig. 3.4c, with rows 1 through 4 corresponding to the first quartile with highest agreement through the fourth quartile with lowest agreement, respectively. The predicted DoS in the first and second quartiles show excellent agreement with DFPT calculations by reproducing fine features and relative peak heights, while the third and fourth quartiles show acceptable agreement by capturing overall energy ranges and coarse trends in DoS magnitude. The MSE distribution shown in the leftmost subplot of Fig. 3.4c is heavily peaked in the first and second quartiles, indicating overall good agreement for a majority of test set examples. We also compute the average MSE for all test set examples containing each element (Fig. 3.4d) and recompute the quartiles for this set of errors. The spread of MSE values is about half that found in Fig. 3.4c, suggesting relatively balanced performance across different elements. However, somewhat better prediction of heavy-element compounds emerges as a general trend. This may be attributed to the increased significance of the atomic mass, which is a direct input to the network, in the phonon DoS of heavy-element compounds, while the electrostatic effect is more pronounced in bonding involving lighter elements and is not considered in the current node features. Inclusion of additional

3.4 Phonon DoS Prediction

Mean Squared Error

a

37

d 10−1

10−3

0

Predicted ω ¯ (cm−1)

b

800

20 40 Number of Sites

Train Validation Test

600 400 200 0

0

200 400 600 True ω ¯ ∗ (cm−1)

800

c

Fig. 3.4 Performance of E(3)NN-based phonon DoS predictor. (a) Mean squared error (MSE) versus total number of sites in a unit cell in the training (blue), validation (orange), and test (green) sets. (b) Comparison between average phonon frequencies of predicted and ground truth phonon DoS. Reproduced from [40] under a Creative Commons Attribution License (CC BY). (c) Randomly selected examples in the test set within each error quartile. Black and colored curves denote the ground truth and predicted phonon DoS, respectively. The MSE distribution shown at left is heavily peaked in the first and second quartiles. (d) Average MSE of all test set compounds containing each element, colored by the error quartile

atomic features, such as electronegativity, characterizing bond covalency is expected to improve performance in such compounds.

3.4.3 Comparison with Experiment To place our model’s performance in a broader context, we compare the E(3)NN predictions of 16 materials with experimental and computed phonon DoS available in literature (Fig. 3.5a) [52–65]. Where available, we also show the corresponding calculated phonon DoS from the Materials Project (MP) database [49], which represents the current state-of-the-art for high-throughput phonon DoS calculation. Given the disorder and anharmonic effects in measured samples, disagreement between DFPT calculations and measured data are expected. As a result, lower agreement is expected between experimental and predicted DoS, since the groundtruths are based on DFPT calculations. Although the E(3)NN-predicted DoS do not match the fine features of the experimental spectra, several key features,

38

3 Data-Efficient Learning of Materials’ Vibrational Properties

Fig. 3.5 Comparison of ML-based and conventional acquisition of phonon DoS. (a) Comparison between E(3)NN model predictions (orange line), experimental data (blue dots), and calculated phonon DoS (black dashed line) reproduced from literature [52–65]. Where available, the corresponding calculated phonon DoS from the Materials Project (MP) database [49] is also shown (black solid line). (b) Boxplots of the mean squared errors (MSEs) between the experimental data and each of three computational approaches: tailored first principles calculations, first principles calculations from the Materials Project database, and E(3)NN predictions. Orange lines indicate median values

such as peak positions, gaps, and energy bandwidths, are still well-predicted and can be valuable in guiding experimental planning. In Fig. 3.5b, we compare the MSEs between the experimental data and all available computational approaches, including tailored first principles calculations accompanying the data in literature, first principles calculations from the Materials Project, and E(3)NN predictions. Specifically, 14 of the 16 experimental measurements are published alongside corresponding calculations, which are typically more fine-tuned than those in the MP database. Thus, we find the best agreement between the published calculations and experiment, with a median MSE less than 2 × 10−5 . Only 5 of the experimental compounds are found in the MP database; with limited statistics, we find a median MSE of approximately 3 × 10−5 . Interestingly, the median MSE of all 16 E(3)NN predictions is only slightly higher than that of the MP database, which we take as the state-of-the-art for high-throughput calculation. However, the E(3)NN model requires approximately 1 h of training time and is subsequently capable of evaluating thousands of phonon DoS with negligible computational cost, far exceeding the high-throughput capacity of the MP workflow. It is also worth noting that 3 of the 16 examples contain two unseen elements, Nd and U; that is, these elements were not present in any compounds used for training. Nonetheless, the model predictions are plausible and, importantly, capture the right energy bandwidths. A possible explanation of this unexpected generalizability to unseen elements is due to the atomic mass-weighting of the one-hot encoded element type; however, we leave further quantitative estimation of the limits to this generalizability to future work.

3.4 Phonon DoS Prediction

39

Fig. 3.6 High-throughput search for high specific heat capacity (CV ) materials. (a) Periodic table colored by average CV of materials containing each element. (b) Histogram showing the distribution of CV evaluated from E(3)NN-predicted phonon DoS. The inset shows the average phonon DoS of materials with the highest CV . (c) Distribution of predicted phonon DoS along the first two principal components, colored by the E(3)NN-predicted CV . The inset shows the first two principal axes in the original frequency basis. (d) Average frequency ω¯ of predicted phonon DoS versus the average atomic mass m ¯ of the corresponding unit cell. The black solid line represents the least squared fit to the hyperbolic relation ω¯ ∼ m ¯ −1/2 . (e) Comparison between E(3)NN-predicted (orange line) and DFPT-calculated (dashed black line) phonon DoS. (f) Comparison of specific heat capacities evaluated from E(3)NN-predicted and DFPT-calculated phonon DoS for the 12 examples in (e). Reproduced from [40] under a Creative Commons Attribution License (CC BY)

3.4.4 High-CV Materials Discovery To demonstrate a proposed application of our work, we apply our trained predictive model to conduct a high-throughput search for high heat capacity compounds from a set of 4346 unseen crystal structures without calculated phonon DoS in the Materials Project database [50]. We restrict the materials selection to those with up to 13 atoms in the primitive unit cell, consistent with most materials represented in [49]. As a check, we plot the average phonon frequency predicted phonon DoS √of the 2 ω¯ against the average atomic mass m ¯ = ( N1 N i=1 mi ) of atoms in the unit cell (Fig. 3.6d). The predicted data fit well to a hyperbolic curve ω¯ = C m ¯ −1/2 , where the constant C is a measure of the crystal rigidity. The reasonable distribution of rigidity supports the physical validity of our model’s predictions on unseen materials. We

40

3 Data-Efficient Learning of Materials’ Vibrational Properties

then directly compute the phononic specific heat capacity (CV ) using the predicted phonon DoS according to the relation [66], CV (T ) =

kB mtot

0

∞

hω ¯ 2kB T

2

csch2

h¯ ω 2kB T

g(ω)dω,

(3.5)

where mtot is the total mass of all N atoms in the unit cell, and the phonon DoS is normalized such that g(ω)dω = 3N. Figure 3.6a illustrates the average CV of crystalline solids containing a given element. As expected, materials containing light elements tend to have higher heat capacities. The distribution of CV evaluated from Eq. 3.5 is shown in Fig. 3.6b, where ∼2% of materials show a CV greater than 1000 J/(kg · K). The inset shows the average phonon DoS of the highest CV materials. Materials with higher CV appear to have high spectral weight at higher energies, consistent with expectation. This trend is also noticed by inspecting the scatter plot of phonon DoS along the first two principal components (Fig. 3.6c), where high heat capacity materials appear clustered with respect to the first principal axis. The first principal axis has a broad negative peak extending to high energies; thus, the clustering of high CV materials in the negative first principal direction parallels the shift of the phonon DoS towards higher energies. To validate our model’s predictions of high CV materials, we select 12 materials with ultrahigh predicted CV and carry out independent DFPT calculations. Since the maximum frequency of the training data was set to 1000 cm−1 , which is sufficient for the majority of materials, the CV evaluated by DFPT was also truncated at 1000 cm−1 for fair comparison. The phonon densities of states obtained by E(3)NN and DFPT in these high-CV materials are shown in Fig. 3.6e, where satisfactory agreement is achieved in most examples, with the exception of H4 NF and HCN. We attribute the large discrepancy in hydrogen- and lithium-rich materials to the electrostatic effect of hydrogen and lithium bonding [67], which may dominate the mass effect that is primarily considered in the current model. This further supports the inclusion of additional atomic features such as electronegativity to improve performance in light-element compounds. The CV values at room temperature, T = 293.15 K, evaluated from E(3)NN predictions and DFPT calculations, are plotted in Fig. 3.6f, showing excellent agreement for most materials. Values of the plotted CV are also presented in Table 3.2. The biggest discrepancies are found in the results for MgH2 , H4 NF, and HCN, which are partially induced by significant contributions to the DoS at higher energies outside the range considered for training (0–1000 cm−1 ), shown in Fig. 3.7. A potential remedy for this issue is to expand the energy range of the training data; however without increasing the network dimension, this will decrease the energy resolution without playing a significant role for a majority of materials. The good agreement between E(3)NN-predictions and DFPT ground-truth spectra achieved in the energy range chosen for this work sufficiently demonstrates the potential for generalizability of our model.

3.4 Phonon DoS Prediction

Fig. 3.7 Full phonon DoS of three materials with the largest discrepancies

Material Li4 HN H4 NF LiC MgH2 HCN Li9 Al4

E(3)NN

CV 2514.9 2396.6 2101.0 1897.8 1871.5 1540.3

CVDFPT 2320.2 3117.4 2138.8 1908.5 2047.2 1642.1

Material Li9 S3 N LiBeBO3 LiBO2 Li3 Al2 MgC2 LiMgN

E(3)NN

CV 1508.1 1318.1 1310.2 1307.4 1288.1 1246.5

CVDFPT 1498.3 1286.1 1172.9 1431.2 1307.8 1216.2 MgH2 H4 NF HCN

3 Phonon DoS (a.u.)

Table 3.2 Comparisons of predicted and DFPT-calculated CV (J/(kg · K))

41

2

1

0

0

2000 Frequency (cm−1)

4000

3.4.5 Partial Phonon Density of States Although the model described above was trained only to predict the total phonon DoS of pure phases and equilibrium structures, we here evaluate the model’s potential to inform certain applications that go beyond these outcomes. To better understand the performance of the trained model, it is interesting to examine not just the output, but also the intermediate features. Specifically, prior to the aggregation step, the network has learned a set of site-specific feature vectors, one for each node in the crystal graph. These can be recovered by simply omitting the aggregation step in the trained network (Fig. 3.8a). These site-specific feature vectors may be regarded as fingerprints of the atomic species subject to the surrounding environment, which together generate the total phonon DoS. Similarly, the projected or partial phonon DoS is the relative contribution of a particular atom to the total DoS. Thus, it is sensible to compare the two quantities in search of interpretability or physicality behind the learned features. In Fig. 3.8b, we survey 6 randomly chosen compounds, plotting the site-specific feature vectors averaged over atoms of the same type against the corresponding partial phonon DoS obtained from DFPT. Despite the fact that the E(3)NN model was not explicitly trained using partial DoS information, these disaggregated features bear a striking resemblance to the partial phonon DoS for a number of atoms. It should also be noted that apart from Au, Cu, and Ag, no other single-atom examples were present in the training data, yet the element-specific features mimic the partial DoS for a broad range of elements

42

3 Data-Efficient Learning of Materials’ Vibrational Properties

Fig. 3.8 Additional applications of the phonon DoS predictor. (a) Schematic illustration of the network architecture which outputs disaggregated predictions on each site in the unit cell, used in inspecting the learned feature vectors on each node. (b) Comparison between the site-specific feature vectors averaged over atoms of the same type (orange lines), and the corresponding partial phonon DoS obtained from DFPT (black lines) for 6 randomly selected compounds. The leftmost column shows the predicted (blue lines) and calculated (black lines) total phonon DoS. (c) Comparison between predicted (black lines) and calculated phonon DoS for the alloy Mg3 Sb2(1−p) Bi2p . Blue lines indicate DFPT-calculated results for Mg3 Sb2 and Mg3 Bi2 curated from [49] and [68], respectively. The orange line represents the VCA calculation for Mg3 Sb0.5 Bi1.5 (p = 0.75). The inset shows the direct comparison of the E(3)NN and VCA results on Mg3 Sb0.5 Bi1.5 . (d) Predicted (orange lines) and DFPT-calculated (blue dashed lines) phonon DoS of biaxially-strained SrTiO3 under compression (upper), no strain (middle), and tension (lower). Panels (c) and (d) are reproduced from [40] under a Creative Commons Attribution License (CC BY)

in the examples shown. While the agreement is certainly worse than that between the predicted and calculated total DoS, these results are a valuable indication of the model’s physicality and may further be used to correlate desired vibrational properties to the specific environments in which atoms are found.

3.4.6 Alloys and Strained Compounds A natural extension of the current model is the prediction of phonon DoS for alloyed systems, specifically crystalline alloys with substitutional disorder, which can be obtained with no additional computational cost using the existing implementation.

3.4 Phonon DoS Prediction

43

For example, given a binary alloy with composition Ap B1−p (0 ≤ p ≤ 1), the input feature vector can take the form of a two-hot encoded vector, xalloy = [0, . . . , pmA , . . . , (1 − p)mB , . . . , 0],

(3.6)

where pmA and (1 − p)mB are positioned at the vector indices corresponding the atomic numbers of A and B, respectively, and weighted by composition. With this definition, Eq. 3.6 can be directly reduced to the pure phase, one-hot encoded A (B) by simply setting p = 1 (p = 0), or it can be generalized to more complicated alloys by introducing more non-zero terms. In fact, Eq. 3.6 is inherently similar to the virtual crystal approximation (VCA) formulation, which defines an atomic mass average mVCA = pmA + (1 − p)mB and scattering potential average VVCA = pVA + (1 − p)VB . However, in the case of Eq. 3.6, the effect of the atomic mass is contained in the numerical values of pmA and (1 − p)mB , while the effect of the potential is encoded by turning on the vector indices that correspond to the atomic species A and B. Since the computation of pure phases and alloys differs only in the method of encoding the input features, the alloy calculation does not generate any additional computational cost. We demonstrate the power of this approach with the alloy Mg3 Sb2(1−p) Bi2p , with p ∈ [0, 1]. The model evaluation is done by the aforementioned two-hot encoding scheme while simultaneously interpolating the lattice constants between those of the two pure phases (namely, Mg3 Bi2 for p = 1 and Mg3 Sb2 for p = 0). In this case, both the input vectors and structures of the pure phases are recovered when p = 0 or 1. We use our trained model to predict the phonon DoS in the alloy Mg3 Sb0.5 Bi1.5 and compare it with independent VCA calculations, shown in Fig. 3.8c, and observe excellent agreement. The E(3)NN model used to evaluate this alloy system is identical to that described earlier with the addition of Mg3 Bi2 [68] in the training set; Mg3 Sb2 was already been included in the original training set. While the E(3)NN-based phonon DoS has demonstrated success in unseen elements and alloys, it faces a limitation when applied to strained materials. In Fig. 3.8d, we compare the neural network predictions with DFPT calculations for SrTiO3 under compressive, tensile, and no strain. In all cases, the phonon DoS can be evaluated from the corresponding supercells within seconds by the trained E(3)NN neural network. While the DFPT-computed results show clear evolution of the phonon DoS features with strain, such as symmetry breaking-induced peak splitting near 300 and 600 cm−1 , these features are not reproduced in the E(3)NN predictions. We attribute this discrepancy to the current model’s lack of knowledge about equilibrium bond lengths. Annotating the crystal graph with edge attributes that reflect this property may be one avenue for improving performance in this context, in addition to providing strain-dependent training data.

44

3 Data-Efficient Learning of Materials’ Vibrational Properties

3.5 Unsupervised Representation Learning of Vibrational Spectra In addition to representation of crystalline data types, machine learning of materials data can depend strongly on the effective representation of spectroscopic signatures. Since the phonon DoS spectrum is a momentum-integrated quantity, a somewhat coarse-grained representation such as the one implemented in this chapter is a sufficiently informative prediction target. However, momentum-resolved spectroscopic signatures often contain more detailed and sparse features, requiring an additional set of design rules for integration into machine learning workflows. As a prototypical example, we consider Raman spectroscopy, which captures materials’ vibrational properties in the form of highly resolved fingerprints with characteristic peaks, providing diagnostic evidence of their chemical and structural properties. The Raman scattering intensity is typically expressed as a function of the Raman shift in wavenumbers (cm−1 ), which is measurable from approximately 10 up to 4000 cm−1 with a spectral resolution on the order of 1 cm−1 . Thus, a single Raman spectrum can comprise thousands of data points, a relatively highdimensional data format for analysis. As a result, connecting Raman spectra to underlying structural and chemical attributes can be nontrivial, and first principles calculations typically incur a high computational cost. While integration of machine learning methods in traditional computational workflows promises to accelerate scientific calculations, accurate prediction of high-dimensional targets, e.g., Raman spectra, from comparatively low-dimensional inputs, such as structural and atomic properties, remains challenging. However, since Raman scattering events are typically the result of phonon creation, Raman lines (peaks) tend to coincide with a finite number of Raman-active phonon modes with characteristic shifts and linewidths. As a result, these spectra are often considerably sparse, and their informative features are captured effectively by a suitable low-dimensional representation. In particular, dimensionality reduction (DR) techniques such as principal component analysis (PCA) are commonly used in Raman spectral analysis in order to obtain a compressed (latent) representation that facilitates analysis or interpretation, which can be useful to detect outliers or otherwise discriminate between complex spectra [69, 70]. More recently, data-driven workflows have been proposed automate, accelerate, and improve Raman spectral analysis through denoising and reconstruction of low signal-to-noise ratio Raman signatures [71] and prediction or classification of a broad range of associated properties [72–77]. Machine learning methods have also been used to predict the surface-enhanced Raman spectroscopy signals of different molecular conformations [78]. However, direct prediction of Raman spectra from fundamental descriptors of chemically and structurally diverse materials remains an outstanding challenge. In this section, we show that unsupervised machine learning can be used to develop effective and interpretable representations of Raman spectra. We do so by optimizing a variational autoencoder (VAE) using an objective function inspired by optimal transport to obtain low-dimensional (latent) representations of Raman spectra. The VAE-based

3.5 Unsupervised Representation Learning of Vibrational Spectra

45

approach is evaluated against established unsupervised dimensionality reduction techniques in terms of reconstruction quality and interpretability of the learned representations. The latent representation is proposed for a number of relevant applications, such as improved evaluation of spectral similarity, widely used in spectral matching for substance identification [79, 80], or as an optimization space for associated materials’ properties. The latent features are further envisioned as effective prediction targets for machine learning models to subsequently recover Raman spectra from materials’ structural and chemical attributes. We apply our framework to ab initio Raman spectra of 2D materials with varying complexity [81].

3.5.1 Dimensionality Reduction The variational autoencoder (VAE) is an unsupervised deep generative model consisting of an encoder that reduces the dimensionality of the input and a decoder that reconstructs the input from the latent features [82]. Given an input observation x with m features, the encoder network Eθ outputs parameters to a probability density, qθ (z | x), from which the latent features z are sampled. Here, θ denotes a set of trainable weights parameterizing the encoder network. The decoder network, Dφ , parameterized by a set of weights φ, then outputs the parameters to the probability distribution of the data, pφ (x | z), using the sampled latent features z. The encoder and decoder network weights are optimized by minimizing the loss function, L = −Ez∼qθ (z|x) [log pφ (x | z)] + βDKL (qθ (z | x) || p(z)).

(3.7)

The first term represents the reconstruction loss, or the expected negative loglikelihood of x given z. The second term regularizes the distribution of latent features. In particular, DKL denotes the Kullback-Leibler (KL) divergence, computed between the approximate posterior distribution qθ (z | x) of the latent features and the prior distribution p(z). Due to convenient properties of the Gaussian distribution, such as an analytical form of the KL divergence, the prior distribution is typically modeled as a unit Gaussian, p(z) ∼ N (0, 1), and the approximate posterior qθ (z | x) as a Gaussian with mean and variance estimated by the encoder. This choice also facilitates efficient gradient computation using the reparameterization trick. In Eq. 3.7, β represents a hyperparameter regulating the degree of entanglement between the learned latent channels [83]. Making an explicit choice of the likelihood pφ (x | z) in Eq. 3.7 specifies the reconstruction loss function between the target x and prediction xˆ = Dφ (z); for example, for pφ (x | z) ∼ N (μ, σ 2 ), minimizing the negative log-likelihood of x with respect to θ is equivalent to minimizing the mean squared error (the square of the L2 loss or Euclidean distance) between x and x. ˆ However, we can expect a sufficiently complex, non-linear decoder to map the latent features to an arbitrarily complicated distribution of observations pφ (x | z). In particular, while the Euclidean distance can be used to quantify the difference between two spectra,

46

3 Data-Efficient Learning of Materials’ Vibrational Properties

Fig. 3.9 Method overview. (a) Joint probability density of Raman peak shifts and amplitudes in the 2D materials database, used to construct the synthetic spectra. White crosses indicate the sampled points for an example spectrum, illustrated at the bottom. (b) Schematic illustration of dimensionality reduction by a VAE with Wasserstein-1 loss

it is also highly sensitive to even slight shifts in the wavenumber of an individual peak. This makes similarity metrics that instead rely on the local integrated density, such as the Wasserstein-1 (W1 ) or Earth Mover’s distance, a more suitable choice when comparing spectra with sparse features [84]. The W1 metric computes the distance between two probability distributions as the L1 loss between their cumulative distribution functions (CDFs). Thus, by treating the area-normalized Raman spectrum x as a probability density function (PDF), we can train the VAE to predict the corresponding CDF, X, using the W1 metric as the reconstruction ˆ This choice of reconstruction loss loss between the target X and predicted X. corresponds to a zero-mean Laplace distribution for the likelihood, pφ (X | z) ∝ exp(−λ(|X − Dφ (z))). Figure 3.9b illustrates the proposed VAE-based DR scheme for Raman spectra, which employs a softmax-activated output layer to ensure that the predicted PDF x and CDF X are properly normalized before computing the loss.

3.5.2 Data Processing Methods In this work, we assess the reconstruction quality and interpretability of the VAE in comparison to traditional linear and kernel-based unsupervised dimensionality reduction techniques surveyed in Chap. 2 in the context of computed Raman spectral examples obtained from the library of ab initio Raman spectra of 2D materials [81]. The computational database contains calculated Raman tensors and corresponding phonon frequencies of 733 monolayers, of which 717 were used in this study. The excluded examples contained Raman peaks above 3000 cm−1 while the selected 717 spectra had all Raman peaks below 1700 cm−1 , allowing us to limit the input and output feature dimensions to ∼1700 by discretizing the spectra with a resolution of 1 cm−1 . The resulting dataset contains spectra from compounds with diverse chemistry and up to 28 peaks over the selected range. To generate training, validation, and testing data for each DR technique, we constructed 10,000 synthetic spectra modeled after the computational datasets. First, a joint probability density of peak positions and intensities (amplitudes) was estimated using the peak statistics of the dataset. To generate each synthetic spectrum, the total

3.5 Unsupervised Representation Learning of Vibrational Spectra

47

number of peaks N was sampled at random from the distribution of peak numbers in the dataset. The corresponding joint probability density was then sampled N times to obtain the position and amplitude of each peak, shown schematically in Fig. 3.9a. Each peak lineshape was modeled as a Lorentz distribution parameterized by the sampled position q0(i) and amplitude x0(i) , as well as a scale parameter γ (i) , and aggregated to produce the synthetic spectrum x(q), ⎛ 2 ⎞−1 N x0(i) q − q0(i) ⎝1 + ⎠ , x(q) = π γ (i) γ (i)

(3.8)

i=1

evaluated over a predefined range of wavenumbers q. Each γ (i) was set uniformly to 10 cm−1 , which approximates the half-width at half-maximum of each peak. Each synthetic spectrum x(q) was normalized to have an area of 1 over the selected range of q values, and the numerical CDF X(q) was then computed over the same range. The 10,000 synthetic spectra were divided among training, validation, and testing sets using a 80/10/10% split.

3.5.3 Results We evaluate the reconstruction fidelity and interpretability of latent representations obtained through PCA, kPCA, SVD, NMF, and VAE DR methods. In each case, the reconstruction quality is evaluated using six different similarity metrics defined in Table 3.3: L1 , L2 , cosine similarity, Wasserstein-1 distance (W1 ), energy distance, and Pearson correlation coefficient (PCC). The DR methods were trained with latent dimensions of 26, selected by balancing the minimization of both the loss and size of the latent space. Figure 3.10a and c shows the reconstruction quality Table 3.3 Similarity metrics used to assess reconstruction fidelity

Similarity metric L1 L2 Cosine similarity

Mathematical expression i |xi − yi |

2 i |xi − yi | x yi

i i 2 2 i xi i yi

Energy

a i |Xi − Yi |

2a 2 i (Xi − Yi )

Pearson correlation coefficient

√

W1

a b

i (xi −μx )(yi −μy ) 2 2 i (yi −μy )

i (xi −μx )

X and Y are the CDFs of x and y, respectively μx and μy are the mean values of x and y, respectively

b

48

3 Data-Efficient Learning of Materials’ Vibrational Properties

Fig. 3.10 Reconstruction quality for 2D materials dataset. (a) Map of average reconstruction errors (rows) between the true synthetic and predicted spectra obtained by each DR method (columns). See main text for a description of each similarity metric used to evaluate the reconstruction error. (b) Example spectra reconstructed by each DR method from the corresponding latent representation. (c) Reconstruction error map and (d) example reconstructions of each DR method operating on the original Raman spectra of the 2D materials database

of each DR method evaluated on the synthetic test dataset and original Raman spectra, respectively, in terms of the six similarity metrics. kPCA is observed to be the best performing model in terms of L1 , L2 , cosine similarity, and PCC metrics, while the VAE performs best according to the W1 and energy distance metrics. Four representative examples of spectra and their reconstructions by each

3.5 Unsupervised Representation Learning of Vibrational Spectra

49

Fig. 3.11 Latent representations of synthetic test data based on 2D materials dataset. (a) Projection of the latent representation along the two most significant latent dimensions, colored by the value X∗ = X(q ∗ ). (b) Correlation between true values of X ∗ and those predicted by a linear regressor operating on the full set of 26 latent features. (c) Same as (b) but operating on the two most significant latent features. See main text for specific treatment of NMF

DR method are shown in Fig. 3.10b and d for the synthetic and original spectra, respectively. It is generally observed that the traditional methods can recover finer spectral peaks that are occasionally washed out in the VAE reconstruction. However, these methods also appear to suffer from ringing artifacts that often resemble peak structures, particularly in the case of PCA and SVD approaches. This suggests that prioritization of the integrated density-based metrics such as W1 and energy distance, in terms of which PCA and SVD tend to be worse-performing, are a more effective choice to evaluate reconstruction quality in the context of Ramanlike signatures with sparse, peaked features. The latent axes produced by PCA, kPCA, and SVD methods are typically sorted in decreasing order of the corresponding singular value. The first latent axis reflects the most variance in the data, and the last axis the least. Because NMF is a purely additive approach, there is no explicit indication for the relative importance of different features in the order of latent axes. For the VAE, the latent features likewise do not appear in any particular order; however, a useful order is obtained by sorting the axes in decreasing order of the variance in the predicted mean value of z. In Fig. 3.11a, we show a two-component latent representation of the synthetic test data obtained by each DR method. With the exception of NMF, these two components represent the most significant latent features. In the case of NMF, this latent representation is obtained by re-optimizing the NMF model using only two latent dimensions. Each point is colored according to the value X∗ = X(q ∗ ), where q ∗ corresponds to the wavenumber at which X exhibits the maximal variance.

50

3 Data-Efficient Learning of Materials’ Vibrational Properties

Fig. 3.12 Latent representations of original Raman spectra from 2D materials dataset. (a) Projection of the latent representation along the two most significant latent dimensions, colored by the value X∗ = X(q ∗ ). (b) Correlation between true values of X ∗ and those predicted by a linear regressor operating on the full set of 26 latent features. (c) Same as (b) but operating on the two most significant latent features. See main text for specific treatment of NMF

This quantity is a useful estimate of the smoothness and separability of the latent space according to meaningful trends in the underlying spectra. We quantify the separability by performing linear regression to recover the values X∗ from the full (26-component) and reduced (2-component) latent representations, shown in Fig. 3.11b and c, respectively. While utilization of the full set of latent features results in near perfect recovery with R 2 values at or above 0.99, reliance on just two components shows a marked decrease in performance for PCA and kPCA methods, with VAE performing best. This discrepancy is more apparent when the trained DR models are applied to the original computational spectra, as shown in Fig. 3.12. These results indicate an advantage of SVD, NMF, and VAE approaches over the centered methods like PCA and kPCA to separate the actions of specific spectral transformations along distinct latent channels. As a more realistic case study, we also consider the separability of the latent representations obtained for the original computational Raman spectra according to physical attributes of the underlying crystal structures. Specifically, in Fig. 3.13a we plot the same latent representations shown in Fig. 3.12a, now colored according to the average value of atomic radii in the unit cell, μAtomicRadius . In this case, the ability of a linear regressor to recover μAtomicRadius from the full set of latent features ranges from R 2 values of 0.68 for PCA and kPCA methods (worst), to 0.78 for the VAE approach (best). When considering only a two-component representation, SVD, NMF, and VAE methods tend to perform comparably, while the performance of PCA and kPCA methods is substantially poorer. This suggests that the VAE approach is a strong candidate for

3.6 Conclusion

51

Fig. 3.13 Interpreting the latent representations of original Raman spectra from 2D materials dataset. (a) Projection of the latent representation along the two most significant latent dimensions, colored by the average value of atomic radii in the unit cell, μAtomicRadius . (b) Correlation between true values of μAtomicRadius and those predicted by a linear regressor operating on the full set of 26 latent features. (c) Same as (b) but operating on the two most significant latent features. See main text for specific treatment of NMF

learning low-dimensional spectral representations with meaningful latent channels without explicit conditioning on physical attributes.

3.6 Conclusion In this chapter, we first presented a machine learning model to directly predict phonon DoS using only accessible structural and atomic features, namely atomic positions, species, and masses. Due to their equivariance to Euclidean transformations, Euclidean neural networks are able to capture the symmetries of the input crystal structures, making them both physically informative and data-efficient. A small training set of only 1200 examples is sufficient to generate meaningful predictions without extensive data augmentation. Importantly, our method of input feature embedding extends naturally to alloy systems, which can be predicted with the same computational cost as pure phases. One limitation of the current model is identified in examples with elastic strain, where we propose some directions for future work. Even so, Euclidean neural networks can be applied to predicting broader properties in crystalline solids, where there are often issues of data scarcity. Second, we offered a perspective on unsupervised representation learning of materials’ vibrational signatures, namely Raman spectra, which typically exhibit

52

3 Data-Efficient Learning of Materials’ Vibrational Properties

fine yet sparse features compared to momentum-integrated measurements. The VAE-based DR technique proposed in this chapter is envisioned for a number of potential applications which are the subject of ongoing work, including the prediction of Raman spectra from materials’ structural and chemical attributes, and improved evaluation of spectral similarity in the context of spectral matching for substance identification. In contrast to ab initio calculations and inelastic scattering experiments that acquire vibrational spectra deterministically, a machine learning model is datadriven and probabilistic in nature. It is thus impractical to fully rely on such a model to acquire materials properties without further validation. However, the power of the ML approach goes far beyond obtaining property values at the individual material level. From an experimental perspective, the ML model can inform experimental planning to optimize the use of limited national facility resources. From a materials design perspective, ML demonstrates extremely high efficiency in rapidly screening candidates with a target property. The Dulong-Petit law poses a grand challenge in searching for promising materials for thermal storage [85], and a highly efficient approach can further enable inverse design from target properties to candidate structures. Our model provides a promising framework to facilitate high-throughput screening and guide experimental planning for materials with exceptional thermal properties, elucidating the fundamental links between symmetry, structure, and elementary excitations in condensed matter.

References 1. Giustino, F. (2017). Electron-phonon interactions from first principles. Reviews of Modern Physics, 89, 015003. 2. Zhang, J., et al. (2015). Molecular dynamics study of interfacial thermal transport be tween silicene and substrates. Physical Chemistry Chemical Physics, 17, 23704– 23710. 3. Wu, Y. J., Fang, L., & Xu, Y. (2019). Predicting interfacial thermal resistance by machine learning. npj Computational Materials, 5, 56. 4. Bardeen, J., Cooper, L. N., & Schrieffer, J. R. (1957). Theory of superconductivity. Physical Review, 108, 1175. 5. Baroni, S., De Gironcoli, S., Dal Corso, A., & Giannozzi, P. (2001). Phonons and related crystal properties from density-functional perturbation theory. Reviews of Modern Physics, 73, 515. 6. Seyf, H. R., et al. (2017). Rethinking phonons: The issue of disorder. npj Computational Materials, 3, 49. ISSN: 2057-3960. 7. Ramprasad, R., Batra, R., Pilania, G., Mannodi-Kanakkithodi, A., & Kim, C. (2017). Machine learning in materials informatics: recent applications and prospects. npj Computational Materials, 3, 1–13. 8. Schmidt, J., Marques, M. R., Botti, S., & Marques, M. A. (2019). Recent advances and applications of machine learning in solid-state materials science. npj Computational Materials, 5, 1–36. 9. Raccuglia, P., et al. (2016). Machine-learning-assisted materials discovery using failed experiments. Nature, 533, 73–76. 10. Oliynyk, A. O., et al. (2016). High-throughput machine-learning-driven synthesis of fullHeusler compounds. Chemistry of Materials, 28, 7324–7331.

References

53

11. Gómez-Bombarelli, R., et al. (2018). Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4, 268–276. 12. Liu, Y., Zhao, T., Ju, W., & Shi, S. (2017). Materials discovery and design using machine learning. Journal of Materiomics, 3. High-throughput Experimental and Modeling Research toward Advanced Batteries, 159–177. ISSN: 2352-8478. 13. Häse, F., Roch, L. M., & Aspuru-Guzik, A. Chimera: enabling hierarchy based multi-objective optimization for self-driving laboratories. Chemical Science, 9, 7642–7655 (2018). 14. Granda, J. M., Donina, L., Dragone, V., Long, D. L., & Cronin, L. (2018). Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature, 559, 377– 381. ISSN: 1476-4687. 15. Pilania, G., Wang, C., Jiang, X., Rajasekaran, S., & Ramprasad, R. (2013). Accelerating materials property predictions using machine learning. Scientific Reports, 3, 1–6. 16. Isayev, O., et al. (2017). Universal fragment descriptors for predicting properties of inorganic crystals. Nature Communications, 8, 1–12. 17. Xie, T., & Grossman, J. C. (2018). Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Physical Review Letters, 120, 145301. 18. Chen, C., Ye, W., Zuo, Y., Zheng, C., & Ong, S. P. (2019). Graph networks as a universal machine learning framework for molecules and crystals. Chemistry of Materials, 31, 3564– 3572. 19. Carrete, J., Li, W., Mingo, N., Wang, S., & Curtarolo, S. (2014). Finding unprecedentedly lowthermal-conductivity half-Heusler semiconductors via high-throughput materials modeling. Physical Review X, 4, 011019. 20. Van Roekeghem, A., Carrete, J., Oses, C., Curtarolo, S., & Mingo, N. (2016). High-throughput computation of thermal conductivity of high-temperature solid phases: the case of oxide and fluoride perovskites. Physical Review X, 6, 041061. 21. Tawfik, S. A., Isayev, O., Spencer, M. J., & Winkler, D. A. (2020). Predicting thermal properties of crystals using machine learning. Advanced Theory and Simulations, 3, 1900208. 22. Mortazavi, B., et al. (2020). Machine-learning interatomic potentials enable first-principles multiscale modeling of lattice thermal conductivity in graphene/borophene heterostructures. Materials Horizons, 7, 2359. 23. Ward, L., Agrawal, A., Choudhary A., & Wolverton, C. (2016). A general-purpose machine learning framework for predicting properties of inorganic materials. npj Computational Materials, 2, 1–7. 24. Zhuo, Y., Mansouri Tehrani, A., & Brgoch, J. (2018). Predicting the band gaps of inorganic solids by machine learning. The Journal of Physical Chemistry Letters, 9, 1668–1673. 25. Dong, Y., et al. (2019). Bandgap prediction by deep learning in configurationally hybridized graphene and boron nitride. npj Computational Materials, 5, 1–8. 26. Meredig, B., et al. (2018). Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery. Molecular Systems Design & Engineering, 3, 819–825. 27. Stanev, V., et al. (2018). Machine learning modeling of superconducting critical temperature. npj Computational Materials, 4, 1–14. 28. Andrejevic, N., Andrejevic, J., Rycroft, C. H., & Li, M. (2020). Machine learning spectral indicators of topology. Preprint. arXiv:2003.00994. 29. Claussen, N., Bernevig, B. A., & Regnault, N. (2020). Detection of topological materials with machine learning. Physical Review B, 101, 245117. 30. Scheurer, M. S., & Slager, R. J. (2020). Unsupervised machine learning and band topology. Physical Review Letters, 124, 226401. 31. Li, Z., Kermode, J. R., & De Vita, A. (2015). Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. Physical Review Letters, 114, 096405. 32. Chmiela, S., et al. (2017). Machine learning of accurate energy-conserving molecular force fields. Science Advances, 3, e1603015. 33. Kruglov, I., Sergeev, O., Yanilkin, A., & Oganov, A. R. (2017). Energy-free machine learning force field for aluminum. Scientific Reports, 7, 1–7.

54

3 Data-Efficient Learning of Materials’ Vibrational Properties

34. Glielmo, A., Sollich, P., & De Vita, A. (2017). Accurate interatomic force fields via machine learning with covariant kernels. Physical Review B, 95, 214302. 35. Botu, V., Batra, R., Chapman, J., & Ramprasad, R. (2017). Machine learning force fields: construction, validation, and outlook. The Journal of Physical Chemistry C, 121, 511–522. 36. Legrain, F., et al. (2018). Vibrational properties of metastable polymorph structures by machine learning. Journal of Chemical Information and Modeling, 58, 2460–2466. 37. Zhang, L., Lin, D. Y., Wang, H., Car, R., & Weinan, E. (2019). Active learning of uniformly accurate interatomic potentials for materials simulation. Physical Review Materials, 3, 023804. 38. Geiger, M., et al. (2020). e3nn: A modular framework for Euclidean neural networks, version 0.1.1. https://doi.org/10.5281/zenodo.5292912 39. Chen, Z., et al. (2021). Direct prediction of phonon density of states with Euclidean neural networks. Advanced Science, 8, 2004214. 40. Chen, Z., Andrejevic, N., & Smidt, T. (2020). Code repository: Direct prediction of phonon density of states with Euclidean neural networks. https://github.com/zhantaochen/ phonondos_e3nn 41. Andrejevic, N., & Chen, Z. (2021). Tutorial: Predicting phonon DoS with Euclidean neural networks. https://github.com/ninarina12/phononDoS_tutorial 42. Musil, F., et al. (2021). Physics-inspired structural representations for molecules and materials. Chemical Reviews, 121, 9759–9815. 43. Jiang, Y., et al. (2021). Topological representations of crystalline compounds for the machinelearning prediction of materials properties. npj Computational Materials, 7, 1–8. 44. Thomas, N., et al. (2018). Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. arXiv e-prints, arXiv:1802.08219 [cs.LG]. 45. Kondor, R., Lin, Z., & Trivedi, S. (2018). Clebsch–gordan nets: a fully fourier space spherical convolutional neural network. Advances in Neural Information Processing Systems, 32, 10117– 10126. 46. Weiler, M., Geiger, M., Welling, M., Boomsma, W., & Cohen, T. (2018). 3D steerable CNNs: Learning rotationally equivariant features in volumetric data. Advances in Neural Information Processing Systems, 32, 10402–10413. 47. Smidt, T. E., Geiger, M., & Miller, B. K. (2020). Finding symmetry breaking order parameters with euclidean neural networks. arXiv e-prints, arXiv: 2007.02005 [cs.LG]. 48. Miller, B. K., Geiger, M., Smidt, T. E., & Noé, F. (2020). Relevance of rotationally equivariant convolutions for predicting molecular properties. arXiv e-prints, arXiv: 2008.08461 [cs.LG]. 49. Petretto, G., et al. (2018). High-throughput density-functional perturbation theory phonons for inorganic materials. Scientific Data, 5, 180065. 50. Jain, A., et al. (2013). The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials, 1, 011002. ISSN: 2166532X. 51. Lynn, J., Smith, H., & Nicklow, R. (1973). Lattice dynamics of gold. Physical Review B, 8, 3493. 52. Choudhury, N., Walter, E. J., Kolesnikov, A. I., & Loong, C. K. (2008). Large phonon band gap in SrTiO3 and the vibrational signatures of ferroelectricity in ATiO3 perovskites: Firstprinciples lattice dynamics and inelastic neutron scattering. Physical Review B, 77, 134111. 53. Shimada, D., Tsuda, N., Paltzer, U., & De Wette, F. (1998). Tunneling phonon structures and the calculated phonon density of states for Bi2Sr2CaCu2O8. Physica C: Superconductivity, 298, 195–202. 54. Bedoya-Martínez, O., Hashibon, A., & Elsässer, C. (2016). Influence of point defects on the phonon thermal conductivity and phonon density of states of Bi2Te3. Physica Status Solidi (a), 213, 684–693. 55. Price, D., Ghose, S., Choudhury N., Chaplot, S., & Rao, K. (1991). Phonon density of states in fayalite, Se2SiO4. Physica B: Condensed Matter, 174, 87–90. 56. Nipko, J., Loong, C. K., Balkas, C., & Davis, R. (1998). Phonon density of states of bulk gallium nitride. Applied Physics Letters, 73, 34–36. 57. Christianson, A. D., et al. (2008). Phonon density of states of LaFeAsO1-xFx. Physical Review Letters, 101, 157004.

References

55

58. Lynn, J., et al. (1991). Phonon density of states and superconductivity in Nd1.85Ce0.15CuO4. Physical Review Letters, 66, 919. 59. Le Tacon, M., Krisch, M., Bosak, A., Bos, J. W., & Margadonna, S. (2008). Phonon density of states in NdFeAsO1-xFx. Physical Review B, 78, 140505. 60. Rauh, H., Geick, R., Kohler, H., Nucker, N., & Lehner, N. (1981). Generalized phonon density of states of the layer compounds Bi2Se3, Bi2Te3, Sb2Te3 and Bi2(Te0.5Se0.5)3, (Bi0.5Sb0.5)2Te3. Journal of Physics C: Solid State Physics, 14, 2705. 61. Pang, J. W., et al. (2014). Phonon density of states and anharmonicity of UO2. Physical Review B, 89, 115132. 62. Achar, B., & Barsch, G. (1976). Phonon density of states of V3Si. Physics Letters A, 59, 65–66. 63. Renker, B., et al. (1988). Phonon density-of-states for high-Tc (Y, RE)Ba2Cu3O7 superconductors and non-superconducting reference systems. Zeitschrift für Physik B Condensed Matter, 71, 437–442. 64. Chaplot, S., Pintschovius, L., Choudhury N., & Mittal, R. (2006). Phonon dispersion relations, phase transitions, and thermodynamic properties of ZrSiO4: Inelastic neutron scattering experiments, shell model, and first-principles calculations. Physical Review B, 73, 094308. 65. Mittal, R., & Chaplot, S. (2000). Phonon density of states and thermodynamic properties in cubic and orthorhombic phases of ZrW2O8. Solid State Communications, 115, 319–322. 66. Lee, C., & Gonze, X. (1995). Ab initio calculation of the thermodynamic properties and atomic temperature factors of SiO2 ?-quartz and stishovite. Physical Review B, 51, 8610. 67. Shahi, A., & Arunan, E. (2014). Hydrogen bonding, halogen bonding and lithium bonding: an atoms in molecules and natural bond orbital perspective towards conservation of total bond order, inter-and intra-molecular bonding. Physical Chemistry Chemical Physics, 16, 22935– 22952. 68. Agne, M. T., et al. (2018). Heat capacity of Mg3Sb2, Mg3Bi2, and their alloys at high temperature. Materials Today Physics, 6, 83–88. 69. Gautam, R., Vanga, S., Ariese, F., & Umapathy S. (2015). Review of multidimensional data processing approaches for Raman and infrared spectroscopy. EPJ Techniques and Instrumentation, 2, 1–38. 70. He, X., et al. (2018). Raman spectroscopy coupled with principal component analysis to quantitatively analyze four crystallographic phases of explosive CL-20. RSC Advances, 8, 23348–23352. 71. Horgan, C. C., et al. (2020). High-throughput molecular imaging via deep learning enabled Raman spectroscopy. Preprint. arXiv:2009.13318. 72. Cui, A., et al. (2019). Decoding Phases of matter by machine-learning raman spectroscopy. Physical Review Applied, 12, 054049. 73. Zhang, X., Lin, T., Xu, J., Luo, X., & Ying, Y. (2019). DeepSpectra: An end-to-end deep learning approach for quantitative spectral analysis. Analytica Chimica Acta, 1058, 48–57. 74. Chatzidakis, M., & Botton, G. (2019). Towards calibration-invariant spectroscopy using deep learning. Scientific Reports, 9, 1–10. 75. Fine, J. A., Rajasekar, A. A., Jethava, K. P., & Chopra, G. (2020). Spectral deep learning for prediction and prospective validation of functional groups. Chemical Science, 11, 4618–4630. 76. Carey, C., Boucher, T., Mahadevan, S., Bartholomew, P., & Dyar, M. (2015). Machine learning tools for mineral recognition and classification from Raman spectroscopy. Journal of Raman Spectroscopy, 46, 894–903. 77. Yu, S., et al. (2021). Analysis of Raman spectra by using deep learning methods in the identification of marine pathogens. Analytical Chemistry, 93, 11089–11098. 78. Hu, W., et al. (2019). Machine learning protocol for surface-enhanced Raman spectroscopy. The Journal Of Physical Chemistry Letters, 10, 6026–6031. 79. Khan, S. S., & Madden, M. G. (2012). New similarity metrics for Raman spectroscopy. Chemometrics and Intelligent Laboratory Systems, 114, 99–108. 80. Samuel, A. Z., et al. (2021). On selecting a suitable spectral matching method for automated analytical applications of Raman spectroscopy ACS Omega, 6, 2060–2065.

56

3 Data-Efficient Learning of Materials’ Vibrational Properties

81. Taghizadeh, A., Leffers, U., Pedersen, T. G., & Thygesen, K. S. (2020). A library of ab initio Raman spectra for automated identification of 2D materials. Nature Communications, 11, 1– 10. 82. Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. Preprint. arXiv:1312.6114. 83. Higgins, I., et al. (2017). VAE: Learning basic visual concepts with a constrained variational framework, in Proc. ICLR. 84. Seifert, N. A., Prozument, K., & Davis, M. J. (2021). Computational optimal transport for molecular spectra: The fully discrete case. The Journal of Chemical Physics, 155, 184101. 85. Henry A., Prasher, R., & Majumdar, A. (2020). Five thermal energy grand challenges for decarbonization. Nature Energy, 1–3.

Chapter 4

Machine Learning-Assisted Parameter Retrieval from Polarized Neutron Reflectometry Measurements

Abstract Polarized neutron reflectometry is a powerful technique to interrogate the structures of multilayered magnetic materials with depth sensitivity and nanometer resolution. However, reflectometry profiles often inhabit a complicated objective function landscape using traditional fitting methods, posing a significant challenge for parameter retrieval. In this chapter, we develop a data-driven framework to recover the sample parameters from polarized neutron reflectometry data with minimal user intervention. In particular, we train a variational autoencoder to map reflectometry profiles with moderate experimental noise to an interpretable, low-dimensional space from which sample parameters can be extracted with high resolution. We apply our method to recover the scattering length density profiles of the topological insulator–ferromagnetic insulator heterostructure Bi2 Se3 /EuS exhibiting proximity magnetism in good agreement with the results of conventional fitting. We further analyze a more challenging reflectometry profile of the topological insulator–antiferromagnet heterostructure (Bi,Sb)2 Te3 /Cr2 O3 and identify possible interfacial proximity magnetism in this material. We anticipate that the framework developed here can be applied to resolve hidden interfacial phenomena in a broad range of layered systems. Parts of this chapter are reprinted from Andrejevic et al. [1] with the permission of AIP Publishing.

4.1 Introduction Polarized neutron reflectometry (PNR) facilitates structural characterization of multilayered materials with depth sensitivity and subnanometer resolution. By probing materials’ nuclear and magnetic depth profiles, PNR has enabled the study of hidden interfaces in a broad range of nanostructured and thin film systems [2–17]. Moreover, by leveraging the interaction of spin-polarized neutrons with magnetic moments, PNR can be used to detect magnetic interfacial phenomena [18– 25], including the magnetic proximity effect (MPE). The MPE is a phenomenon in which a magnetic material induces magnetic order near the interface of an otherwise non-magnetic system. Due to its non-disruptive mechanism, proximity coupling is a promising pathway for magnetizing topological insulators (TIs) © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Andrejevic, Machine Learning-Augmented Spectroscopies for Intelligent Materials Design, Springer Theses, https://doi.org/10.1007/978-3-031-14808-8_4

57

58

4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron. . .

without introducing magnetic dopants [26–43]. This opens up the possibility of realizing emergent phenomena such as the quantum anomalous Hall effect [44–48] or axion insulator state [49–51] at room temperature and advancing TI-based device applications. More recently, the realization of proximity magnetism in van der Waals heterostructures [52–68] presents new opportunities to engineer atomically thin devices with novel functionalities, and at the same time highlights an increasing need for precise characterization of interfacial effects at subnanometer length scales. Thus, accurate quantitative analysis of magnetic structural information obtained by PNR is critical to resolving important interfacial effects within a broad range of materials systems. However, subtle interfacial phenomena such as proximity magnetism can be difficult to single out from bulk contributions to the PNR signature. While PNR profiles can be fit to a theoretical model of the experimental system to retrieve the magnetic and structural parameters, the optimization landscape is often complex as a result of information loss about the phase of the reflected neutrons. This leads to an ambiguity in the fitted parameters and potentially incorrect solutions, particularly with poor parameter initialization. In this chapter, we develop an alternate, data-driven framework to retrieve the sample parameters of candidate proximity-coupled systems from their PNR profiles with minimal user intervention [1]. Using a variational autoencoder (VAE), we map reflectometry profiles simulated from a broad range of candidate physical parameters to a low-dimensional latent space from which the true sample parameters can be readily obtained. The decoded profiles directly inform the suitability of the parameter search space through the reconstruction quality and are robust to moderate perturbation of the input reflectivities emulating experimental noise. Importantly, we find that the latent mapping naturally bypasses the complexity of the original loss landscape, and is both well-organized and visually interpretable in terms of the physical parameters. Thus, the latent representation can further be used to automatically refine the parameter search space for poorly reconstructed profiles. We evaluate our model on its ability to recover the sample parameters of a prototypical TI-ferromagnetic (FM) insulator heterostructure, Bi2 Se3 /EuS, exhibiting proximity magnetism, as well as a more challenging PNR profile of the TI-antiferromagnet (AFM) heterostructure, (Bi,Sb)2 Te3 /Cr2 O3 , which points to possible proximity magnetism at the resolution limit. Original code for application to other candidate systems is also made publicly available [69].

4.2 Polarized Neutron Reflectometry Neutron reflectometry measures the specular reflection of an incident beam from a thin film surface. The reflectometry profile, R(Q), is a function of the wave vector transfer Q = 4π sin θ/λ, where θ is the angle of reflection and λ is the wavelength of the neutron. In the Born approximation, this quantity is related to the nuclear scattering length density (SLD) according to [21]

4.2 Polarized Neutron Reflectometry

R(Q) =

2 16π 2 ∞ −iQz ρ (z)e dz N , 2 Q −∞

59

(4.1)

where ρN = i ni bi is the nuclear scattering length density, ni is the number of scattering centers, e.g., atoms, of type i per unit volume in the compound, and bi is the corresponding coherent scattering length. The coordinate z measures the depth perpendicular to the sample surface. In the presence of magnetic thin films, the neutron magnetic moment also interacts with the atomic moments present in the sample, giving rise to the spin-dependent reflectometry profile, R

±±

2 16π 2 ∞ −iQz (Q) = dz , (ρN (z) ± ρM (z) cos φ) e Q2 −∞

(4.2)

where ρM = i ni pi is the magnetic scattering length density, ni is the number density of magnetic atoms of type i in the compound, pi is the corresponding magnetic scattering length, and φ is the angle between the magnetization and the neutron polarization. The superscript + and − denote the neutron spin up and down states, respectively. The composition and magnetic profiles can in theory be recovered from the reflectometry profiles of a sample by simultaneously fitting the spin-dependent data to a candidate model of the nuclear and magnetic scattering length densities. This traditionally involves building a theoretical model of the experimental system in terms of structural parameters—density, thickness, interface roughness, and magnetization—of the constituent layers, and simulating the associated reflectometry profile using the methods of Parratt recursion [70] or the Abeles matrix formalism [71]. However, due to information loss about the phase of the reflected neutrons, different SLD profiles can generate highly similar reflectivities, leading to a complicated cost landscape between the theoretical and experimental profiles with potentially many local minima. Thus, parameter refinement often demands expert insight to identify a suitable starting point and adequately constrain the parameter space of the model. Methods to resolve the phase ambiguity through carefully designed experiments [72–76] have also been proposed. More generally, additional insights from X-ray diffraction, transmission electron microscopy, and/or bulk magnetometry are required, as well as selection of appropriate models [77, 78] and optimization methods, which can play a critical role in steering the refinement process. For example, many existing neutron and Xray reflectivity refinement programs (e.g. GenX [79], Refl1D [80], StochFit [81]), implement stochastic optimization methods such as differential evolution, simulated annealing, or stochastic tunneling, to better manage multiple local minima. More recently, machine learning-guided fitting approaches have been introduced to both improve and automate parameter retrieval from neutron reflectometry data with promising results [82–86]. However, elucidating subtle interfacial phenomena such as magnetic proximity from PNR signatures remains a significant challenge.

60

4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron. . .

4.3 Variational Autoencoder The data-driven approach developed in this chapter is based on the variational autoencoder (VAE) [87], a generative model trained to reconstruct the input from a low-dimensional encoding by minimizing the error between the input and prediction. The VAE is introduced at the end of Chap. 3 but is here summarized for completeness. In a VAE, the encoded—or latent—vector elements are drawn from independent standard normal distributions, an assumption enforced by an additional regularization term in the loss function, L = x − x 2 + βDKL (q(z|x)||p(z)),

(4.3)

where x and x represent the input and prediction, respectively, DKL denotes the Kullback-Leibler (KL) divergence, which is computed between the returned distribution q(z|x) of the latent vector z and the prior distribution p(z), and β is a hyperparameter regulating the degree of entanglement between the learned latent channels [88]. Specifically, a large β enforces the prior more strongly and thereby encourages learning statistically independent latent factors. By encoding the input as a distribution rather than as a single point, the VAE compels the latent space to be smooth and continuous, with nearby points corresponding to similar reconstructions of the input. Thus, in the context of PNR, the VAE can be considered as a way to map PNR profiles into a well-organized and informative low-dimensional space, as they naturally evolve as a function of a few well-defined structural parameters. To demonstrate the potential advantages of VAE-based parameter retrieval, we first consider a simplified example illustrated in Fig. 4.1. The objective is to determine the sample parameters y1 and y2 that best reproduce the comparatively high-dimensional target signature xtarget , denoted in red in Fig. 4.1a. In this simple example, the landscape of a mean squared error (MSE) loss L = x − xtarget 2 contains one global and one local minimum in the (y1 , y2 )-space; thus, rather than optimize directly in this space, we train the VAE to recover the input from a low (3)dimensional latent representation, z (Fig. 4.1b). Since the latent space is organized such that nearby points correspond to similar reconstructions of the input, the loss function viewed in this space has only a single, global minimum in the region of interest (Fig. 4.1d). Additionally, if the target signature were instead an outlier of the chosen domain for parameters y1 and y2 (teal point in Fig. 4.1a), we could readily single it out by its large reconstruction error, shown in Fig. 4.1c, indicating a need to expand the parameter space. Thus far, the framework considered is fully unsupervised, with no direct correspondence between the intermediate features and the original parameters y1 and y2 . However, as we expect the output to evolve predictably with the sample parameters, we can supervise the VAE training by conditioning a subset of the latent features z to vary directly with y1 and y2 following the formulation of Zhao et al. [89], shown in Fig. 4.1e. This can make the latent

4.3 Variational Autoencoder

61

Fig. 4.1 Advantages of VAE-based parameter retrieval. (a) Toy model of a signature x generated by two parameters, y1 and y2 . The loss L between an arbitrary x and a target signature xtarget contains one global and one local minimum in (y1 , y2 )-space. (b) Schematic illustration of the VAE architecture. (c) Norms of the predicted versus true signatures of the test dataset, colored by the MSE between the associated xpred and xtrue . The signature xtarget (red point) generated by (y1 , y2 ) within the domain of training samples (yellow patch in (a)) is reconstructed with comparatively low MSE, while the signature xoutlier outside the domain (teal point) is an outlier. (d) Density plot of the loss L as a function of the 3D latent space z. A 2D slice through the z3 = 0 plane is shown at the right. The shaded disk and dashed circle denote one and three standard deviations away from the latent coordinate of the target signature, respectively. “×” denotes the minimum of the loss L. (e) Latent mapping of the test dataset colored by the true values of y1 (left) and y2 (right). Reproduced from Andrejevic et al. [1] with the permission of AIP Publishing

space more informative and interpretable by encouraging each selected dimension to organize according to a specific physical parameter, thereby facilitating quantitative prediction of their values.

4.3.1 VAE-Based PNR Parameter Retrieval We now introduce the VAE-based approach to recover the SLD profile of a candidate proximity-coupled system. To outline the approach, we consider the specific thin film system consisting of a Bi2 Se3 /EuS heterostructure atop a sapphire (α−Al2 O3 ) substrate with an amorphous a-Al2 O3 capping layer, as shown in Fig. 4.2a. Proximity-induced magnetism has been reported at the interface between the TI Bi2 Se3 and EuS, a ferromagnet with a Curie temperature of approximately 16.6 K [33, 90]. The reflectometry profile consists of two curves, R ++ and R −− , corresponding to the two neutron spin non-flip channels aligned parallel

Fig. 4.2 VAE with regression for PNR data analysis. (a) Schematic illustration of the proximity-coupled Bi2 Se3 /EuS system. A depiction of the spin-polarized neutron reflectometry experiment under an externally applied magnetic field is shown at right. (b) Reflectometry profile of the heterostructure in (a) measured at T = 5 K. Solid lines correspond to the best fit obtained using the GenX parameter refinement program. Error-bars represent ±1 standard deviation. (c) Nuclear, magnetic, and absorption SLD profiles corresponding to the fit in (b). (d) Schematic illustration of the VAE architecture used for PNR parameter retrieval. Reproduced from Andrejevic et al. [1] with the permission of AIP Publishing

62 4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron. . .

4.3 Variational Autoencoder

63

and antiparallel to an in-plane external magnetic field Hext = 1 T, respectively (Fig. 4.2b). In a typical parameter refinement program, the theoretical model is fit simultaneously to both spin channels to obtain the SLD profile of the sample; an example fit obtained with the GenX refinement program is shown in Fig. 4.2c. However, due to the phase ambiguity and large number of fitting parameters, different SLD profiles can produce excellent fits to the measured data. For instance, the best fit obtained in Fig. 4.2b corresponds to the SLD profile shown in Fig. 4.2c, which obscures proximity magnetism at the interface by instead proposing a high interface roughness of the EuS film. However, this contradicts the expected behavior corroborated by other measurements [33], which can be obtained by an alternate choice of initial parameters. Thus, the objective of a data-driven approach is to retrieve the optimal physical parameters of a target sample from its PNR profile with minimal user intervention and without the need for additional experiments. Moreover, it should compute a reliable SLD profile from learned sample parameters with minimal influence from common issues in iterative optimization algorithms, such as sensitivity to parameter initialization and stagnation. An additional outcome of the VAE approach is to inform the suitability of the entire parameter search space, not just the predicted parameters, through the reconstruction error. Like most fitting programs, the VAE treats the R ++ and R −− channels jointly using a convolutional neural network (CNN) encoder with a one-dimensional kernel (Fig. 4.2d). The convolutional and pooling layers are followed by a set of fullyconnected layers operating on the flattened CNN output, returning the predicted means μ and standard deviations σ of the normal distributions from which the latent vector z is sampled. When the latent representation is conditioned on the sample parameters, we can interpret each latent channel as effectively returning a distribution over one parameter’s value, illustrated schematically in Fig. 4.2d. The mean values of the latent distributions are fed to a simple regressor consisting of a single hidden and activation layer that predicts the physical parameter values. A ReLU activation is used to restrict the predicted values to be non-negative in accordance with the physical parameters. At the same time, the sampled vector z is passed to the decoder, which has a symmetric construction to the encoder and returns

the reconstructed profiles R ++ and R −− . All three networks—encoder, decoder, and regressor—are trained end-to-end by minimizing the total loss function, L = x − x 2 + βDKL (q(z|x)||p(z)) + λ

vj − vj 2 ,

(4.4)

j

where vj and vj denote the true and predicted values of the j th sample parameter, respectively, and λ is a hyperparameter weighting the contribution of parameter regression to the total loss.

64

4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron. . .

Fig. 4.3 Parameter ranges for Bi2 Se3 /EuS data generation. Distribution of parameter values for the density, thickness, interface roughness, and magnetization of each layer in the generated data. Distributions are split between samples with (solid) and without (hatched) an interfacial proximity layer. To maintain a similar total thickness for the TI layer in samples with and without proximity magnetism, the interfacial layer thickness is simply reallocated to the bulk TI if a zero is sampled for the proximity magnetism. Density and magnetization are sampled using their values in terms of formula units (middle column), which are compatible with the GenX simulation software. The corresponding values in conventional units are shown in the rightmost column (1 emu = 1 A·m2 ). Reproduced from Andrejevic et al. [1] with the permission of AIP Publishing

4.3.2 Data Preparation To generate the training and development datasets for the neural network model, we used the GenX neutron reflectivity modeling code [79] to simulate the PNR profiles of 100,000 candidate systems of the Bi2 Se3 /EuS heterostructure. For each example, the constituent layers are parameterized by their density, thickness, roughness, and magnetization, which are each sampled uniformly at random over a range of experimentally feasible values (Fig. 4.3). Importantly, these parameter ranges can be quite broad around the set of nominal parameter values, and can differ in size for different quantities depending on their level of uncertainty. For example, the parameter ranges for the amorphous capping layer are intentionally broader compared to the TI and FM layer thicknesses that are carefully controlled during growth. Density and magnetization are treated using their values in terms of formula units, which are compatible with the GenX simulation software; however, the final results are converted to conventional units before plotting. The proximity effect is modeled as a thin interfacial layer between the Bi2 Se3 and EuS films with a sampled thickness, roughness, and magnetization, and sharing the density value of the

4.3 Variational Autoencoder

65

neighboring TI film. The proximity layer magnetization is constrained to not exceed the sampled value of the EuS magnetization for any given example. The minimum possible thickness and magnetization of this layer represent a particular resolution threshold of the trained model and are taken to be 6 Å and 28 electromagnetic units (emu) cm−3 (0.6 μB per formula unit), respectively (1 emu = 1 A·m2 ). These thresholds are not universal and are chosen simply to be slightly lower than the thickness of one quintuple layer of Bi2 Se3 (∼9.6 Å) and about a tenth of the nominal magnetization of ferromagnetic EuS (6 μB per formula unit), which are accessible quantities found in literature. Fifty percent of all examples are generated with this interfacial layer and are designated as proximity-coupled examples. The PNR profiles are simulated over the experimentally-accessible Q-range from 0.1 to 1.3 nm−1 , and the intensities normalized to a maximum value of 1. To simulate experimental noise, the generated PNR profiles are randomly perturbed at each Q point by sampling a normal distribution with standard deviation equal to the corresponding errorbar of the experimental reflectometry profile of the same system. Additionally, the instrument resolution and background are sampled uniformly at random between 0.001 and 0.01 nm−1 , and 10−8 and 10−4 on a logarithmic scale, respectively. Since the reflectivity spans nearly eight decades, the base-10 logarithm of the profiles is used as input (output) of the encoder (decoder) to more equitably treat the intensity values. To similarly place the sample parameters on equal footing, the output of the regressor is taken to be the standardized values of the physical parameters. The regressor is trained to predict the density and thickness of each constituent layer, as well as the magnetization of the magnetic and proximity layers. Note that the substrate thickness is not a fitting parameter as it is taken to be macroscopically thick. Although interface roughness is varied in the generated data, due to the potential trade-off between the number and accuracy of predicted parameters, the regressor model was not trained to directly determine interface roughnesses in order to capture the more interesting proximity effect more accurately. However, the nuanced changes induced by different roughnesses can still be captured by the free latent dimensions. They can thus be regarded as an underlying degree of freedom, similar to the treatment of instrument resolution and background, which may be more complex functions of the latent space. The freedom to choose the number of output quantities, even as the training data reflect variations in the full set of sample parameters, is one advantage of the machine learning-based approach: It allows one to output only the most relevant quantities, reducing the needed training data volume and neural network size. Additionally, machine learning makes it possible to seek hidden relationships between the data and parameters that may not be captured in an approximate theoretical model. Finally, the generated data are subdivided into training, validation, and test sets according to a 70/10/20% split, with approximately 50% of the examples in each dataset exhibiting proximity magnetism. The data generation, model implementation, and analysis codes are provided in the associated GitHub repository [69].

66

4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron. . .

a

d

e

b

c

!

Fig. 4.4 Training history and reconstruction accuracy for the Bi2 Se3 /EuS system. Trajectories of the (a) total loss (b) mean squared error (MSE) between the decoded and true PNR profiles, and (c) MSE between predicted and true sample parameters as functions of training epochs. (d) Representative reconstructions of the test dataset within each error quartile, sorted from best (top row) to worst (bottom row). Predicted (true) reflectivities are plotted as light (bold) curves for each channel: R ++ (blue) and R −− (red). The corresponding reconstruction error (MSE) is indicated in the top right corner of each subplot. (e) Distribution of reconstruction MSE values for the test dataset. Note the inverted y-axis of the plot, with the best performing quartile located at the top of the distribution. Quartiles are indicated and separated by dashed gray lines. Plotted points represent the reconstruction errors of the four experimental PNR profiles, colored by the corresponding measurement temperature. Reproduced from Andrejevic et al. [1] with the permission of AIP Publishing

4.3.3 Results The performance of the VAE is first evaluated on the test dataset of simulated PNR profiles. The average loss trajectories of the training and validation sets for ten models with different initial weights are shown in Fig. 4.4a–c, indicating reproducible convergence to a low mean squared error (MSE). In Fig. 4.4d, we plot representative reconstructions of the test dataset in each error quartile for one of these models. In many cases, the true and predicted profiles appear nearly indistinguishable for both R ++ and R −− channels. The distribution of reconstruction MSE values for the test dataset, shown in Fig. 4.4e, appears heavily peaked between the first and second quartiles. We also apply the trained VAE to four experimental PNR profiles corresponding to four different temperatures, 5, 50, 75 and 300 K, which are reproduced from Katmis et al. [33]. The four solid points in Fig. 4.4e indicate the reconstruction MSEs of the decoded experimental profiles. Notably, the decoded profiles are all found to be inliers of the distribution of reconstruction errors, namely, x − x 2 . This suggests that the chosen parameter ranges are likely suitable for the data under consideration.

4.3 Variational Autoencoder

67

Fig. 4.5 Regressor performance and predictions on experimental data. (a) Histograms of predicted versus true values for different sample parameters of the test dataset. (b) The predictions of proximity layer thickness tprox and magnetization mprox , and FM layer magnetization mFM obtained from 10 instances of the VAE trained with different initial weights, shown as a function of the measurement temperature of the corresponding experiment. Gray dashed lines indicate the optimal threshold obtained for proximity classification. Scattered points above (below) the determined threshold are colored yellow (blue). Violin plots of the predicted values for experiments with majority predictions above (below) the threshold are shaded yellow (blue). Reproduced from Andrejevic et al. [1] with the permission of AIP Publishing

We next evaluate the trained VAE on its ability to recover the sample parameters from the simulated and experimental PNR profiles of the Bi2 Se3 /EuS system. In each subplot of Fig. 4.5a, the test data points are histogrammed according to the true and predicted values of a given sample parameter, such as the density, thickness, and magnetization of each thin film in the heterostructure. The values of the bulk layer properties appear very well reproduced by the regressor, while tprox and mprox , which exhibit much weaker signatures and tend to be expressed most in the noisier, high-Q region of the PNR profiles, are somewhat underestimated for proximitycoupled samples, or overestimated in the case of samples modeled without this effect. Note that the sharp discontinuities in the tprox and mprox histograms correspond to the resolution thresholds of 6 Å and 28 emu cm−3 , respectively, which are imposed on the generated data. To assess the reproducibility of the regression results, we trained ten identical models with different initial weights and collected statistics of the resulting predictions. Figure 4.5b shows the predictions of these models for the values of tprox , mprox , and EuS magnetization mFM . We can optimize the trade-off between the true (tpr) and false (fpr) positive rates of correctly classifying proximity magnetism to obtain the classification thresholds of thickness and magnetization that best separate the data points between the two classes. Specifically, by minimizing |tpr − (1 − fpr)| of the validation set for each model,

68

4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron. . .

we determine the average classification thresholds of thickness and magnetization, indicated by the dashed gray lines in Fig. 4.5b. These thresholds are found to be 7.7 Å and 42 emu cm−3 , both corresponding to a true positive rate—or recall— of 70–71% in both classes. The optimal thresholds are found to be only slightly higher than the actual resolution thresholds of the generated data. While a small spread in the predicted values across the ten models is observed, the overall trend in the predictions is as expected. Notably, all ten models predict tprox and mprox values close to or above their respective thresholds at 5 K; seven models predict both tprox and mprox above their respective thresholds, and nine predict mprox above the threshold magnetization. Both tprox and mprox values decay to 0 at 300 K, and similarly mFM drops rapidly beyond its Curie temperature and vanishes at 300 K. We note that the predicted values of tprox and mprox at intermediate temperatures are sometimes nonzero. This may be attributable to strong magnetic fluctuations above the EuS Curie temperature stabilizing a weak proximity effect below the resolution threshold of our model [58, 91]. Note that if weak proximity magnetism persists at high temperatures, it must be below the resolution threshold of our current model. A tailored network trained on a narrow range of parameters can potentially be devised to clarify even weaker signatures of proximity magnetism that may be expected at higher temperatures; however, the current model is highly suitable for surveying the evolution of proximity magnetism over a broad experimental parameter space, such as a wide temperature range. Having both evaluated the model performance on the simulated data and verified the suitability of the chosen parameter space, we now analyze the model predictions on the four experimental profiles. Figure 4.6a shows the reconstructed reflectometry profiles for both channels for the measurements at 5 K. The right panel shows the spin asymmetry (R ++ − R −− )/(R ++ + R −− ) calculated for both the measured and decoded profiles of the four experimental reflectivities at 5, 50, 75 and 300 K. Next, using the parameter values predicted by the regressor, we calculate the SLD profiles for the measurements at each temperature (Fig. 4.6b). Note that the SLD profiles are computed directly using predicted parameter values and are not derived from the reconstructed PNR profiles. Additionally, since the interface roughnesses are considered in the training data but not directly predicted by the regressor, their values remain undetermined and are uniformly set to 1 Å for producing the plots in Fig. 4.6b. The nuclear (NSLD) and absorption (ASLD) scattering length density profiles appear largely consistent for the measurements at different temperatures, suggesting that the predicted values of the temperature-independent parameters, such as the thickness and density of each layer, are physically plausible. However, we do observe some fluctuation of the NSLD at the bulk TI and FM layers that is worth mentioning. By examining the underlying parameters generating each SLD profile, we identify that the bulk TI thickness gradually increases with temperature; however, this is accounted for by the corresponding decrease in the proximity layer thickness, maintaining an approximately uniform thickness of the total TI layer. By contrast, the fluctuation of the NSLD at the FM layer is due to slight changes of both the FM layer density and thickness as a function of temperature. At this stage, the exact origin of this temperature dependence is not well understood, but

4.3 Variational Autoencoder

69

Fig. 4.6 VAE performance on Bi2 Se3 /EuS. (a) Decoded PNR profile of the Bi2 Se3 /EuS heterostructure measured at 5 K (left). The spin asymmetry calculated from the data (points) and decoded profiles (solid lines) of PNR measurements taken at four different temperatures (right). Error-bars represent ±1 standard deviation. (b) Nuclear (NSLD), magnetic (MSLD), and absorption (ASLD) scattering length density profiles obtained from the regressor predictions for the measurements in (a). Roughness values are uniformly set to 1 Å for plotting. (c) Projections of the latent encoding of the test dataset along the latent dimensions with the largest gradients for different sample parameters (density ρ, thickness t, magnetization m) and proximity classification, colored by their true values. Outlined points show the latent encoding of the four experimental measurements from (a). (d) Parameter entanglement inferred from gradients along each latent channel. Reproduced from Andrejevic et al. [1] with the permission of AIP Publishing

a possibility is that higher temperatures contribute to a more substantial blurring of the interface between the FM and capping layers than can be accounted by the interface roughness considered in our model. This would tend to slightly decrease (increase) the density of the FM (capping) layers. Likewise, the FM and capping layer thicknesses appear to evolve in a complementary manner that may help explain the NSLD fluctuation. The magnetic scattering length density (MSLD) profile is maximal at the EuS layer and exhibits a slight shoulder near the TI interface at 5 K, corresponding to the proximity layer in the TI. The MSLD magnitude drops progressively as the temperature is increased and disappears entirely beyond 75 K. These observations can be further traced to the latent representations of the four experimental examples. In Fig. 4.6c, we visualize the latent space by projecting the encoded test dataset along the two dimensions with the largest local gradients for a given parameter value, e.g. substrate density ρsub . Specifically, the horizontal and vertical axes of each subplot correspond to the latent dimensions with the largest

70

4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron. . .

and second largest gradient of the target parameter, respectively. The local gradients ∂vi /∂zj , where vi denotes the i-th parameter and zj denotes the j -th latent channel, are estimated using the 32 nearest-neighbors of each scattered point. The scattered points, each corresponding to one profile of the test dataset, are colored according to the true value of the parameter viewed in each subplot. We find that the latent space is well-organized according to these parameter values, including the thickness tprox and magnetization mprox of the proximity layer. The last subplot of Fig. 4.6c shows the true classification classprox of each encoded test profile according to whether or not it exhibits proximity magnetism (i.e. whether it was generated with a nonzero proximity layer thickness and magnetization or not). The latent representations of the four experimental PNR profiles are indicated by the outlined circles in the projection plots and are colored by the corresponding temperature. Notably, for temperature-independent quantities such as the TI density ρTI , the experimental points at different temperatures are generally insensitive to the gradient direction of the underlying parameter value, while for those like tprox and mprox , the points at different temperatures follow the gradient direction of the parameter values closely. Notable exceptions are ρFM and ρcap , which lie partially along the gradient direction as explained by the slight NSLD fluctuation. This corroborates our observations that the trained VAE learns a sensible and interpretable latent representation of PNR profiles from which the physical parameter values may be estimated. The degree of parameter entanglement can be inferred from Fig. 4.6d, which shows the magnitudes of the average gradients of each sample parameter with respect to each latent channel. The labeled latent dimensions are conditioned to vary with the values of the corresponding parameter, while those labeled “free” are not linked to physical parameters, i.e. are regularized by the conventional standard normal prior distribution.

4.4 Resolving Interfacial AFM Coupling Lastly, we apply our approach to elucidate proximity magnetism from a more challenging PNR profile of an intrinsic TI (Bi,Sb)2 Te3 interfaced with the AFM Cr2 O3 , shown schematically in Fig. 4.7a. At the interface between a TI and AFM, magnetic atoms on the AFM surface have been shown to induce interfacial ferromagnetic order in the TI which can survive at much higher temperatures than that produced by doping or interfacing with a FM film, owing to the typically higher Néel temperatures [34, 37, 38, 42, 92, 93]. However, magnetic proximity coupling between an AFM and TI is comparatively weaker and thereby more challenging to isolate experimentally. Figure 4.7b shows the experimental PNR profile of the intrinsic (Bi,Sb)2 Te3 /Cr2 O3 system measured at the Magnetism Reflectometer at the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory [94]. Data were collected at a temperature of 5 K with an in-plane external magnetic field of 1 T. The TI has a nominal composition of (Bi0.2 Sb0.8 )2 Te3 . Subtle evidence of spin splitting is observed in the associated plot of the spin asymmetry, (R ++ − R −− )/(R ++ +

4.4 Resolving Interfacial AFM Coupling

71

Fig. 4.7 Proximity magnetism in the TI-AFM system. (a) Schematic illustration of the (Bi,Sb)2 Te3 /Cr2 O3 system. (b) Experimental PNR profile at 5 K and best fit obtained with the GenX program. An expanded view of the splitting between the two channels at high Q is shown in the inset. Error-bars represent ±1 standard deviation. (c) The spin asymmetry calculated from the measured (points) and best fit (solid line) profiles. (d) Projections of the latent encoding of the test dataset along the latent dimensions with the largest gradients for the thickness and magnetization of the interfacial AFM and TI layers. Red points show the latent encoding of the PNR profile in (b). (e) Decoded profiles of the PNR measurement in (b). An expanded view of the profiles at high Q is shown in the inset. (f) SLD profile obtained using the regressor predictions. Roughnesses are uniformly set to 1 Å for plotting. (g) The predictions of interfacial AFM layer magnetization, and proximity layer thickness and magnetization obtained from 10 instances of the VAE trained with different initial weights. Gray dashed lines indicate the optimal threshold for proximity classification. Scattered points above (below) the determined threshold are colored yellow (blue). Reproduced from Andrejevic et al. [1] with the permission of AIP Publishing

R −− ), shown in Fig. 4.7c. The experimental data in Fig. 4.7b,c are superimposed with the corresponding best fit obtained using the GenX parameter refinement program [79]. However, a major challenge encountered during conventional fitting is that repeated refinement with different initial populations failed to reproducibly predict the proximity effect, as the weak spin splitting observed in the PNR profiles could be attributed to either a small net magnetization at the AFM surface, or to proximity magnetism in the interfacial TI layer. To address this challenge, we train our VAE on a set of synthetic PNR profiles of the heterostructure shown in Fig. 4.7a. Similar to the case of Bi2 Se3 /EuS, we simulate the PNR profiles of 100,000 candidate systems parameterized by the density, thickness, and roughness of each layer, which included the TI, AFM, sapphire substrate, Te capping layer, and a possible TeO2 surface film. In a subset of these examples, we model the presence of an interfacial FM layer on either the AFM or TI surface, or both, parameterized by thickness, roughness, and magnetization, and sharing the density value of the corresponding bulk layer. In this case, due to the weaker nature of the possible proximity effect, the resolution thresholds for the thickness and magnetization of the interfacial layer are chosen to be 2 Å and 8 emu cm−3 (or

72

4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron. . .

0.2 μB per formula unit), respectively, allowing the trained model to resolve lower values of magnetization and thickness. Fifty percent of all generated examples are modeled with this interfacial FM layer on the TI surface and are designated as proximity-coupled. The PNR profiles are simulated over the experimentallymeasured Q-range from 0.1 to 1.72 nm−1 and normalized to a maximum value of 1. The instrument background is sampled uniformly at random between 10−10 and 10−4 on a logarithmic scale. The remaining data preparation steps are conducted as for the Bi2 Se3 /EuS example. The trained VAE yields a latent representation for the test dataset and experimental example as shown in Fig. 4.7d. The points in each subplot are colored according to the true values of the thickness or magnetization of the interfacial FM layer on either the AFM surface, denoted tiAFM and miAFM , or the TI surface, denoted tprox and mprox , respectively. We note that the experimental PNR profile is mapped unambiguously to a region with tiAFM = 0 and miAFM = 0, while tprox and mprox are predicted to be approximately 7 Å and 20 emu cm−3 , respectively. The reconstructed PNR profile, shown in Fig. 4.7e, matches the experimental data closely and is found to be in the top reconstruction error quartile. The corresponding SLD profile obtained using the predicted sample parameters is shown in Fig. 4.7f, where the interfacial FM layer is evidenced by the peak in the MSLD profile at the TI surface. To test the robustness of these predictions, we train another ten identical models with different initial weights. Figure 4.7g shows the predictions of the ten models for the values of miAFM = 0, tprox , and mprox . The gray dashed lines delimiting proximity-coupled examples are similarly obtained by optimizing the trade-off between the true and false positive rates of correctly classifying proximity magnetism in the validation set for each model. The average threshold values are found at 4.5 Å and 14 emu cm−3 , which correspond to true positive rates of 73 and 74% in both classes, respectively. While only half of the models predict proximity magnetism is present using these thresholds, all models unambiguously predict miAFM = 0. Thus, although the VAE approach pushes the boundary for resolving subtle magnetic signatures, it is possible that a very weak proximity effect is present at or slightly below the resolution threshold we could achieve with the current model. Nonetheless, the predictions could be used as a valuable screening tool before conducting finer measurements with either longer acquisition times or at higher Q to more clearly resolve the spin splitting, which could potentially benefit experimental planning and optimize the use of scientific user facilities.

4.5 Discussion Machine learning methods are valuable means of recovering hidden patterns in materials data and elucidating the relationships between structural descriptors and measured quantities. However, it is often desirable to balance the flexibility of “black box” neural networks with a degree of interpretability in terms of physical parameters. We accomplish this in our VAE-based framework by conditioning the latent channels to emulate the behavior of the original sample parameters,

4.6 Conclusion

73

which enables direct, visual inspection of encoded profiles in terms of meaningful physical quantities. However, our assessment of parameter entanglement in Fig. 4.6d reveals underlying correlations between sample parameters; for example, proximity layer thickness and magnetization are deeply entwined, since finite proximity magnetization implies finite thickness of the interfacial layer, and vice versa. This suggests that the prior assumption that all latent dimensions are sampled from independent normal distributions does not perfectly describe a latent space that is conditioned to vary directly with certain correlated physical parameters. A possible improvement to the existing approach would be to describe the latent space in terms of several joint distributions of a few strongly-correlated parameters, which can be tuned to balance the number of necessary network parameters. We note on a few additional considerations for future work. In particular, density fluctuations may be present in certain samples and can require fitting a number of distinct sub-layers of each material. Additionally, although variations in interface roughness are taken into account in the training data, roughness values are not predicted by the regressor directly; however, quantifying the interface roughnesses may also be relevant for certain applications. These and other specialized features can be readily integrated into the framework presented in this work.

4.6 Conclusion A quantitative understanding of structural and magnetic information encoded in PNR measurements is often critical to resolving important interfacial phenomena, but experimental factors and lack of adequate fitting constraints can impede parameter retrieval without expert insight. In this chapter, we constructed a datadriven framework for PNR parameter retrieval by training a conditioned VAE to map reflectometry profiles with moderate experimental noise to a well-organized, low-dimensional space from which sample parameters can be readily obtained. We balance the flexibility and interpretability of our model through latent space engineering, enabling in-depth analysis of the resulting predictions. Compared to traditional fitting methods, our framework involves minimal user intervention overall, requiring no expert insight for parameter initialization or refinement, yet is capable of resolving parameter values near the experimental resolution limit. It further enables evaluation of the entire parameter search space by readily identifying outliers of the chosen domain. A possible extension of the framework is suggested to account for intrinsic correlations between conditioning variables. We apply our method to recover the SLD profiles of two proximity-coupled systems at subnanometer resolution, and we envision its potential application to a broader context of elusive phases expressed through weak experimental signatures, such as the axion insulator and topological superconducting phases. We anticipate that the methodology developed in this work can facilitate the development of comprehensive and fully-automated analysis routines for PNR parameter retrieval

74

4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron. . .

of a broad range of materials systems, as well as inform the wide spectrum of spectroscopic analysis workflows requiring parameter refinement.

References 1. Andrejevic, N., et al. (2022). Elucidating proximity magnetism through polarized neutron reflectometry and machine learning. Applied Physics Reviews, 9, 011421. 2. Fitzsimmons, M. R., et al. (2004). Neutron scattering studies of nanomagnetism and artificially structured materials. Journal of Magnetism and Magnetic Materials, 271, 103–146. 3. Lauter-Pasyuk, V. (2007). Neutron grazing incidence techniques for nano-science. Collection SFN, 7, s221–s240. 4. Daillant, J., & Gibaud, A. X-ray and neutron reflectivity: principles and applications (Springer, 2008). 5. Hoffmann, A., et al. (2005). Suppressed magnetization in La0.7 Ca0.3 MnO3 /YBa2 Cu3 O7−δ superlattices. Physical Review B, 72, 140407. 6. Fitzsimmons, M., et al. (2007). Pinned magnetization in the antiferromagnet and ferromagnet of an exchange bias system. Physical Review B, 75, 214412. 7. Fitzsimmons, M. R., et al. (2011). Upper limit to magnetism in LaAlO3 /SrTiO3 heterostructures. Physical Review Letters, 107, 217201. 8. Bennett, S., et al. (2016). Giant controllable magnetization changes induced by structural phase transitions in a metamagnetic artificial multiferroic. Scientific Reports, 6, 1–7. 9. Gilbert, D. A., et al. (2016). Structural and magnetic depth profiles of magneto-ionic heterostructures beyond the interface limit. Nature Communications, 7, 1–8. 10. Gilbert, D. A., et al. (2015). Realization of ground-state artificial skyrmion lattices at room temperature. Nature Communications, 6, 1–7. 11. Fan, Y., et al. (2020). Manipulation of coupling and magnon transport in magnetic metalinsulator hybrid structures. Physical Review Applied, 13, 061002. 12. Theis-Bröhl, K., et al. (2020). Self-assembly of magnetic nanoparticles in ferrofluids on different templates investigated by neutron reflectometry. Nanomaterials, 10, 1231. 13. Need, R. F., et al. (2020). Magnetic properties and electronic origin of the interface between dilute magnetic semiconductors with orthogonal magnetic anisotropy. Physical Review Materials, 4, 054410. 14. Keunecke, M., et al. (2020). High-TC interfacial ferromagnetism in SrMnO3 /LaMnO3 superlattices. Advanced Functional Materials, 30, 1808270. 15. Liu, C., et al. (2021). Ferroelectric self-polarization controlled magnetic stratification and magnetic coupling in ultrathin La0.67 Sr0.33 MnO3 films. ACS Applied Materials & Interfaces, 13, 30137. 16. Bhatnagar-Schöffmann, T., et al. (2021). Differentiation between strain and charge mediated magnetoelectric coupling in La0.7 Sr0.3 MnO3 /Pb(Mg1/3 Nb2/3 )0.7 Ti0.3 O3 (001). New Journal of Physics, 23, 063043. 17. Wang M., et al. (2021). Optically induced static magnetization in metal halide perovskite for spin-related optoelectronics. Advanced Science, 8, 2004488. 18. Majkrzak, C. (1991). Polarized neutron reflectometry. Physica B: Condensed Matter, 173, 75– 88. 19. Blundell, S., et al. (1995). Spin-orientation dependence in neutron reflection from a single magnetic film. Physical Review B, 51, 9395. 20. Ankner, J., & Felcher, G. (1999). Polarized-neutron reflectometry. Journal of Magnetism and Magnetic Materials, 200, 741–754. 21. Majkrzak, C., O’Donovan, K., & Berk, N. (2006). Neutron Scattering from Magnetic Materials (pp. 397–471). Elsevier.

References

75

22. Toperverg, B. P. (2015). Polarized neutron reflectometry of magnetic nanostructures. The Physics of Metals and Metallography, 116, 1337–1375. 23. Nichols, J., et al. (2016). Emerging magnetism and anomalous Hall effect in iridate– manganite heterostructures. Nature Communications, 7, 1–6. 24. Zhan, X., et al. (2019). Probing the transfer of the exchange bias effect by polarized neutron reflectometry. Scientific Reports, 9, 1–9. 25. Inyang, O., et al. (2019). Threshold interface magnetization required to induce magnetic proximity effect. Physical Review B, 100, 174418. 26. Bhattacharyya, S., et al. (2021). Recent progress in proximity coupling of magnetism to topological insulators. Advanced Materials, 33, 2007795. 27. Vobornik, I., et al. (2011). Magnetic proximity effect as a pathway to spintronic applications of topological insulators. Nano Letters, 11, 4079–4082. 28. Eremeev, S., Men’Shov, V., Tugushev, V., Echenique, P. M., & Chulkov, E. V. (2013). Magnetic proximity effect at the three-dimensional topological insulator/magnetic insulator interface. Physical Review B, 88, 144430. 29. Lang, M., et al. (2014). Proximity induced high-temperature magnetic order in topological insulator-ferrimagnetic insulator heterostructure. Nano Letters, 14, 3459–3465. 30. Lee, A. T., Han, M. J., & Park, K. (2014). Magnetic proximity effect and spin-orbital texture at the Bi2 Se3 /EuS interface. Physical Review B, 90, 155103. 31. Li, M., et al. (2015). Magnetic proximity effect and interlayer exchange coupling of ferromagnetic/topological insulator/ferromagnetic trilayer. Physical Review B, 91, 014427. 32. Liu, W., et al. (2015). Enhancing magnetic ordering in Cr-doped Bi2 Se3 using high-TC ferrimagnetic insulator. Nano Letters, 15, 764–769. 33. Katmis, F., et al. (2016). A high-temperature ferromagnetic topological insulating phase by proximity coupling. Nature, 533, 513–516. 34. He, Q. L., et al. (2017). Tailoring exchange couplings in magnetic topologicalinsulator/antiferromagnet heterostructures. Nature Materials, 16, 94–100. 35. Che, X., et al. (2018). Proximity-induced magnetic order in a transferred topological insulator thin film on a magnetic insulator. ACS Nano, 12, 5042–5050. 36. Koren, G. (2018). Magnetic proximity effect of a topological insulator and a ferromagnet in thin-film bilayers of Bi0.5 Sb15 Te3 and SrRuO3 . Physical Review B, 97, 054405. 37. He, Q. L., et al. (2018). Exchange-biasing topological charges by antiferromagnetism. Nature Communications, 9, 1–8. 38. He, Q. L., et al. (2018). Topological transitions induced by antiferromagnetism in a thin-film topological insulator. Physical Review Letters, 121, 096802. 39. Hou, Y., Kim, J., & Wu, R. (2019). Magnetizing topological surface states of Bi2 Se3 with a CrI3 monolayer. Science Advances, 5, eaaw1874. 40. Akiyama, R., et al. (2019). Direct probe of ferromagnetic proximity effect at the interface in Fe/SnTe heterostructure by polarized neutron reflectometry. Preprint. arXiv:1910.10540. 41. Watanabe, R., et al. (2019). Quantum anomalous Hall effect driven by magnetic proximity coupling in all-telluride based heterostructure. Applied Physics Letters, 115, 102403. 42. Pan, L., et al. (2020). Observation of quantum anomalous hall effect and exchange interaction in topological insulator/antiferromagnet heterostructure. Advanced Materials, 32, 2001460. 43. Li, M., et al. (2015). Proximity-driven enhanced magnetic order at ferromagnetic-insulator– magnetic-topological-insulator interface. Physical Review Letters, 115, 087201. 44. Tokura, Y., Yasuda, K., & Tsukazaki, A. (2019). Magnetic topological insulators. Nature Reviews Physics, 1, 126–143. 45. Yu, R., et al. (2010). Quantized anomalous Hall effect in magnetic topological insulators. Science, 329, 61–64. 46. Kou, X., et al. (2014). Scale-invariant quantum anomalous Hall effect in magnetic topological insulators beyond the two-dimensional limit. Physical Review Letters, 113, 137201. 47. Kou, X., Fan, Y., Lang, M., Upadhyaya, P., & Wang, K. L. (2015). Magnetic topological insulators and quantum anomalous hall effect. Solid State Communications, 215, 34–53.

76

4 Machine Learning-Assisted Parameter Retrieval from Polarized Neutron. . .

48. Mogi, M., et al. (2019). Large anomalous Hall effect in topological insulators with proximitized ferromagnetic insulators. Physical Review Letters, 123, 016804. 49. Mogi, M., et al. (2017). A magnetic heterostructure of topological insulators as a candidate for an axion insulator. Nature Materials, 16, 516–521. 50. Mogi, M., et al. (2017). Tailoring tricolor structure of magnetic topological insulator for robust axion insulator. Science Advances, 3, eaao1669. 51. Xiao, D., et al. (2018). Realization of the axion insulator state in quantum anomalous Hall sandwich heterostructures. Physical Review Letters, 120, 056801. 52. Liang, X., et al. (2017). The magnetic proximity effect and electrical field tunable valley degeneracy in MoS2 /EuS van der Waals heterojunctions. Nanoscale, 9, 9502–9509. 53. Karpiak, B., et al. (2019). Magnetic proximity in a van der Waals heterostructure of magnetic insulator and graphene. 2D Materials, 7, 015026. 54. Tong Q., Chen, M., & Yao, W. (2019). Magnetic proximity effect in a van der Waals moiré superlattice. Physical Review Applied, 12, 024031. 55. Behera, S. K., Bora, M., Chowdhury S. S. P., & Deb, P. (2019). Proximity effects in graphene and ferromagnetic CrBr3 van der waals heterostructures. Physical Chemistry Chemical Physics, 21, 25788–25796. 56. Island, J., et al. (2019). Spin–orbit-driven band inversion in bilayer graphene by the van der Waals proximity effect. Nature, 571, 85–89. 57. Zollner, K., Junior, P E. F., & Fabian, J. (2019). Proximity exchange effects in MoSe2 and WSe2 heterostructures with CrI3 : Twist angle, layer, and gate dependence. Physical Review B, 100, 085128. 58. Huang, B., et al. (2020). Emergent phenomena and proximity effects in two-dimensional magnets and heterostructures. Nature Materials, 19, 1276–1289. 59. Ciorciaro, L., Kroner, M., Watanabe, K., Taniguchi, T., & Imamoglu, A. (2020). Observation of magnetic proximity effect using resonant optical spectroscopy of an electrically tunable MoSe2 /CrBr3 heterostructure. Physical Review Letters, 124, 197401. 60. Zhao, W., et al. (2020). Magnetic proximity and nonreciprocal current switching in a monolayer WTe2 helical edge. Nature Materials, 19, 503–507. 61. Tang C., Zhang, Z., Lai, S., Tan, Q., & Gao, W. b. (2020). Magnetic proximity effect in graphene/CrBr3 van der Waals heterostructures. Advanced Materials, 32, 1908498. 62. Zhong, D., et al. (2020). Layer-resolved magnetic proximity effect in van der Waals heterostructures. Nature Nanotechnology, 15, 187–191. 63. Dayen, J. F., Ray S. J., Karis, O., Vera-Marun, I. J., & Kamalakar, M. V. (2020). Twodimensional van der Waals spinterfaces and magnetic-interfaces. Applied Physics Reviews, 7, 011303. 64. Zhang, Y., et al. (2020). Controllable magnetic proximity effect and charge transfer in 2D semiconductor and double-layered perovskite manganese oxide van der Waals heterostructure. Advanced Materials, 32, 2003501. 65. Zhang, L., et al. (2020). Proximity-coupling-induced significant enhancement of coercive field and curie temperature in 2D van der Waals Heterostructures. Advanced Materials, 32, 2002032. 66. Zou, R., et al. (2020). Intrinsic quantum anomalous Hall phase induced by proximity in the van der Waals heterostructure germanene/Cr2 Ge2 Te6 . Physical Review B, 101, 161108. 67. Bora, M., & Deb, P. (2021). Magnetic proximity effect in two-dimensional van der Waals heterostructure. Journal of Physics: Materials, 4, 034014. 68. Liu, N., et al. (2021). Antiferromagnetic proximity coupling between semiconductor quantum emitters in WSe2 and van der Waals ferromagnets. Nanoscale, 13, 832–841. 69. Andrejevic, N. (2021). Repository for machine learning-assisted analysis of polarized neutron reflectometry measurements. https://github.com/ninarina12/ML_PNR 70. Parratt, L. G. (1954). Surface studies of solids by total reflection of X-rays. Physical Review, 95, 359. 71. Abelès, F. (1948). Sur la propagation des ondes électromagnétiques dans les milieux sratifiés. Annales de Physique, 12, 504–520.

References

77

72. Sivia, D., Hamilton, W., Smith, G., Rieker, T., & Pynn, R. (1991). A novel experimental procedure for removing ambiguity from the interpretation of neutron and x-ray reflectivity measurements: “Speckle holography”. Journal of Applied Physics, 70, 732–738. 73. De Haan, V. O., Van Well, A., Adenwalla, S., & Felcher, G. (1995). Retrieval of phase information in neutron reflectometry. Physical Review B, 52, 10831. 74. Pleshanov, N. (1999). Polarized neutron reflectometry with phase analysis. Physica B: Condensed Matter, 269, 79–94. 75. O’Donovan, K., Borchers, J., Majkrzak, C., Hellwig, O., & Fullerton, E. (2002). Pinpointing chiral structures with front-back polarized neutron reflectometry Physical Review Letters, 88, 067201. 76. Durant, J. H., Wilkins, L., & Cooper, J. F. (2021). Optimising experimental design in neutron reflectometry. Preprint. arXiv:2108.05605. 77. Sivia, D., & Webster, J. (1998). The Bayesian approach to reflectivity data. Physica B: Condensed Matter, 248, 327–337. 78. McCluskey A. R., Cooper, J. F., Arnold, T., & Snow, T. (2020). A general approach to maximise information density in neutron reflectometry analysis. Machine Learning: Science and Technology, 1, 035002. 79. Björck, M., & Andersson, G. (2007). GenX: an extensible X-ray reflectivity refinement program utilizing differential evolution. Journal of Applied Crystallography, 40, 1174–1178. 80. Maranville, B. B. (2017). Interactive, web-based calculator of neutron and X-ray reflectivity Journal of Research of the National Institute of Standards and Technology, 122, 1. 81. Danauskas, S. M., Li, D., Meron, M., Lin, B., & Lee, K. Y. C. (2008). Stochastic fitting of specular X-ray reflectivity data using StochFit. Journal of Applied Crystallography, 41, 1187– 1193. 82. Greco, A., et al. (2019). Fast fitting of reflectivity data of growing thin films using neural networks. Journal of Applied Crystallography, 52, 1342–1347. 83. Mironov, D., Durant, J. H., Mackenzie, R., & Cooper, J. F. (2021). Towards automated analysis for neutron reflectivity. Machine Learning: Science and Technology, 2, 035006. 84. Loaiza, J. M. C., & Raza, Z. (2021). Towards reflectivity profile inversion through artificial neural networks. Machine Learning: Science and Technology, 2, 025034. 85. Doucet, M., Archibald, R. K., & Heller, W. T. (2021). Machine learning for neutron reflectometry data analysis of two-layer thin films. Machine Learning: Science and Technology, 2, 035001. 86. Aoki, H., Liu, Y., & Yamashita, T. (2021). Deep learning approach for an interface structure analysis with a large statistical noise in neutron reflectometry. Scientific Reports, 11, 1–9. 87. Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. Preprint. arXiv:1312.6114. 88. Higgins, I., et al. (2017). VAE: Learning basic visual concepts with a constrained variational framework, in Proc. ICLR. 89. Zhao, Q., Adeli, E., Honnorat, N., Leng, T., & Pohl, K. M. (2019). Variational autoencoder for regression: Application to brain aging analysis, in International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 823–831). 90. Lee, C., Katmis, F., Jarillo-Herrero, P., Moodera, J. S., & Gedik, N. (2016). Direct measurement of proximity-induced magnetism at the interface between a topological insulator and a ferromagnet. Nature Communications, 7, 1–6. 91. Nogueira, F. S., & Eremin, I. (2012). Fluctuation-induced magnetization dynamics and criticality at the interface of a topological insulator with a magnetically ordered layer. Physical Review Letters, 109, 237203. 92. Luo, W., & Qi, X. L. (2013). Massive Dirac surface states in topological insulator/magnetic insulator heterostructures. Physical Review B, 87, 085431. 93. Wang F., et al. (2019). Observation of interfacial antiferromagnetic coupling between magnetic topological insulator and antiferromagnetic insulator. Nano Letters, 19, 2945–2952. 94. Lauter, V., Ambaye, H., Goyette, R., Lee, W. T. H., & Parizzi, A. (2009). Highlights from the magnetism reflectometer at the SNS. Physica B: Condensed Matter, 404, 2543–2546.

Chapter 5

Machine Learning Spectral Indicators of Topology

Abstract Topological materials discovery has emerged as an important frontier in condensed matter physics. While theoretical classification frameworks have been used to identify thousands of candidate topological materials, experimental determination of materials’ topology often poses significant technical challenges. X-ray absorption spectroscopy (XAS) is a widely-used materials characterization technique sensitive to atoms’ local symmetry and chemical environment, which are intimately linked to band topology by the theory of topological quantum chemistry. Moreover, as a local structural probe, XAS is known to have high quantitative agreement between experiment and calculation, suggesting that insights from computational spectra can effectively inform experiments. In this chapter, we show that XAS can potentially uncover materials’ topology when augmented by machine learning. Using the computed X-ray absorption near-edge structure (XANES) spectra of more than 10,000 inorganic materials, we train a neural network classifier that predicts topological class directly from XANES signatures with F1 scores of 82% and 87% for topological and trivial classes, respectively, and achieves F1 scores above 90% for materials containing certain elements. Given the simplicity of the XAS setup and its compatibility with multimodal sample environments, the proposed machine learning-empowered XAS topological indicator has the potential to discover broader categories of topological materials, such as non-cleavable compounds and amorphous materials, and may further inform a variety of field-driven phenomena in situ, such as magnetic field-driven topological phase transitions.

5.1 Introduction Topological materials are characterized by a topologically nontrivial electronic band structure from which they derive their exceptional transport properties [1– 6]. The prospect of developing these exotic phases into useful applications has garnered widespread efforts to identify and catalogue candidate topological materials, evidenced by the emergence of numerous theoretical frameworks based on electron-filling constraints [7, 8], symmetry-based indicators [9–16], connectivity © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Andrejevic, Machine Learning-Augmented Spectroscopies for Intelligent Materials Design, Springer Theses, https://doi.org/10.1007/978-3-031-14808-8_5

79

80

5 Machine Learning Spectral Indicators of Topology

of electronic bands [8, 17–23], and spin–orbit spillage [24–26]. These frameworks have facilitated the prediction of over 8000 topologically non-trivial phases [27– 33], a vast unexplored territory for experiments. This is strong motivation to develop complementary experimental techniques for high-throughput screening of candidate materials. Current state-of-the-art techniques such as angle-resolved photoemission spectroscopy (ARPES), scanning tunneling microscopy (STM), and quantum transport measurements are commonly used to detect topological signatures, but a few limitations remain. Methods like ARPES directly probe band topology but are surface-sensitive and thereby place strict requirements on sample preparation and the sample environment, limiting the range of experimentally accessible materials [34, 35]. Transport measurements, on the other hand, can be performed on more versatile samples, but the topological character often needs to be inferred with substantial analysis. Neither approach yet fully meets the demands of a high-throughput classification program. Machine learning methods are increasingly being adapted to materials research to accelerate materials discovery [36–43] and facilitate inverse design through high-throughput property prediction [44–46]. Several recent studies have proposed data-driven frameworks for predicting band topology from structural and compositional attributes [47, 48] and quantum theoretical or simulated data [49–52]. At the same time, machine learning methods are being adopted to automate and improve data analysis for a broad range of experimental techniques [53–59]. Importantly, machine learning presents a potential opportunity to not only accelerate data analysis, but to derive useful information from complex data in the absence of reliable theoretical models, or to extract new insights beyond traditional models. In this work, we develop a classifier of electronic band topology using materials’ X-ray absorption spectra. X-ray absorption spectroscopy (XAS) is widely used to characterize the chemical state and local atomic structure of atomic species in a material. This technique is suitable for the study of highly diverse samples and environments, including noncrystalline materials and extreme conditions of temperature and pressure. As a bulk probe, XAS also places few constraints on surface quality and sample preparation. The X-ray absorption near-edge structure (XANES), defined within approximately 50 eV of an XAS absorption edge, provides a specie-specific fingerprint of coordination chemistry, orbital hybridization, and density of available electronic states. However, despite the rich electronic structural information contained in XANES spectra, the lack of a simple analytic description of XANES has compelled largely qualitative treatment of this energy regime, with individual spectral features attributed to properties of the electronic structure through empirical evidence and spectral matching [60]. As a result, machine learning methods have been introduced to automate the estimation of materials parameters such as coordination environments [54, 60–64], oxidation states [61, 64], and crystal-field splitting [65] from XANES and other core-level spectroscopies, and even enable direct prediction of XANES spectra from structural and atomic descriptors [66–68]. Here, we propose that machine learning models can be used to extract other hidden electronic properties, namely the electronic

5.2 Topological Materials Discovery

81

band topology, from XANES signatures and thereby serve as a potentially useful diagnostic of topological character. In this chapter, we develop a machine learning-enabled indicator of band topology based on K-edge XANES spectral inputs, which correspond to electronic transitions from the 1s core shell states to unoccupied states above the Fermi energy [69]. First, we summarize the data assembly procedure and conduct an exploratory analysis of topological indication for the K-edge XANES spectra of different elements based principal component analysis (PCA) and k-means clustering. Then, we develop a convolutional neural network (CNN) classifier of topology that synthesizes insights from XANES signatures of all elements in a given compound. The classifier achieves F1 scores of 82% and 87% for topological and trivial classes, respectively. Materials containing certain elements, including Be, Al, Sc, and Zr, are predicted with F1 scores above 90% in both classes. Our work suggests the potential of machine learning to uncover topological character embedded in complex spectral features, especially when a mechanistic understanding is challenging to acquire.

5.2 Topological Materials Discovery Topological materials discovery has emerged as an important frontier in condensed matter physics, leading to numerous theoretical and experimental undertakings to establish materials’ topological nature. Historically, three types of theoretical and computational approaches have been used to determine the topological characteristics of a band structure [4]: band structure calculations and computation of the Z2 (or higher-order) topological invariants; adiabatic continuity arguments connecting an unknown band structure to a known topological or trivial one through a series of adiabatic transformations of the Hamiltonian [70]; and direct computation of the surface or edge states. One prototypical route to obtaining non-trivial band topology is through band inversion, illustrated in Fig. 5.1a. The band structures of conventional insulating or semiconducting materials often consist of an s-like conduction band and p- or d-like valence band. However, certain heavy elements can contribute s electrons lower in energy than the p and d electrons of the lighter elements, resulting in a semimetallic material with conduction and valence bands with inverted orbital characters that cross at a nodal line [4, 73]. In the presence of spin-orbit coupling, the band crossing can either become fully gapped to form a topological insulator, or remain only at certain points in the Brillouin zone to form a topological semimetal. Figure 5.1b shows the general relationship between atomic number and the strength of spin-orbit splitting, which increases as the fourth power of the effective nuclear charge but decreases as the third power of the principal quantum number, leading to significantly larger interactions for atoms with a larger period (row) on the periodic table. One way to distinguish the trivial and topological phases is to consider the atomic limit, which is a band structure with exponentially localized Wannier functions that coincide with the atomic positions. If the atomic separation of such a band structure were taken to infinity, these

82

5 Machine Learning Spectral Indicators of Topology

Fig. 5.1 Band inversion route to topology. (a) Transformation from a normal (trivial) semiconductor to a topological insulator through band inversion, commonly driven by spin-orbit coupling. A normal semiconductor is adiabatically connected to the atomic limit, defined as a band structure with exponentially localized Wannier functions coinciding with the atomic positions. At infinite atomic separation, these Wannier functions coincide with localized atomic orbitals. A topological insulator is adiabatically disconnected from the atomic limit as its Hamiltonian cannot be smoothly and continuously deformed to this limit without closing the band gap. Adapted from Witting et al. [71] under a Creative Commons Attribution License (CC BY). (b) Strength of spin-orbit splitting as a function of atomic number, based on the data of Herman et al. [72]. (c) Ternary phase diagram of the relative orbital characters of the conduction bands in Ge-containing compounds, colored by topological character

Wannier functions would typically coincide with localized atomic orbitals. A normal semiconductor is adiabatically connected to the atomic limit, meaning the ground state of its Hamiltonian can be continuously deformed to this limit without closing the bulk band gap. Due to the band inversion, the Hamiltonian of topological band structures can no longer be adiabatically connected to the atomic limit. Thus, band inversion represents an important mechanism to explain non-trivial topology in known materials or identify other candidate topological systems. Figure 5.1 shows a ternary phase diagram of the relative orbital characters of the conduction bands in Ge-containing compounds, classified by electronic band topology using

5.4 Exploratory Analysis

83

the database of topological materials [32], indicative of a qualitative trend that the predicted topological compounds possess conduction bands with increased p- and d- over s-type orbital character compared to their trivial counterparts. In addition to crystalline symmetry and species type, orbital character may thus serve as a relevant topological indicator that can be ascertained from a material’s XANES spectrum (see Sect. 2.1.4). XAS has been used, for example, to determine the relative contributions of the s- and p-states of Pb to the conduction band to study the band inversion in Pbx Sn1−x Te topological crystalline insulators [74].

5.3 Data Preparation and Pre-processing The materials data used for this study were curated from the Inorganic Crystal Structure Database [75] (ICSD) and labelled according to their classification in the database of topological materials [32], which is based on the formalism of topological quantum chemistry [17]. XAS data were obtained from the published database of computed K-edge XANES spectra [76] and additional examples distributed on the Materials Project [77–80]. The materials data were refined based on availability of both high-quality topological classification and spectral data, resulting in 14,593 total materials considered: 5836 topological (∼40%) and 8757 trivial (∼60%). Moreover, the materials in this dataset are structurally and chemically diverse, covering 200 of 230 spacegroups and 70 different elements, with primitive unit cells ranging from 1 to 62 atoms and up to 7 unique chemical species. Data were subdivided into training, validation, and test sets according to a 70/15/15% split. While samples were randomly distributed among the datasets, an assignment process was developed to ensure balanced representation of each absorbing element and topological class within each dataset. Specifically, the fraction of topological insulators (TI), topological semimetals (TSM) and topologically trivial materials represented in compounds containing a certain element was balanced as shown in Fig. 5.2. For each example, the computed K-edge XANES spectra of each absorbing element were interpolated and re-sampled at 200 evenly-spaced energy values spanning an energy range of 56 eV surrounding the absorption edge. The spectra were standardized separately for different absorbing elements, which consisted of centering the mean of spectral intensities over each energy range, and scaling by the average intensity standard deviations.

5.4 Exploratory Analysis Prior to training the neural network classifier, we conducted an exploratory analysis of the assembled XANES spectra to estimate the separability by topological class exhibited by different elements. For all examples containing a given element, we performed a principal component analysis (PCA) on the high-dimensional

84

5 Machine Learning Spectral Indicators of Topology

valid

train

test

0.4

TSM

0.6

0.2

TI

fraction

0.8

Trivial

1.0

0.0

Ag

Al

As

Au

B

Ba

Be

Bi

Br

C

Ca

Cd

Ce

Cl

Co

Cr

Cs

Cu

F

Fe

Ga

Ge

H

Hf

Hg

Ho

I

In

Ir

K

La

Li

Mg

Mn

Mo

N

Na

Nb

Ni

Np

O

Os

P

Pb

Pd

Pt

U

V

W

Y

Yb

Zn

Zr

0.4 0.2

TSM

0.6

TI

fraction

0.8

Trivial

1.0

0.0

0.4

TSM

0.6

0.2

TI

fraction

0.8

Trivial

1.0

0.0

Rb

Re

Rh

Ru

S

Sb

Sc

Se

Si

Sn

Sr

Ta

Tc

Te

Th

Ti

Tl

Fig. 5.2 Element and class representation in training, validation, and test sets. TI and TSM denote topological insulators and topological semimetals, respectively

spectra and subsequently carried out unsupervised k-means clustering on a subset of principal components of the training set. The number of retained principal components was selected to retain at least 80% of the explained variance of spectra for a given element. During k-means clustering, a small fraction of data with the greatest distance from the data mean (up to 2%) was removed from consideration to discourage forming a cluster composed primarily of outliers. A second PCA was performed on the reduced representation to visualize a projection of the decision boundary produced by k-means clustering in two dimensions. Results of the clustering analysis for a selection of elements are shown in Fig. 5.3. The decision boundary between the two clusters identified by k-means clustering lies at the intersection of the blue (trivial) and orange (topological) shaded regions. Since unsupervised clustering is blind to the true topological class of the examples, cluster assignment was performed by solving an optimal matching problem which finds the pairing between clusters and topological classes that minimizes the number of misclassified examples, corrected for class imbalance. The examples from all three datasets (training, validation, and testing) are plotted as scattered points in the low-dimensional space and colored according to their known topological class. Additional visualizations are shown in Fig. 5.4. A quick survey of these results reveals a number of elements for which the classification accuracy of topological and trivial examples exhibits a clear trade-off, and a few for which the classification accuracy is more balanced. We correlated these observations with the decision boundary visualizations and noted three distinct patterns in the result of

5.5 Results

85

Fig. 5.3 Exploratory analysis using principal components and k-means clustering. Decision boundary visualizations of classifications by unsupervised k-means clustering for selected elements. As detailed in the main text, the k-means clustering is performed on the subset of principal components accounting for at least 80% of the explained variance of spectra for a given element. A second PCA is performed for visualization purposes to display a projection of the cluster decision boundary in two dimensions. Scattered points are colored according to their true class: topological (orange) or trivial (blue). The background is shaded according to the cluster class. The principal components exhibited three typical patterns characteristic of different columns in the periodic table: (a) no apparent clustering by class, (b, d) primary segregation of topological examples, and (c) relatively balanced segregation of topological and trivial examples. These patterns are further expressed in the confusion matrices and class-averaged XANES spectra of representative examples shown in each category

our unsupervised clustering. For some elements, nearly all topological examples were segregated within a single cluster (Fig. 5.3b and d). This led to a strong score for topological examples but weaker score for trivial ones for elements in the fourth and 14th columns of the periodic table. Other elements like Zn, Ga, and In exhibited more balanced classification accuracies between the two topological classes (Fig. 5.3c). On the other hand, there were a number of unsuccessful examples of alkali metals and halogens for which clustering of the data did not appear coincident with topological class (Fig. 5.3a). A possible explanation for this is that the elements in these columns rarely contribute to frontier orbitals (valence and conduction bands) in materials, and are thereby poor indicators of topology. Given that the feature transformations performed in our exploratory analysis were element-specific, the potential to discriminate data between the two classes is encouraging. This also suggests a possible advantage of synthesizing information of all constituent elements in given compound in order to improve prediction accuracy.

5.5 Results To construct the CNN classifier inputs, spectral data were assigned to channels of the corresponding absorbing atom, as shown in Fig. 5.5a. The core–electron binding energy increases substantially with increasing atomic number, ranging from 284 eV

Fig. 5.4 Decision boundary visualizations of classifications by unsupervised k-means clustering for all elements

86 5 Machine Learning Spectral Indicators of Topology

5.5 Results

87

Fig. 5.5 Data structure and model architecture. (a) A schematic of the complete XANES spectrum for a representative sample in the dataset, showing the distinct separation of signatures from different absorbing elements. As each absorber is only active in a narrow energy range, the spectral input to the machine learning model is organized into 118 element channels corresponding to each absorber. (b) Schematic of the convolutional neural network architecture operating on XANES spectra to obtain the predicted (binary) topological class

for the C K-edge to 115,606 eV for the U K-edge [81], and thus representing the XANES spectra of all elements on a continuous energy scale would be either poorly resolved or exceedingly high-dimensional (Fig. 5.5a). The use of element-specific channels retains both spectrum resolution and, implicitly, element-type information. In addition to enabling the synthesis of information from different absorbers, a neural network comprises more complex, non-linear operations than PCA and thereby has the capability to learn more expressive representations of the input data. The network architecture is illustrated in Fig. 5.5b, consisting of a series of convolutional and fully-connected layers with a prediction of the binary topological class as output. Due to moderate class imbalance, samples were weighted to add greater penalty to the misclassification of topological examples. Figure 5.6 summarizes the performance of the trained CNN classifier. The receiver operating characteristic (ROC) curve (Fig. 5.6a), which indicates the tradeoff between true and false positive rates as a function of the classification threshold, was used to determine an optimal threshold, tcutoff = 0.57; that is, samples with a predicted value greater than tcutoff were classified as topological, and otherwise as trivial. We use three different metrics in assessing the quality of prediction: recall, precision, and F1 score. These metrics are defined as recall: r =

tp tp + fn

tp tp + fp p·r F1 score: F1 = 2 , p+r precision: p =

(5.1a) (5.1b) (5.1c)

88

5 Machine Learning Spectral Indicators of Topology

Fig. 5.6 CNN classifier performance. (a) The receiver operating characteristic (ROC) curve showing the tradeoff between true and false positives for the best performing model with changing classification threshold tcutoff . The area under the curve (AUC) for each dataset is noted in the legend along with the selected threshold. Comparative plots of the overall recall, precision, and F1 scores for topological (b) and trivial (c) examples obtained using different methods discussed in the main text. Element-specific F1 scores for topological (d) and trivial (e) examples. Each element’s entry lists its atomic number, atomic symbol, and F1 score

where tp and tn denote the number of true positive and true negative predictions, and fp and fn denote the number of false positive and false negative predictions of a given class, respectively. The CNN classifier achieved F1 scores of 82% and 87% for topological and trivial classes, respectively. We compare these results to the performance of a traditional support vector machine (SVM) operating on onehot encoded element types only (denoted SVM-type), and on the flattened array of CNN inputs (denoted SVM), as well as on the average performance of the PCA and k-means clustering approach across all elements. Both the CNN and SVM classifiers based on XANES spectral inputs outperform the baseline model relying on element types alone, suggesting that XANES spectral features contribute insights significant to topological indication. The CNN further offers a slight improvement over the SVM, particularly in the precision of topological classification. These results are summarized in Fig. 5.6b and c.

5.5 Results Table 5.1 Predictions on mislabelled Weyl semimetals

89 Material TaAsa NbAsa NbP WTe2a Ag2Sea LaAlGe Ba7Al4Ge9 Cu2SnTe3 NaCu5S3 BiTeIa Al4Mo KOs2O6a Zn2In2S5a a

Spacegroup 109 109 109 31 17 109 42 44 182 143 8 216 186

Predicted class 1 1 1 1 0 1 1 0 0 0 1 1 0

Indicates sample present in training set

Next, we average the metric scores obtained by the CNN classifier for each absorbing element, shown in Fig. 5.6d and e for topological and trivial examples, respectively. We can see that the CNN classifier enables higher and more balanced predictive accuracy over the PCA and k-means clustering approach for a majority of elements, including significant improvement for alkali metals. Certain elements are better indicators of one class over another; for instance, the chalcogens and halogens appear to serve as somewhat poor indicators of topological samples but are well-predicted in trivial compounds. These may be explained by the comparatively less frequent contribution of these elements to the frontier orbitals and thereby to the determination of topological class. Certain transition-metal elements, such as Mn, Fe, Co, and Ru, also exhibit imbalanced accuracy in the prediction of trivial and topological classes. This may require more in-depth or systematic clarification of the relevant spectroscopic features—whether pre-edge, edge, or post-edge—in connection with the corresponding electronic transitions (e.g. 1s → 3d) to better understand performance barriers for transition metals. Finally, we comment on the comparatively lower precision obtained for topological over trivial examples, 80% and 89%, respectively. While the higher false positive rate of topological materials may suggest additional model improvements are needed, it may also indicate missed topological candidates. In fact, since the topological quantum chemistry formalism considers only the characters of electronic bands at high-symmetry points, it incorrectly classifies certain Weyl semimetals with topological singularities at arbitrary k-points [32]. In particular, we identified 13 experimentally-verified [5] or theoretically-predicted Weyl semimetals [82] which were labelled as trivial in our dataset, 8 of which were correctly predicted as topological by our CNN classifier (Table 5.1), and thereby marked as false positives. As a number of the surveyed samples were also present in the training set, this can account for at least some loss of precision reported for classification of topological examples.

90

5 Machine Learning Spectral Indicators of Topology

5.6 Conclusion We explored the predictive power of XAS as a potential discriminant of band topology by training and evaluating a convolutional neural network classifier on more than 10,000 examples of computed XANES K-edge spectra. A number of important extensions are envisioned for this work, such as its application to experimental XANES data, incorporation of a multi-fidelity approach to favor experimentally validated examples [83], expansion of the energy range to the extended X-ray absorption fine structure (EXAFS) regime, and inquiry into the detailed contribution from spectral features of specific elements. Our results demonstrate a promising pathway to develop robust experimental protocols for high-throughput screening of candidate topological materials aided by machine learning methods. Additionally, the flexibility of the XAS sample environment can further enable the study of materials whose topological phases emerge when driven by electric, magnetic, or strain fields, and even presents the opportunity to study topology with strong disorder or in amorphous materials [84, 85]. Thus, machine learning-empowered XAS may be poised to become a simple but powerful experimental tool for topological classification.

References 1. Hasan, M. Z., & Kane, C. L. (2010). Colloquium: Topological insulators. Reviews of Modern Physics, 82, 3045–3067. 2. Qi, X. L., & Zhang, S. C. (2011). Topological insulators and superconductors. Reviews of Modern Physics, 83, 1057–1110. 3. Yan, B., & Zhang, S. C. (2012). Topological materials. Reports on Progress in Physics, 75, 096501. 4. Bansil, A., Lin, H., & Das, T. (2016). Colloquium: Topological band theory. Reviews of Modern Physics, 88, 021004. 5. Yan, B., & Felser, C. (2017). Topological materials: Weyl semimetals. Annual Review of Condensed Matter Physics, 8, 337–354. 6. Armitage, N. P., Mele, E. J., & Vishwanath, A. (2018). Weyl and Dirac semimetals in threedimensional solids. Reviews of Modern Physics, 90, 015001. 7. Chen, R., Po, H. C., Neaton, J. B., & Vishwanath, A. (2018). Topological materials discovery using electron filling constraints. Nature Physics, 14, 55–61. 8. Watanabe, H., Po, H. C., & Vishwanath, A. (2018). Structure and topology of band structures in the 1651 magnetic space groups. Science Advances, 4, eaat8685. 9. Slager, R. J., Mesaros, A., Juriˇci´c, V., & Zaanen, J. (2013). The space group classification of topological band-insulators. Nature Physics, 9, 98–102. 10. Jadaun, P., Xiao, D., Niu, Q., & Banerjee, S. K. (2013). Topological classification of crystalline insulators with space group symmetry. Physical Review B, 88, 085110. 11. Chiu, C. K., Teo, J. C., Schnyder, A. P., & Ryu, S. (2016). Classification of topological quantum matter with symmetries. Reviews of Modern Physics, 88, 035005. 12. Po, H. C., Vishwanath, A., & Watanabe, H. (2017). Symmetry-based indicators of band topology in the 230 space groups. Nature Communications, 8, 1–9. 13. Song, Z., Zhang, T., Fang, Z., & Fang, C. (2018). Quantitative mappings between symmetry and topology in solids. Nature Communications, 9, 1–7.

References

91

14. Song, Z., Huang, S. J., Qi, Y., Fang, C., & Hermele, M. (2019). Topological states from topological crystals. Science Advances, 5, eaax2007. 15. Po, H. C. (2020). Symmetry indicators of band topology. Journal of Physics: Condensed Matter, 32, 263001. 16. Peng, B., Jiang, Y., Fang, Z., Weng, H., & Fang, C. (2021). Topological classification and diagnosis in magnetically ordered electronic materials. Preprint. arXiv:2102.12645. 17. Bradlyn, B., et al. (2017). Topological quantum chemistry. Nature, 547, 298–305. 18. Kruthoff, J., De Boer, J., Van Wezel, J., Kane, C. L., & Slager, R.-J. (2017). Topological classification of crystalline insulators through band structure combinatorics. Physical Review X, 7, 041069. 19. Cano, J., et al. (2018). Building blocks of topological quantum chemistry: Elementary band representations. Physical Review B, 97, 035139. 20. Elcoro, L., Song, Z., & Bernevig, B. A. (2020). Application of induction procedure and Smith decomposition in calculation and topological classification of electronic band structures in the 230 space groups. Physical Review B, 102, 035110. 21. Wieder, B. J., et al. (2021). Topological materials discovery from nonmagnetic crystal symmetry. Preprint. arXiv:2106.00709. 22. Bouhon, A., Lange, G. F., & Slager, R. J. (2021). Topological correspondence between magnetic space group representations and subdimensions. Physical Review B, 103, 245127. 23. C˘alug˘aru, D., et al. (2021). General construction and topological classification of all magnetic and non-magnetic flat bands. Preprint. arXiv:2106.05272. 24. Choudhary K., Garrity K. F., & Tavazza, F. (2019). High-throughput discovery of topologically non-trivial materials using spin-orbit spillage. Scientific Reports, 9, 1–8. 25. Choudhary K., Garrity K. F., Jiang, J., Pachter, R., & Tavazza, F. (2020). Computational search for magnetic and non-magnetic 2D topological materials using unified spin-orbit spillage screening. npj Computational Materials, 6, 1–8. 26. Choudhary, K., Garrity, K. F., Ghimire, N. J., Anand, N., & Tavazza, F. (2021). Highthroughput search for magnetic topological materials using spin-orbit spillage, machine learning, and experiments. Physical Review B, 103, 155131. 27. Tang, F., Po, H. C., Vishwanath, A., & Wan, X. (2019). Comprehensive search for topological materials using symmetry indicators. Nature, 566, 486–489. 28. Tang, F., Po, H. C., Vishwanath, A., & Wan, X. (2019). Topological materials discovery by large-order symmetry indicators. Science Advances, 5, eaau8725. 29. Tang, F., Po, H. C., Vishwanath, A., & Wan, X. (2019). Efficient topological materials discovery using symmetry indicators. Nature Physics, 15, 470–476. 30. Zhang, T., et al. (2019). Catalogue of topological electronic materials. Nature, 566, 475–479. 31. Wang D., et al. (2019). Two-dimensional topological materials discovery by symmetryindicator method. Physical Review B, 100, 195108. 32. Vergniory M., et al. (2019). A complete catalogue of high-quality topological materials. Nature, 566, 480–485. 33. Xu, Y., et al. (2020). High-throughput calculations of magnetic topological materials. Nature, 586, 702–707. 34. Suga, S., & Sekiyama, A. (2013). Photoelectron Spectroscopy: Bulk and Surface Electronic Structures. Springer. 35. Lv, B., Qian, T., & Ding, H. (2019). Angle-resolved photoemission spectroscopy and its application to topological materials. Nature Reviews Physics, 1, 609–626. 36. Raccuglia, P., et al. (2016). Machine-learning-assisted materials discovery using failed experiments. Nature, 533, 73–76. 37. Liu, Y., Zhao, T., Ju, W., & Shi, S. (2017). Materials discovery and design using machine learning. Journal of Materiomics, 3, 159–177. 38. Gómez-Bombarelli, R., et al. (2018). Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 4, 268–276. 39. Zhang, H., et al. (2019). Machine learning for novel thermal-materials discovery: early successes, opportunities, and challenges. Preprint. arXiv:1901.05801.

92

5 Machine Learning Spectral Indicators of Topology

40. Mikulskis, P., Alexander, M. R., & Winkler, D. A. (2019). Toward interpretable machine learning models for materials discovery. Advanced Intelligent Systems, 1, 1900045. 41. Juan, Y., Dai, Y., Yang, Y., & Zhang, J. (2021). Accelerating materials discovery using machine learning. Journal of Materials Science & Technology, 79, 178. 42. Kusne, A. G., et al. (2020). On-the-fly closed-loop materials discovery via Bayesian active learning. Nature Communications, 11, 1–11. 43. Mannodi-Kanakkithodi, A., & Chan, M. K. (2021). Computational data-driven materials discovery. Trends in Chemistry, 3, 79. 44. Pilania, G., Wang, C., Jiang, X., Rajasekaran, S., & Ramprasad, R. (2013). Accelerating materials property predictions using machine learning. Scientific Reports, 3, 1–6. 45. Ward, L., Agrawal, A., Choudhary A., & Wolverton, C. (2016). A general-purpose machine learning framework for predicting properties of inorganic materials. npj Computational Materials, 2, 1–7. 46. Carrete, J., Li, W., Mingo, N., Wang, S., & Curtarolo, S. (2014). Finding unprecedentedly lowthermal-conductivity half-Heusler semiconductors via high-throughput materials modeling. Physical Review X, 4, 011019. 47. Claussen, N., Bernevig, B. A., & Regnault, N. (2019). Detection of topological materials with machine learning. Preprint. arXiv:1910.10161. 48. Rodriguez-Nieva, J. F., & Scheurer, M. S. (2019). Identifying topological order through unsupervised machine learning. Nature Physics, 15, 790–795. 49. Zhang, Y., & Kim, E. A. (2017). Quantum loop topography for machine learning. Physical Review Letters, 118, 216401. 50. Lian, W., et al. (2019). Machine learning topological phases with a solid-state quantum simulator. Physical Review Letters, 122, 210503. 51. Scheurer, M. S., & Slager, R. J. (2020). Unsupervised machine learning and band topology Physical Review Letters, 124, 226401. 52. Zhang, P., Shen, H., & Zhai, H. (2018). Machine learning topological invariants with neural networks. Physical Review Letters, 120, 066401. 53. Carleo, G., et al. (2019). Machine learning and the physical sciences. Reviews of Modern Physics, 91, 045002. 54. Carbone, M. R., Yoo, S., Topsakal, M., & Lu, D. (2019). Classification of local chemical environments from X-ray absorption spectra using supervised machine learning. Physical Review Materials, 3, 033604. 55. Cui, A., et al. (2019). Decoding phases of matter by machine-learning Raman spectroscopy. Physical Review Applied, 12, 054049. 56. Han, B., et al. (2019). Deep learning enabled fast optical characterization of two-dimensional materials. Preprint. arXiv:1906.11220. 57. Samarakoon, A. M., et al. (2019). Machine learning assisted insight to spin ice Dy2Ti2O7. Preprint. arXiv:1906.11275. 58. Zhang, Y., et al. (2019). Machine learning in electronic-quantum-matter imaging experiments. Nature, 570, 484–490. 59. Rem, B. S., et al. (2019). Identifying quantum phase transitions using artificial neural networks on experimental data. Nature Physics, 15, 917–920. 60. Gaur, A., & Shrivastava, B. (2015). Speciation using X-ray absorption fine structure (XAFS). Review Journal of Chemistry, 5, 361–398. 61. Torrisi, S. B., et al. (2020). Random forest machine learning models for interpretable X-ray absorption near-edge structure spectrum-property relationships. npj Computational Materials, 6, 1–11. 62. Zheng, C., Chen, C., Chen, Y., & Ong, S. P. (2020). Random forest models for accurate identification of coordination environments from x-ray absorption near-edge structure. Patterns, 1, 100013. 63. Kiyohara, S., Miyata, T., Tsuda, K., & Mizoguchi, T. (2018). Data-driven approach for the prediction and interpretation of core-electron loss spectroscopy. Scientific Reports, 8, 1–12.

References

93

64. Guda, A., et al. (2021). Understanding X-ray absorption spectra by means of descriptors and machine learning algorithms. npj Computational Materials, 7, 1–13. 65. Suzuki, Y., Hino, H., Kotsugi, M., & Ono, K. (2019). Automated estimation of materials parameter from X-ray absorption and electron energy-loss spectra with similarity measures. npj Computational Materials, 5, 1–7. 66. Carbone, M. R., Topsakal, M., Lu, D., & Yoo, S. (2020). Machine-learning X-ray absorption spectra to quantitative accuracy Physical Review Letters, 124, 156401. 67. Rankine, C. D., Madkhali, M. M., & Penfold, T. J. (2020). A deep neural network for the rapid prediction of X-ray absorption spectra. The Journal of Physical Chemistry A, 124, 4263–4270. 68. Lueder, J. (2021). A machine learning approach to predict L-edge X-ray absorption spectra of light transition metal ion compounds. Preprint. arXiv:2107.13149. 69. Andrejevic, N., Andrejevic, J., Rycroft, C. H., & Li, M. (2020). Machine learning spectral indicators of topology. Preprint. arXiv:2003.00994. 70. Lin, H., et al. (2013). Adiabatic transformation as a search tool for new topological insulators: Distorted ternary Li2AgSb-class semiconductors and related compounds. Physical Review B, 87, 121202. 71. Witting, I. T., Ricci, F., Chasapis, T. C., Hautier, G., & Snyder, G. J. (2020). The thermoelectric properties of-type bismuth telluride: bismuth selenide alloys. Research, 2020. 72. Herman, F., Kuglin, C. D., Cuff, K. F., & Kortum, R. L. (1963). Relativistic corrections to the band structure of tetrahedrally bonded semiconductors. Physical Review Letters, 11, 541. 73. Narang, P., Garcia, C. A., & Felser, C. (2021). The topology of electronic band structures. Nature Materials, 20, 293–300. 74. Mitrofanov, K. et al. (2014). Study of band inversion in the PbxSn1- xTe class of topological crystalline insulators using X-ray absorption spectroscopy Journal of Physics: Condensed Matter, 26, 475502. 75. Bergerhoff, G., & Brown, I. (1987). Crystallographic databases. FH Allen et al. (Hrsg.) Chester. International Union of Crystallography. 76. Mathew, K., et al. (2018). High-throughput computational X-ray absorption spectroscopy Scientific Data, 5, 180151. 77. Jain, A., et al. (2013). The Materials Project: A materials genome approach to accelerating materials innovation. APL Materials, 1, 011002. ISSN: 2166532X. 78. Zheng, C., et al. (2018). Automated generation and ensemble-learned matching of X-ray absorption spectra. npj Computational Materials, 4, 12 . 79. Ong, S. P, et al. (2013). Python materials genomics (pymatgen): A robust, open-source python library for materials analysis. Computational Materials Science, 68, 314–319. ISSN: 09270256. 80. Ong, S. P., et al. (2015). The materials application programming interface (API): A simple, flexible and efficient API for materials data based on REpresentational State Transfer (REST) principles. Computational Materials Science, 97, 209–215. 81. Penner-Hahn, J. E., et al. (2003). X-ray absorption spectroscopy. Comprehensive Coordination Chemistry II, 2, 159–186. 82. Xu, Q., et al. (2020). Comprehensive scan for nonmagnetic Weyl semimetals with nonlinear optical response. npj Computational Materials, 6, 1–7. 83. Meng, X., & Karniadakis, G. E. (2020). A composite neural network that learns from multifidelity data: Application to function approximation and inverse PDE problems. Journal of Computational Physics, 401, 109020. 84. Prodan, E. (2011). Disordered topological insulators: a non-commutative geometry perspective. Journal of Physics A: Mathematical and Theoretical, 44, 113001. 85. Agarwala, A., & Shenoy V. B. (2017). Topological insulators in amorphous systems. Physical Review Letters, 118, 236402.

Chapter 6

Conclusion and Outlook

Abstract In this chapter, we summarize the primary contributions of this thesis work and offer a short perspective on the possible extensions of each study, concluding with a discussion of outstanding challenges and emerging approaches in the field.

6.1 Thesis Summary In this thesis, we proposed a set of frameworks for improved experimental design and analysis of photon and neutron scattering and spectroscopies, motivated by the need for both interpretable and flexible scientific models. In Chap. 3, we developed a physics-informed model based on Euclidean neural networks which directly predicts the phonon DoS of crystalline solids from the atomic masses and positions of their constituent atoms. The model is readily applicable to rapid, high-throughput prediction of the phonon DoS of candidate materials to support experimental planning. Moreover, we found that the trained model captured essential physics, such as partial DoS, without explicit training, enabled by the choice of a symmetry-aware network architecture equivariant to Euclidean transformations. We further offered a perspective on representation learning of materials’ spectral signatures, exemplified through a case study on Raman spectroscopic data. In Chap. 4, we implemented a semi-supervised variational autoencoder to facilitate robust parameter retrieval from polarized neutron reflectometry measurements, targeting the presence of proximity magnetism in topological insulators interfaced with magnetic layers. Additionally, we observed that the trained model learns an interpretable latent representation of the relevant parameters and serves as a surrogate model for PNR profile simulation. Finally, in Chap. 5 we leveraged both supervised and unsupervised learning methods to develop a convolutional neural network classifier of materials’ electronic band topology using K-edge XANES spectra. The success of our XAS-based topological indicator suggests the possibility to broaden the application scope of neutron and X-ray spectroscopies through machine learning, thereby accelerating scientific discovery.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 N. Andrejevic, Machine Learning-Augmented Spectroscopies for Intelligent Materials Design, Springer Theses, https://doi.org/10.1007/978-3-031-14808-8_6

95

96

6 Conclusion and Outlook

6.2 Perspectives and Outlook At the same time, a number of outstanding challenges and directions remain for future work. First, extension of the Euclidean neural network predictor to defected and even disordered compounds would have profound impact on the computation and design of materials systems for diverse applications. Thus, enhanced sensitivity to symmetry breaking in crystalline solids represents one important objective for neural network operations. Additionally, while the phonon DoS is fundamental to materials properties such as heat capacity, thermal conductivity, and phononmediated superconductivity, access to the full phonon dispersion is often needed to understand the origins of certain phenomena, such as momentum-dependent contributions to electron-phonon and phonon-phonon coupling. Since reciprocal space is intimately linked to the real space symmetries already embedded in the Euclidean neural network operations, the extension to predicting a full phonon dispersion is certainly within reach of future studies. Second, while the fitting approach proposed in Chap. 4 was tailored to PNR measurements, much of the underlying infrastructure, including data representation and network operations, is transferable to other spectroscopic techniques. The inverse scattering problem, or the inversion of data to a structural solution or other physical model, represents one of the most fundamental goals of experimental analysis. This problem generally demands sufficient expertise in a particular scientific domain and, as a result, often limits extraction of scientific insights. Machine learning models offer a potentially universal or widely transferable solution to address this and other closely related problems, such as the treatment of experimental backgrounds and artifacts, noise, and loss of phase information. Thus, it is advantageous to assess the generality of the proposed semi-supervised framework for parameter retrieval to other spectroscopic techniques. Third, the development of a topological indicator informed by a widely accessible experimental technique such as XAS is promising, but performance improvements are necessary to implement it in practice. To this end, examination of lowerenergy absorption edges, such as the L or M edges, with selectivity to electronic transitions between different orbitals, as well as consideration of the extended X-ray absorption fine structure (EXAFS), may be worthwhile. Subsequently, the extension of machine learning models trained on computational data to experiments is often challenging, and thus an account of the possible experimental artifacts and their influence on the classifier predictions remains an important next step. Lastly, this thesis considered four different spectroscopies with independent analysis programs where data representations and network architectures could be specialized to a single data type, such as a graph-based or spectral input. However, materials characterization often requires insight from multiple experimental techniques with sensitivity to different types of excitations, which together provide a more complete picture of materials properties and dynamics. Data acquired using different neutron and X-ray scattering techniques are often complementary but are typically synthesized manually by researchers. In this regard, machine learning may

Reference

97

provide an important avenue toward intelligent analysis across multiple modalities. By consolidating information from multiple sources, multimodal machine learning models have the potential to make more robust predictions and discover more sophisticated relationships among data. Their flexibility may further enable a richer and more informative synthesis of diverse data than can be attained with analytical models. At the same time, this approach introduces new prerequisites compared to learning from single modalities, such as the representation and fusion of heterogeneous data [1]. On the other hand, knowledge gained by learning from one modality can potentially assist a model trained on a different modality with more limited resources. In the context of neutron and photon experimental data analysis, different experimental techniques access widely different energy, time, length, and momentum scales, produce diverse data structures, and carry varying levels of uncertainty, all of which are important considerations in a multimodal approach. While the development of analysis workflows and supporting data infrastructure to aggregate measurements from diverse neutron and photon-based modalities is undoubtedly a significant undertaking, it is precisely the data-intensive setting that calls for intelligent analysis enabled by machine learning.

Reference 1. Baltrušaitis, T., Ahuja, C., & Morency, L. P. (2018). Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(2), 423–443.