AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action (Studies in Computational Intelligence, 1013) [1st ed. 2022] 3030930793, 9783030930790

This book aims to highlight the latest achievements in the use of artificial intelligence for digital disease surveillan

126 85 8MB

English Pages 350 [335] Year 2022

Table of contents :
Preface
Contents
Contributors
Abbreviations
Digital Technologies for Clinical, Public and Global Health Surveillance
1 Introduction
2 Artificial Intelligence-Based Tools and Methods
References
Imputing Fine-Grain Patterns of Mental Health with Statistical Modelling of Online Data
1 Introduction
2 Methods
2.1 Regression Analysis
3 Results
3.1 Web Search Queries as Predictors of Annual Suicide Rates
3.2 Imputing Monthly Near-Real-Time Resolution from Annual Ground Truth Data
3.3 Prediction
4 Discussion
5 Conclusion
References
Lexical and Acoustic Correlates of Clinical Speech Disturbance in Schizophrenia
1 Introduction
2 Related Work
3 Methods
3.1 Dataset
3.2 Data Processing and Analysis
4 Results
5 Discussion
5.1 Limitations and Future Work
6 Acknowledgements and Conflicts of Interest
References
A Prognostic Tool to Identify Youth at Risk of at Least Weekly Cannabis Use
1 Introduction
2 Methods
3 Results
4 Discussion
References
Neuro-Symbolic Neurodegenerative Disease Modeling as Probabilistic Programmed Deep Kernels
1 Introduction
2 Background
2.1 Probabilistic Programming
2.2 Neurodegenerative Disease Modeling
3 Problem Formulation
4 Proposed Method
4.1 Monotonic Gaussian Processes
4.2 Probabilistic Programmed Deep Kernel Learning
4.3 Neurodegeneration Programs
5 Experiments
5.1 Results
6 Conclusion
References
Self-Disclosure in Opioid Use Recovery Forums
1 Introduction
2 Related Work
3 Dataset
4 Self-disclosure Expressed in Post-titles
4.1 Features
4.2 Predicting Self-Disclosure in Opioid Use Recovery Forums
5 Social Support Expressed in Post-titles
6 What Type of Social Support Is Sought in Post-titles that Disclose Positive or Negative Information
6.1 Results
7 Accountability
8 Discussion
9 Conclusion and Future Work
References
Identifying Prepubertal Children with Risk for Suicide Using Deep Neural Network Trained on Multimodal Brain Imaging
1 Introduction
2 Previous Work
3 Dataset and Preprocessing
3.1 Adolescent Brain and Cognitive Development Study
4 Methods
4.1 Deep Neural Network
4.2 Model Training and Testing
4.3 Experiment Settings
5 Results
6 Discussion
References
Improving Adverse Drug Event Extraction with SpanBERT on Different Text Typologies
1 Introduction
2 Related Work
3 The Proposed Approach
4 Experimental Results
4.1 Datasets
4.2 Metrics
4.3 Baselines
4.4 Implementation Details
4.5 Evaluation
4.6 Quantitative Analysis
4.7 Qualitative Error Analysis
5 Conclusions
References
Machine Learning Identification of Self-reported COVID-19 Symptoms from Tweets in Canada
1 Introduction
2 Methodology
3 Results and Discussion
4 Conclusion
References
RRISK: Analyzing COVID-19 Risk in Food Establishments
1 Background and Motivation
2 Related Research
3 Application Overview
4 Approach
5 Data
6 Algorithm
6.1 Data Statistics
7 System Architecture
8 Training, Validation, and Verification
8.1 Similar Restaurants with Uncorrelated Yelp and RRISK Scores
8.2 A Risky Region with a Reasonable Restaurant
9 Implementation Challenges
10 Conclusions and Future Work
References
AWS CORD-19 Search: A Neural Search Engine for COVID-19 Literature
1 Introduction
2 System Overview
2.1 Amazon Kendra
2.2 Comprehend Medical
2.3 COVID-19 Knowledge Graph
2.4 Topic Models
3 Evaluation
3.1 Paper Recommendation
4 Analysis
5 Limitations and Future Directions
6 Conclusion
References
Inferring COVID-19 Biological Pathways from Clinical Phenotypes Via Topological Analysis
1 Introduction
2 Background
2.1 Redescriptions
2.2 Topological Data Analysis
3 Proposed Pipeline
4 Experimental Setup
4.1 Dataset
4.2 Experimental Details
5 Results
5.1 Main Result
5.2 Discussion
6 Conclusions
References
The EpiBench Platform to Propel AI/ML-Based Epidemic Forecasting: A Prototype Demonstration Reaching Human Expert-Level Performance
1 Introduction
2 Background
2.1 Existing Infrastructure
2.2 Acceptable Methods
3 Results from the Prototype
3.1 Ensemble Development
3.2 Assessing Forecasting Decisions
4 Discussions
4.1 Evaluation Protocol
4.2 EpiBench: Planned Functionalities
5 Conclusions
References
Interpretable Classification of Human Exercise Videos Through Pose Estimation and Multivariate Time Series Analysis
1 Introduction
2 Related Work
3 Data Collection
4 Methods
5 Experiments
5.1 Classifier Accuracy
5.2 Classifier Runtime
5.3 Classifier Feedback
6 Conclusion
References
Interpreting Deep Neural Networks for Medical Imaging Using Concept Graphs
1 Introduction
2 Proposed Framework
2.1 Concept Formation
2.2 Concept Identification
2.3 Network Formation and Information Flow
2.4 Trail Estimation
3 Experiments
3.1 Brain Tumor Segmentation
3.2 Diabetic Retinopathy Classification
4 Related Work
5 Discussion
6 Appendix 1
References
Do Deep Neural Networks Forget Facial Action Units?—Exploring the Effects of Transfer Learning in Health Related Facial Expression Recognition
1 Introduction
2 Related Work
2.1 Transfer Learning in Pain Recognition
2.2 Explainable Artificial Intelligence
3 Pain Training
3.1 Datasets
3.2 Transfer Learning
4 Measuring Forgetting
4.1 Re-Train for Measuring Forgetting
4.2 Visual Analysis
4.3 Concept Embedding Detection
5 Results
6 Discussion
7 Conclusion
References
Utilizing Predictive Analysis to Aid Emergency Medical Services
1 Introduction
2 Data Source and Preparation
3 Model Development and Evaluation
4 Results
5 Discussion
5.1 Usability of the System
5.2 Scope of the System
5.3 Limitations
6 Conclusion
7 Future Work
References
Measuring Physiological Markers of Stress During Conversational Agent Interactions
1 Introduction
2 Methods
2.1 Study Designs
2.2 Participants
2.3 Procedures
3 Analysis
3.1 Data Preprocessing
3.2 Statistical Analyses
4 Results
4.1 External Validation of the CA Study Resting Period
4.2 Signal Comparisons Between Tasks
4.3 Comparing the Rest Period and CA Interaction
4.4 Comparing the CA and WESAD Studies
4.5 HRV Analysis
5 Discussion
References
EvSys: A Relational Dynamic System for Sparse Irregular Clinical Events
1 Introduction
2 EvSys for Clinical Events
2.1 Preliminaries
2.2 Disentangling Measurements from Health States
2.3 Dynamic of Measurement Processes
2.4 Dynamic of Health Processes with Skipping
2.5 Predictions
3 Experiments
3.1 Evaluation Setup
3.2 Results
4 Related Works
5 Conclusion
References
Predicting Patient Outcomes with Graph Representation Learning
1 Introduction
2 Related Work
3 Methods
4 Data
5 Results
6 Discussion
References
Patient-Specific Seizure Prediction Using Single Seizure Electroencephalography Recording
1 Introduction
2 Related Work
3 Motivation
4 Methodology
4.1 Data and Preprocessing
4.2 Seizure Prediction Using Siamese Learning
5 Evaluation
6 Conclusion and Future Work
References
Evaluation Metrics for Deep Learning Imputation Models
1 Introduction
2 Imputation Models and Evaluation Metrics
2.1 Imputation Models
2.2 Evaluation Metrics
2.3 Evaluation Methodology
3 Comparative Analysis
3.1 Data Collection and Processing
3.2 Evaluation Procedure
3.3 Results
4 Limitations of Evaluation Metrics
5 Related Work
6 Conclusion and Future Work
References
Logistic Regression is also a Black Box. Machine Learning Can Help
1 Introduction
2 Materials and Methods
2.1 Data Simulation Protocol
2.2 Classification
2.3 Classification Performance Evaluation
3 Results
4 Discussion
5 Conclusions
References

Recommend Papers

Advanced Computational Intelligence in Healthcare-7 (Studies in Computational Intelligence, 891) [1st ed. 2020] 3662611120, 9783662611128

This book presents state-of-the-art works and systematic reviews in the emerging field of computational intelligence (CI

123 106 6MB Read more

Intelligent Information Access (Studies in Computational Intelligence, 301) 9783642139994, 364213999X

Written from a multidisciplinary perspective, Intelligent Information Access investigates new insights into methods, tec

113 67 4MB Read more

Computational Intelligence (Studies in Computational Intelligence, 1119) 3031462203, 9783031462207

This book includes a set of selected revised and extended versions of the best papers presented at the 13th Internationa

112 111 Read more

Computational Intelligence for Water and Environmental Sciences (Studies in Computational Intelligence, 1043) 9811925186, 9789811925184

This book provides a comprehensive yet fresh perspective for the cutting-edge CI-oriented approaches in water resources

100 66 Read more

Computational Methods for Biological Models (Studies in Computational Intelligence, 1109) [1st ed. 2023] 9789819950003, 9789819950010, 9819950007

This book discusses computational methods related to biological models using mathematical tools and techniques. The book

105 23 8MB Read more

Computational Intelligence for Business Analytics (Studies in Computational Intelligence, 953) 3030738183, 9783030738181

Corporate success has been changed by the importance of new developments in Business Analytics (BA) and furthermore by t

107 67 Read more

Modern Approaches for Intelligent Information and Database Systems (Studies in Computational Intelligence, 769) 3319760807, 9783319760803

106 14 16MB Read more

Computational Intelligence Methods in COVID-19: Surveillance, Prevention, Prediction and Diagnosis [1st ed.] 9789811585333, 9789811585340

The novel coronavirus disease 2019 (COVID-19) pandemic has posed a major threat to human life and health. This book is b

378 137 13MB Read more

Advances in Deep Generative Models for Medical Artificial Intelligence (Studies in Computational Intelligence, 1124) [1st ed. 2023] 3031463404, 9783031463402

Generative Artificial Intelligence is rapidly advancing with many state-of-the-art performances on computer vision, spee

113 6 9MB Read more

Recent Studies on Computational Intelligence: Doctoral Symposium on Computational Intelligence (DoSCI 2020) [1st ed.] 9789811584688, 9789811584695

This book gathers the latest quality research work of Ph.D. students working on the current areas presented in the Docto

429 67 4MB Read more

AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action (Studies in Computational Intelligence, 1013) [1st ed. 2022]
3030930793, 9783030930790

Author / Uploaded
Arash Shaban-Nejad (editor)
Martin Michalowski (editor)
Simone Bianco (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Studies in Computational Intelligence 1013

Arash Shaban-Nejad Martin Michalowski Simone Bianco Editors

AI for Disease Surveillance and Pandemic Intelligence Intelligent Disease Detection in Action

Studies in Computational Intelligence Volume 1013

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

More information about this series at https://link.springer.com/bookseries/7092

Arash Shaban-Nejad · Martin Michalowski · Simone Bianco Editors

AI for Disease Surveillance and Pandemic Intelligence Intelligent Disease Detection in Action

Editors Arash Shaban-Nejad Oak-Ridge National Lab (ORNL) Department of Pediatrics Center for Biomedical Informatics College of Medicine The University of Tennessee Health Science Center (UTHSC) Memphis, TN, USA

Martin Michalowski School of Nursing University of Minnesota Minneapolis, MN, USA

Simone Bianco IBM Almaden Research Center San Jos, CA, USA

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-93079-0 ISBN 978-3-030-93080-6 (eBook) https://doi.org/10.1007/978-3-030-93080-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

Surveillance activities play a crucial role in helping healthcare professionals and researchers, and public health organizations monitor, track, and assess emerging patterns and trends of disease. Artificial intelligence and digital technology have shown great potential to improve disease surveillance through systematic collection and integration of big medical and non-medical data and providing advanced health data analytics in mission-critical medical and public health applications. This book highlights the latest achievements in the application of artificial intelligence to health care and medicine. The edited volume contains selected papers presented at the 2021 International Workshop on Health Intelligence, colocated with the Association for the Advancement of Artificial Intelligence (AAAI) thirty-fifth annual conference. The papers present an overview of the issues, challenges, and potential in the field, along with new research results. The book makes the emerging topics of digital health and AI for disease surveillance accessible to a broad readership with a wide range of practical applications. It provides information for scientists, researchers, students, industry professionals, national and international public health agencies, and NGOs interested in the theory and practice of AI and machine learning in medicine, digital, and precision health, with emphasis on both population health and individual risk factors for disease prevention, diagnosis, and intervention. Memphis, USA Minneapolis, USA San Jos, USA

Arash Shaban-Nejad Martin Michalowski Simone Bianco

v

Contents

Digital Technologies for Clinical, Public and Global Health Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arash Shaban-Nejad, Martin Michalowski, and Simone Bianco

1

Imputing Fine-Grain Patterns of Mental Health with Statistical Modelling of Online Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Esther Jack, Estie Kruger, and Marc Tennant

11

Lexical and Acoustic Correlates of Clinical Speech Disturbance in Schizophrenia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rony Krell, Wenqing Tang, Katrin Hänsel, Michael Sobolev, Sunghye Cho, Sarah Berretta, and Sunny X. Tang A Prognostic Tool to Identify Youth at Risk of at Least Weekly Cannabis Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marie-Pierre Sylvestre, Simon de Montigny, Laurence Boulanger, Danick Goulet, Isabelle Doré, Jennifer O’Loughlin, Slim Haddad, Richard E. Bélanger, and Scott Leatherdale Neuro-Symbolic Neurodegenerative Disease Modeling as Probabilistic Programmed Deep Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . Alexander Lavin Self-Disclosure in Opioid Use Recovery Forums . . . . . . . . . . . . . . . . . . . . . . Anietie Andy Identifying Prepubertal Children with Risk for Suicide Using Deep Neural Network Trained on Multimodal Brain Imaging . . . . . . . . . . Gun Ahn, Bogyeom Kim, Ka-kyeong Kim, Hyeonjin Kim, Eunji Lee, Woo-Young Ahn, Jae-Won Kim, and Jiook Cha

27

37

49 65

75

vii

viii

Contents

Improving Adverse Drug Event Extraction with SpanBERT on Different Text Typologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beatrice Portelli, Daniele Passabì, Edoardo Lenzi, Giuseppe Serra, Enrico Santus, and Emmanuele Chersoni

87

Machine Learning Identification of Self-reported COVID-19 Symptoms from Tweets in Canada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Jean-Philippe Gilbert, Jingcheng Niu, Simon de Montigny, Victoria Ng, and Erin Rees RRISK: Analyzing COVID-19 Risk in Food Establishments . . . . . . . . . . . 113 Saahil Sundaresan, Shafin Khan, Faraz Rahman, and Chris Huang AWS CORD-19 Search: A Neural Search Engine for COVID-19 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Parminder Bhatia, Lan Liu, Kristjan Arumae, Nima Pourdamghani, Suyog Deshpande, Ben Snively, Mona Mona, Colby Wise, George Price, Shyam Ramaswamy, Xiaofei Ma, Ramesh Nallapati, Zhiheng Huang, Bing Xiang, and Taha Kass-Hout Inferring COVID-19 Biological Pathways from Clinical Phenotypes Via Topological Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Negin Karisani, Daniel E. Platt, Saugata Basu, and Laxmi Parida The EpiBench Platform to Propel AI/ML-Based Epidemic Forecasting: A Prototype Demonstration Reaching Human Expert-Level Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Ajitesh Srivastava, Tianjian Xu, and Viktor K. Prasanna Interpretable Classification of Human Exercise Videos Through Pose Estimation and Multivariate Time Series Analysis . . . . . . . . . . . . . . . 181 Ashish Singh, Binh Thanh Le, Thach Le Nguyen, Darragh Whelan, Martin O’Reilly, Brian Caulfield, and Georgiana Ifrim Interpreting Deep Neural Networks for Medical Imaging Using Concept Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Avinash Kori, Parth Natekar, Balaji Srinivasan, and Ganapathy Krishnamurthi Do Deep Neural Networks Forget Facial Action Units?—Exploring the Effects of Transfer Learning in Health Related Facial Expression Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Pooja Prajod, Dominik Schiller, Tobias Huber, and Elisabeth André Utilizing Predictive Analysis to Aid Emergency Medical Services . . . . . . . 235 Pratyush Kumar Sahoo, Nidhi Malhotra, Shirley Sanjay Kokane, Biplav Srivastava, Harsh Narayan Tiwari, and Sushant Sawant

Contents

ix

Measuring Physiological Markers of Stress During Conversational Agent Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Shreya Datar, Libby Ferland, Esther Foo, Michael Kotlyar, Brad Holschuh, Maria Gini, Martin Michalowski, and Serguei Pakhomov EvSys: A Relational Dynamic System for Sparse Irregular Clinical Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 Duc Nguyen, Phuoc Nguyen, and Truyen Tran Predicting Patient Outcomes with Graph Representation Learning . . . . . 281 Catherine Tong, Emma Rocheteau, Petar Veliˇckovi´c, Nicholas Lane, and Pietro Liò Patient-Specific Seizure Prediction Using Single Seizure Electroencephalography Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Zaid Bin Tariq, Arun Iyengar, Lara Marcuse, Hui Su, and Bulent Yener Evaluation Metrics for Deep Learning Imputation Models . . . . . . . . . . . . 309 Omar Boursalie, Reza Samavi, and Thomas E. Doyle Logistic Regression is also a Black Box. Machine Learning Can Help . . . 323 Liam Butler, Fatma Gunturkun, Ibrahim Karabayir, and Oguz Akbilgic

Contributors

Gun Ahn Department of Material Sciences, College of Engineering, Seoul National University, Seoul, South Korea Woo-Young Ahn Department of Psychology, College of Social Sciences, Seoul National University, Seoul, South Korea; Department of Brain and Cognitive Sciences, College of Natural Sciences, Seoul National University, Seoul, South Korea; AI Institute, Seoul National University, Seoul, South Korea Oguz Akbilgic Loyola University Chicago, Maywood, IL, USA; Wake Forest School of Medicine, Winston-Salem, NC, USA Elisabeth André Human Centered Multimedia, Augsburg University, Augsburg, Germany Anietie Andy Penn Medicine, University of Pennsylvania, Philadelphia, PA, USA Kristjan Arumae Amazon Web Services AI, Seattle, USA Saugata Basu Purdue University, West Lafayette, USA Sarah Berretta Zucker Hillside Hospital/Feinstein Institutes for Medical Research, Glen Oaks, NY, USA Parminder Bhatia Amazon Web Services AI, Seattle, USA Simone Bianco IBM Corporation, Almaden Research Center, San Jose, CA, USA Laurence Boulanger Centre de recherche du CHUM, Montréal, Canada Omar Boursalie School of Biomedical Engineering, McMaster University, Hamilton, Canada; Vector Institute, Toronto, Canada Liam Butler Loyola University Chicago, Maywood, IL, USA; Stony Brook University, Stony Brook, NY, USA

xi

xii

Contributors

Richard E. Bélanger Université Laval, Québec, Canada Brian Caulfield Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland Jiook Cha Department of Psychology, College of Social Sciences, Seoul National University, Seoul, South Korea; Department of Brain and Cognitive Sciences, College of Natural Sciences, Seoul National University, Seoul, South Korea; AI Institute, Seoul National University, Seoul, South Korea Emmanuele Chersoni The Hong Kong Polytechnic University, Hung Hom, Hong Kong Sunghye Cho Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA Shreya Datar University of Minnesota, Minneapolis, MN, USA Simon de Montigny School of Public Health, University of Montreal, Montreal, Canada Suyog Deshpande Amazon Web Services AI, Seattle, USA Isabelle Doré Université de Montréal, Montréal, Canada Thomas E. Doyle School of Biomedical Engineering, McMaster University, Hamilton, Canada; Department of Electrical and Computer Engineering, McMaster University, Hamilton, Canada; Vector Institute, Toronto, Canada Libby Ferland University of Minnesota, Minneapolis, MN, USA Esther Foo University of Minnesota, Minneapolis, MN, USA Jean-Philippe Gilbert Laval University, Quebec, Canada Maria Gini University of Minnesota, Minneapolis, MN, USA Danick Goulet Université de Montréal, Montréal, Canada Fatma Gunturkun University of Tennessee Health Sciences Center, Memphis, TN, USA Slim Haddad Université Laval, Québec, Canada Katrin Hänsel Cornell University, New York, NY, USA; Zucker Hillside Hospital/Feinstein Institutes for Medical Research, Northwell Health, Glen Oaks, NY, USA Brad Holschuh University of Minnesota, Minneapolis, MN, USA Chris Huang Island Trees HS, Levittown, NY, USA

Contributors

xiii

Zhiheng Huang Amazon Web Services AI, Seattle, USA Tobias Huber Human Centered Multimedia, Augsburg University, Augsburg, Germany Georgiana Ifrim Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland Arun Iyengar IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA Esther Jack University of Western Australia, Crawley, WA, Australia Negin Karisani Purdue University, West Lafayette, USA Taha Kass-Hout Amazon Web Services AI, Seattle, USA Ibrahim Karabayir Loyola University Chicago, Maywood, IL, USA; Wake Forest School of Medicine, Winston-Salem, NC, USA Shafin Khan Troy Athens HS, Troy, Michigan, CA, USA Bogyeom Kim Department of Psychology, College of Social Sciences, Seoul National University, Seoul, South Korea Hyeonjin Kim Department of Psychology, College of Social Sciences, Seoul National University, Seoul, South Korea Jae-Won Kim Division of Child and Adolescent Psychiatry, Seoul National University Hospital, Seoul, South Korea Ka-kyeong Kim Department of Brain and Cognitive Sciences, College of Natural Sciences, Seoul National University, Seoul, South Korea Shirley Sanjay Kokane Institute of Chemical Technology, Mumbai, India Avinash Kori Indian Institute of Technology Madras, Chennai, India Michael Kotlyar University of Minnesota, Minneapolis, MN, USA Rony Krell Cornell University, New York, NY, USA Ganapathy Krishnamurthi Indian Institute of Technology Madras, Chennai, India Estie Kruger University of Western Australia, Crawley, WA, Australia Nicholas Lane University of Cambridge, Samsung AI Center, Cambridge, UK Alexander Lavin Latent Sciences, Cambridge, MA, USA Scott Leatherdale University of Waterloo, Waterloo, Canada Binh Thanh Le Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland

xiv

Contributors

Eunji Lee Department of Psychology, College of Social Sciences, Seoul National University, Seoul, South Korea Edoardo Lenzi University of Udine, Udine, UD, Italy Lan Liu Amazon Web Services AI, Seattle, USA Pietro Liò University of Cambridge, Cambridge, UK Xiaofei Ma Amazon Web Services AI, Seattle, USA Nidhi Malhotra Indian Institute of Technology, BHU, Varanasi, India Lara Marcuse Mount Sinai Hospital New York, NY, USA Martin Michalowski University of Minnesota, Minneapolis, MN, USA Martin Michalowski School of Nursing, University of Minnesota, Minneapolis, MN, USA Mona Mona Amazon Web Services AI, Seattle, USA Ramesh Nallapati Amazon Web Services AI, Seattle, USA Parth Natekar Indian Institute of Technology Madras, Chennai, India Victoria Ng Public Health Agency of Canada, Ottawa, Canada Duc Nguyen Applied AI Institute, Waurn Ponds, VIC, Australia Phuoc Nguyen Applied AI Institute, Waurn Ponds, VIC, Australia Thach Le Nguyen Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland Jingcheng Niu University of Toronto, Toronto, Canada Jennifer O’Loughlin Université de Montréal, Montréal, Canada Martin O’Reilly Output Sports Limited, NovaUCD, Dublin, Ireland Serguei Pakhomov University of Minnesota, Minneapolis, MN, USA Laxmi Parida IBM T.J. Watson Research Center, New York, USA Daniele Passabì University of Udine, Udine, UD, Italy Daniel E. Platt IBM T.J. Watson Research Center, New York, USA Beatrice Portelli University of Udine, Udine, UD, Italy Nima Pourdamghani Amazon Web Services AI, Seattle, USA Pooja Prajod Human Centered Multimedia, Augsburg University, Augsburg, Germany Viktor K. Prasanna University of Southern California, Los Angeles, USA

Contributors

xv

George Price Amazon Web Services AI, Seattle, USA Faraz Rahman DCDS, Beverly Hills, MI, USA Shyam Ramaswamy Amazon Web Services AI, Seattle, USA Erin Rees Public Health Agency of Canada, Ottawa, Canada Emma Rocheteau University of Cambridge, Cambridge, UK Pratyush Kumar Sahoo Indian Institute of Technology, BHU, Varanasi, India Reza Samavi Department of Electrical, Computer, and Biomedical Engineering, Ryerson University, Toronto, Canada; Vector Institute, Toronto, Canada Enrico Santus CSAIL MIT, Cambridge, MA, USA Sushant Sawant Indian Institute of Technology, Bombay, India Dominik Schiller Human Centered Multimedia, Augsburg University, Augsburg, Germany Giuseppe Serra University of Udine, Udine, UD, Italy Arash Shaban-Nejad Center for Biomedical Informatics, Department of Pediatrics, College of Medicine, The University of Tennessee Health Science Center— Oak-Ridge National Lab (UTHSC-ORNL), Memphis, TN, USA Ashish Singh Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland Ben Snively Amazon Web Services AI, Seattle, USA Michael Sobolev Cornell University, New York, NY, USA; Zucker Hillside Hospital/Feinstein Institutes for Medical Research, Glen Oaks, NY, USA Balaji Srinivasan Indian Institute of Technology Madras, Chennai, India Ajitesh Srivastava University of Southern California, Los Angeles, USA Biplav Srivastava AI Institute, University of South Carolina, Columbia, USA Hui Su Rensselaer Polytechnic Institute, Troy, NY, USA; IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA Saahil Sundaresan Mountain View HS, Mountain View, CA, USA Marie-Pierre Sylvestre Université de Montréal, Montréal, Canada Sunny X. Tang Zucker Hillside Hospital/Feinstein Institutes for Medical Research, Glen Oaks, NY, USA Wenqing Tang Cornell University, New York, NY, USA

xvi

Contributors

Zaid Bin Tariq Rensselaer Polytechnic Institute, Troy, NY, USA Marc Tennant University of Western Australia, Crawley, WA, Australia Harsh Narayan Tiwari Indian Institute of Technology, BHU, Varanasi, India Catherine Tong University of Oxford, Oxford, UK Truyen Tran Applied AI Institute, Waurn Ponds, VIC, Australia Petar Veliˇckovi´c University of Cambridge, Cambridge, UK Darragh Whelan Output Sports Limited, NovaUCD, Dublin, Ireland Colby Wise Amazon Web Services AI, Seattle, USA Bing Xiang Amazon Web Services AI, Seattle, USA Tianjian Xu University of Southern California, Los Angeles, USA Bulent Yener Rensselaer Polytechnic Institute, Troy, NY, USA

Abbreviations

AA ABCD ABS ACM ACS AD AED ASAP AU AUC-ROC AWS BAS BIS CA CAM CDC CDT CI CKG CM CN CNN COVID-19 CS DAG DKL DR DSM-IV DSS ED EDA

Alcohol Anonymous Adolescent Brain Cognitive Development Australian Bureau of Statistics Amazon Comprehend Medical AWS CORD-19 Search Alzheimer’s Dementia Anti-Epileptic Drug As Safe As Possible Action Unit Area Under the ROC Curve Amazon Web Services Behavioral Activation System Behavioral Inhibition System Conversational Agent Concept Attention Map Centers for Disease Control and Prevention Cohen’s Distance Test Confidence Interval COVID-19 Knowledge Graph Comprehend Medical Cognitively Normal Convolutional Neural Network Coronavirus Disease 2019 Criticality Score Direct Acyclic Graph Deep Kernel Learning Document Ranking Diagnostic and Statistical Manual of Mental Disorder IV Decision Support System Emergency Department Electrodermal Activity xvii

xviii

EEG EHR EPI ESI FAQM FCN GAIN GAP GNN GP HAR HR HRV ICC ICD10 ICD-9 ICU JM KG KQ KSADS LDA LIME LIWC LMICs LOS LRP LSTM MAE MAR MCAR MCI MI MIDAs ML MNAR MP MPNN MSE MTSC NHAMCS-ED NLP NLQ NSCA ODE

Abbreviations

Electroencephalogram Electronic Health Record Epidemic Prediction Initiative Emergency Severity Index FAQ Matching Fully Convolutional Network Generative Adversarial Imputation Nets Global Average Pooling Graph Neural Network Gaussian Process Human Activity Recognition Heart Rate Heart Rate Variability Intra-Class Correlation International Classification of Disease, Tenth Revision International Classification of Disease, Ninth Revision Intensive Care Unit Joint Modeling Knowledge Graph Keyword-Based Query Kiddie Schedule for Affective Disorders and Schizophrenia Latent Dirichlet Allocation Local Interpretable Model-Agnostic Explanations Linguistic Inquiry and Word Count Low- and Middle-Income Countries Length of Stay Layer-Wise Relevance Propagation Long Short-Term Memory Mean Absolute Error Missing at Random Completely at Random Mild Cognitive Impairment Medical Imaging Multiple Imputation with Denoising Autoencoders Machine Learning Missing-not-at-Random Military Press Message Passing Neural Network Mean Squared Error Multivariate Time Series Classification National Hospital Ambulatory Medical Care Survey ED Natural Language Processing Natural Language Question National Strength and Conditioning Association Ordinary Differential Equation

Abbreviations

OUD PHQ-9 PMM PP-DKL PPL PR QA QT RAAS RMSE RNN SAD SANS SST SVM TDA TLC UTSC VAE VR WESAD WISQARS

xix

Opioid Use Disorder 9-Item Patient Health Questionnaire Predictive Mean Matching Probabilistic Programmed Deep Kernel Learning Probabilistic Programming Language Passage Ranking Question Answering Quantile Transform Renin–Aldosterone–Angiotensin System Root Mean Square Error Recurrent Neural Network Seasonal Affective Disorder Scale for the Assessment of Negative Symptoms Stop Signal Task Support Vector Machine Topological Data Analysis Thought, Language, and Communication Univariate Time Series Classification Variational Autoencoder Vietoris–Rips Wearable Stress and Affect Detection Web-Based Injury Statistics Query and Reporting System

Digital Technologies for Clinical, Public and Global Health Surveillance Arash Shaban-Nejad, Martin Michalowski, and Simone Bianco

Abstract Digital intelligent technologies are widely used to support the monitoring, detection, and prevention of diseases among individuals or communities. Artificial Intelligence offers a wide range of tools, methodologies, and techniques to collect, integrate, process, analyze and generate insights for improving care and conducting further exploratory and explanatory research. This introductory chapter first sets out the purpose of the book, which is to investigate the role of AI and digital technologies to improve personalized and population health, and then summarizes some of the recent developments in the field and sets up the stage for the rest of chapters in the book. Keywords Digital health · Smart health · Surveillance · Health intelligence

1 Introduction According to the World Health Organization [1], health surveillance is defined as “the continuous and systematic collection, orderly consolidation and evaluation of pertinent data with prompt dissemination of results to those who need to know, particularly those who are in a position to take action”. The life cycle of health or disease (chronic or infectious) surveillance is iterative by nature and consists of (i) observing and monitoring individuals or communities at different geographical A. Shaban-Nejad (B) Center for Biomedical Informatics, Department of Pediatrics, College of Medicine, The University of Tennessee Health Science Center—Oak-Ridge National Lab (UTHSC-ORNL), Memphis, TN, USA e-mail: [email protected] M. Michalowski School of Nursing, University of Minnesota, Minneapolis, MN, USA e-mail: [email protected] S. Bianco IBM Corporation, Almaden Research Center, San Jose, CA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Shaban-Nejad et al. (eds.), AI for Disease Surveillance and Pandemic Intelligence, Studies in Computational Intelligence 1013, https://doi.org/10.1007/978-3-030-93080-6_1

1

2

A. Shaban-Nejad et al.

resolutions, settings, and different periods; (ii) data collection, wrangling, integration, and analysis; (iii) using the insights gained through analytics to design and implement therapeutic or preventive interventions; (iv) collecting and processing data from individuals and communities to evaluate implemented interventions. There are different types of disease surveillance [2]. The surveillance can be active or passive. Active surveillance happens when the information is collected through regular contacts with health care providers, unlike passive surveillance, which collects data from the reports submitted from hospitals, clinics, public health offices, etc. [2] As one can imagine although active surveillance is more expensive and time-consuming, but it generates more accurate timely results. Based on different objectives, priorities and needs the surveillance activities can be in the form of “Sentinel Surveillance”, “Laboratory-Based Surveillance”, “Periodic Population-Based Surveys”, or “Integrated Disease Surveillance and Response” [2]. Artificial intelligence-based tools and methods [3] have great potential to assist through the entire surveillance life cycle. AI can improve the collection and management of large volumes of medical and health data [4] and facilitate data and knowledge integration [5, 6], and maintain interoperability between heterogeneous data sources [7, 8]. AI and machine learning methods are instrumental in advanced health data analytics in today’s medical and health applications. Moreover, intelligent tools and techniques are frequently used to simulate health and medical interventions and programs to evaluate their impact [9, 10] and provide explanations [11, 12] for their success and failures. In fact, the success of initiatives such as precision medicine, precision health, and precision population health [13, 14], are highly dependent on the effective use of advanced AI technologies such as machine learning, knowledge representations, natural language processing, image processing, and visual recognition, speech recognition, expert systems, robotics, and Internet of Things (IoT). AI solutions have offered significant improvement in clinical and hospital surveillance [15, 16], mental health surveillance [17, 18], and assisted in investigating health disparity and health equity [19–21] among different populations.

2 Artificial Intelligence-Based Tools and Methods Several AI-based tools and methods have been developed to improve our understanding of diseases and help to implement clinical and public health interventions for better care and management. Digital technology is now an integral part of personalized care management [22–24] and is instrumental to improve patients’ adherence to therapy. Increasingly, digital tools are used to deliver patient education materials [25]. Recent years have seen a large increase in the number of medical applications (apps) or e-learning systems to teach patients about diseases, drugs, medical tests, and treatments. For example, the MedlinePlus app developed by the National Library of Medicine provides information about diseases, conditions, and wellness. Like printed educational materials, the information provided by the app

Digital Technologies for Clinical, Public and Global Health …

3

is generic and standardized. Short Message Service (SMS), the most common form of technology-based intervention to deliver tailored content, is used for example, in asthma [26], diabetes [27], and cardiac care education [28]. Tailored health communication motivates patients to process health messages and improve their health behaviors [29]. A recent work [30] has focused on improving patients’ motivation and competence via a mobile application by combining digital tools with psychological theories like the trans-theoretical model of behavior change [31] and specific behavioral change techniques [32]. This work was further extended [33] to support personalized educational interventions to improve patient understanding of their care by delivering educational materials in the form of multi-modal courses, customized to a patient in a manner suitable for the outpatient setting. Inspired by Intelligent Tutoring Systems (ITSs) [34], it uses interactive sessions of dynamically generated question answering and content delivery to increase a patient’s level of understanding based on the VARK (Visual, Aural, Read/Write, Kinesthetic) presentation model [35] and Bloom’s taxonomy of educational objectives [36]. Evidence shows that tailoring communication to a patient’s level of comprehension of their health care is a means to impact adherence to prescribed treatment [37] and the combination of artificial intelligence techniques and psychological theories is a promising new area of research for improving patient outcomes. Some other examples of applications of AI and digital technologies in disease surveillance, clinical and biomedical informatics are the following. Jack et al. [38] conduct a cross-sectional population-level observational study to examine the interplay between web search behaviors and mental health problems and forecast suicide rates up to 3 years ahead with 83% accuracy. They also studied monthly variations, long-term trends, and the effect of seasonal affective disorder (SAD) by comparing data across both hemispheres. Krell et al. [39] leverage lexical and acoustic features as predictors of clinical ratings used to measure thought disorder and negative symptoms of schizophrenia. Lavin [40] presents a probabilistic programmed deep kernel learning approach to personalize, predictive modeling of neurodegenerative diseases (e.g. Alzheimer’s disease). Their Bayesian approach combines the flexibility of Gaussian processes with the structural power of neural networks to model biomarker progressions, without needing clinical labels for training. Ahn et al. [41] study brain functional substrates associated with the risk for youth suicidality using a large, multi-site, multi-ethnic, prospective developmental population study in the US. They employed interpretable deep neural networks over functional brain imaging, behavioral, and self-reported questionnaires. Sylvestre et al. [42] developed and validated a tool to identify youth at risk of initiating frequent (i.e., at least weekly) cannabis use in the next year. Most predictors selected into the tool pertained to substance use including the use of cigarettes, e-cigarettes, alcohol, and energy drinks mixed with alcohol, but not to mental or physical health. Andy [43] analyzes posts published in two active Reddit opioid use recovery forums and builds machine learning models to measure the extent of different types of self-disclosures (positive and negative) expressed in these forums and of social supports sought (emotional and informational). This paper concludes

4

A. Shaban-Nejad et al.

that positive self-disclosure in posts correlates with emotional support sought, while negative self-disclosure correlates with informational support sought. The ongoing COVID-19 pandemic has generated an extraordinary number of cross-disciplinary scientific work, ranging from genomic and molecular biology to bioinformatics, medical imaging, epidemiology, digital health, sociology, and psychology. Increasingly, solutions have appeared that employ data-driven and AI methods to coordinate data, predict the course of the disease, and increase the confidence of the stakeholders in allocating important resources to combat the disease [44]. Portelli et al. [45] propose a natural language processing technique to extract Adverse Drug Events from social media, blogs, and health forums. Gilbert et al. [46] present a method for classifying Tweets into narratives about COVID-19 symptoms to produce a dataset for downstream surveillance applications. Sundaresan et al. [47] present RRISK, a visual, and interactive application that allows users to view COVID19 risk assessments for restaurants in any area in the United States, search for the safest food establishments in their neighborhood, and find the best places to eat while still minimizing their risk of contracting COVID-19. Bhatia et al. [48] present AWS CORD-19 Search (ACS), a public, COVID-19 specific, neural search engine that is powered by several machine learning systems to support natural language-based searches to provide a scalable solution to COVID-19 researchers and policymakers in their search and discovery for answers to high priority scientific questions. Karisani et al. [49] propose a pipeline to assist health practitioners in analyzing clinical notes and revealing the pathways associated with COVID-19. Srivastava et al. [50] propose a prototype of the EpiBench platform consisting of community-driven benchmarks for AI/ML applied to COVID-19 forecasting (cases and death) to standardize the challenge with a uniform evaluation protocol. Epidemiology in particular has seen an increase in the use of data-driven tools for epidemic forecast, fueled by the freely available corpus of disease incidence data present on various platforms [51–53]. The US Center for Disease Control has established an online platform where modeling groups could provide their forecasts [54] and has used an extract of those predictions to produce its policies. Epidemiologists generally employ two large classes of models: Data-driven statistical models, which use sometimes complex statistical estimators to provide short and long-term forecasts of disease incidence; and compartmental models, which divide the population into compartments according to their disease state. The use of AI/ML tools in both types of work has in the past been limited to parameter estimation. The COVID pandemic has brought renewed interest in the use of AI methods for epidemic forecasting [55]. Specifically, of high importance is the bridge between AI and classical epidemiological theory. Epidemic modeling software frameworks like GLEAM [56] and STEM [57] have been augmented by AI methods to produce predictions, infer relevant parameters, and obtain important information about the biology and epidemiology of the disease [58, 59]. Singh et al. [60] present an approach for the classification and interpretation of human motion from video data to assist physiotherapists, coaches, and rehabilitation patients by providing feedback after the execution of physical exercises. Their proposed approach was evaluated using a real-world CrossFit Workout Activities

Digital Technologies for Clinical, Public and Global Health …

5

dataset. Kori et al. [61] attempt to understand the behavior of trained models that perform image processing tasks in the medical domain by building a graphical representation of the concepts they learn. Prajod et al. [62] present a process to investigate the effects of transfer learning for automatic facial expression recognition from emotions to pain. Sahoo et al. [63] investigate the use of machine learning algorithms to predict patient outcomes in the Emergency Department. Datar et al. [64] examine the effects of a Conversational Agent (CA) interaction on naive users’ physiological markers of stress i.e. heart rate (HR) and electrodermal activity (EDA). Nguyen et al. [65] propose a deep recurrent system that disentangles the observed measurement processes from latent health processes. The proposed model is validated over two public datasets, PhysioNet 2012 and MIMIC-III. Tong et al. [66] propose LSTM-GNN for patient outcome prediction tasks: a hybrid model combining Long Short-Term Memory networks (LSTMs) for extracting temporal features and Graph Neural Networks (GNNs) for extracting the patient neighborhood information. Bin Tariq et al. [67] propose a Siamese neural network-based seizure prediction method that takes a wavelet transformed EEG tensor as an input with a convolutional neural network (CNN) as the base network for detecting change-points in EEG. Compared to the solutions in the literature, which utilize days of Electroencephalogram (EEG) recordings. Boursalie et al. [68] demonstrate the limitations of Root Mean Square Error (RMSE), which is used for evaluating the performance of a deep learning-based imputation model, by conducting a comparative analysis between RMSE and alternative metrics in the statistical literature including qualitative, predictive accuracy, and statistical distance. Butler et al. [69] assess several scenarios mimicking real-world problems to evaluate Logistic Regression in the identification of true model covariates, accurate estimation of regression coefficients, and classification accuracy with comparisons to a commonly used machine learning algorithm, Extreme Gradient Boosting (XGBoost).

References 1. World Health Organization.: Public Health Surveillance. Retrieved on 5 Oct 2022. http://www. emro.who.int/health-topics/public-health-surveillance/index.html 2. Nsubuga, P., White, M.E., Thacker, S.B., et al.: Public health surveillance: A tool for targeting and monitoring interventions. In: Jamison, D.T., Breman, J.G., Measham, A.R., et al. (eds.) Disease Control Priorities in Developing Countries. 2nd edn. The International Bank for Reconstruction and Development/The World Bank, Washington (DC). Chapter 53. Available from: https://www.ncbi.nlm.nih.gov/books/NBK11770/ Co-published by Oxford University Press, New York 3. Shaban-Nejad, A., Michalowski, M., Buckeridge, D.L.: Health intelligence: How artificial intelligence transforms population and personalized health. npj Digit. Med. 1, 53 4. Lovis, C.: Unlocking the power of artificial intelligence and big data in medicine. J. Med. Int. Res. 21(11), e16607 5. Shaban-Nejad, A., Lavigne, M., Okhmatovskaia, A., Buckeridge, D.L.: PopHR: a knowledgebased platform to support integration, analysis, and visualization of population health data. Ann. N Y Acad. Sci. 1387(1), 44–53 (2017)

6

A. Shaban-Nejad et al.

6. Brakefield, W.S., Ammar, N., Olusanya, O.A., Shaban-Nejad, A.: An urban population health observatory system to support COVID-19 pandemic preparedness, response, and management: Design and development study. JMIR Pub. Health Surveill. 7(6), e28269. https://doi.org/10. 2196/28269 7. Brenas, J.H., Al Manir, M.S., Baker, C.J.O., Shaban-Nejad, A.: A Malaria analytics framework to support evolution and interoperability of global health surveillance systems. IEEE Access 5, 21605–21619 (2017) 8. Al Manir, M.S, Brenas, J.H., Baker, C.J., Shaban-Nejad, A. (2018) A surveillance Infrastructure for malaria analytics: Provisioning data access and preservation of interoperability MIR Pub. Health Surveill. 4(2), e10218, 15 Jun 2018. https://doi.org/10.2196/10218 9. Brenas, J.H., Shaban-Nejad, A.: Health intervention evaluation using semantic explainability and causal reasoning. IEEE Access 8, 9942–9952 (2020) 10. Shaban-Nejad, A., Okhmatovskaia, A., Shin, E.K., Davis, R.L., Franklin, B.E., Buckeridge, D.L.: A semantic framework for logical cross-validation, evaluation and impact analyses of population health interventions. Stud. Health Technol. Inform. 235, 481–485 (2017) 11. Shaban-Nejad, A., Michalowski, M., Brownstein, J.S., Buckeridge, D.L.: Guest editorial explainable AI: Towards fairness, accountability, transparency and trust in healthcare. IEEE J. Biomed. Health Inform. 25(7), 2374–2375 (2021) 12. Shaban-Nejad, A., Michalowski, M., Buckeridge, D.L.: Explainability and interpretability: Keys to deep medicine. In: Shaban-Nejad, A., Michalowski, M., Buckeridge, D.L. (eds.) Explainable AI in Healthcare and Medicine. Studies in Computational Intelligence, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-030-53352-6_1 13. Shaban-Nejad, A., Michalowski, M.: Precision health and medicine—A digital revolution in healthcare. In: Studies in Computational Intelligence 843. Springer, Berlin (2020), ISBN 9783-030-24408-8 14. Shaban-Nejad, A., Michalowski, M., Peek, N., Brownstein, J.S., Buckeridge, D.L.: Seven pillars of precision digital health and medicine. Artif. Intell. Medicine 103, 101793 (2020) 15. Shaban-Nejad, A., Mamiya, H., Riazanov, A., Forster, A.J., Baker, C.J., Tamblyn, R., Buckeridge, D.L.: From cues to nudge: A knowledge-based framework for surveillance of healthcare-associated infections. J. Med. Syst. 40(1), 23 (2016). PMID: 26537131 16. Alghatani, K., Ammar, N., Rezgui, A., Shaban-Nejad, A.: Predicting intensive care unit length of stay and mortality using patient vital signs: Machine learning model development and validation. JMIR Med. Inform. 9(5), e21347 (2021). https://doi.org/10.2196/21347 17. Brenas, J.H., Shin, E.K., Shaban-Nejad, A.: Adverse childhood experiences ontology for mental health surveillance, research, and evaluation: Advanced knowledge representation and semantic web techniques. JMIR Ment Health 6(5), e13498 (2019) 18. Ammar, N., Shaban-Nejad, A.: Explainable artificial intelligence recommendation system by leveraging the semantics of adverse childhood experiences: proof-of-concept prototype development. JMIR Med. Inform. 8(11), e18752 (2020) 19. Shin, E.K., Kwon, Y., Shaban-Nejad, A.: Geo-clustered chronic affinity: pathways from socioeconomic disadvantages to health disparities. JAMIA Open. 2(3), 317–322 (2019) 20. Shin, E.K., Mahajan, R., Akbilgic, O., Shaban-Nejad, A.: Sociomarkers and biomarkers: predictive modeling in identifying pediatric asthma patients at risk of hospital revisits. NPJ Digit. Med. 2(1), 50 (2018) 21. Chen, I.Y., Joshi, S., Ghassemi, M.: Treating health disparities with artificial intelligence. Nat. Med. 26, 16–17 (2020). https://doi.org/10.1038/s41591-019-0649-2 22. Ammar, N., Bailey, J.E., Davis, R.L., Shaban-Nejad, A.: Using a personal health libraryenabled mhealth recommender system for self-management of diabetes among underserved populations: Use case for knowledge graphs and linked data. JMIR Form Res. 5(3):e24738, (2021). https://doi.org/10.2196/24738 23. Ammar, N., Bailey, J.E., Davis, R.L., Shaban-Nejad, A.: The personal health library: A single point of secure access to patient digital health information. Stud. Health Technol. Inform. 16(270), 448–452. (2020). https://doi.org/10.3233/SHTI200200

Digital Technologies for Clinical, Public and Global Health …

7

24. Olusanya, O.A., Ammar, N., Davis, R.L., Bednarczyk, R.A., Shaban-Nejad, A.: A digital personal health library for enabling precision health promotion to prevent human papilloma virus-associated cancers. Front. Digit. Health, 21 July 2021. https://doi.org/10.3389/fdgth. 2021.683161 25. Hamine, S., Gerth-Guyette, E., Faulx, D., Green, B.B., Ginsburg, A.S.: Impact of MHealth chronic disease management on treatment adherence and patient outcomes: A systematic review. J. Med. Internet Res. 17, e52. https://doi.org/10.2196/jmir.3951 26. Strandbygaard, U., Thomsen, S.F., Backer, V.: A daily SMS reminder increases adherence to asthma treatment: A three-month follow-up study. Respir. Med. 104, 166–171 (2010). https:// doi.org/10.1016/J.RMED.2009.10.003 27. Quinn, C.C., Shardell, M.D., Terrin, M.L., Barr, E.A., Ballew, S.H., Gruber-Baldini, A.L.: Cluster-randomized trial of a mobile phone personalized behavioral intervention for blood glucose control. Diabetes Care 34, 1934–1942 (2011). https://doi.org/10.2337/dc11-0366 28. Khonsari, S., Subramanian, P., Chinna, K., Latif, L.A., Ling, L.W., Gholami, O.: Effect of a reminder system using an automated short message service on medication adherence following acute coronary syndrome. Eur. J. Cardiovasc. Nurs. 14, 170–179 (2015). https://doi.org/10. 1177/1474515114521910 29. Hawkins, R.P., Kreuter, M., Resnicow, K., Fishbein, M., Dijkstra, A.: Understanding tailoring in communicating about health. Health Educ. Res. 23, 454–466 (2008). https://doi.org/10.1093/ her/cyn004 30. Peleg, M., Michalowski, W., Wilk, S., Parimbelli, E., Bonaccio, S., O’Sullivan, D., Michalowski, M., Quaglini, S., Carrier, M.: Ideating mobile health behavioral support for compliance to therapy for patients with chronic disease: A case study of atrial fibrillation management. J. Med. Syst. 42 (2018). https://doi.org/10.1007/s10916-018-1077-4 31. Norcross, J.C., Krebs, P.M., Prochaska, J.O.: Stages of change. J. Clin. Psychol. 67, 143–154 (2011). https://doi.org/10.1002/jclp.20758 32. Abraham, C., Michie, S.: A taxonomy of behavior change techniques used in interventions. Heal. Psychol. 27, 379–387 (2008). https://doi.org/10.1037/0278-6133.27.3.379 33. Michalowski, M., Wilk, S., Michalowski, W., O’Sullivan, D., Bonaccio, S., Parimbelli, E., Carrier, M., Le Gal, G., Kingwell, S., Peleg, M.: A health eLearning ontology and procedural reasoning approach for developing personalized courses to teach patients about their medical condition and treatment. Int. J. Environ. Res. Pub. Health. 18(14), 7355 (2021). https://doi.org/ 10.3390/ijerph18147355 34. Sedlmeier, P.: Intelligent tutoring systems. Int. Encycl. Soc. Behav. Sci. 7674–7678 (2001). https://doi.org/10.1016/B0-08-043076-7/01618-1 35. Fleming, N.D., Mills, C.: Not another inventory, rather a catalyst for reflection. To Improv. Acad. 11, 137–155 (1992). https://doi.org/10.1002/j.2334-4822.1992.tb00213.x 36. Bloom, B.S., Benjamin S.: Taxonomy of educational objectives; the classification of educational goals. Longmans, Green (1956) ISBN 0679302093 37. Schapira, M.M., Swartz, S., Ganschow, P.S., Jacobs, E.A., Neuner, J.M., Walker, C.M., Fletcher, K.E.: Tailoring educational and behavioral interventions to level of health literacy: A systematic review. MDM Policy Pract. 2, 238146831771447 (2017). https://doi.org/10.1177/238146831 7714474 38. Jack, E., Kruger, E., Tennant, M.: Imputing fine-grain patterns of mental health with statistical modelling of online data. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 39. Krell, R., Tang, W., Hänsel, K., Sobolev, M., Cho, S., Berretta, S., Tang, S.X.: Lexical and acoustic correlates of clinical speech disturbance in schizophrenia. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 40. Lavin, A.: Neuro-symbolic neurodegenerative disease modeling as probabilistic programmed deep kernels. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021)

8

A. Shaban-Nejad et al.

41. Ahn, G., Kim, B., Kim, K.K., Kim, H., Lee, E., Ahn, W.Y., Kim, J.W., Cha, J.: Identifying prepubertal children with risk for suicide using deep neural network trained on multimodal brain imaging. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 42. Sylvestre, M.P., de Montigny, S., Boulanger, L., Goulet, D., Doré, I., O’Loughlin, J., Haddad, S., Bélanger, R.S., Leatherdale, S.: A prognostic tool to identify youth at risk of at least weekly cannabis use. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 43. Andy, A.: Self-Disclosure in opioid use recovery forums. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 44. Vaishya, R., Javaid, M., Khan, I.H., Haleem, A.: Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes Metab. Syndr. 14(4), 337–339 (2020) 45. Portelli, B., Passabì, D., Lenzi, E., Serra, G., Santus, E., Chersoni, E.: Improving adverse drug event extraction with SpanBERT on different text typologies. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 46. Gilbert, J.P., de Montigny, S., Niu, J., Ng, V., Rees, E.: Machine learning identification of self-reported COVID-19 symptoms from Tweets in Canada. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 47. Sundaresan, S., Rahman, F., Khan, S., Huang, C.: RRISK: Analyzing COVID-19 risk in food establishments. In: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 48. Bhatia, P., Liu, L., Arumae, K., Pourdamghani, N., Deshpande, S., Snively, B., Mona, M., Wise, C., Price, G. Ramaswamy, S., Ma, X., Nallapati, R., Huang, Z., Xiang, B., Kass-Hout, T.: AWS CORD-19 search: A neural search engine for COVID-19 literature. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 49. Karisani, N., Platt, D.E., Basu, S., Parida, L.: Inferring COVID-19 biological pathways from clinical phenotypes via topological analysis. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 50. Srivastava, A., Xu, T., Prasanna, V.K.: The EpiBench platform to propel AI/ML-based epidemic forecasting: A prototype demonstration reaching human expert-level performance. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) 51. The New York Times COVID-19 Tracker: https://www.nytimes.com/interactive/2021/us/ covid-cases-deaths-tracker.html 52. The John’s Hopkins University COVID-19 map: https://coronavirus.jhu.edu/map.html 53. USA FACT, COVID-19: https://usafacts.org/visualizations/coronavirus-covid-19-spread-map 54. CDC, August 7, 2020. COVID-19 Mathematical Modeling: https://www.cdc.gov/coronavirus/ 2019-ncov/science/forecasting/mathematical-modeling.html 55. Bullock, J., Luccioni, A., Pham, K.H., Lam, C.S.N., Luengo-Oroz, M.: Mapping the landscape of artificial intelligence applications against COVID-19. J. Artif. Intell. Res. 69, 807–845 (2020) 56. Balcan, D., Gonçalves, B., Hu, H., Ramasco, J.J., Colizza, V., Vespignani, A.: Modeling the spatial spread of infectious diseases: The GLobal epidemic and Mobility computational model. J. Comput. Sci. 1(3), 132–145 (2010) 57. Douglas, J.V., Bianco, S., Edlund, S., Engelhardt, T., Filter, M., Günther, T., Hu, M.H., Nixon, E.J., Sevilla, N., Swaid, A., Kaufman, J.H.: STEM: An open source tool for disease modeling. Health Secur. 17(4), 291–306 (2019) 58. Kraemer, M.U., Yang, C.H., Gutierrez, B., Wu, C.H., Klein, B., Pigott, D.M., Open COVID19 Data Working Group, du Plessis, L., Faria, N.R., Li, R., Hanage, W.P., Brownstein, J.S., Layan, M., Vespignani, A., Tian, H., Dye, C., Pybus, O.G., Scarpino, S.V. (2020). The effect of

Digital Technologies for Clinical, Public and Global Health …

59.

60.

61.

62.

63.

64.

65.

66.

67.

68.

69.

9

human mobility and control measures on the COVID-19 epidemic in China. Science 368(6490), 493–497 Gopalakrishnan, V., Pethe, S., Kefayati, S., Srinivasan, R., Hake, P., Deshpande, A., Liu, X., Hoang, E., Davila, M., Bianco, S., Kaufman, J. H. (2021). Globally local: Hyper-local modeling for accurate forecast of COVID-19. Epidemics 100510 Singh, A., Le, B.T., Nguyen, T.L., Whelan, D., O’Reilly, M., Caulfield, B., Ifrim, G.: Interpretable Classification of Human Exercise Videos through Pose Estimation and Multivariate Time Series Analysis AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) Kori, A., Natekar, N., Krishnamurthi, G., Srinivasan, B.: Interpreting deep neural networks for medical imaging using concept graphs. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) Prajod, P., Schiller, D., Huber, T., André, E.: Do deep neural networks forget facial action units?—Exploring the Effects of Transfer Learning in Health Related Facial Expression Recognition Sahoo, P.K., Malhotra, N., Kokane, S.S., Srivastava, Tiwari, B.H.N., Sawant S.: Utilizing predictive analysis to aid emergency medical services. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) Datar, S., Ferland, L., Foo, E., Kotlyar, M., Holschuh, B., Gini, M., Michalowski, M., Pakhomov, S.: Measuring physiological markers of stress during conversational agent interactions. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) Nguyen, D., Nguyen, P., Tran, T.: EvSys: A relational dynamic system for sparse irregular clinical events. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) Tong, C., Rocheteau, E., Velickovic, P., Lane, N., Lio, P.: Predicting patient outcomes with graph representation learning. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) Bin Tariq, Z., Iyengar, A., Marcuse, L., Su, H., Yener, B.: Patient-specific seizure prediction using single seizure electroencephalography recording. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) Boursalie, O., Samavi, R., Doyle, T.E.: Evaluation metrics for deep learning imputation models. In: AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021) Butler, L., Gunturkun, F., Karabayir, I., Akbilgic, O.: Logistic regression is also a black box. In: Machine Learning Can Help. AI for Disease Surveillance and Pandemic Intelligence: Intelligent Disease Detection in Action. Studies in Computational Intelligence. Springer, Berlin (2021)

Imputing Fine-Grain Patterns of Mental Health with Statistical Modelling of Online Data Esther Jack, Estie Kruger, and Marc Tennant

Abstract Depression and suicide are large and acute global mental health problems. Timely, granular, and accurate estimates of the burden are critical for effective policy adjustments and interventions. In this cross-sectional population-level observational study, we examine the interplay between web search behaviors and these growing mental health problems. We map symptoms of depression—a known risk factor for suicide—as described in the PHQ-9 rubric to web search concepts as a noisy proxy indicating the person doing the search may be affected. We then perform statistical analyses to find the most salient web signals that explain annual suicide data. From this, we impute suicide incidence rates at previously unavailable monthly resolution, which allows us to extract and study seasonal patterns and predict future suicide incidences at this increased granularity and timeliness. We shed additional light on monthly variations, long-term trends, and the effect of seasonal affective disorder (SAD) by comparing data across both hemispheres. Finally, in addition to explainability of the model, we demonstrate its high predictive power—forecasting suicide rates up to 3 years ahead with 83% accuracy. This work is a complement to traditional techniques for more effective public health interventions, novel means of suicide prevention and awareness, and timely diagnosis and treatment of depression. Keywords Depression · Suicide · Statistical modelling · Epidemic intelligence · Predictive modelling

E. Jack (B) · E. Kruger · M. Tennant University of Western Australia, 35 Stirling Hwy, Crawley, WA 6009, Australia e-mail: [email protected] E. Kruger e-mail: [email protected] M. Tennant e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Shaban-Nejad et al. (eds.), AI for Disease Surveillance and Pandemic Intelligence, Studies in Computational Intelligence 1013, https://doi.org/10.1007/978-3-030-93080-6_2

11

12

E. Jack et al.

1 Introduction1 Suicide is a public health issue of high and increasing severity. For example, in the USA, suicide rates have risen annually in past decade to consistently rank in the top ten of leading causes of death in young people [1, 2]. In the United Kingdom (UK), close to 6000 suicides were registered in 2017. This accounts for 10.1 deaths per 100,000 population [3]. The UK government recognizes the severity of this crisis and has even elected Ministers for Loneliness and Suicide [4] as a stepping stone in addressing it. In Australia, deaths from intentional self-harm was the tenth leading cause of death amongst males in 2017 [5]. Recently, the mental health implications of the coronavirus disease 2019 (COVID-19) pandemic is only beginning to be understood with a reported increase in depressive states and likely, suicides [6]. Google is the world’s leading search engine with a market-share of an estimated 90% on mobile devices [7]. Google Trends, a publicly-accessible database, can be used to study the relative popularity of search terms and topics historically and up to real-time, at various geospatial resolutions worldwide. All data available from Google Trends is de-identified allowing for large-scale population surveys without infringing upon individual privacy. This provides a large-scale view of the search terms on the Internet according to location, for example in studies looking at influenza, food poisoning, and dengue [8–10]. Previously, Google Trends had been used to predict national suicide rates using queries related to the term ‘suicide’, but none to our knowledge that specifically looked at the risk factors pertaining suicide, such as mood disorders [11]. In this paper, we focus on web search queries that are reflective of depressive states in the USA and Australia, representing two large countries in the north and south hemispheres, respectively, with English as their primary language. We cover over 10 years starting on January 1 2008. The search terms are mapped to the 9-item Patient Health Questionnaire [12] (PHQ-9) for depression, a validated tool frequently used by clinicians such as psychiatrists, general practitioners and mental health workers. Ground truth data for suicide incidence is available online from the Centers for Disease Control and Prevention (CDC) in the USA and the Australian Bureau of Statistics (ABS). This data has limited resolution in time (annual only), timeliness (a lag of over a year), and often in space (entire country) these hinder more effective and prompt public-health responses. Using statistical modelling and timely search data, here we lift all three limitations. Specifically, we impute suicide incidence at a monthly resolution and look for correlations particularly in terms of seasonality between the search queries, suicide incidences and with this new capability revisit previously studied factors such as the weather and world news.

1

Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Imputing Fine-Grain Patterns of Mental Health with Statistical …

13

2 Methods This is a cross-sectional, population-based observational study. The 9-item Patient Health Questionnaire (PHQ-9) [12] is a widely used self-reporting construct based on criteria listed in the Diagnostic and Statistical Manual of Mental Disorder IV (DSMIV) [13] to test for presence of depressive symptoms. Being relatively sensitive and highly specific, the PHQ-9 is used as a diagnostic tool in clinical practice and in assessing severity of diagnosis. We use items listed in the PHQ-9 as a guide to keywords searched for online: 1. 2. 3. 4. 5. 6. 7. 8. 9.

Anhedonia—“little/no interest”,”bored”,”feeling empty” Feeling down, depressed, or hopeless—“sad”,”low”,”depressed”,”hopeless” Insomnia/hypersomnia—“trouble sleeping”,”sleeping too much”,”how to fall asleep”,”insomnia”,”why can’t I sleep”,”why do I sleep too much” Fatigue—“tired”, “low energy” Anorexia/hyperphagia—“no appetite”, “eating too much” Guilt—“feeling bad about myself”, “why am I useless” Trouble concentrating—“can’t focus” Moving or speaking so slowly/Fidgeting—“why am I slow”, “why am I restless” Thoughts about being better off dead or hurting self in some way—“ how to end life”, “how to hurt self”

It must be stressed that the keywords pertaining to depression can be virtually limitless when considering the cultural diversity and backgrounds of Internet users. To mitigate this complexity, we used the PHQ-9 test as it offers a succinct representation of what notions people with depression in general are likely to exhibit in a clinical setting. Therefore they have an increased likelihood of searching the web for those concepts. While it cannot be asserted that each person searching for these concepts has depression, we find in the experiments below that at a population level there is a strong correspondence, which enables novel applications in mental health. The CDC Web-based Injury Statistics Query and Reporting System (WISQARS) reports the leading causes of death in the USA on an interactive, and publiclyaccessible platform [14]. Data are obtained from the National Center for Health Statistics (NCHS) of the Department of Health and Human Services. The causes of death are recorded according to the ICD-10 [15, 16] codes for self-inflicted injury, X60-X84, Y87, and U03. Suicide data is published annually by the Australian Bureau of Statistics [5] and includes ICD-10 codes X60-84 and Y87. We take these official datasets as ground truth. The Pearson correlations between the relative popularity of the search queries and the annual suicide rates for the most recent ten years available, for the USA and Australia is computed. Next, an elastic net regression model using Google Trends data aggregates is fitted to the ground truth data at the available country/annual resolution. Subsequently, the model is used to impute the most likely estimate for each month— in essence building a model on available coarse-grained official data and run in inference mode to impute fine-grained data points that are otherwise unavailable.

14

E. Jack et al.

The monthly signal is further decomposed into seasonal patterns, longitudinal trend, and residual signal. From this, we are able to see if particular search queries spike or decrease over certain periods during the year, driven by weather patterns, and if they are grossly increased due to unusual events, such as celebrity death. We emphasize that this analysis is only possible with this statistical model because the ground truth data is currently only available at an annual resolution.

2.1 Regression Analysis Regular linear regression models often fail to find dependable patterns because of high noise to signal ratio and many cross-correlated independent variables—in our study, more than 10 Google Trends signals, over a decade. Therefore, we apply elastic net regression [17, 18] whose objective function penalizes complex and likely overfitted models by giving preference to simpler models with sparser weights on independent variables. ⎞ ⎛ m m J ˆ 2 β) (y − x 1 − α i ˆ ⎠ i i=1 βˆ 2 + α + λ⎝ β j 2n 2 j=1 j j=1

n

(1)

Equation 1 defines the loss function of our elastic net model. Constant n is the number of samples, m is the number of input dimensions (independent variables), yi is the ground truth value (suicide rate provided by CDC and ABS), xi β is the predicted value of suicide rate by our model, hence the first term is the square of the prediction error. For an elastic net, hyperparameter α is set to 0.5 [17, 18]. The effect of this loss function is to push the model to use as few parameters as possible while still explaining the salient patterns. Therefore this can be viewed as an instance of Occam’s razor [19] embedded into our statistical model. Using standard methodology for training machine-learned models with hyperparameters, [17, 18] we apply leave-one-out validation to determine the optimal value of lambda, which trades off loss accumulated on individual data points versus penalty incurred from introducing large or too many weights in the model. We see that only a subset of the search concepts is salient to modelling suicide: Signal

Weight

Intercept

6.701962e-16

Bored

.

Cancer

.

Can’t focus

1.310726e-01

Chess strategies

.

Depressed

7.821943e-02 (continued)

Imputing Fine-Grain Patterns of Mental Health with Statistical …

15

(continued) Signal

Weight

Eating too much

.

Feeling down

5.263480e-02

Feeling empty

9.053828e-02

Feeling worthless

.

How to end life

1.028740e-01

Insomnia

.

Leg ulcer

.

No appetite

1.161891e-01

No energy

.

No interest

1.308742e-01

Sleeping too much

.

Tired

7.269638e-02

Trouble sleeping

.

We include three sample queries unrelated to the PHQ-9 ‘cancer’, ‘chess strategies’, and ‘leg ulcer’ as examples of unrelated concepts to test if our model can detect their irrelevance and set their weights to 0, which it does. These were included as an empirical test case and the main focus should be on the PHQ-9-related queries studied together, as a representation of depression incidence. Signals with weight denoted by “.” have been automatically regularized away by the parameter selection during model fitting.

3 Results 3.1 Web Search Queries as Predictors of Annual Suicide Rates We first turn our attention to the correlations between the PHQ-9 derived concepts search on the web and the actual ground truth suicide rates in the corresponding time periods. We see that concepts that raise the PHQ-9 score are typically positively correlated with suicide rates, and with a high confidence. Conversely, we observe that randomly chosen control terms are either anti-correlated (e.g., “chess strategies”) or weakly and/or insignificantly correlated (e.g., “cancer”). The exception is search for “leg ulcer”, which may be correlated via a latent factor of older age where incidence of both leg ulcers and suicide is increased. We see these patterns are consistent across the countries. The relation of search for “cancer” to suicide rate is particularly interesting and prompts further discussion. We see a weak correlation in the USA and moderate correlation in Australia. Although it is known that cancer patients have an

16

E. Jack et al.

increased risk of suicide [20] there are also many searches about cancer that are not linked to suicide, but to symptom and disease management. The higher correlation in Australia suggests a possible stronger link between cancer diagnosis and eventual suicide. Figure 1 shows the temporal relationship between these search concepts and the ground truth suicide rate in the US between 2008 and 2018 (most recent ground truth data available). In the Results section below, we will see that many of these correlations, and all unrelated concepts, are explained away by a small number of salient signals when regularized parameter selection is applied (Table 1).

Fig. 1 Ground truth suicide rate in USA (bold orange line ‘g’) and its statistical reconstruction using underlying PHQ-9-derived symptoms (individual colored lines). We see that a large number of symptoms (studied via a proxy of web search queries) are good predictors of the actual suicide rate. For example, a web search for ‘feeling empty’ correlates 0.99 (Spearman correlation) with official data g. For additional validation and context, we also see that unrelated queries such as ‘chess strategies’ has a very large anticorrelation (−0.93, p-value of 0.0003). Interestingly, ‘bored’ concept also has a significant anticorrelation (−0.82, p-value of 0.003). We discuss this in text below

Imputing Fine-Grain Patterns of Mental Health with Statistical … Table 1 Pearson correlations between the monthly relative popularity of search queries and annual suicide incidents in the USA and Australia, respectively

USA No interest Bored

0.65576, p = 0.005515 −0.85999, p = 0.00294

17 Australia 0.92904, p = 0.00010 −0.8236, p = 0.0034

Feeling empty

0.984581, p = 1.467–6

0.9212, p = 0.000153

feeling down

0.94036, p = 0.0001609

0.9246, p = 0.00013

Depressed

0.98318, p = 1.999–6

0.9077, p = 0.00028

Feeling hopeless

0.977343, p = 5.639–6

Insufficient data

Trouble sleeping

0.7353024, p = 0.02398

0.64977, p = 0.04199

Insomnia

0.9843048, p = 1.576–6

0.18592, p = 0.6071

Sleeping too much

0.7870307, p = 0.01183

0.7233, p = 0.01807

Tired

0.9838177, p = 1.747–6

0.92025, p = 0.00016

No energy

0.985865, p = 1.09–6

0.80068, p = 0.00055

No appetite

O.975703, p = 7.189–6

0.924973, p = 0.0001265

Eating too much

0.9541436, p = 6.5–5

0.8909, p = 0.8261

Feeling worthless

0.9514336, p = 7.925–5

0.74851, p = 0.01275

Can’t focus

0.987779, p = 6.565–7

0.8904, p = 0.0006

Why am I slow

0.91017, p = 6.565–7

Insufficient data

How to end life

0.977373, p = 5.612–6

0.8878, p = 0.000604

Leg ulcer

0.952402, p = 7.393–5

0.2102477, p = 0.5599

Chess strategies

−0.92897, p = 0.0002933

−0.797602, p = 0.005705

0.3736022, p = 0.322

0.6683316, p = 0.03464

Cancer

18

E. Jack et al.

3.2 Imputing Monthly Near-Real-Time Resolution from Annual Ground Truth Data Vast majority of ground truth mental health data (and in fact most epidemiological data) is available only with significant delays and at a limited resolution (countrywide, annual). Here we explore how the statistical model described in Methods can lift these limitations. We first impute monthly suicide incidence rate by learning a mapping between search signals and annual suicide statistics. We then leverage the higher resolution of the search data to infer—using search data as noisy evidence— the most likely suicide rate for each month shown in Fig. 2ab. Further analyses of the data to assess for the presence of seasonality within data from the USA and Australia is shown in Figs. 3 and 4. We take the suicide rate data at the new monthly granularity imputed by our model shown as top panel in Figs. 3 and 4, and decompose it into seasonal patterns (second panel), long-term trends (third), and residual fluctuations (fourth and final). Figure 5 shows the differences between Northern and Southern hemispheres side by side. We see a stable periodic pattern over the 10 years and a clear inversion where in the USA, peaks of suicide occur in April and January; in Australia two peaks are seen annually in September and April.

3.3 Prediction Since ground truth data is often unavailable for the most recent years, it is important to be able to provide more timely estimates of mental health indicators. To evaluate the predictive power of our approach, we fit model parameters on the first 2/3 of the ground truth available (6.7 years’ worth of data) and then predict the following withheld 1/3 of the time period (the remaining 3.3 years). We find an accuracy of 83.3% and correlation of 93.12% on the withheld data. Using only the most significant search term (“depressed”), the predictive accuracy is 60.4% and correlation of 86.6%. This indicates the model does capture salient features of search behavior related to suicide and is able to accurately predict over 3 years into the future.

4 Discussion Internet search queries are a useful proxy to represent human behavior. Previous studies have used web signals in studying human affect and high-risk behavior [23, 24]. We use Google as the main repository for our search queries as it is the most widely used Internet search engine in the world. With billions of searches per day, Google Trends provides a sample of users’ web behavior. The dataset is normalized

Imputing Fine-Grain Patterns of Mental Health with Statistical …

19

Months (January 2008 - January 2018) (a) Imputed monthly suicide rate in USA, ranging from January 2008 to January 2018 (horizontal axis). Note that currently available ground truth data for suicide would only have a total of 9 data points on this graph summarizing the 10 available years ending in 2018. In contrast, in the chart above we show 132 data points on key mental health metrics that were previously inaccessible.

Months (January 2008 - January 2018) (b) Imputed suicide rate, analogous to Fig. 2(a), in Australia as a function of month. Fig. 2 a Imputed monthly suicide rate in USA, ranging from January 2008 to January 2018 (horizontal axis). Note that currently available ground truth data for suicide would only have a total of 9 data points on this graph summarizing the 10 available years ending in 2018. In contrast, in the chart above we show 132 data points on key mental health metrics that were previously inaccessible. b Imputed suicide rate, analogous to Fig. 2a, in Australia as a function of month

by including the total searches of the geography and time range it represents to achieve relative popularity, which addresses the population size within a region [25]. We apply our methodology in the USA and Australia as they have cultural similarities—speaking the same language, and practicing similar activities of daily living despite having differing historical progression, making the PHQ-9 rubric applicable. Yet they are on opposite hemispheres which together with the novel monthly data granularity introduced in this work allows us to study the effect of weather patterns

20

E. Jack et al.

Fig. 3 Decomposition of USA data: Top subplot shows the monthly estimates of suicide incidences imputed from depression-related internet queries, the second plot shows the inferred seasonal component, the third the longitudinal trend, and the bottom subplot shows the residual. Thereby the monthly estimates are automatically decomposed into a seasonal pattern, overall increase of suicide over time, and the residual. With statistical modelling, we confirm previous findings in the literature that suicide rates are highest in springtime (April) and the manifestation of seasonal affective disorder (SAD) [21]. Since the USA is mostly in the northern hemisphere (with the exception of Hawaii), the peaks and troughs fall on winter and summer months, respectively, on that hemisphere. As we will see in Fig. 4, these patterns are inverse on the southern hemisphere—showing a link with the environment and mechanisms of mental health epidemics. The highest residual spike seen in August 2014 may be associated with the passing of a well-known celebrity (see Discussion)

on mental health. We acknowledge that while there is no direct link between PHQ-9 and suicide, instead we infer an association by statistical modelling of the risk factors of depression, a well-known cause of suicide. To our knowledge, currently there is no available data for suicide at a monthly level in both the USA and Australia, American state-level data is only available up to year 2018 [14]. We acknowledge Internet data is often noisy and that the semantic context of search terms may be missing. We account for this with the inclusion of many search terms in our statistical analyses and use machine learning to automatically determine their salience. As discussed in the Results section, only the search queries that were selected by our statistical model to be most salient were included in the analyses. Further, using this model with the most salient queries and only 2/3 of the ground truth data (national suicide incidences) we were able to predict the following 1/3 incidence with a relative accuracy of over 83%. Moreover, the popularity of certain

Imputing Fine-Grain Patterns of Mental Health with Statistical …

21

Fig. 4 Seasonal decomposition of Australian data, analogous to Fig. 3. Top panel shows the monthly estimates of suicide incidences imputed from depression-related internet queries, the second plot shows the inferred seasonal component, the third the longitudinal trend, and the bottom subplot shows the residual. With our statistical model, we confirm previous findings in the literature—suicide rates spiking in springtime (September) in the Southern hemisphere [22]

Fig. 5 Seasonal trends in USA (orange curve) and Australian (blue) imputed monthly data. Repeated patterns occur annually: in the USA, peaks occur in April and January; in Australia two peaks happen annually in September and April. Such fine-grained patterns were previously inaccessible because mental health data is available only at annual resolution

22

E. Jack et al.

search queries reflects the demographics of the population making these searches: in Australia, top-related queries for ‘bored’ include ‘bored studies’ and ‘bored of studies’, while in the USA it related most to ‘what to do when bored’ and ‘when your (sic) bored’. This may suggest that the Australian Internet users searching for these concepts are of the younger school-going age group while users in the United States represent a broader age range. The density of Internet users in each country can also be inferred from the search volume: for example, the search terms ‘feeling empty’ and ‘feeling worthless’ did not yield sufficient data in Australia, reflecting a smaller population as compared to the USA. We believe that our methodology is applicable in a variety of contexts to surmise timely data that could be used for better interventions and policy change: for example, we know from recent reports from the CDC that mental health, especially among the young are negatively impacted [6] as a result of the ongoing COVID-19 pandemic. Using our model, we can predict its impact and have primary and population-level interventions put in place. With current practice, this is only possible in hindsight as ground-truth data on mental health is expensive and time-consuming to collect. For example, at time of writing, the most recent official suicide from the USA and Australia is from 2018. In this study we show that a model built on available coarsegrained annual official data can be run in inference mode to impute fine-grained (monthly) data points that are otherwise unavailable. Currently, public health measures in suicide intervention typically include primary prevention with a population approach—these include training personnel to recognize individuals at risk for suicide and educating communities to practice good mental health habits, reducing access to means of suicide, and postvention protocols [26]. Programs such the ‘As Safe As Possible” (ASAP) [27] using the smartphone app ‘BRITE’ has only been used as an inpatient intervention following suicide attempt. Using our approach which helps with early and automatic detection, individuals in crisis or at risk can be contacted by public health agencies before any suicide attempt. To protect individual privacy, this can be made into an opt-in feature on web browsers. Additionally, our population-level data allows for real-time syndromic surveillance without compromising individual privacy. These actions warrant deeper discussions between psychiatrists, psychologists, mental health workers and the general population regarding the ethical and legal implications of authorities intervening with individual choices [28]. Figures 2, 3, 4 and 5 give us an unprecedented view into suicide patterns at a population scale. We are able to compare the suicide seasonality between the USA and Australia, which is impossible from existing ground truth data at an annual resolution. We further see that even when controlling for these seasonal patterns, there is a definite long-term increase in suicide (Fig. 3 and 4), consistent with reports from the CDC and ABS [2, 5, 14]. In the USA, the overall increase in suicide incidence can be explained in part by an increase number of suicide amongst young people: deaths as a result of intentional self-harm is currently the second commonest cause of increased mortality, after accidents, and amongst young people between the ages of 10 and 24 [14]. The gender gap of suicides have also narrowed between male and female youth, though the number of suicides remain greater in males [5, 14].

Imputing Fine-Grain Patterns of Mental Health with Statistical …

23

Additionally, an increasingly older population at risk for depression increases the incidence of suicide overall and the multifactorial reasons the geriatric population engage in self-harm and suicide: depression, loneliness, loss of control, previous episodes of self-harm [21]. Figure 5 demonstrates the seasonality of suicide between the opposite hemispheres. Studying the peaks for each country: there are there repeating peaks, with decreasing amplitude seen in the USA on an annual basis: in the months of April, December/January, corresponding to springtime and winter in the northern hemisphere; in Australia two peaks are seen annually: September and April, corresponding to springtime and fall in the southern hemisphere. The lowest troughs for the USA occur in July of every year and every January and June in Australia, which is summer for both hemispheres. There is a brief drop in depression search queries between December and January within the USA that also is a recurring pattern, perhaps internet users were then taking a break for the holiday season. The peak of search queries in April for the USA and September for Australia agree with current literature that report suicide incidences as being highest in springtime for both hemispheres [22, 29]. The secondary peaks in December and January for the USA can be explained by the length of daylight: these are the months with shortest hours of day, and seasonal affective disorder (SAD) [22, 23, 30, 31]. Every July, however, the search trends for depression in the USA reach its lowest nadir. This is pattern is similarly observed in Australia whereby the lowest points were observed every January, corresponding to days with the longest daylight: this finding can be explained by SAD, as discussed by Rock et al. [22]. Variations in sunlight exposure in areas of increasing latitude, which affects photoperiodicity, have been studied as reasons to explain the seasonal patterns of suicide distribution [32–34]. This was true in studies performed in the Northern (e.g. New York, USA) [35] and Southern hemispheres (New South Wales, Australia) [36]. As the duration of the day varies with shifts in latitude, seasonal patterns in suicide deaths should minimize in equatorial countries, as was demonstrated by Parker et al. in Singapore [33]. On a physiological basis, there have been studies attempting to define a biologic basis between seasonality and neurotransmitter production such as serotonin and melatonin [37, 38]. Indeed, the rate of serotonin production was shown to rise with increasing sunlight intensity. Studying the residual pattern (bottom panel in Fig. 3), some dates in particular stand out: in August 2014, well-renowned American actor and comedian Robin Williams committed suicide. This took the world by surprise and led to an unusually high spike in search queries pertaining to depression in both countries. This is just one example of how world events are reflected on the internet, it does not necessarily mean that in that particular month there was also an increase in suicide incidence in the United States—in fact, there wasn’t an additional increase that year over the what would be expected from the long-term trend shown in Fig. 3. From our regression and predictive power analysis we see that our model is robust to such potential confounders and singular events. This is primarily due to leveraging multiple signals jointly and the ability to identify salient from transient signals by automatically analyzing historical data.

24

E. Jack et al.

The residual spikes in Australia are more evenly distributed throughout the year, reflecting generally broader confidence intervals on the Australian inferences due to significantly lower data density compared to the USA. The high predictive accuracy and strong correspondence between the search behaviors studied here and suicide rates suggest that they can potentially be used in tandem with the currently available annual suicide data as a way to estimate future incidences with greater granularity, allowing for prevention strategies.

5 Conclusion This work shows that Internet search behaviors reflecting documented risk factors of suicide can be used to infer a fine-grained view into suicide incidence and has utility in providing an accurate long-term forecast of likely future incidents. Additionally, we have demonstrated the seasonality of suicide and the impact of SAD in opposite hemispheres, agreeing with a number of independently performed clinical research that have arrived at this conclusion previously using classical approaches [39–42]. These findings are significant in not only shedding additional light on the association among web behavior, depression, and suicide, but also the seasonal patterns of mental health globally. We view this methodology as a scalable and timely complement to traditional techniques to monitor and study mental health problems.

References 1. American Psychiatric Association.: DSM History [internet]. Washington DC (USA): American Psychiatric Association (USA) (2018) [cited 2018 Dec 12]. Available from: https://www.psy chiatry.org/psychiatrists/practice/dsm/history-of-the-dsm 2. Hedegaard, H., Curtin, S.C., Warner, M.: Suicide Mortality in the United States, 1999–2017. NCHS Data Brief. 2018 (Nov) 330, pp. 1–7 (2018) 3. Suicides in the UK: 2017 registrations. Office for National Statistics (UK) (2018) [cited 2018 Dec 12]. Available from: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdea thsandmarriages/deaths/bulletins/suicidesintheunitedkingdom/2017registrations 4. Yeginshu, C.U.K.: Appoints Minister for Suicide Prevention [Internet]. The New York Times (2018). [cited 26 Mar 2020] Retrieved from: https://www.nytimes.com/2018/10/10/world/eur ope/uk-minister-suicide-prevention.html 5. Australian Bureau of Statistics.: Causes of Death, Australia, 2017. ACT (AU): Australian Bureau of Statistics (AU) (2018) [cited 12 Dec 2018]. Available from: http://www.abs.gov.au/ ausstats/[email protected]/Lookup/by%20Subject/3303.0~2017~Main%20Features~Australia’s%20l eading%20causes%20of%20death,%202017~2 6. Czeisler, M.É., Lane, R.I., Petrosky, E. et al.: Mental health, substance use, and suicidal ideation during the COVID-19 pandemic—United States. MMWR. Morb. Mortal. Wkly. Rep. 69, 1049– 1057, 24–30 June, 2020. https://doi.org/10.15585/mmwr.mm6932a1 7. Mangles, C.: Search engines statistics (2018) [cited 12 Dec 2018]. Available from: https:// www.smartinsights.com/search-engine-marketing/search-engine-statistics/ 8. Sasikiran, K., Shaman, J.: Reappraising the utility of google flu trends. PLOS Comput. Biol. (2019). 2 Aug 2019. Available from: https://doi.org/10.1371/journal.pcbi.1007258

Imputing Fine-Grain Patterns of Mental Health with Statistical …

25

9. Sadilek et al.: Machine-learned epidemiology: real-time detection of foodborne illness at scale. npj Digital Med. (2018). 6 Nov 2018. Available from: https://doi.org/10.1038/s41746-0180045-1 10. Gluskin, R.T., Johansson, M.A., Santillana, M., Brownstein, J.S.: Evaluation of internet-based dengue query data: Google dengue trends. PLoS Neglected Trop Dis 8(2), e2713. https://doi. org/10.1371/journal.pntd.0002713(2014). 11. Tran, U.S., Andel, R., Niederkrotenthaler, T., Till, B., Ajdacic-Gross, V., Voracek, M.: Low validity of Google Trends for behavioral forecasting of national suicide rates. PLoS ONE 12(8), e0183149 (2017). https://doi.org/10.1371/journal.pone.0183149 12. Kroenke, K., Spitzer, R.L., Williams, J.B.W. (2001). The PHQ-9: Validity of a brief depression severity measure. J. Gen. Intern. Med. 16, 606–613 13. Diagnostic and Statistical Manual of Mental Disorders: DSM-5. American Psychiatric Publishing, Arlington, VA (2013) 14. Centers for Disease Control and Prevention.: Injury Prevention and Control—Data and Statistics (WISQARS). (2018 Dec 3) [Accessed 19 Dec 2018]. Available from: https://www.cdc. gov/injury/wisqars/index.html 15. ICD-10 Classifications of Mental and Behavioural Disorder: Clinical Descriptions and Diagnostic Guidelines. World Health Organisation, Geneva (1992) 16. World Health Organization: International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD–10), 2008th edn. Switzerland, Geneva (2009) 17. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B (Methodol.) 58, 267–288 (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x 18. Liu, W., Li, Q.: An efficient elastic net with regression coefficients method for variable selection of spectrum data. PLoS ONE 12(2), e0171122 (2017). https://doi.org/10.1371/journal.pone. 0171122 19. Rothman, K.J.: Occam’s razor pares the choice among statistical models. Am. J. Epidemiol. 108(5), 347–49 (1978) 20. Misono, S., Weiss, N.S., Fann, J.R., Redman, M., Yueh, B.: Incidence of suicide in persons with cancer. J. Clin. Oncol. 26(29):4731–4738 (2008). https://doi.org/10.1200/JCO.2007.13. 8941 21. Troya, M.I., Babatunde, O., Polidano, K., Bartlam, B., McCloskey, E., Dikomitis, L. et al.: Self-harm in older adults: systematic review. Br. J. Psychiatry 214:186–200 (2019) 22. Rock, D.: Increasing seasonality of suicide in Australia 1970–1999. Psychiatry Res. 120, 43–51 (2003) 23. Yang, A.C., Huang, N.E., Peng, C.K., Tsai, S.J.: Do seasons have an influence on the incidence of depression? The use of an internet search engine query data as a proxy of human affect. PLOS One 5, 1–7 (2003) 24. Arora, V.S., Stuckler, D., McKee, M.: Tracking search engine queries for suicide in the United Kingdom, 2004–2013. Publ. Health. 137, 147–153 (2016) 25. How Trends data is adjusted [internet]: Google (2019). [cited May 2019] Available from: https://support.google.com/trends/answer/4365533?hl=en&ref_topic=6248052 26. Centers for Disease Control and Prevention. Youth Risk Behavior Surveillance—United States, 2009. Surveillance Summaries, June 4. MMWR 59(No. SS-5) (2010) 27. Kennard, B.D., Goldstein, T., Foxwell, A.A., McMakin, D.L., Wolfe, K., Biernesser, C., Moorehead, A., Douaihy, A., Zullo, L., Wentroble, E., Owen, V., Zelazny, J., Iyengar, S., Porta, G., Brent, D.: As safe as possible (ASAP): a brief app-supported inpatient intervention to prevent postdischarge suicidal behavior in hospitalized, suicidal adolescents. Am. J. Psychiatry. 175(9), 864–872, 1 Sep 2018. https://doi.org/10.1176/appi.ajp.2018.17101151. Epub 2018 Jul 19. Erratum in: Am. J. Psychiatry. 176(9), 764. PMID: 30021457; PMCID: PMC6169524, 1 Sep 2019 28. Mishara, B.L., Weisstub, D.N.: Ethical and legal issues in suicide research. Int. J. Law Psychiatry 28, 23–41. Available from https://doi.org/10.1016/j.ijlp.2004.12.006 29. Rock, D., Greenberg, D., Hallmayer, J.: Season-of-birth as a risk factor for the seasonality of suicidal behaviour. Eur. Arch. Psychiatry Clin. Neurosci. 256, 98–105 (2006)

26

E. Jack et al.

30. Altamura, C.: Seasonal and circadian rhythms in suicide in Cagliari, Italy. J. Affect. Disord. 52, 77–85 (1999) 31. Dixon, P.G.: Effects of temperature variation on suicide in five U.S. counties, 1991–2001. Int. J. Biometeorol. 51, 395–403 (2007) 32. Heerlein, A., Valeria, C., Medina, B.: Seasonal variation in suicidal deaths in chile: Its relationship to latitude. Psychopathology 39, 75–79 (2006) 33. Parker, G., Gao, F., Machin, D.: Seasonality of suicide in Singapore: data from the equator. Psychol. Med. 31, 549–553 34. Benedito-Silva, A.A., Pires, M.L.N., Calil, H.M.: Seasonal variation of suicide in Brazil. Chronobiol. Int. 24(4), 727–737 (2007) 35. Lester, D.: Seasonal variation in suicidal deaths. Br J. Psychiatry. 118, 627–628 (1971) 36. Parker, G., Walter, S.: Seasonal variation in depressive disorders and suicidal deaths in New South Wales. Br J. Psychiatry 140, 626–632 (1982) 37. Lambert, G.W., Redi, C., Kaye, D.M., Jennings, G.L., Esler, M.D.: Effect of sunlight and season on serotonin turnover in the brain. Lancet. 360, 1840–1842 (2002) 38. Wehr, T.A., Duncan, W.C., Sher, L., Aeschbach, D., Schwartz, P.J., Turner, E.H et al.: A circadian signal of change of season in patients with seasonal affective disorder. Arch. Gen. Psychiatry. 58(12), 1108–1014 (2001) 39. Preti, A.: The influence of seasonal change on suicidal behaviour in Italy. J. Affect. Disord. 44, 123–130 (1997) 40. Souêtre, E.: Seasonality of suicides: Environmental, sociological and biological covariations. J. Affect. Disord. 13, 215–225 (1987) 41. Kim, Y., Kim, H., Kim, D.S.: Association between daily environmental temperature and suicide mortality in Korea (2001–2005). Psychiatry Res. 186, 390–396 (2011) 42. Lee, H.C., Lin, H.C., Tsai, S.Y., Li, C.Y., Chen, C.C., Huang, C.C.: Suicide rates and the association with climate: A population-based study. J. Affect. Disord. 92, 221–226 (2006)

Lexical and Acoustic Correlates of Clinical Speech Disturbance in Schizophrenia Rony Krell, Wenqing Tang, Katrin Hänsel, Michael Sobolev, Sunghye Cho, Sarah Berretta, and Sunny X. Tang

Abstract There is potential to leverage lexical and acoustic features as predictors of clinical ratings used to measure thought disorder and negative symptoms of schizophrenia. In this paper, segments of speech from individuals identified as having schizophrenia were extracted from publicly available educational videos and used accordingly. We explored correlations and fit a LASSO regression model to predict individual clinical measures from the language and acoustic features. We were able to predict poverty of speech, perseveration, and latency with R2 values of 0.60, 0.54, and 0.62 on a held-out test set. We note and discuss the important predicting features as well as synchronicities between correlation and prediction results. Keywords Natural language processing · Clinical linguistics language · Psychosis · Acoustics · Coherence · Semantics The first two authors contributed equally to this work. R. Krell · W. Tang (B) · K. Hänsel · M. Sobolev Cornell University, New York, NY, USA e-mail: [email protected] R. Krell e-mail: [email protected] K. Hänsel e-mail: [email protected] M. Sobolev e-mail: [email protected] K. Hänsel · M. Sobolev · S. Berretta · S. X. Tang Zucker Hillside Hospital/Feinstein Institutes for Medical Research, Northwell Health, Glen Oaks, NY, USA e-mail: [email protected] S. X. Tang e-mail: [email protected] S. Cho Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Shaban-Nejad et al. (eds.), AI for Disease Surveillance and Pandemic Intelligence, Studies in Computational Intelligence 1013, https://doi.org/10.1007/978-3-030-93080-6_3

27

28

R. Krell et al.

1 Introduction Schizophrenia is a debilitating mental disorder characterized by positive symptoms such as hallucinations and delusions, and negative (or deficit) symptoms, e.g., amotivation, alogia, and blunted affect [3]. Affecting between 1.5 and 3 million Americans, the condition implicates a loss of life between 15 and 25 years and a suicide rate of around five percent [15, 20]. Early diagnosis greatly increases the likelihood of successful treatment. The ability to self report on the condition and symptoms is often impaired for individuals with schizophrenia [11]; thus, diagnosis relies on lengthy interviews and ratings by highly-trained clinicians. Assessing the impairment of verbal communication is one of several diagnostic methods. Some aspects of assessment rely on the content of the patient’s report while others rely on how information (or lack thereof) is conveyed, e.g., coherence, flattened tonality, or slow speech, etc. [12]. Currently, clinical assessment of language disturbance in schizophrenia is timeconsuming and highly subjective. Computational methods can support the assessment of these linguistic and speech disturbances and make the process more scalable and objective. Our approach relied on openly available interview samples collected from Youtube—similarly to that of Nour et al. [16]. We considered acoustic properties for predicting ratings of disordered speech.

2 Related Work Researchers have investigated relationships between coherence and acoustic-related features with clinical scales, such as, the Scale for the Assessment of Negative Symptoms (SANS) and Thought Language, and Communication (TLC) ratings [2, 4]. Natural language processing-derived correlates have been found for global ratings and factor scores, but relationships to individual items are not well-established. Elvevåg et al. [7] found that higher global TLC ratings (reflecting more severe thought disorder) but not poverty of speech was correlated with lower latent semantic analysis (LSA) coherence scores. Moschopoulos et al. [14] found significant correlations between auditory processing disorder and people with high TLC factor scores. Püschel et al. [19] predicted high vs low negative symptom severity on measures including the SANS on hospital release with 78.6% accuracy using total recording time, total length of utterances, number of pauses, mean energy per second, variation of energy per second, and F0 contour. Finally, Dombrowski et al. [6] found significant negative correlations between a subset of the TLC items (pressure of speech, derailment, incoherence, distractible speech, and tangentiality) as well as similarities in interviewer and subject emotional frequencies during speech when subjects were placed under a stressful condition. In this paper, we detail our work to extend existing knowledge by investigating how coherence and acoustic-related features relate to the individual items on the TLC and SANS clinical scales.

Lexical and Acoustic Correlates of Clinical Speech …

29

3 Methods 3.1 Dataset We identified videos on YouTube depicting interviews with 20 individuals identified by the video or description as having schizophrenia. Only educational videos with clinician-interviewers were included to optimize validity and reliability.1 The total duration of interview content was 38 min and 6 s (S D = 21s, 8 female and 12 male subjects). We transcribed the segments using WebTrans2 and exported segment- and word-level timestamps. Snippets with ambiguous or hard-to-transcribe words were excluded to avoid errors. Incomplete words and sentences, repeated words and phrases, neologisms, and dysfluencies were noted. The snippets were rated by three experienced clinical assessors and final ratings were verified through consensus review. Ratings included the 18-dimension TLC and the SANS items for decreased vocal inflection and increased latency of response [2, 4]; see Appendix Table 2 for details.

3.2 Data Processing and Analysis Tables 2 and 3 summarize the features and clinical targets. Coherence features use similarity metrics between different representations of contiguous words and sentences to determine how closely related different segments of the subject’s speech are to one another [9, 13, 18]. Coherence scores per subject were calculated using the incoherence metric from Iter et al. [10]. OpenSmile [8], an open source audio processing program, was used to extract acoustic features (cf., Table 6) from the audio of the YouTube subjects. The configuration file used was myIS13_ComParE_8K.conf [1]. The Spearman correlation between clinical features and non-normally distributed language features was computed. A Least Absolute Shrinkage and Selection Operator (LASSO) regression model was trained on 80% of the dataset (16 subjects) to predict clinical targets—TLC and SANS scores—from 23 total features using 10-fold cross validation on the training set. The LassoCV from the python scikit learn library [17] was used to find the best α to fit the model on the training set. The model was then used to predict clinical targets for the held-out dataset (4 subjects) using the calculated features. All features were standardized using the StandardScaler from scikit learn before the model was fit; StandardScaler was fit on the training set and then used to transform both the training and test sets. R2 scores and mean absolute error (MAE) were calculated for the training and test sets. 1

Some videos included multiple interviewed subjects. The list of videos from which samples were derived can be found here: https://pastebin.com/fz1smhzn. 2 WebTrans is an online version of XTrans developed by the Linguistic Data Consortium, see https:// www.ldc.upenn.edu/language-resources/tools/xtrans.

30

R. Krell et al.

4 Results With 20 clinical features and 23 language features, we found eighteen computed correlations had an absolute value above 0.5; Fig. 1 depicts variables with at least one coefficient rs greater than 0.5. The strongest correlation was between mean_lsa and perseveration (TLC) with a score of −0.77. The LASSO model was applied on a held-out test set and found R2 scores of 0.60, and 0.54 for poverty of speech (TLC) and perseveration (TLC), and R2 scores of 0.18, and 0.62 for inflection (SANS) and latency (SANS). The coefficients for the model’s selected features along with R2 and MAE scores are listed in Table 1. We reran the LASSO on the dataset without features that were very closely tied to specific circumstances surrounding the interviews; namely duration and total_words. The results for the second run were similar to the first run with a few important differences: one clinical target, perseveration, ceased to have a positive test R2 value, and the coefficient for the mean_lsa feature in predicting poverty of speech increased from 0.09 to 0.29. We highlight parallels between high correlations of non-clinical and clinical pairs, and high coefficient values for non-clinical features in the predicting the later. Poverty of speech and total_words have a correlation of −0.70, and total_words has the highest magnitude (−0.24) coefficient in predicting poverty of speech (TLC). When total_words and duration were removed as features, mean_lsa (which has a correlation of 0.69 with poverty of speech) became its most important predictor (coefficient value of 0.29). We also note a correlation of 0.58 between mean_lsa and latency, and a corresponding coefficient for mean_lsa as a predictor of latency of 0.47. Finally, mean_lsa has a high negative correlation with perseveration (−0.77) which aligns with its high negative coefficient (−0.39) when acting as a predictor of the clinical measure.

Fig. 1 Spearman coefficients rs of the target variables

Lexical and Acoustic Correlates of Clinical Speech …

31

Table 1 Results of the LASSO regression. The R2 and mean absolute error (MAE) are reported on the training and test set Training

Test

Dependent variable

R2

MAE

R2

MAE

Negative

Significant coefficients Positive

Poverty of speech (TLC)

0.97

0.07

0.60

0.42

total_words (−0.24) incomplete_words_frac (−0.14) filler_words_frac (−0.14) words_per_second_std (−0.12) words_per_second_mean (−0.04) shimmerLocal_sma_stdev (−0.01)

F0final_sma_stdev (0.11) mean_lsa (0.09) tfidf_w2v (0.07)

Perseveration (TLC)

0.48

0.54

0.52

0.48

mean_lsa (−0.39)

Inflection (SANS)

0.55

1.01

0.17

1.18

total_words (−0.80) words_per_second_mean (−0.04)

Latency (SANS)

0.64

0.60

0.62

0.88

incomplete_words_frac (−0.08)

F0_final_sma_stdev (0.50) mean_lsa (0.47) incomplete_words_frac (−0.02)

5 Discussion Overall, the simple correlations we identified in the data also serve as significant predictors of clinical features. Specifically, these results suggest that incoherence/disorganization speech disturbances not only should be considered separately from negative symptom disturbances like poverty of speech, but they may in fact be anti-correlated. Our regression models sufficiently predicted poverty of speech (TLC), perseveration (TLC) and latency (SANS). The most important coefficients found for features predicting poverty of speech were ones for total word count of the interview, the fraction of incomplete words, and the fraction of filler words. Preseveration was inversely related to mean_lsa coherence scores—which was surprising from a linguistic analysis perspective—as coherence measures similarity between adjoining sentences in a subject’s response. However, clinically, perseveration has been noted as one manifestation of disorganized speech in schizophrenia. The most important features to predict latency were the standard deviation of the smoothed pitch contour as well as mean_lsa coherence scores. The former reflects similarity with results from Püschel et al. [19] which found that the pitch contour was useful in predicting high vs low SANS scores. When the total number of words spoken was removed as a feature, mean_lsa coherence replaced it as the most important predictor of poverty of speech. It is interesting that mean_lsa coherence had a powerful positive effect as a predictor of both poverty of speech and latency—which are measures of alogia—while it had a large negative effect with perseveration. This further suggests that negative speech symptoms like poverty of speech do not necessarily co-occur with positive speech symptoms like incoherence. In fact, in our sample, individuals with negative speech symptoms appear more coherent. A notable negative finding was that average pause length was not found to be a predictor of latency, which is

32

R. Krell et al.

similar to findings by Cohen et al. [5] that pause length was not related to alogia. Finally, it was surprising that mean_lsa was the most powerful coherence coefficient across targets since other coherence metrics such as tfidf_w2v have been found to be better at distinguishing clinical from control populations in our previous work as well as by Iter et al. [10].

5.1 Limitations and Future Work One major limitation of the study was its small dataset, and upon further examination with additional methods we discovered our results are less robust. More data including independent datasets will be needed to ascertain which features are valid predictors and avoid over-training. As a future direction, we plan to add reduced dimensionality representations of more than 90 additional acoustic features to the model to expand beyond pitch, shimmer, and jitter. We also plan to explore how the features change as clinical interviews progress—specifically whether features change together or in opposite directions, and whether this can be helpful in predicting clinical scores such as TLC and SANS scores. Initial efforts in this direction have not shown significantly improved results We hope that we can eventually build a model to provide direct assistance for clinicians in the diagnosis and treatment of schizophrenia. Additionally, we recognize that bias can exist in our dataset due to multiple reasons. Videos were likely selected based on clinical features, such as clinical stability. However, in our research we don’t seek to investigate a representative schizophrenia group but more focus on how to connect various features to clinical ratings, which shouldn’t be significantly affected by such bias. Secondly, during the snippets selection, we are only using our own experience to judge whether the subject is a native American-English speaker, which may lead to mistakes. Even for native speakers, the differences in the dialects and the language habits in the interview years can also influence the language and acoustic features. We don’t expect the possibility of such impact existing in the features we used in this paper to be big, but in the future we would like to be either more strict in choosing our dataset or conduct interviews by ourselves to control these factors to minimize the bias. Our research only compares the findings between people with schizophrenia and the general healthy people from a clinical perspective. We plan to expand the comparisons further to other individuals who have disorders affecting speech to validate the robustness for schizophrenia results, such as autism spectrum disorder, speed impediment, hearing disabilities, depression, etc.

Lexical and Acoustic Correlates of Clinical Speech …

33

6 Acknowledgements and Conflicts of Interest We acknowledge Nathalie Moreno, who contributed to the clinical ratings. We also acknowledge Mark Liberman, James Fiumara, Christopher Cieri, and Jonathan Wright from the Linguistic Data Consortium who helped with the logistics of transcription. Sunny X. Tang holds equity to North Shore Therapeutics and received research funding from Winterlight Labs. She is a consultant for Winterlight Labs and Neurocrine Biosciences. The other authors report no conflicts of interest.

Features See Tables 2 and 3. Table 2 Clinical targets (TLC, SANS [2, 4]) Target name Description Poverty of speech (TLC) Poverty of content of speech (TLC) Pressure of speech (TLC) Distractible speech (TLC) Tangentiality (TLC) Derailment (TLC) Incoherence (TLC) Illogicality (TLC) Clanging (TLC) Neologism (TLC) Word approximation (TLC) Circumstantiality (TLC) Loss of goal (TLC) Perseveration (TLC) Echolalia (TLC) Blocking (TLC) Stilted speech (TLC) Self-reference (TLC) Decreased inflection (SANS) Increased latency (SANS)

Restriction in the amount of spontaneous speech Vague speech conveying little information Increased spontaneous speech Easily distracted by ambient stimuli Replying to questions obliquely or in an irrelevant manner Ideas slip off into obliquely related topics Speech is incomprehensible at times Conclusions do not follow logical train of thought Word choice governed by sounds rather than meaning New word formations Old words used in new, unconventional ways Indirect speech; delay in reaching goal Failure to follow through to natural conclusion Persistent repetition of words, ideas, or subjects Echoes the words of the interviewer Interruption before a train of thought is completed Excessively formal or outdated quality of speech Increased references of the participant back to themselves Decreased prosody; monotonous, lacking emphasis Delayed responses; abnormally long pauses

34

R. Krell et al.

Table 3 Description for all the language features Feature name Description Total_words Incomplete_words_num Repetitions_frac Stop_words_frac Filler_words_frac (a) Transcription features Feature name mean_lsa mean_w2v mean_glove tfidf_glove tfidf_w2v (b) Coherence language Feature name F0final_sma_mean F0final_sma_stdev jitterLocal_sma_mean jitterLocal_sma_stdev shimmerLocal_sma_mean shimmerLocal_sma_stdev (c) Acoustic language features Feature name Duration Words_per_second_mean Words_per_second_std Avg_pause_length Avg_pause_length_long Median_pause_length Median_pause_length_long (d) Text Align and Pause Features

Total number of words in a subject’s responses Total number of incomplete words in a subject’s responses divided by total_words Total number of repeated words in a subject’s responses divided by total_words Total number of stop words∗ in a subject’s responses divided by total_words Total number of filler words† in a subject’s responses divided by total_words Description Coherence score with lsa as word embeddings and mean method as sentence embeddings Coherence score with Word2Vec as word embeddings and mean method as sentence embeddings Coherence score with GloVe as word embeddings and mean method as sentence embeddings Coherence score with Word2Vec as word embeddings and TF-IDF method as sentence embeddings Coherence score with GloVe as word embeddings and TF-IDF method as sentence embeddings Description The average of smoothed fundamental frequency (pitch) The standard deviation of smoothed fundamental frequency (pitch) The average of jitter (variability of pitch) The standard deviation of jitter The average of shimmer (variability of volume) The standard deviation of shimmer Description Total length of time in seconds for a subject’s response The average of total_words divided by duration The standard deviation of speed across each of a subject’s responses. Words divided by response duration The average of pause length in miliseconds The average of pause length that is no less than 150ms in miliseconds The median of pause length in miliseconds The median of pause length that is no less than 150ms in miliseconds

Lexical and Acoustic Correlates of Clinical Speech …

35

References 1. Alhanai, T.: acousticfeatures-fhs. https://github.com/talhanai/acousticfeatures-fhs (2017) 2. Andreasen, N.: Scale for the assessment of negative symptoms (sans)*. Brit. J. Psychiatr. 155(S7), 53–58 (1989). https://doi.org/10.1192/S0007125000291502 3. Andreasen, N., Flaum, M., Swayze, V., Tyrrell, G., Arndt, S.: Positive and negative symptoms in schizophrenia. Archives General Psychiatry 47, 615–621 (1990) 4. Andreasen, N.: Scale for the assessment of thought, language, and communication (TLC). Schizophrenia Bulletin 12(3), 473–482 (1986). https://doi.org/10.1093/schbul/12.3.473 5. Cohen, A.S., Schwartz, E., Le, T.P., Cowan, T., Kirkpatrick, B., Raugh, I.M., Strauss, G.P.: Digital phenotyping of negative symptoms: the relationship to clinician ratings. Schizophrenia Bulletin 1–10 (2020). https://doi.org/10.1093/schbul/sbaa065 6. Dombrowski, M., McCleery, A., Gregory, S.W., Docherty, N.M.: Stress reactivity of emotional and verbal speech content in Schizophrenia. J. Nerv. Mental Disease 202(8) (2014). https:// doi.org/10.1097/NMD.0000000000000169 7. Elvevåg, B., Foltz, P.W., Weinberger, D.R., Goldberg, T.E.: Quantifying incoherence in speech: an automated methodology and novel application to schizophrenia. Schizophrenia Res. 93(1– 3), 304–316 (2007). https://doi.org/10.1016/j.schres.2007.03.001 8. Eyben, F., Wöllmer, M., Schuller, B.: Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the International Conference on Multimedia—MM ’10. ACM Press, Firenze, Italy (2010). https://doi.org/10.1145/1873951.1874246 9. Günther, F., Dudschig, C., Kaup, B.: LSAfun—an R package for computations based on latent semantic analysis. Beh. Res. Methods 47(4), 930–944 (2015). https://doi.org/10.3758/s13428014-0529-0 10. Iter, D., Yoon, J., Jurafsky, D.: Automatic detection of incoherent speech for diagnosing schizophrenia. In: Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 136–146 (2018) 11. Joseph, B., Narayanaswamy, J.C.: Insight in schizophrenia: relationship to positive, negative and neurocognitive dimensions. Indian J. Psychol. Med. 37(1) (2015). https://doi.org/10.4103/ 0253-7176.15079 12. Kuperberg, G.R.: Language in schizophrenia part 1: an introduction. Lang. Linguistics Compass 4(8) (2010). https://doi.org/10.1111/j.1749-818X.2010.00216.x 13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013) 14. Moschopoulos, N., Nimatoudis, I., Kaprinis, S., Sidiras, C., Iliadou, V.: Auditory processing disorder may be present in schizophrenia and it is highly correlated with formal thought disorder. Psychiatry Res. 291 (2020). https://doi.org/10.1016/j.psychres.2020.113222 15. NIMH: Schizophrenia (2018), https://www.nimh.nih.gov/health/statistics/schizophrenia. shtml 16. Nour, M.M., Nour, M.H., Tsatalou, O.M., Barrera, A.: Schizophrenia on youtube. Psychiatric Serv. 68(1), 70–74 (2017). https://doi.org/10.1176/appi.ps.201500541 17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 18. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) 19. Püschel, J., Stassen, H., Bomben, G., Scharfetter, C., Hell, D.: Speaking behavior and speech sound characteristics in acute schizophrenia. J. Psychiatric Res. 32(2) (1998). https://doi.org/ 10.1016/S0022-3956(98)00046-6 20. SARDAA: The Truth About Schizophrenia (2010), https://www.slideshare.net/SARDAA/thetruth-about-schizophrenia

A Prognostic Tool to Identify Youth at Risk of at Least Weekly Cannabis Use Marie-Pierre Sylvestre, Simon de Montigny, Laurence Boulanger, Danick Goulet, Isabelle Doré, Jennifer O’Loughlin, Slim Haddad, Richard E. Bélanger, and Scott Leatherdale

Abstract We developed and validated an 8-item prognostic tool to identify youth at risk of initiating frequent (i.e., at least weekly) cannabis use in the next year. The tool, which aims to identify youth who would benefit most from clinician intervention, can be completed by the patient or clinician using a computer or smart phone application prior to or during a clinic visit. Methodological challenges in developing the tool included selecting a parsimonious model from a set of correlated predictors with missing data. We implemented Bach’s bolasso algorithm which combines lasso with bootstrap and investigated the performance of the prognostic tool in new data M.-P. Sylvestre (B) · S. de Montigny · D. Goulet · J. O’Loughlin Université de Montréal, 7101 avenue du Parc, Montréal, Canada e-mail: [email protected] S. de Montigny e-mail: [email protected] D. Goulet e-mail: [email protected] J. O’Loughlin e-mail: [email protected] L. Boulanger Centre de recherche du CHUM, 800 St-Denis, Montréal, Canada e-mail: [email protected] I. Doré Université de Montréal, 2100, boul. Édouard-Montpetit, Montréal, Canada e-mail: [email protected] S. Haddad · R. E. Bélanger Université Laval, 1050, avenue de la Médecine, Québec, Canada e-mail: [email protected] R. E. Bélanger e-mail: [email protected] S. Leatherdale University of Waterloo, 200 University Avenue West, Waterloo, Canada e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 A. Shaban-Nejad et al. (eds.), AI for Disease Surveillance and Pandemic Intelligence, Studies in Computational Intelligence 1013, https://doi.org/10.1007/978-3-030-93080-6_4

37

38

M.-P. Sylvestre et al.

collected in a different time period (temporal validation) and in another location (geographic validation). The tool showed adequate discrimination abilities, as reflected by a c-statistic above 0.8, in both validation samples. Most predictors selected into the tool pertained to substance use including use of cigarettes, e-cigarettes, alcohol and energy drinks mixed with alcohol, but not to mental or physical health. Keywords Cannabis use · Adolescents · Prognostic tool

1 Introduction Canadian youth have one of the highest prevalence rates of cannabis use in developed countries, with a past 12-month prevalence of 18% in 2018–19 [1]. Average age at first cannabis use is 14 years in Canada [1], and one in every six individuals who try cannabis during adolescence further develop problematic use [2]. Age of onset and frequency of cannabis use during this critical developmental period [4] are strongly associated with adverse cannabis-related health impacts including detrimental effects on the structure and function of the brain [3]. Addressing this issue before an adolescent becomes a frequent cannabis user could increase the likelihood of successful intervention [5]. However, although validated screening tools exist [6], they generally aim to identify problematic cannabis users (i.e., individuals already on the pathological spectrum of use disorder) [7]. No validated tool to date aims to identify adolescents at risk of weekly use (i.e., before pathological use is established), a key milestone in the natural course of problematic cannabis use that is strongly associated with adverse health outcomes. The objective of this paper is to describe the development and validation of a short, easy-to-administer screening tool to identify youth who are at the greatest risk of initiating weekly cannabis use in the next year. The tool is intended for use in clinical practice since clinicians report lack of time (in addition to lack of knowledge about available tools) as a major barrier to discussing psychoactive substance use with adolescents [8]. The tool can be completed using an online application prior to or during a clinic visit so that clinicians can identify who would benefit most from intervention. A recent study on the perceptions of pediatric primary care providers regarding computer-administered screening tools for substance use suggested high utility, acceptability and feasibility [9].

2 Methods The current study adheres to TRIPOD statement for the development and validation of prediction models [10]. Data source Data were drawn from COMPASS, an ongoing prospective study (inception 2012– 13) of grade 9–12 students in a convenience sample of Canadian high schools which

A Prognostic Tool to Identify Youth at Risk of at Least Weekly Cannabis Use

39

was designed to investigate how changes in the school environment and in provincial, territorial, and national policies affect youth health behaviours [11]. Students complete in-class self-report questionnaires annually, that assess demographics, health behaviours and school-related characteristics [11]. The current study uses data from the 61 schools sampled in Ontario (ON) in 2016–18 and the 36 schools sampled in Québec (QC) in 2017–18. The University of Waterloo Research Ethics Board, and the Research Ethics Review Board of the Centre intégré universitaire de santé et de services sociaux de la Capitale-Nationale approved the COMPASS study in Ontario and Québec, respectively. Students were recruited in participating schools using activeinformation passive-consent permission protocols [12]. Parents/guardians received an information letter about the COMPASS study by mail and could opt out of the study by emailing or calling the COMPASS recruitment coordinator. Students could withdraw from the study at any time during the consent or data collection procedures without prejudice [11]. Study variables Frequency of cannabis use was measured using the question “In the last 12 months, how often did you use marijuana or cannabis (a joint, pot, weed, hash)?” (never; not in the past 12 months; less than once a month; once a month; 2 or 3 times per month; once a week; 2 or 3 times per week; 4–6 times per week; every day). Participants were categorized as at least weekly cannabis users (yes, no) if they used cannabis at least weekly. A total of 45 potential predictor variables were selected based on the literature on risk factors for weekly or more frequent cannabis use, and included sociodemographic characteristics, indicators of substance use (i.e., cannabis, alcohol, tobacco and nicotine), personality traits, mental health, school connectedness, bullying/victimization, academic achievement, and health behaviours (e.g., physical activity, nutrition, sleep). Most potential predictor variables were coded as binary indicators to facilitate administration of the prognostic tool, with the value of ‘1’ indicating the presence of the factor. Data preprocessing We created three analytical samples, each with two waves of data collection. The training sample was drawn from 13,759 participants who completed questionnaires in 2017 and 2018 in Ontario. A subset of 9174 participants had complete data on the 45 predictors and were used to train the prognostic tool. We assumed data were missing completely at random, but we also considered multiple imputation under the assumption that data were missing at random, as described in sensitivity analyses. A temporal validation sample including 13,652 participants who completed questionnaires in Ontario in 2016 and 2017 was used to assess variation of the performance of the tool in the same location (Ontario) but in a different time period. The 2017–18 Ontario sample was selected for training the tool because it contained mental health variables that were not available in the 2016-17 Ontario sample. Finally, a geographical validation sample included 9435 participants who completed questionnaires in 2017–18 in Québec. This sample was used to test transferability of the model to other locations (i.e., in the same time period during which the prognostic tool was developed). A total of 6199 participants provided data for both the training and temporal

40

M.-P. Sylvestre et al.

validation. However, the sample used for geographical validation was independent of the training sample, and thus represents an external validation. Predictors were measured in the first wave of each sample, and the event (i.e., cannabis use at least weekly) was assessed a year later, in the second wave. Adolescents who had already used cannabis at least weekly in the first wave of each sample were excluded. Algorithm Selection of predictors was undertaken in the training sample using the bolasso algorithm proposed by Bach [13], which combines the variable selection algorithm of lasso (i.e., least absolute shrinkage and selection operator) with bootstrap aggregating (i.e., bagging), to improve the stability and accuracy of the prediction. Use of bootstrap also addresses the potential issue of strongly correlated predictors, which may lead to inconsistent estimators with lasso [13]. As with lasso, model selection is performed by penalizing coefficients of less influential variables to exactly zero. However, with bolasso, a variable enters the final model only if selected in most bootstrapped copies. Our implementation of the bolasso algorithm uses 100 bootstrap copies and required variables to be selected in 99 of the 100 bootstrap copies to be included in the final model. In each of the 100 bootstrap copies, the hyperparameter controlling the level of shrinkage was selected by minimizing the AUC statistics using ten-fold cross-validation. The final value for the hyperparameter was obtained by averaging over the 100 resulting values. Coefficients of the variables selected by the bolasso algorithm were estimated using logistic regression. A decision rule to identify adolescents at risk of initiation cannabis use at least weekly in the next year was derived using a utility-based approach that emphasized sensitivity over specificity. This is warranted when the intervention (e.g., counselling) is not invasive, such that intervening with low-risk adolescents is a less important problem than not intervening with those at-risk. Specifically, we selected the lowest threshold that maximized specificity under the constraint that a sensitivity ≥ 0.8 in the training sample. Assessment of predictive ability Discrimination was measured using the c-statistic. Calibration plots were used to assess level of agreement between observed and predicted cannabis use at least weekly. Model accuracy was measured using the Brier score, a measure that combines components of both discrimination and calibration. Finally, Spiegelhalter’s z-test was used to test the calibration component of the Brier’s score, with rejection of the null hypothesis suggesting poor calibration. The performance of the prognostic tool was assessed in the temporal validation sample. In addition, the prognostic tool was compared to a more parsimonious model reflecting data that clinicians could extract from charts and/or ask during a routine visit (i.e., age, sex, questions on cannabis use selected by the bolasso algorithm). Model update and recalibration Coefficients for the prognostic tool were re-estimated in the combined training and temporal validation samples, resulting in an updated model and decision rule. Then, the coefficient corresponding to the intercept in the updated prognostic tool was

A Prognostic Tool to Identify Youth at Risk of at Least Weekly Cannabis Use

41

refitted to Quebec to reflect the difference in the prevalence of cannabis use at least weekly compared to Ontario. This recalibration was done by calculating the difference between the predicted and observed prevalence of cannabis use at least weekly in the geographical validation sample. Recalibration is preferred over developing new tools, since new models waste data, are prone to over-optimism and can contribute to too many non-validated models, limiting uptake of such tools [16]. Sensitivity analyses A total of six sensitivity analyses were performed to investigate the robustness of our findings. First, we investigated the stability of the selection of predictors under an alternative coding of the binary indicators which minimized loss of information by choosing the cut-point value for dichotomization that led to a distribution that was the closest possible to a 50–50 split. Second, we considered inclusion of twoway interaction terms between predictors in the bolasso algorithm to investigate the appropriateness of a linear representation of the predictors. Third, we investigated the stability of the model by considering a more lenient threshold of 95 for the number of the 100 bootstrap copies that selected a predictor. Fourth, we assessed the impact of the decision rule on sensitivity and specificity by implementing the more conventional Youden index to identify the threshold that optimized both sensitivity and specificity in the training sample. Fifth, we assessed the benefit of sex-specific versions of the prognostic tool by conducting analyses in sex-stratified subsamples of the training sample and by testing the performance of the proposed model separately in each sex. Sixth, in a sensitivity analysis that assumed a missing at random process, we used multiple imputation by chained equations [14] to impute missing values in 10 imputation sets using the 45 prognostic factors in addition to the outcome variable. Missing values were imputed by province and by year, using the entire sample and the original coding for each variable. The bolasso algorithm was adapted as follows: (i) the shrinkage hyperparameter was selected in each of the 10 imputed datasets; and (ii) a variable was selected in the prognostic tool if it was selected in all imputed datasets. Rubin’s rule was applied to summarize the estimated coefficients and standard errors obtained from fitting a logistic regression on the selected variables in each imputed dataset. The data analysis was performed using R version 3.6.3 with the glmnet, rms, plyr, dplyr, tidyr, summarytools, pROC, MICE and mitools packages.

3 Results The 1-year cumulative incidence of cannabis use at least weekly was 6.3% in the training sample, but lower in the geographical validation sample (3.0%), which included a larger proportion of younger participants. Once standardized to the age distribution in the training sample, the 1-year cumulative incidence of cannabis use at least weekly was 4.2% in the geographical validation sample. The sample-specific distributions of selected predictors are presented in Table 1.

42

M.-P. Sylvestre et al.

Table 1 Selected descriptive statistics for the training sample (Ontario 2017–18), temporal validation sample (Ontario 2016–17) and geographic validation sample (Québec 2017–18), COMPASS study Training sample Temporal validation Geographic validation Ontario 2017-18 sample sample n = 11,792 Ontario 2016-17 Québec 2017-18 n = 11,743 n = 8347 1-year cumulative incidence of regular cannabis use Sociodemographics Age (years), % ≤12 13 14 15 16 17 ≥18 Male, % Substance use Used cannabis in the past 12 months, % Ever tried cigarettes, % Smoked a cigarette in the past 30 days, % Ever tried e-cigarettes, % Used e-cigarettes in the past 30 days, % Ever tried alcohol, % Used alcohol weekly, % Drink high-energy drinks weekly, % Mixed alcohol with an energy drink in past 12 months, %

6.3

5.4

3.0

0.0 1.3 32.0 34.6 25.5 6.1 0.5 46.5

0.0 1.4 30.7 34.9 25.6 6.8 0.5 46.6

12.0 24.1 24.6 24.5 13.4 1.3 0.2 43.6

11.5

26.0

7.0

11.3

11.5

13.6

3.2

7.4

3.6

24.8

17.2

31.9

12.0

15.9

15.1

66.6 3.9

77.0 8.6

69.7 4.7

9.0

11.3

8.4

5.7

5.5

7.4

A Prognostic Tool to Identify Youth at Risk of at Least Weekly Cannabis Use

43

Table 2 Estimated model coefficients for the variables selected by bolasso with and without multiple imputation, COMPASS 2016–2018 Variable Coefficient (SE) Complete cases Imputed data Age (years) Male Time since first use of cannabis (years) Ever tried e-cigarettes Perceived easiness of obtaining cannabis (yes) Ever tried cigarettes Failed last math and/or last English/French class(es) Mixed alcohol with an energy drink in the last 12 months Drink high-energy drinks weekly

−0.21 (0.03) 0.44 (0.06) 0.61 (0.04)

−0.12 (0.04) 0.46 (0.08) 0.49 (0.05)

0.95 (0.07) 0.87 (0.07)

1.03 (0.09) 0.94 (0.55)

0.81 (0.07) 0.50 (0.07)

0.74 (0.09) 0.55 (0.10)

0.49 (0.09)

0.36 (0.10)

NA

0.38 (0.10)

The prognostic tool derived using the training sample included 8 predictors of initiating cannabis at least weekly. In addition to sex and age, it included one predictor pertaining to school performance (i.e., failing their last math and/or English/French classe(s)1 ). The remaining five predictors captured substance use, including years since first use of cannabis, perceived easiness of obtaining cannabis, ever smoking cigarettes, ever using e-cigarettes, and mixing alcohol with energy drinks. The estimated coefficients for each selected predictor are shown in the left column of Table 2. The larger estimated coefficients suggested that the predictions were more heavily affected by ever-trying e-cigarettes, followed by perceptions that it was easy to obtain cannabis. Performance statistics for each sample are shown in Table 3. The overall accuracy of prediction was satisfactory as suggested by the low Brier score in the training and validation samples. The tool also showed adequate discrimination abilities, as reflected by the c-statistic above 0.8 in all samples. The prognostic tool performed better than the reduced model which included age, sex and the two cannabis-related variables (i.e., likelihood ratio test p-value