Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016, 19 conf., part 1 978-3-319-46719-1, 3319467190, 9783319467221, 3319467220, 9783319467252, 3319467255, 978-3-319-46720-7


290 47 19MB

English Pages 723 Year 2016

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Content: Brain analysis.- Brain analysis - connectivity.- Brain analysis - cortical morphology.- Alzheimer disease.- Surgical guidance and tracking.- Computer aided interventions.- Ultrasound image analysis.- cancer image analysis.
Recommend Papers

Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016, 19 conf., part 1
 978-3-319-46719-1, 3319467190, 9783319467221, 3319467220, 9783319467252, 3319467255, 978-3-319-46720-7

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

LNCS 9900

Sebastien Ourselin · Leo Joskowicz Mert R. Sabuncu · Gozde Unal William Wells (Eds.)

Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016 19th International Conference Athens, Greece, October 17–21, 2016 Proceedings, Part I

123

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany

9900

More information about this series at http://www.springer.com/series/7412

Sebastien Ourselin Leo Joskowicz Mert R. Sabuncu Gozde Unal William Wells (Eds.) •



Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016 19th International Conference Athens, Greece, October 17–21, 2016 Proceedings, Part I

123

Editors Sebastien Ourselin University College London London UK

Gozde Unal Istanbul Technical University Istanbul Turkey

Leo Joskowicz The Hebrew University of Jerusalem Jerusalem Israel

William Wells Harvard Medical School Boston, MA USA

Mert R. Sabuncu Harvard Medical School Boston, MA USA

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-319-46719-1 ISBN 978-3-319-46720-7 (eBook) DOI 10.1007/978-3-319-46720-7 Library of Congress Control Number: 2016952513 LNCS Sublibrary: SL6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics © Springer International Publishing AG 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer imprint is published by Springer Nature The registered company is Springer International Publishing AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Preface

In 2016, the 19th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI 2016) was held in Athens, Greece. It was organized by Harvard Medical School, The Hebrew University of Jerusalem, University College London, Sabancı University, Bogazici University, and Istanbul Technical University. The meeting took place at the Intercontinental Athenaeum Hotel in Athens, Greece, during October 18–20. Satellite events associated with MICCAI 2016 were held on October 19 and October 21. MICCAI 2016 and its satellite events attracted word-leading scientists, engineers, and clinicians, who presented high-standard papers, aiming at uniting the fields of medical image processing, medical image formation, and medical robotics. This year the triple anonymous review process was organized in several phases. In total, 756 submissions were received. The review process was handled by one primary and two secondary Program Committee members for each paper. It was initiated by the primary Program Committee member, who assigned exactly three expert reviewers, who were blinded to the authors of the paper. Based on these initial anonymous reviews, 82 papers were directly accepted and 189 papers were rejected. Next, the remaining papers went to the rebuttal phase, in which the authors had the chance to respond to the concerns raised by reviewers. The reviewers were then given a chance to revise their reviews based on the rebuttals. After this stage, 51 papers were accepted and 147 papers were rejected based on a consensus reached among reviewers. Finally, the reviews and associated rebuttals were subsequently discussed in person among the Program Committee members during the MICCAI 2016 Program Committee meeting that took place in London, UK, during May 28–29, 2016, with 28 Program Committee members out of 55, the four Program Chairs, and the General Chair. The process led to the acceptance of another 95 papers and the rejection of 192 papers. In total, 228 papers of the 756 submitted papers were accepted, which corresponds to an acceptance rate of 30.1%. For these proceedings, the 228 papers are organized in 18 groups as follows. The first volume includes Brain Analysis (12), Brain Analysis: Connectivity (12), Brain Analysis: Cortical Morphology (6), Alzheimer Disease (10), Surgical Guidance and Tracking (15), Computer Aided Interventions (10), Ultrasound Image Analysis (5), and Cancer Image Analysis (7). The second volume includes Machine Learning and Feature Selection (12), Deep Learning in Medical Imaging (13), Applications of Machine Learning (14), Segmentation (33), and Cell Image Analysis (7). The third volume includes Registration and Deformation Estimation (16), Shape Modeling (11), Cardiac and Vascular Image Analysis (19), Image Reconstruction (10), and MRI Image Analysis (16). We thank Dekon, who did an excellent job in the organization of the conference. We thank the MICCAI society for provision of support and insightful comments, the Program Committee for their diligent work in helping to prepare the technical program,

VI

Preface

as well as the reviewers for their support during the review process. We also thank Andreas Maier for his support in editorial tasks. Last but not least, we thank our sponsors for the financial support that made the conference possible. We look forward to seeing you in Quebec City, Canada, in 2017! August 2016

Sebastien Ourselin William Wells Leo Joskowicz Mert Sabuncu Gozde Unal

Organization

General Chair Sebastien Ourselin

University College London, London, UK

General Co-chair Aytül Erçil

Sabanci University, Istanbul, Turkey

Program Chair William Wells

Harvard Medical School, Boston, MA, USA

Program Co-chairs Mert R. Sabuncu Leo Joskowicz Gozde Unal

A.A. Martinos Center for Biomedical Imaging, Charlestown, MA, USA The Hebrew University of Jerusalem, Israel Istanbul Technical University, Istanbul, Turkey

Local Organization Chair Bülent Sankur

Bogazici University, Istanbul, Turkey

Satellite Events Chair Burak Acar

Bogazici University, Istanbul, Turkey

Satellite Events Co-chairs Evren Özarslan Devrim Ünay Tom Vercauteren

Harvard Medical School, Boston, MA, USA Izmir University of Economics, Izmir, Turkey University College London, UK

Industrial Liaison Tanveer Syeda-Mahmood

IBM Almaden Research Center, San Jose, CA, USA

VIII

Organization

Publication Chair Andreas Maier

Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany

MICCAI Society Board of Directors Stephen Aylward (Treasurer) Hervé Delinguette Simon Duchesne Gabor Fichtinger (Secretary) Alejandro Frangi Pierre Jannin Leo Joskowicz Shuo Li Wiro Niessen (President and Board Chair) Nassir Navab Alison Noble (Past President - Non Voting) Sebastien Ourselin Josien Pluim Li Shen (Executive Director)

Kitware, Inc., NY, USA Inria, Sophia Antipolis, France Université Laval, Quebéc, QC, Canada Queen’s University, Kingston, ON, Canada University of Sheffield, UK INSERM/Inria, Rennes, France The Hebrew University of Jerusalem, Israel Digital Imaging Group, Western University, London, ON, Canada Erasmus MC - University Medical Centre, Rotterdam, The Netherlands Technical University of Munich, Germany University of Oxford, UK University College London, UK Eindhoven University of Technology, The Netherlands Indiana University, IN, USA

MICCAI Society Consultants to the Board Alan Colchester Terry Peters Richard Robb

University of Kent, Canterbury, UK University of Western Ontario, London, ON, Canada Mayo Clinic College of Medicine, MN, USA

Executive Officers President and Board Chair Executive Director (Managing Educational Affairs) Secretary (Coordinating MICCAI Awards) Treasurer Elections Officer

Wiro Niessen Li Shen

Gabor Fichtinger Stephen Aylward Rich Robb

Organization

IX

Non-Executive Officers Society Secretariat Recording Secretary and Web Maintenance Fellows Nomination Coordinator

Janette Wallace, Canada Jackie Williams, Canada Terry Peters, Canada

Student Board Members President Professional Student Events officer Public Relations Officer Social Events Officer

Lena Filatova Danielle Pace Duygu Sarikaya Mathias Unberath

Program Committee Arbel, Tal Cardoso, Manuel Jorge Castellani, Umberto Cattin, Philippe C. Chung, Albert C.S. Cukur, Tolga Delingette, Herve Feragen, Aasa Freiman, Moti Glocker, Ben Goksel, Orcun Gonzalez Ballester, Miguel Angel Grady, Leo Greenspan, Hayit Howe, Robert Isgum, Ivana Jain, Ameet Jannin, Pierre Joshi, Sarang Kalpathy-Cramer, Jayashree Kamen, Ali Knutsson, Hans Konukoglu, Ender Landman, Bennett Langs, Georg

McGill University, Canada University College London, UK University of Verona, Italy University of Basel, Switzerland Hong Kong University of Science and Technology, Hong Kong Bilkent University, Turkey Inria, France University of Copenhagen, Denmark Philips Healthcare, Israel Imperial College London, UK ETH Zurich, Switzerland Universitat Pompeu Fabra, Spain HeartFlow, USA Tel Aviv University, Israel Harvard University, USA University Medical Center Utrecht, The Netherlands Philips Research North America, USA University of Rennes, France University of Utah, USA Harvard Medical School, USA Siemens Corporate Technology, USA Linkoping University, Sweden Harvard Medical School, USA Vanderbilt University, USA University of Vienna, Austria

X

Organization

Lee, Su-Lin Liao, Hongen Linguraru, Marius George Liu, Huafeng Lu, Le Maier-Hein, Lena Martel, Anne Masamune, Ken Menze, Bjoern Modat, Marc Moradi, Mehdi Nielsen, Poul Niethammer, Marc O’Donnell, Lauren Padoy, Nicolas Pohl, Kilian Prince, Jerry Reyes, Mauricio Sakuma, Ichiro Sato, Yoshinobu Shen, Li Stoyanov, Danail Van Leemput, Koen Vrtovec, Tomaz Wassermann, Demian Wein, Wolfgang Yang, Guang-Zhong Young, Alistair Zheng, Guoyan

Imperial College London, UK Tsinghua University, China Children’s National Health System, USA Zhejiang University, China National Institutes of Health, USA German Cancer Research Center, Germany University of Toronto, Canada The University of Tokyo, Japan Technische Universitat München, Germany Imperial College, London, UK IBM Almaden Research Center, USA The University of Auckland, New Zealand UNC Chapel Hill, USA Harvard Medical School, USA University of Strasbourg, France SRI International, USA Johns Hopkins University, USA University of Bern, Bern, Switzerland The University of Tokyo, Japan Nara Institute of Science and Technology, Japan Indiana University School of Medicine, USA University College London, UK Technical University of Denmark, Denmark University of Ljubljana, Slovenia Inria, France ImFusion GmbH, Germany Imperial College London, UK The University of Auckland, New Zealand University of Bern, Switzerland

Reviewers Abbott, Jake Abolmaesumi, Purang Acosta-Tamayo, Oscar Adeli, Ehsan Afacan, Onur Aganj, Iman Ahmadi, Seyed-Ahmad Aichert, Andre Akhondi-Asl, Alireza Albarqouni, Shadi Alberola-López, Carlos Alberts, Esther

Alexander, Daniel Aljabar, Paul Allan, Maximilian Altmann, Andre Andras, Jakab Angelini, Elsa Antony, Bhavna Ashburner, John Auvray, Vincent Awate, Suyash P. Bagci, Ulas Bai, Wenjia

Bai, Ying Bao, Siqi Barbu, Adrian Batmanghelich, Kayhan Bauer, Stefan Bazin, Pierre-Louis Beier, Susann Bello, Fernando Ben Ayed, Ismail Bergeles, Christos Berger, Marie-Odile Bhalerao, Abhir

Organization

Bhatia, Kanwal Bieth, Marie Bilgic, Berkin Birkfellner, Wolfgang Bloch, Isabelle Bogunovic, Hrvoje Bouget, David Bouix, Sylvain Brady, Michael Bron, Esther Brost, Alexander Buerger, Christian Burgos, Ninon Cahill, Nathan Cai, Weidong Cao, Yu Carass, Aaron Cardoso, Manuel Jorge Carmichael, Owen Carneiro, Gustavo Caruyer, Emmanuel Cash, David Cerrolaza, Juan Cetin, Suheyla Cetingul, Hasan Ertan Chakravarty, M. Mallar Chatelain, Pierre Chen, Elvis C.S. Chen, Hanbo Chen, Hao Chen, Ting Cheng, Jian Cheng, Jun Cheplygina, Veronika Chowdhury, Ananda Christensen, Gary Chui, Chee Kong Côté, Marc-Alexandre Ciompi, Francesco Clancy, Neil T. Claridge, Ela Clarysse, Patrick Cobzas, Dana Comaniciu, Dorin Commowick, Olivier Compas, Colin

Conjeti, Sailesh Cootes, Tim Coupe, Pierrick Crum, William Dalca, Adrian Darkner, Sune Das Gupta, Mithun Dawant, Benoit de Bruijne, Marleen De Craene, Mathieu Degirmenci, Alperen Dehghan, Ehsan Demirci, Stefanie Depeursinge, Adrien Descoteaux, Maxime Despinoy, Fabien Dijkstra, Jouke Ding, Xiaowei Dojat, Michel Dong, Xiao Dorfer, Matthias Du, Xiaofei Duchateau, Nicolas Duchesne, Simon Duncan, James S. Ebrahimi, Mehran Ehrhardt, Jan Eklund, Anders El-Baz, Ayman Elliott, Colm Ellis, Randy Elson, Daniel El-Zehiry, Noha Erdt, Marius Essert, Caroline Fallavollita, Pascal Fang, Ruogu Fenster, Aaron Ferrante, Enzo Fick, Rutger Figl, Michael Fischer, Peter Fishbaugh, James Fletcher, P. Thomas Forestier, Germain Foroughi, Pezhman

Foroughi, Pezhman Forsberg, Daniel Franz, Alfred Freysinger, Wolfgang Fripp, Jurgen Frisch, Benjamin Fritscher, Karl Funka-Lea, Gareth Gabrani, Maria Gallardo Diez, Guillermo Alejandro Gangeh, Mehrdad Ganz, Melanie Gao, Fei Gao, Mingchen Gao, Yaozong Gao, Yue Garvin, Mona Gaser, Christian Gass, Tobias Georgescu, Bogdan Gerig, Guido Ghesu, Florin-Cristian Gholipour, Ali Ghosh, Aurobrata Giachetti, Andrea Giannarou, Stamatia Gibaud, Bernard Ginsburg, Shoshana Girard, Gabriel Giusti, Alessandro Golemati, Spyretta Golland, Polina Gong, Yuanhao Good, Benjamin Gooya, Ali Grisan, Enrico Gu, Xianfeng Gu, Xuan Gubern-Mérida, Albert Guetter, Christoph Guo, Peifang B. Guo, Yanrong Gur, Yaniv Gutman, Boris Hacihaliloglu, Ilker

XI

XII

Organization

Haidegger, Tamas Hamarneh, Ghassan Hammer, Peter Harada, Kanako Harrison, Adam Hata, Nobuhiko Hatt, Chuck Hawkes, David Haynor, David He, Huiguang He, Tiancheng Heckemann, Rolf Hefny, Mohamed Heinrich, Mattias Paul Heng, Pheng Ann Hennersperger, Christoph Herbertsson, Magnus Hütel, Michael Holden, Matthew Hong, Jaesung Hong, Yi Hontani, Hidekata Horise, Yuki Horiuchi, Tetsuya Hu, Yipeng Huang, Heng Huang, Junzhou Huang, Xiaolei Hughes, Michael Hutter, Jana Iakovidis, Dimitris Ibragimov, Bulat Iglesias, Juan Eugenio Iordachita, Iulian Irving, Benjamin Jafari-Khouzani, Kourosh Jain, Saurabh Janoos, Firdaus Jedynak, Bruno Jiang, Tianzi Jiang, Xi Jin, Yan Jog, Amod Jolly, Marie-Pierre Joshi, Anand Joshi, Shantanu

Kadkhodamohammadi, Abdolrahim Kadoury, Samuel Kainz, Bernhard Kakadiaris, Ioannis Kamnitsas, Konstantinos Kandemir, Melih Kapoor, Ankur Karahanoglu, F. Isik Karargyris, Alexandros Kasenburg, Niklas Katouzian, Amin Kelm, Michael Kerrien, Erwan Khallaghi, Siavash Khalvati, Farzad Köhler, Thomas Kikinis, Ron Kim, Boklye Kim, Hosung Kim, Minjeong Kim, Sungeun Kim, Sungmin King, Andrew Kisilev, Pavel Klein, Stefan Klinder, Tobias Kluckner, Stefan Konofagou, Elisa Kunz, Manuela Kurugol, Sila Kuwana, Kenta Kwon, Dongjin Ladikos, Alexander Lamecker, Hans Lang, Andrew Lapeer, Rudy Larrabide, Ignacio Larsen, Anders Boesen Lindbo Lauze, Francois Lea, Colin Lefèvre, Julien Lekadir, Karim Lelieveldt, Boudewijn Lemaitre, Guillaume

Lepore, Natasha Lesage, David Li, Gang Li, Jiang Li, Xiang Liang, Liang Lindner, Claudia Lioma, Christina Liu, Jiamin Liu, Mingxia Liu, Sidong Liu, Tianming Liu, Ting Lo, Benny Lombaert, Herve Lorenzi, Marco Loschak, Paul Loy Rodas, Nicolas Luo, Xiongbiao Lv, Jinglei Maddah, Mahnaz Mahapatra, Dwarikanath Maier, Andreas Maier, Oskar Maier-Hein (né Fritzsche), Klaus Hermann Mailhe, Boris Malandain, Gregoire Mansoor, Awais Marchesseau, Stephanie Marsland, Stephen Martí, Robert Martin-Fernandez, Marcos Masuda, Kohji Masutani, Yoshitaka Mateus, Diana Matsumiya, Kiyoshi Mazomenos, Evangelos McClelland, Jamie Mehrabian, Hatef Meier, Raphael Melano, Tim Melbourne, Andrew Mendelson, Alex F. Menegaz, Gloria Metaxas, Dimitris

Organization

Mewes, Philip Meyer, Chuck Miller, Karol Misra, Sarthak Misra, Vinith MÌürup, Morten Moeskops, Pim Moghari, Mehdi Mohamed, Ashraf Mohareri, Omid Moore, John Moreno, Rodrigo Mori, Kensaku Mountney, Peter Mukhopadhyay, Anirban Müller, Henning Nakamura, Ryoichi Nambu, Kyojiro Nasiriavanaki, Mohammadreza Negahdar, Mohammadreza Nenning, Karl-Heinz Neumann, Dominik Neumuth, Thomas Ng, Bernard Ni, Dong Näppi, Janne Niazi, Muhammad Khalid Khan Ning, Lipeng Noble, Alison Noble, Jack Noblet, Vincent Nouranian, Saman Oda, Masahiro O’Donnell, Thomas Okada, Toshiyuki Oktay, Ozan Oliver, Arnau Onofrey, John Onogi, Shinya Orihuela-Espina, Felipe Otake, Yoshito Ou, Yangming Özarslan, Evren

Pace, Danielle Panayiotou, Maria Panse, Ashish Papa, Joao Papademetris, Xenios Papadopoulo, Theo Papie, Bartâomiej W. Parisot, Sarah Park, Sang hyun Paulsen, Rasmus Peng, Tingying Pennec, Xavier Peressutti, Devis Pernus, Franjo Peruzzo, Denis Peter, Loic Peterlik, Igor Petersen, Jens Petersen, Kersten Petitjean, Caroline Pham, Dzung Pheiffer, Thomas Piechnik, Stefan Pitiot, Alain Pizzolato, Marco Plenge, Esben Pluim, Josien Polimeni, Jonathan R. Poline, Jean-Baptiste Pont-Tuset, Jordi Popovic, Aleksandra Porras, Antonio R. Prasad, Gautam Prastawa, Marcel Pratt, Philip Preim, Bernhard Preston, Joseph Prevost, Raphael Pszczolkowski, Stefan Qazi, Arish A. Qi, Xin Qian, Zhen Qiu, Wu Quellec, Gwenole Raj, Ashish Rajpoot, Nasir

Randles, Amanda Rathi, Yogesh Reinertsen, Ingerid Reiter, Austin Rekik, Islem Reuter, Martin Riklin Raviv, Tammy Risser, Laurent Rit, Simon Rivaz, Hassan Robinson, Emma Rohling, Robert Rohr, Karl Ronneberger, Olaf Roth, Holger Rottman, Caleb Rousseau, François Roy, Snehashis Rueckert, Daniel Rueda Olarte, Andrea Ruijters, Daniel Salcudean, Tim Salvado, Olivier Sanabria, Sergio Saritas, Emine Sarry, Laurent Scherrer, Benoit Schirmer, Markus D. Schnabel, Julia A. Schultz, Thomas Schumann, Christian Schumann, Steffen Schwartz, Ernst Sechopoulos, Ioannis Seeboeck, Philipp Seiler, Christof Seitel, Alexander sepasian, neda Sermesant, Maxime Sethuraman, Shriram Shahzad, Rahil Shamir, Reuben R. Shi, Kuangyu Shi, Wenzhe Shi, Yonggang Shin, Hoo-Chang

XIII

XIV

Organization

Siddiqi, Kaleem Silva, Carlos Alberto Simpson, Amber Singh, Vikas Sivaswamy, Jayanthi Sjölund, Jens Skalski, Andrzej Slabaugh, Greg Smeets, Dirk Sommer, Stefan Sona, Diego Song, Gang Song, Qi Song, Yang Sotiras, Aristeidis Speidel, Stefanie Špiclin, Žiga Sporring, Jon Staib, Lawrence Stamm, Aymeric Staring, Marius Stauder, Ralf Stewart, James Studholme, Colin Styles, Iain Styner, Martin Sudre, Carole H. Suinesiaputra, Avan Suk, Heung-Il Summers, Ronald Sun, Shanhui Sundar, Hari Sushkov, Mikhail Suzuki, Takashi Szczepankiewicz, Filip Sznitman, Raphael Taha, Abdel Aziz Tahmasebi, Amir Talbot, Hugues Tam, Roger Tamaki, Toru Tamura, Manabu Tanaka, Yoshihiro Tang, Hui Tang, Xiaoying Tanner, Christine

Tasdizen, Tolga Taylor, Russell Thirion, Bertrand Tie, Yanmei Tiwari, Pallavi Toews, Matthew Tokuda, Junichi Tong, Tong Tournier, J. Donald Toussaint, Nicolas Tsaftaris, Sotirios Tustison, Nicholas Twinanda, Andru Putra Twining, Carole Uhl, Andreas Ukwatta, Eranga Umadevi Venkataraju, Kannan Unay, Devrim Urschler, Martin Vaillant, Régis van Assen, Hans van Ginneken, Bram van Tulder, Gijs van Walsum, Theo Vandini, Alessandro Vasileios, Vavourakis Vegas-Sanchez-Ferrero, Gonzalo Vemuri, Anant Suraj Venkataraman, Archana Vercauteren, Tom Veta, Mtiko Vidal, Rene Villard, Pierre-Frederic Visentini-Scarzanella, Marco Viswanath, Satish Vitanovski, Dime Vogl, Wolf-Dieter von Berg, Jens Vrooman, Henri Wang, Defeng Wang, Hongzhi Wang, Junchen Wang, Li

Wang, Liansheng Wang, Linwei Wang, Qiu Wang, Song Wang, Yalin Warfield, Simon Weese, Jürgen Wegner, Ingmar Wei, Liu Wels, Michael Werner, Rene Westin, Carl-Fredrik Whitaker, Ross Wörz, Stefan Wiles, Andrew Wittek, Adam Wolf, Ivo Wolterink, Jelmer Maarten Wright, Graham Wu, Guorong Wu, Meng Wu, Xiaodong Xie, Saining Xie, Yuchen Xing, Fuyong Xu, Qiuping Xu, Yanwu Xu, Ziyue Yamashita, Hiromasa Yan, Jingwen Yan, Pingkun Yan, Zhennan Yang, Lin Yao, Jianhua Yap, Pew-Thian Yaqub, Mohammad Ye, Dong Hye Ye, Menglong Yin, Zhaozheng Yokota, Futoshi Zelmann, Rina Zeng, Wei Zhan, Yiqiang Zhang, Daoqiang Zhang, Fan

Organization

Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang,

Le Ling Miaomiao Pei Qing Tianhao Tuo

Zhang, Yong Zhen, Xiantong Zheng, Yefeng Zhijun, Zhang Zhou, Jinghao Zhou, Luping Zhou, S. Kevin

Zhu, Hongtu Zhu, Yuemin Zhuang, Xiahai Zollei, Lilla Zuluaga, Maria A.

XV

Contents – Part I

Brain Analysis Ordinal Patterns for Connectivity Networks in Brain Disease Diagnosis . . . . . Mingxia Liu, Junqiang Du, Biao Jie, and Daoqiang Zhang Discovering Cortical Folding Patterns in Neonatal Cortical Surfaces Using Large-Scale Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yu Meng, Gang Li, Li Wang, Weili Lin, John H. Gilmore, and Dinggang Shen Modeling Functional Dynamics of Cortical Gyri and Sulci . . . . . . . . . . . . . . Xi Jiang, Xiang Li, Jinglei Lv, Shijie Zhao, Shu Zhang, Wei Zhang, Tuo Zhang, and Tianming Liu A Multi-stage Sparse Coding Framework to Explore the Effects of Prenatal Alcohol Exposure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shijie Zhao, Junwei Han, Jinglei Lv, Xi Jiang, Xintao Hu, Shu Zhang, Mary Ellen Lynch, Claire Coles, Lei Guo, Xiaoping Hu, and Tianming Liu Correlation-Weighted Sparse Group Representation for Brain Network Construction in MCI Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Renping Yu, Han Zhang, Le An, Xiaobo Chen, Zhihui Wei, and Dinggang Shen Temporal Concatenated Sparse Coding of Resting State fMRI Data Reveal Network Interaction Changes in mTBI. . . . . . . . . . . . . . . . . . . . . . . . . . . . Jinglei Lv, Armin Iraji, Fangfei Ge, Shijie Zhao, Xintao Hu, Tuo Zhang, Junwei Han, Lei Guo, Zhifeng Kou, and Tianming Liu Exploring Brain Networks via Structured Sparse Representation of fMRI Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qinghua Zhao, Jianfeng Lu, Jinglei Lv, Xi Jiang, Shijie Zhao, and Tianming Liu Discover Mouse Gene Coexpression Landscape Using Dictionary Learning and Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yujie Li, Hanbo Chen, Xi Jiang, Xiang Li, Jinglei Lv, Hanchuan Peng, Joe Z. Tsien, and Tianming Liu

1

10

19

28

37

46

55

63

XVIII

Contents – Part I

Integrative Analysis of Cellular Morphometric Context Reveals Clinically Relevant Signatures in Lower Grade Glioma . . . . . . . . . . . . . . . . . . . . . . . Ju Han, Yunfu Wang, Weidong Cai, Alexander Borowsky, Bahram Parvin, and Hang Chang Mapping Lifetime Brain Volumetry with Covariate-Adjusted Restricted Cubic Spline Regression from Cross-Sectional Multi-site MRI . . . . . . . . . . . Yuankai Huo, Katherine Aboud, Hakmook Kang, Laurie E. Cutting, and Bennett A. Landman Extracting the Core Structural Connectivity Network: Guaranteeing Network Connectedness Through a Graph-Theoretical Approach. . . . . . . . . . Demian Wassermann, Dorian Mazauric, Guillermo Gallardo-Diez, and Rachid Deriche Fiber Orientation Estimation Using Nonlocal and Local Information . . . . . . . Chuyang Ye

72

81

89

97

Brain Analysis: Connectivity Reveal Consistent Spatial-Temporal Patterns from Dynamic Functional Connectivity for Autism Spectrum Disorder Identification . . . . . . . . . . . . . . Yingying Zhu, Xiaofeng Zhu, Han Zhang, Wei Gao, Dinggang Shen, and Guorong Wu Boundary Mapping Through Manifold Learning for Connectivity-Based Cortical Parcellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Salim Arslan, Sarah Parisot, and Daniel Rueckert Species Preserved and Exclusive Structural Connections Revealed by Sparse CCA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiao Li, Lei Du, Tuo Zhang, Xintao Hu, Xi Jiang, Lei Guo, and Tianming Liu

106

115

123

Modularity Reinforcement for Improving Brain Subnetwork Extraction . . . . . Chendi Wang, Bernard Ng, and Rafeef Abugharbieh

132

Effective Brain Connectivity Through a Constrained Autoregressive Model . . . Alessandro Crimi, Luca Dodero, Vittorio Murino, and Diego Sona

140

GraMPa: Graph-Based Multi-modal Parcellation of the Cortex Using Fusion Moves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sarah Parisot, Ben Glocker, Markus D. Schirmer, and Daniel Rueckert A Continuous Model of Cortical Connectivity . . . . . . . . . . . . . . . . . . . . . . Daniel Moyer, Boris A. Gutman, Joshua Faskowitz, Neda Jahanshad, and Paul M. Thompson

148 157

Contents – Part I

Label-Informed Non-negative Matrix Factorization with Manifold Regularization for Discriminative Subnetwork Detection . . . . . . . . . . . . . . . Takanori Watanabe, Birkan Tunc, Drew Parker, Junghoon Kim, and Ragini Verma Predictive Subnetwork Extraction with Structural Priors for Infant Connectomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Colin J. Brown, Steven P. Miller, Brian G. Booth, Jill G. Zwicker, Ruth E. Grunau, Anne R. Synnes, Vann Chau, and Ghassan Hamarneh Hierarchical Clustering of Tractography Streamlines Based on Anatomical Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Viviana Siless, Ken Chang, Bruce Fischl, and Anastasia Yendiki Unsupervised Identification of Clinically Relevant Clusters in Routine Imaging Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Johannes Hofmanninger, Markus Krenn, Markus Holzer, Thomas Schlegl, Helmut Prosch, and Georg Langs Probabilistic Tractography for Topographically Organized Connectomes . . . . Dogu Baran Aydogan and Yonggang Shi

XIX

166

175

184

192

201

Brain Analysis: Cortical Morphology A Hybrid Multishape Learning Framework for Longitudinal Prediction of Cortical Surfaces and Fiber Tracts Using Neonatal Data. . . . . . . . . . . . . . Islem Rekik, Gang Li, Pew-Thian Yap, Geng Chen, Weili Lin, and Dinggang Shen

210

Learning-Based Topological Correction for Infant Cortical Surfaces . . . . . . . Shijie Hao, Gang Li, Li Wang, Yu Meng, and Dinggang Shen

219

Riemannian Metric Optimization for Connectivity-Driven Surface Mapping . . . Jin Kyu Gahm and Yonggang Shi

228

Riemannian Statistical Analysis of Cortical Geometry with Robustness to Partial Homology and Misalignment . . . . . . . . . . . . . . . . . . . . . . . . . . . Suyash P. Awate, Richard M. Leahy, and Anand A. Joshi Modeling Fetal Cortical Expansion Using Graph-Regularized Gompertz Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ernst Schwartz, Gregor Kasprian, András Jakab, Daniela Prayer, Veronika Schöpf, and Georg Langs Longitudinal Analysis of the Preterm Cortex Using Multi-modal Spectral Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eliza Orasanu, Pierre-Louis Bazin, Andrew Melbourne, Marco Lorenzi, Herve Lombaert, Nicola J. Robertson, Giles Kendall, Nikolaus Weiskopf, Neil Marlow, and Sebastien Ourselin

237

247

255

XX

Contents – Part I

Alzheimer Disease Early Diagnosis of Alzheimer’s Disease by Joint Feature Selection and Classification on Temporally Structured Support Vector Machine . . . . . . Yingying Zhu, Xiaofeng Zhu, Minjeong Kim, Dinggang Shen, and Guorong Wu Prediction of Memory Impairment with MRI Data: A Longitudinal Study of Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaoqian Wang, Dinggang Shen, and Heng Huang Joint Data Harmonization and Group Cardinality Constrained Classification . . . Yong Zhang, Sang Hyun Park, and Kilian M. Pohl Progressive Graph-Based Transductive Learning for Multi-modal Classification of Brain Disorder Disease. . . . . . . . . . . . . . . . . . . . . . . . . . . Zhengxia Wang, Xiaofeng Zhu, Ehsan Adeli, Yingying Zhu, Chen Zu, Feiping Nie, Dinggang Shen, and Guorong Wu

264

273 282

291

Structured Outlier Detection in Neuroimaging Studies with Minimal Convex Polytopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erdem Varol, Aristeidis Sotiras, and Christos Davatzikos

300

Diagnosis of Alzheimer’s Disease Using View-Aligned Hypergraph Learning with Incomplete Multi-modality Data . . . . . . . . . . . . . . . . . . . . . . Mingxia Liu, Jun Zhang, Pew-Thian Yap, and Dinggang Shen

308

New Multi-task Learning Model to Predict Alzheimer’s Disease Cognitive Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhouyuan Huo, Dinggang Shen, and Heng Huang

317

Hyperbolic Space Sparse Coding with Its Application on Prediction of Alzheimer’s Disease in Mild Cognitive Impairment . . . . . . . . . . . . . . . . . Jie Zhang, Jie Shi, Cynthia Stonnington, Qingyang Li, Boris A. Gutman, Kewei Chen, Eric M. Reiman, Richard Caselli, Paul M. Thompson, Jieping Ye, and Yalin Wang Large-Scale Collaborative Imaging Genetics Studies of Risk Genetic Factors for Alzheimer’s Disease Across Multiple Institutions . . . . . . . . . . . . Qingyang Li, Tao Yang, Liang Zhan, Derrek Paul Hibar, Neda Jahanshad, Yalin Wang, Jieping Ye, Paul M. Thompson, and Jie Wang Structured Sparse Low-Rank Regression Model for Brain-Wide and Genome-Wide Associations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaofeng Zhu, Heung-Il Suk, Heng Huang, and Dinggang Shen

326

335

344

Contents – Part I

XXI

Surgical Guidance and Tracking 3D Ultrasonic Needle Tracking with a 1.5D Transducer Array for Guidance of Fetal Interventions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenfeng Xia, Simeon J. West, Jean-Martial Mari, Sebastien Ourselin, Anna L. David, and Adrien E. Desjardins Enhancement of Needle Tip and Shaft from 2D Ultrasound Using Signal Transmission Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cosmas Mwikirize, John L. Nosher, and Ilker Hacihaliloglu Plane Assist: The Influence of Haptics on Ultrasound-Based Needle Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heather Culbertson, Julie M. Walker, Michael Raitor, Allison M. Okamura, and Philipp J. Stolka A Surgical Guidance System for Big-Bubble Deep Anterior Lamellar Keratoplasty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hessam Roodaki, Chiara Amat di San Filippo, Daniel Zapp, Nassir Navab, and Abouzar Eslami Real-Time 3D Tracking of Articulated Tools for Robotic Surgery . . . . . . . . . Menglong Ye, Lin Zhang, Stamatia Giannarou, and Guang-Zhong Yang

353

362

370

378

386

Towards Automated Ultrasound Transesophageal Echocardiography and X-Ray Fluoroscopy Fusion Using an Image-Based Co-registration Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shanhui Sun, Shun Miao, Tobias Heimann, Terrence Chen, Markus Kaiser, Matthias John, Erin Girard, and Rui Liao

395

Robust, Real-Time, Dense and Deformable 3D Organ Tracking in Laparoscopic Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Toby Collins, Adrien Bartoli, Nicolas Bourdel, and Michel Canis

404

Structure-Aware Rank-1 Tensor Approximation for Curvilinear Structure Tracking Using Learned Hierarchical Features. . . . . . . . . . . . . . . . Peng Chu, Yu Pang, Erkang Cheng, Ying Zhu, Yefeng Zheng, and Haibin Ling Real-Time Online Adaption for Robust Instrument Tracking and Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nicola Rieke, David Joseph Tan, Federico Tombari, Josué Page Vizcaíno, Chiara Amat di San Filippo, Abouzar Eslami, and Nassir Navab

413

422

XXII

Contents – Part I

Integrated Dynamic Shape Tracking and RF Speckle Tracking for Cardiac Motion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nripesh Parajuli, Allen Lu, John C. Stendahl, Maria Zontak, Nabil Boutagy, Melissa Eberle, Imran Alkhalil, Matthew O’Donnell, Albert J. Sinusas, and James S. Duncan The Endoscopogram: A 3D Model Reconstructed from Endoscopic Video Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qingyu Zhao, True Price, Stephen Pizer, Marc Niethammer, Ron Alterovitz, and Julian Rosenman Robust Image Descriptors for Real-Time Inter-Examination Retargeting in Gastrointestinal Endoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Menglong Ye, Edward Johns, Benjamin Walter, Alexander Meining, and Guang-Zhong Yang Kalman Filter Based Data Fusion for Needle Deflection Estimation Using Optical-EM Sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Baichuan Jiang, Wenpeng Gao, Daniel F. Kacher, Thomas C. Lee, and Jagadeesan Jayender Bone Enhancement in Ultrasound Based on 3D Local Spectrum Variation for Percutaneous Scaphoid Fracture Fixation. . . . . . . . . . . . . . . . . . . . . . . . Emran Mohammad Abu Anas, Alexander Seitel, Abtin Rasoulian, Paul St. John, Tamas Ungi, Andras Lasso, Kathryn Darras, David Wilson, Victoria A. Lessoway, Gabor Fichtinger, Michelle Zec, David Pichora, Parvin Mousavi, Robert Rohling, and Purang Abolmaesumi Bioelectric Navigation: A New Paradigm for Intravascular Device Guidance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bernhard Fuerst, Erin E. Sutton, Reza Ghotbi, Noah J. Cowan, and Nassir Navab

431

439

448

457

465

474

Computer Aided Interventions Process Monitoring in the Intensive Care Unit: Assessing Patient Mobility Through Activity Analysis with a Non-Invasive Mobility Sensor . . . . . . . . . Austin Reiter, Andy Ma, Nishi Rawat, Christine Shrock, and Suchi Saria Patient MoCap: Human Pose Estimation Under Blanket Occlusion for Hospital Monitoring Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Felix Achilles, Alexandru-Eugen Ichim, Huseyin Coskun, Federico Tombari, Soheyl Noachtar, and Nassir Navab

482

491

Contents – Part I

Numerical Simulation of Cochlear-Implant Surgery: Towards Patient-Specific Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olivier Goury, Yann Nguyen, Renato Torres, Jeremie Dequidt, and Christian Duriez Meaningful Assessment of Surgical Expertise: Semantic Labeling with Data and Crowds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marzieh Ershad, Zachary Koesters, Robert Rege, and Ann Majewicz 2D-3D Registration Accuracy Estimation for Optimised Planning of Image-Guided Pancreatobiliary Interventions. . . . . . . . . . . . . . . . . . . . . . Yipeng Hu, Ester Bonmati, Eli Gibson, John H. Hipwell, David J. Hawkes, Steven Bandula, Stephen P. Pereira, and Dean C. Barratt Registration-Free Simultaneous Catheter and Environment Modelling . . . . . . Liang Zhao, Stamatia Giannarou, Su-Lin Lee, and Guang-Zhong Yang Pareto Front vs. Weighted Sum for Automatic Trajectory Planning of Deep Brain Stimulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noura Hamzé, Jimmy Voirin, Pierre Collet, Pierre Jannin, Claire Haegelen, and Caroline Essert Efficient Anatomy Driven Automated Multiple Trajectory Planning for Intracranial Electrode Implantation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rachel Sparks, Gergely Zombori, Roman Rodionov, Maria A. Zuluaga, Beate Diehl, Tim Wehner, Anna Miserocchi, Andrew W. McEvoy, John S. Duncan, and Sebastien Ourselin Recognizing Surgical Activities with Recurrent Neural Networks . . . . . . . . . Robert DiPietro, Colin Lea, Anand Malpani, Narges Ahmidi, S. Swaroop Vedula, Gyusung I. Lee, Mija R. Lee, and Gregory D. Hager Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction Accuracy for Orthognathic Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daeseung Kim, Chien-Ming Chang, Dennis Chun-Yu Ho, Xiaoyan Zhang, Shunyao Shen, Peng Yuan, Huaming Mai, Guangming Zhang, Xiaobo Zhou, Jaime Gateno, Michael A.K. Liebschner, and James J. Xia

XXIII

500

508

516

525

534

542

551

559

Ultrasound Image Analysis Hand-Held Sound-Speed Imaging Based on Ultrasound Reflector Delineation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sergio J. Sanabria and Orcun Goksel

568

XXIV

Contents – Part I

Ultrasound Tomosynthesis: A New Paradigm for Quantitative Imaging of the Prostate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fereshteh Aalamifar, Reza Seifabadi, Marcelino Bernardo, Ayele H. Negussie, Baris Turkbey, Maria Merino, Peter Pinto, Arman Rahmim, Bradford J. Wood, and Emad M. Boctor Photoacoustic Imaging Paradigm Shift: Towards Using Vendor-Independent Ultrasound Scanners . . . . . . . . . . . . . . . . . . . . . Haichong K. Zhang, Xiaoyu Guo, Behnoosh Tavakoli, and Emad M. Boctor 4D Reconstruction of Fetal Heart Ultrasound Images in Presence of Fetal Motion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christine Tanner, Barbara Flach, Céline Eggenberger, Oliver Mattausch, Michael Bajka, and Orcun Goksel Towards Reliable Automatic Characterization of Neonatal Hip Dysplasia from 3D Ultrasound Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Niamul Quader, Antony Hodgson, Kishore Mulpuri, Anthony Cooper, and Rafeef Abugharbieh

577

585

593

602

Cancer Image Analysis Image-Based Computer-Aided Diagnostic System for Early Diagnosis of Prostate Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Islam Reda, Ahmed Shalaby, Mohammed Elmogy, Ahmed Aboulfotouh, Fahmi Khalifa, Mohamed Abou El-Ghar, Georgy Gimelfarb, and Ayman El-Baz Multidimensional Texture Analysis for Improved Prediction of Ultrasound Liver Tumor Response to Chemotherapy Treatment. . . . . . . . . . . . . . . . . . . Omar S. Al-Kadi, Dimitri Van De Ville, and Adrien Depeursinge Classification of Prostate Cancer Grades and T-Stages Based on Tissue Elasticity Using Medical Image Analysis . . . . . . . . . . . . . . . . . . . Shan Yang, Vladimir Jojic, Jun Lian, Ronald Chen, Hongtu Zhu, and Ming C. Lin Automatic Determination of Hormone Receptor Status in Breast Cancer Using Thermography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Siva Teja Kakileti, Krithika Venkataramani, and Himanshu J. Madhu Prostate Cancer: Improved Tissue Characterization by Temporal Modeling of Radio-Frequency Ultrasound Echo Data . . . . . . . . . . . . . . . . . . . . . . . . . Layan Nahlawi, Farhad Imani, Mena Gaed, Jose A. Gomez, Madeleine Moussa, Eli Gibson, Aaron Fenster, Aaron D. Ward, Purang Abolmaesumi, Hagit Shatkay, and Parvin Mousavi

610

619

627

636

644

Contents – Part I

Classifying Cancer Grades Using Temporal Ultrasound for Transrectal Prostate Biopsy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shekoofeh Azizi, Farhad Imani, Jin Tae Kwak, Amir Tahmasebi, Sheng Xu, Pingkun Yan, Jochen Kruecker, Baris Turkbey, Peter Choyke, Peter Pinto, Bradford Wood, Parvin Mousavi, and Purang Abolmaesumi Characterization of Lung Nodule Malignancy Using Hybrid Shape and Appearance Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mario Buty, Ziyue Xu, Mingchen Gao, Ulas Bagci, Aaron Wu, and Daniel J. Mollura Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

XXV

653

662

671

Contents – Part II

Machine Learning and Feature Selection Feature Selection Based on Iterative Canonical Correlation Analysis for Automatic Diagnosis of Parkinson’s Disease . . . . . . . . . . . . . . . . . . . . . Luyan Liu, Qian Wang, Ehsan Adeli, Lichi Zhang, Han Zhang, and Dinggang Shen Identifying Relationships in Functional and Structural Connectome Data Using a Hypergraph Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brent C. Munsell, Guorong Wu, Yue Gao, Nicholas Desisto, and Martin Styner Ensemble Hierarchical High-Order Functional Connectivity Networks for MCI Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaobo Chen, Han Zhang, and Dinggang Shen Outcome Prediction for Patient with High-Grade Gliomas from Brain Functional and Structural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luyan Liu, Han Zhang, Islem Rekik, Xiaobo Chen, Qian Wang, and Dinggang Shen Mammographic Mass Segmentation with Online Learned Shape and Appearance Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Menglin Jiang, Shaoting Zhang, Yuanjie Zheng, and Dimitris N. Metaxas Differential Dementia Diagnosis on Incomplete Data with Latent Trees . . . . . Christian Ledig, Sebastian Kaltwang, Antti Tolonen, Juha Koikkalainen, Philip Scheltens, Frederik Barkhof, Hanneke Rhodius-Meester, Betty Tijms, Afina W. Lemstra, Wiesje van der Flier, Jyrki Lötjönen, and Daniel Rueckert Bridging Computational Features Toward Multiple Semantic Features with Multi-task Regression: A Study of CT Pulmonary Nodules . . . . . . . . . . Sihong Chen, Dong Ni, Jing Qin, Baiying Lei, Tianfu Wang, and Jie-Zhi Cheng Robust Cancer Treatment Outcome Prediction Dealing with Small-Sized and Imbalanced Data from FDG-PET Images . . . . . . . . . . . . . . . . . . . . . . . Chunfeng Lian, Su Ruan, Thierry Denœux, Hua Li, and Pierre Vera

1

9

18

26

35

44

53

61

XXVIII

Contents – Part II

Structured Sparse Kernel Learning for Imaging Genetics Based Alzheimer’s Disease Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jailin Peng, Le An, Xiaofeng Zhu, Yan Jin, and Dinggang Shen

70

Semi-supervised Hierarchical Multimodal Feature and Sample Selection for Alzheimer’s Disease Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Le An, Ehsan Adeli, Mingxia Liu, Jun Zhang, and Dinggang Shen

79

Stability-Weighted Matrix Completion of Incomplete Multi-modal Data for Disease Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kim-Han Thung, Ehsan Adeli, Pew-Thian Yap, and Dinggang Shen

88

Employing Visual Analytics to Aid the Design of White Matter Hyperintensity Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Renata Georgia Raidou, Hugo J. Kuijf, Neda Sepasian, Nicola Pezzotti, Willem H. Bouvy, Marcel Breeuwer, and Anna Vilanova

97

Deep Learning in Medical Imaging The Automated Learning of Deep Features for Breast Mass Classification from Mammograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Neeraj Dhungel, Gustavo Carneiro, and Andrew P. Bradley Multimodal Deep Learning for Cervical Dysplasia Diagnosis . . . . . . . . . . . . Tao Xu, Han Zhang, Xiaolei Huang, Shaoting Zhang, and Dimitris N. Metaxas Learning from Experts: Developing Transferable Deep Features for Patient-Level Lung Cancer Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . Wei Shen, Mu Zhou, Feng Yang, Di Dong, Caiyun Yang, Yali Zang, and Jie Tian DeepVessel: Retinal Vessel Segmentation via Deep Learning and Conditional Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Huazhu Fu, Yanwu Xu, Stephen Lin, Damon Wing Kee Wong, and Jiang Liu Deep Retinal Image Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kevis-Kokitsi Maninis, Jordi Pont-Tuset, Pablo Arbeláez, and Luc Van Gool 3D Deeply Supervised Network for Automatic Liver Segmentation from CT Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qi Dou, Hao Chen, Yueming Jin, Lequan Yu, Jing Qin, and Pheng-Ann Heng

106 115

124

132

140

149

Contents – Part II

Deep Neural Networks for Fast Segmentation of 3D Medical Images . . . . . . Karl Fritscher, Patrik Raudaschl, Paolo Zaffino, Maria Francesca Spadea, Gregory C. Sharp, and Rainer Schubert SpineNet: Automatically Pinpointing Classification Evidence in Spinal MRIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amir Jamaludin, Timor Kadir, and Andrew Zisserman A Deep Learning Approach for Semantic Segmentation in Histology Tissue Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiazhuo Wang, John D. MacKenzie, Rageshree Ramachandran, and Danny Z. Chen

XXIX

158

166

176

Spatial Clockwork Recurrent Neural Network for Muscle Perimysium Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuanpu Xie, Zizhao Zhang, Manish Sapkota, and Lin Yang

185

Automated Age Estimation from Hand MRI Volumes Using Deep Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Darko Štern, Christian Payer, Vincent Lepetit, and Martin Urschler

194

Real-Time Standard Scan Plane Detection and Localisation in Fetal Ultrasound Using Fully Convolutional Neural Networks . . . . . . . . . . . . . . . Christian F. Baumgartner, Konstantinos Kamnitsas, Jacqueline Matthew, Sandra Smith, Bernhard Kainz, and Daniel Rueckert 3D Deep Learning for Multi-modal Imaging-Guided Survival Time Prediction of Brain Tumor Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dong Nie, Han Zhang, Ehsan Adeli, Luyan Liu, and Dinggang Shen

203

212

Applications of Machine Learning From Local to Global Random Regression Forests: Exploring Anatomical Landmark Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Darko Štern, Thomas Ebner, and Martin Urschler

221

Regressing Heatmaps for Multiple Landmark Localization Using CNNs. . . . . Christian Payer, Darko Štern, Horst Bischof, and Martin Urschler

230

Self-Transfer Learning for Weakly Supervised Lesion Localization . . . . . . . . Sangheum Hwang and Hyo-Eun Kim

239

Automatic Cystocele Severity Grading in Ultrasound by Spatio-Temporal Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dong Ni, Xing Ji, Yaozong Gao, Jie-Zhi Cheng, Huifang Wang, Jing Qin, Baiying Lei, Tianfu Wang, Guorong Wu, and Dinggang Shen

247

XXX

Contents – Part II

Graphical Modeling of Ultrasound Propagation in Tissue for Automatic Bone Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Firat Ozdemir, Ece Ozkan, and Orcun Goksel Bayesian Image Quality Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ryutaro Tanno, Aurobrata Ghosh, Francesco Grussu, Enrico Kaden, Antonio Criminisi, and Daniel C. Alexander Wavelet Appearance Pyramids for Landmark Detection and Pathology Classification: Application to Lumbar Spinal Stenosis . . . . . . . . . . . . . . . . . Qiang Zhang, Abhir Bhalerao, Caron Parsons, Emma Helm, and Charles Hutchinson A Learning-Free Approach to Whole Spine Vertebra Localization in MRI . . . Marko Rak and Klaus-Dietz Tönnies Automatic Quality Control for Population Imaging: A Generic Unsupervised Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohsen Farzi, Jose M. Pozo, Eugene V. McCloskey, J. Mark Wilkinson, and Alejandro F. Frangi A Cross-Modality Neural Network Transform for Semi-automatic Medical Image Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehdi Moradi, Yufan Guo, Yaniv Gur, Mohammadreza Negahdar, and Tanveer Syeda-Mahmood Sub-category Classifiers for Multiple-instance Learning and Its Application to Retinal Nerve Fiber Layer Visibility Classification . . . . . . . . . . . . . . . . . Siyamalan Manivannan, Caroline Cobb, Stephen Burgess, and Emanuele Trucco Vision-Based Classification of Developmental Disorders Using Eye-Movements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guido Pusiol, Andre Esteva, Scott S. Hall, Michael Frank, Arnold Milstein, and Li Fei-Fei Scalable Unsupervised Domain Adaptation for Electron Microscopy . . . . . . . Róger Bermúdez-Chacón, Carlos Becker, Mathieu Salzmann, and Pascal Fua Automated Diagnosis of Neural Foraminal Stenosis Using Synchronized Superpixels Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaoxu He, Yilong Yin, Manas Sharma, Gary Brahm, Ashley Mercado, and Shuo Li

256 265

274

283

291

300

308

317

326

335

Contents – Part II

XXXI

Segmentation Automated Segmentation of Knee MRI Using Hierarchical Classifiers and Just Enough Interaction Based Learning: Data from Osteoarthritis Initiative . . . Satyananda Kashyap, Ipek Oguz, Honghai Zhang, and Milan Sonka Dynamically Balanced Online Random Forests for Interactive Scribble-Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guotai Wang, Maria A. Zuluaga, Rosalind Pratt, Michael Aertsen, Tom Doel, Maria Klusmann, Anna L. David, Jan Deprest, Tom Vercauteren, and Sébastien Ourselin Orientation-Sensitive Overlap Measures for the Validation of Medical Image Segmentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tasos Papastylianou, Erica Dall’ Armellina, and Vicente Grau High-Throughput Glomeruli Analysis of lCT Kidney Images Using Tree Priors and Scalable Sparse Computation . . . . . . . . . . . . . . . . . . Carlos Correa Shokiche, Philipp Baumann, Ruslan Hlushchuk, Valentin Djonov, and Mauricio Reyes A Surface Patch-Based Segmentation Method for Hippocampal Subfields . . . Benoit Caldairou, Boris C. Bernhardt, Jessie Kulaga-Yoskovitz, Hosung Kim, Neda Bernasconi, and Andrea Bernasconi Automatic Lymph Node Cluster Segmentation Using Holistically-Nested Neural Networks and Structured Optimization in CT Images . . . . . . . . . . . . Isabella Nogues, Le Lu, Xiaosong Wang, Holger Roth, Gedas Bertasius, Nathan Lay, Jianbo Shi, Yohannes Tsehay, and Ronald M. Summers Evaluation-Oriented Training via Surrogate Metrics for Multiple Sclerosis Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michel M. Santos, Paula R.B. Diniz, Abel G. Silva-Filho, and Wellington P. Santos Corpus Callosum Segmentation in Brain MRIs via Robust TargetLocalization and Joint Supervised Feature Extraction and Prediction . . . . . . . Lisa Y.W. Tang, Tom Brosch, XingTong Liu, Youngjin Yoo, Anthony Traboulsee, David Li, and Roger Tam Automatic Liver and Lesion Segmentation in CT Using Cascaded Fully Convolutional Neural Networks and 3D Conditional Random Fields . . . . . . . Patrick Ferdinand Christ, Mohamed Ezzeldin A. Elshaer, Florian Ettlinger, Sunil Tatavarty, Marc Bickel, Patrick Bilic, Markus Rempfler, Marco Armbruster, Felix Hofmann, Melvin D’Anastasi, Wieland H. Sommer, Seyed-Ahmad Ahmadi, and Bjoern H. Menze

344

352

361

370

379

388

398

406

415

XXXII

Contents – Part II

3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Özgün Çiçek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger Model-Based Segmentation of Vertebral Bodies from MR Images with 3D CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Robert Korez, Boštjan Likar, Franjo Pernuš, and Tomaž Vrtovec Pancreas Segmentation in MRI Using Graph-Based Decision Fusion on Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jinzheng Cai, Le Lu, Zizhao Zhang, Fuyong Xing, Lin Yang, and Qian Yin Spatial Aggregation of Holistically-Nested Networks for Automated Pancreas Segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Holger R. Roth, Le Lu, Amal Farag, Andrew Sohn, and Ronald M. Summers Topology Aware Fully Convolutional Networks for Histology Gland Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aïcha BenTaieb and Ghassan Hamarneh HeMIS: Hetero-Modal Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . Mohammad Havaei, Nicolas Guizard, Nicolas Chapados, and Yoshua Bengio Deep Learning for Multi-task Medical Image Segmentation in Multiple Modalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pim Moeskops, Jelmer M. Wolterink, Bas H.M. van der Velden, Kenneth G.A. Gilhuijs, Tim Leiner, Max A. Viergever, and Ivana Išgum Iterative Multi-domain Regularized Deep Learning for Anatomical Structure Detection and Segmentation from Ultrasound Images . . . . . . . . . . . Hao Chen, Yefeng Zheng, Jin-Hyeong Park, Pheng-Ann Heng, and S. Kevin Zhou Gland Instance Segmentation by Deep Multichannel Side Supervision . . . . . . Yan Xu, Yang Li, Mingyuan Liu, Yipei Wang, Maode Lai, and Eric I-Chao Chang Enhanced Probabilistic Label Fusion by Estimating Label Confidences Through Discriminative Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oualid M. Benkarim, Gemma Piella, Miguel Angel González Ballester, and Gerard Sanroma

424

433

442

451

460 469

478

487

496

505

Contents – Part II

XXXIII

Feature Sensitive Label Fusion with Random Walker for Atlas-Based Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Siqi Bao and Albert C.S. Chung

513

Deep Fusion Net for Multi-atlas Segmentation: Application to Cardiac MR Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heran Yang, Jian Sun, Huibin Li, Lisheng Wang, and Zongben Xu

521

Prior-Based Coregistration and Cosegmentation . . . . . . . . . . . . . . . . . . . . . Mahsa Shakeri, Enzo Ferrante, Stavros Tsogkas, Sarah Lippé, Samuel Kadoury, Iasonas Kokkinos, and Nikos Paragios

529

Globally Optimal Label Fusion with Shape Priors . . . . . . . . . . . . . . . . . . . . Ipek Oguz, Satyananda Kashyap, Hongzhi Wang, Paul Yushkevich, and Milan Sonka

538

Joint Segmentation and CT Synthesis for MRI-only Radiotherapy Treatment Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ninon Burgos, Filipa Guerreiro, Jamie McClelland, Simeon Nill, David Dearnaley, Nandita deSouza, Uwe Oelfke, Antje-Christin Knopf, Sébastien Ourselin, and M. Jorge Cardoso Regression Forest-Based Atlas Localization and Direction Specific Atlas Generation for Pancreas Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . Masahiro Oda, Natsuki Shimizu, Ken’ichi Karasawa, Yukitaka Nimura, Takayuki Kitasaka, Kazunari Misawa, Michitaka Fujiwara, Daniel Rueckert, and Kensaku Mori Accounting for the Confound of Meninges in Segmenting Entorhinal and Perirhinal Cortices in T1-Weighted MRI . . . . . . . . . . . . . . . . . . . . . . . Long Xie, Laura E.M. Wisse, Sandhitsu R. Das, Hongzhi Wang, David A. Wolk, Jose V. Manjón, and Paul A. Yushkevich 7T-Guided Learning Framework for Improving the Segmentation of 3T MR Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Khosro Bahrami, Islem Rekik, Feng Shi, Yaozong Gao, and Dinggang Shen Multivariate Mixture Model for Cardiac Segmentation from Multi-Sequence MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiahai Zhuang Fast Fully Automatic Segmentation of the Human Placenta from Motion Corrupted MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amir Alansary, Konstantinos Kamnitsas, Alice Davidson, Rostislav Khlebnikov, Martin Rajchl, Christina Malamateniou, Mary Rutherford, Joseph V. Hajnal, Ben Glocker, Daniel Rueckert, and Bernhard Kainz

547

556

564

572

581

589

XXXIV

Contents – Part II

Multi-organ Segmentation Using Vantage Point Forests and Binary Context Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mattias P. Heinrich and Maximilian Blendowski Multiple Object Segmentation and Tracking by Bayes Risk Minimization . . . Tomáš Sixta and Boris Flach Crowd-Algorithm Collaboration for Large-Scale Endoscopic Image Annotation with Confidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L. Maier-Hein, T. Ross, J. Gröhl, B. Glocker, S. Bodenstedt, C. Stock, E. Heim, M. Götz, S. Wirkert, H. Kenngott, S. Speidel, and K. Maier-Hein Emphysema Quantification on Cardiac CT Scans Using Hidden Markov Measure Field Model: The MESA Lung Study . . . . . . . . . . . . . . . . . . . . . . Jie Yang, Elsa D. Angelini, Pallavi P. Balte, Eric A. Hoffman, Colin O. Wu, Bharath A. Venkatesh, R. Graham Barr, and Andrew F. Laine

598 607

616

624

Cell Image Analysis Cutting Out the Middleman: Measuring Nuclear Area in Histopathology Slides Without Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mitko Veta, Paul J. van Diest, and Josien P.W. Pluim

632

Subtype Cell Detection with an Accelerated Deep Convolution Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sheng Wang, Jiawen Yao, Zheng Xu, and Junzhou Huang

640

Imaging Biomarker Discovery for Lung Cancer Survival Prediction . . . . . . . Jiawen Yao, Sheng Wang, Xinliang Zhu, and Junzhou Huang 3D Segmentation of Glial Cells Using Fully Convolutional Networks and k-Terminal Cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lin Yang, Yizhe Zhang, Ian H. Guldner, Siyuan Zhang, and Danny Z. Chen Detection of Differentiated vs. Undifferentiated Colonies of iPS Cells Using Random Forests Modeled with the Multivariate Polya Distribution . . . . . . . . Bisser Raytchev, Atsuki Masuda, Masatoshi Minakawa, Kojiro Tanaka, Takio Kurita, Toru Imamura, Masashi Suzuki, Toru Tamaki, and Kazufumi Kaneda Detecting 10,000 Cells in One Second. . . . . . . . . . . . . . . . . . . . . . . . . . . . Zheng Xu and Junzhou Huang

649

658

667

676

Contents – Part II

XXXV

A Hierarchical Convolutional Neural Network for Mitosis Detection in Phase-Contrast Microscopy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yunxiang Mao and Zhaozheng Yin

685

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

693

Contents – Part III

Registration and Deformation Estimation Learning-Based Multimodal Image Registration for Prostate Cancer Radiation Therapy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaohuan Cao, Yaozong Gao, Jianhua Yang, Guorong Wu, and Dinggang Shen

1

A Deep Metric for Multimodal Registration . . . . . . . . . . . . . . . . . . . . . . . . Martin Simonovsky, Benjamín Gutiérrez-Becker, Diana Mateus, Nassir Navab, and Nikos Komodakis

10

Learning Optimization Updates for Multimodal Registration . . . . . . . . . . . . . Benjamín Gutiérrez-Becker, Diana Mateus, Loïc Peter, and Nassir Navab

19

Memory Efficient LDDMM for Lung CT. . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Polzin, Marc Niethammer, Mattias P. Heinrich, Heinz Handels, and Jan Modersitzki

28

Inertial Demons: A Momentum-Based Diffeomorphic Registration Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andre Santos-Ribeiro, David J. Nutt, and John McGonigle

37

Diffeomorphic Density Registration in Thoracic Computed Tomography . . . . Caleb Rottman, Ben Larson, Pouya Sabouri, Amit Sawant, and Sarang Joshi

46

Temporal Registration in In-Utero Volumetric MRI Time Series . . . . . . . . . . Ruizhi Liao, Esra A. Turk, Miaomiao Zhang, Jie Luo, P. Ellen Grant, Elfar Adalsteinsson, and Polina Golland

54

Probabilistic Atlas of the Human Hippocampus Combining Ex Vivo MRI and Histology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Daniel H. Adler, Ranjit Ittyerah, John Pluta, Stephen Pickup, Weixia Liu, David A. Wolk, and Paul A. Yushkevich Deformation Estimation with Automatic Sliding Boundary Computation . . . . Joseph Samuel Preston, Sarang Joshi, and Ross Whitaker Bilateral Weighted Adaptive Local Similarity Measure for Registration in Neurosurgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Kochan, Marc Modat, Tom Vercauteren, Mark White, Laura Mancini, Gavin P. Winston, Andrew W. McEvoy, John S. Thornton, Tarek Yousry, John S. Duncan, Sébastien Ourselin, and Danail Stoyanov

63

72

81

XXXVIII

Contents – Part III

Model-Based Regularisation for Respiratory Motion Estimation with Sparse Features in Image-Guided Interventions . . . . . . . . . . . . . . . . . . Matthias Wilms, In Young Ha, Heinz Handels, and Mattias Paul Heinrich Carotid Artery Wall Motion Estimated from Ultrasound Imaging Sequences Using a Nonlinear State Space Approach . . . . . . . . . . . . . . . . . . . . . . . . . . Zhifan Gao, Yuanyuan Sun, Heye Zhang, Dhanjoo Ghista, Yanjie Li, Huahua Xiong, Xin Liu, Yaoqin Xie, Wanqing Wu, and Shuo Li Accuracy Estimation for Medical Image Registration Using Regression Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hessam Sokooti, Gorkem Saygili, Ben Glocker, Boudewijn P.F. Lelieveldt, and Marius Staring Embedding Segmented Volume in Finite Element Mesh with Topology Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kazuya Sase, Teppei Tsujita, and Atsushi Konno Deformable 3D-2D Registration of Known Components for Image Guidance in Spine Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Uneri, J. Goerres, T. De Silva, M.W. Jacobson, M.D. Ketcha, S. Reaungamornrat, G. Kleinszig, S. Vogt, A.J. Khanna, J.-P. Wolinsky, and J.H. Siewerdsen Anatomically Constrained Video-CT Registration via the V-IMLOP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seth D. Billings, Ayushi Sinha, Austin Reiter, Simon Leonard, Masaru Ishii, Gregory D. Hager, and Russell H. Taylor

89

98

107

116

124

133

Shape Modeling A Multi-resolution T-Mixture Model Approach to Robust Group-Wise Alignment of Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nishant Ravikumar, Ali Gooya, Serkan Çimen, Alejandro F. Frangi, and Zeike A. Taylor Quantifying Shape Deformations by Variation of Geometric Spectrum. . . . . . Hajar Hamidian, Jiaxi Hu, Zichun Zhong, and Jing Hua Myocardial Segmentation of Contrast Echocardiograms Using Random Forests Guided by Shape Model . . . . . . . . . . . . . . . . . . . . . Yuanwei Li, Chin Pang Ho, Navtej Chahal, Roxy Senior, and Meng-Xing Tang Low-Dimensional Statistics of Anatomical Variability via Compact Representation of Image Deformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . Miaomiao Zhang, William M. Wells III, and Polina Golland

142

150

158

166

Contents – Part III

A Multiscale Cardiac Model for Fast Personalisation and Exploitation . . . . . . Roch Mollero, Xavier Pennec, Hervé Delingette, Nicholas Ayache, and Maxime Sermesant Transfer Shape Modeling Towards High-Throughput Microscopy Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fuyong Xing, Xiaoshuang Shi, Zizhao Zhang, JinZheng Cai, Yuanpu Xie, and Lin Yang Hierarchical Generative Modeling and Monte-Carlo EM in Riemannian Shape Space for Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Saurabh J. Shigwan and Suyash P. Awate Direct Estimation of Wall Shear Stress from Aneurysmal Morphology: A Statistical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ali Sarrami-Foroushani, Toni Lassila, Jose M. Pozo, Ali Gooya, and Alejandro F. Frangi Multi-task Shape Regression for Medical Image Segmentation . . . . . . . . . . . Xiantong Zhen, Yilong Yin, Mousumi Bhaduri, Ilanit Ben Nachum, David Laidley, and Shuo Li Soft Multi-organ Shape Models via Generalized PCA: A General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan J. Cerrolaza, Ronald M. Summers, and Marius George Linguraru An Artificial Agent for Anatomical Landmark Detection in Medical Images . . . Florin C. Ghesu, Bogdan Georgescu, Tommaso Mansi, Dominik Neumann, Joachim Hornegger, and Dorin Comaniciu

XXXIX

174

183

191

201

210

219 229

Cardiac and Vascular Image Analysis Identifying Patients at Risk for Aortic Stenosis Through Learning from Multimodal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tanveer Syeda-Mahmood, Yanrong Guo, Mehdi Moradi, D. Beymer, D. Rajan, Yu Cao, Yaniv Gur, and Mohammadreza Negahdar Multi-input Cardiac Image Super-Resolution Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ozan Oktay, Wenjia Bai, Matthew Lee, Ricardo Guerrero, Konstantinos Kamnitsas, Jose Caballero, Antonio de Marvao, Stuart Cook, Declan O’Regan, and Daniel Rueckert GPNLPerf: Robust 4d Non-rigid Motion Correction for Myocardial Perfusion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Thiruvenkadam, K.S. Shriram, B. Patil, G. Nicolas, M. Teisseire, C. Cardon, J. Knoplioch, N. Subramanian, S. Kaushik, and R. Mullick

238

246

255

XL

Contents – Part III

Recognizing End-Diastole and End-Systole Frames via Deep Temporal Regression Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bin Kong, Yiqiang Zhan, Min Shin, Thomas Denny, and Shaoting Zhang Basal Slice Detection Using Long-Axis Segmentation for Cardiac Analysis . . . Mahsa Paknezhad, Michael S. Brown, and Stephanie Marchesseau Spatially-Adaptive Multi-scale Optimization for Local Parameter Estimation: Application in Cardiac Electrophysiological Models . . . . . . . . . . Jwala Dhamala, John L. Sapp, Milan Horacek, and Linwei Wang Reconstruction of Coronary Artery Centrelines from X-Ray Angiography Using a Mixture of Student’s t-Distributions. . . . . . . . . . . . . . . . . . . . . . . . Serkan Çimen, Ali Gooya, Nishant Ravikumar, Zeike A. Taylor, and Alejandro F. Frangi Barycentric Subspace Analysis: A New Symmetric Group-Wise Paradigm for Cardiac Motion Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . Marc-Michel Rohé, Maxime Sermesant, and Xavier Pennec Extraction of Coronary Vessels in Fluoroscopic X-Ray Sequences Using Vessel Correspondence Optimization . . . . . . . . . . . . . . . . . . . . . . . . Seung Yeon Shin, Soochahn Lee, Kyoung Jin Noh, Il Dong Yun, and Kyoung Mu Lee Coronary Centerline Extraction via Optimal Flow Paths and CNN Path Pruning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehmet A. Gülsün, Gareth Funka-Lea, Puneet Sharma, Saikiran Rapaka, and Yefeng Zheng Vascular Registration in Photoacoustic Imaging by Low-Rank Alignment via Foreground, Background and Complement Decomposition . . . . . . . . . . . Ryoma Bise, Yingqiang Zheng, Imari Sato, and Masakazu Toi

264 273

282

291

300

308

317

326

From Real MRA to Virtual MRA: Towards an Open-Source Framework . . . . N. Passat, S. Salmon, J.-P. Armspach, B. Naegel, C. Prud’homme, H. Talbot, A. Fortin, S. Garnotel, O. Merveille, O. Miraucourt, R. Tarabay, V. Chabannes, A. Dufour, A. Jezierska, O. Balédent, E. Durand, L. Najman, M. Szopos, A. Ancel, J. Baruthio, M. Delbany, S. Fall, G. Pagé, O. Génevaux, M. Ismail, P. Loureiro de Sousa, M. Thiriet, and J. Jomier

335

Improved Diagnosis of Systemic Sclerosis Using Nailfold Capillary Flow . . . Michael Berks, Graham Dinsdale, Andrea Murray, Tonia Moore, Ariane Herrick, and Chris Taylor

344

Contents – Part III

Tensor-Based Graph-Cut in Riemannian Metric Space and Its Application to Renal Artery Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chenglong Wang, Masahiro Oda, Yuichiro Hayashi, Yasushi Yoshino, Tokunori Yamamoto, Alejandro F. Frangi, and Kensaku Mori Automatic, Robust, and Globally Optimal Segmentation of Tubular Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simon Pezold, Antal Horváth, Ketut Fundana, Charidimos Tsagkas, Michaela Andělová, Katrin Weier, Michael Amann, and Philippe C. Cattin Dense Volume-to-Volume Vascular Boundary Detection . . . . . . . . . . . . . . . Jameson Merkow, Alison Marsden, David Kriegman, and Zhuowen Tu HALE: Healthy Area of Lumen Estimation for Vessel Stenosis Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sethuraman Sankaran, Michiel Schaap, Stanley C. Hunley, James K. Min, Charles A. Taylor, and Leo Grady 3D Near Infrared and Ultrasound Imaging of Peripheral Blood Vessels for Real-Time Localization and Needle Guidance . . . . . . . . . . . . . . . . . . . . Alvin I. Chen, Max L. Balter, Timothy J. Maguire, and Martin L. Yarmush The Minimum Cost Connected Subgraph Problem in Medical Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Markus Rempfler, Bjoern Andres, and Bjoern H. Menze

XLI

353

362

371

380

388

397

Image Reconstruction ASL-incorporated Pharmacokinetic Modelling of PET Data With Reduced Acquisition Time: Application to Amyloid Imaging. . . . . . . . . . . . . . . . . . . Catherine J. Scott, Jieqing Jiao, Andrew Melbourne, Jonathan M. Schott, Brian F. Hutton, and Sébastien Ourselin Probe-Based Rapid Hybrid Hyperspectral and Tissue Surface Imaging Aided by Fully Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . . Jianyu Lin, Neil T. Clancy, Xueqing Sun, Ji Qi, Mirek Janatka, Danail Stoyanov, and Daniel S. Elson Efficient Low-Dose CT Denoising by Locally-Consistent Non-Local Means (LC-NLM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Green, Edith M. Marom, Nahum Kiryati, Eli Konen, and Arnaldo Mayer Deep Learning Computed Tomography . . . . . . . . . . . . . . . . . . . . . . . . . . . Tobias Würfl, Florin C. Ghesu, Vincent Christlein, and Andreas Maier

406

414

423

432

XLII

Contents – Part III

Axial Alignment for Anterior Segment Swept Source Optical Coherence Tomography via Robust Low-Rank Tensor Recovery . . . . . . . . . . . . . . . . . Yanwu Xu, Lixin Duan, Huazhu Fu, Xiaoqin Zhang, Damon Wing Kee Wong, Baskaran Mani, Tin Aung, and Jiang Liu 3D Imaging from Video and Planar Radiography . . . . . . . . . . . . . . . . . . . . Julien Pansiot and Edmond Boyer Semantic Reconstruction-Based Nuclear Cataract Grading from Slit-Lamp Lens Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yanwu Xu, Lixin Duan, Damon Wing Kee Wong, Tien Yin Wong, and Jiang Liu Vessel Orientation Constrained Quantitative Susceptibility Mapping (QSM) Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suheyla Cetin, Berkin Bilgic, Audrey Fan, Samantha Holdsworth, and Gozde Unal Spatial-Angular Sparse Coding for HARDI . . . . . . . . . . . . . . . . . . . . . . . . Evan Schwab, René Vidal, and Nicolas Charon Compressed Sensing Dynamic MRI Reconstruction Using GPU-accelerated 3D Convolutional Sparse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tran Minh Quan and Won-Ki Jeong

441

450

458

467

475

484

MRI Image Analysis Dynamic Volume Reconstruction from Multi-slice Abdominal MRI Using Manifold Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xin Chen, Muhammad Usman, Daniel R. Balfour, Paul K. Marsden, Andrew J. Reader, Claudia Prieto, and Andrew P. King Fast and Accurate Multi-tissue Deconvolution Using SHORE and H-psd Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Ankele, Lek-Heng Lim, Samuel Groeschel, and Thomas Schultz Optimisation of Arterial Spin Labelling Using Bayesian Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Owen, Andrew Melbourne, David Thomas, Enrico De Vita, Jonathan Rohrer, and Sebastien Ourselin 4D Phase-Contrast Magnetic Resonance CardioAngiography (4D PC-MRCA) Creation from 4D Flow MRI . . . . . . . . . . . . . . . . . . . . . . Mariana Bustamante, Vikas Gupta, Carl-Johan Carlhäll, and Tino Ebbers

493

502

511

519

Contents – Part III

Joint Estimation of Cardiac Motion and T1 Maps for Magnetic Resonance Late Gadolinium Enhancement Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . Jens Wetzl, Aurélien F. Stalder, Michaela Schmidt, Yigit H. Akgök, Christoph Tillmanns, Felix Lugauer, Christoph Forman, Joachim Hornegger, and Andreas Maier Correction of Fat-Water Swaps in Dixon MRI . . . . . . . . . . . . . . . . . . . . . . Ben Glocker, Ender Konukoglu, Ioannis Lavdas, Juan Eugenio Iglesias, Eric O. Aboagye, Andrea G. Rockall, and Daniel Rueckert Motion-Robust Reconstruction Based on Simultaneous Multi-slice Registration for Diffusion-Weighted MRI of Moving Subjects . . . . . . . . . . . Bahram Marami, Benoit Scherrer, Onur Afacan, Simon K. Warfield, and Ali Gholipour

XLIII

527

536

544

Self Super-Resolution for Magnetic Resonance Images . . . . . . . . . . . . . . . . Amod Jog, Aaron Carass, and Jerry L. Prince

553

Tight Graph Framelets for Sparse Diffusion MRI q-Space Representation . . . Pew-Thian Yap, Bin Dong, Yong Zhang, and Dinggang Shen

561

A Bayesian Model to Assess T2 Values and Their Changes Over Time in Quantitative MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Benoit Combès, Anne Kerbrat, Olivier Commowick, and Christian Barillot Simultaneous Parameter Mapping, Modality Synthesis, and Anatomical Labeling of the Brain with MR Fingerprinting . . . . . . . . . . . . . . . . . . . . . . Pedro A. Gómez, Miguel Molina-Romero, Cagdas Ulas, Guido Bounincontri, Jonathan I. Sperl, Derek K. Jones, Marion I. Menzel, and Bjoern H. Menze

570

579

XQ-NLM: Denoising Diffusion MRI Data via x-q Space Non-local Patch Matching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geng Chen, Yafeng Wu, Dinggang Shen, and Pew-Thian Yap

587

Spatially Adaptive Spectral Denoising for MR Spectroscopic Imaging using Frequency-Phase Non-local Means . . . . . . . . . . . . . . . . . . . . . . . . . . Dhritiman Das, Eduardo Coello, Rolf F. Schulte, and Bjoern H. Menze

596

Beyond the Resolution Limit: Diffusion Parameter Estimation in Partial Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zach Eaton-Rosen, Andrew Melbourne, M. Jorge Cardoso, Neil Marlow, and Sebastien Ourselin

605

XLIV

Contents – Part III

A Promising Non-invasive CAD System for Kidney Function Assessment . . . M. Shehata, F. Khalifa, A. Soliman, M. Abou El-Ghar, A. Dwyer, G. Gimel’farb, R. Keynton, and A. El-Baz

613

Comprehensive Maximum Likelihood Estimation of Diffusion Compartment Models Towards Reliable Mapping of Brain Microstructure . . . Aymeric Stamm, Olivier Commowick, Simon K. Warfield, and S. Vantini

622

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

631

Ordinal Patterns for Connectivity Networks in Brain Disease Diagnosis Mingxia Liu, Junqiang Du, Biao Jie, and Daoqiang Zhang(B) School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China [email protected]

Abstract. Brain connectivity networks have been widely used for diagnosis of brain-related diseases, e.g., Alzheimer’s disease (AD), mild cognitive impairment (MCI), and attention deficit hyperactivity disorder (ADHD). Although several network descriptors have been designed for representing brain connectivity networks, most of them not only ignore the important weight information of edges, but also cannot capture the modular local structures of brain connectivity networks by only focusing on individual brain regions. In this paper, we propose a new network descriptor (called ordinal pattern) for brain connectivity networks, and apply it for brain disease diagnosis. Specifically, we first define ordinal patterns that contain sequences of weighted edges based on a functional connectivity network. A frequent ordinal pattern mining algorithm is then developed to identify those frequent ordinal patterns in a brain connectivity network set. We further perform discriminative ordinal pattern selection, followed by a SVM classification process. Experimental results on both the ADNI and the ADHD-200 data sets demonstrate that the proposed method achieves significant improvement compared with state-of-the-art brain connectivity network based methods.

1

Introduction

As a modern brain mapping technique, functional magnetic resonance imaging (fMRI) is an efficient as well as non-invasive way to map the patterns of functional connectivity of the human brain [1,2]. In particular, the task-free (restingstate) functional magnetic resonance imaging (rs-fMRI) have a small-world architecture, which can reflect a robust functional organization of the brain. Recent studies [3–6] show great promises of brain connectivity networks in understanding brain diseases (e.g., AD, MCI, and ADHD) pathology by exploring anatomical connections or functional interactions among different brain regions, where brain regions are treated as nodes and anatomical connections or functional associations are regarded as edges. Several network descriptors have been developed for representing brain connectivity networks, such as node degrees [3], clustering coefficients [4], and subnetworks [7]. Most of existing descriptors are designed on un-weighted brain connectivity networks, where the valuable weight information of edges are ignored. M. Liu and J. Du—These authors contribute equally for this paper. c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 1–9, 2016. DOI: 10.1007/978-3-319-46720-7 1

2

M. Liu et al.

Actually, different edges are usually assigned different weights to measure the connectivity strength between pairs of nodes (w.r.t. brain regions). However, previous studies usually simply apply thresholds to transform the original weighted networks into un-weighted ones [2,5], which may lead to sub-optimal learning performance. In addition, existing descriptors mainly focus on individual brain regions other than local structures of brain networks, while many evidences have declared that some brain diseases (e.g., AD and MCI) are highly related to modular local structures [8]. Unfortunately, it is hard to capture such local structures using existing network descriptors. ... fMRI Data

Weighted Networks Pre-Processing

... ...

b

c

b

c

e

c

e

a d

b

d

b

e

c

e

d

b

e

d

SVM Classification

...

b

...

a

e

a

Feature Representation

c

Discriminative Ordinal Pattern Selection

Frequent Ordinal Pattern Mining b

b ...

a

...

Normal controls’ networks

Discriminative Ordinal Pattern Selection

Frequent Ordinal Pattern Mining

b Patients’ networks ...

...

Regional Mean Time Series

e

d

a

b

e

d

Fig. 1. An overview of ordinal pattern based learning for brain disease diagnosis.

In this paper, we propose a new network descriptor, i.e., ordinal pattern, for brain connectivity networks. The basic idea of the ordinal pattern is to construct a sequence of weighted edges on a weighted network by considering both the edge weights and the ordinal relations between edges. Compared with conventional network descriptors, ordinal patterns are directly constructed on weighted networks, which can naturally preserve the weight information and local structures of original networks. Then, an ordinal pattern based learning method is developed for brain disease diagnosis. Figure 1 presents the schematic diagram of the proposed framework with each network representing a specific subject. We first construct ordinal patterns on patients’ and normal controls’ (NCs) brain connectivity networks separately. A frequent ordinal pattern mining algorithm is then developed to identify ordinal patterns that frequently occur in patients’ and NCs’ brain networks. We then select the most discriminative ordinal patterns from those frequent ordinal patterns, and regard them as feature representation for subjects. Finally, we learn a support vector machine (SVM) classifier for brain disease diagnosis, by using ordinal pattern based feature representation.

Ordinal Patterns for Brain Connectivity Network

2

3

Method

2.1

Data and Preprocessing

The first data set contains rs-fMRI data from the ADNI1 database with 34 AD patients, 99 MCI patients, and 50 NCs. The rs-fMRI data were pre-processed by brain skull removal, motion correction, temporal pre-whitening, spatial smoothing, global drift removal, slice time correction, and band pass filtering. By warping the automated anatomical labelling (AAL) [9] template, for each subject, we concatenate the brain space of rs-fMRI scans into 90 regions of interest (ROIs). For each ROI, the rs-fMRI time series of all voxels were averaged to be the mean time series of the ROI. With ROIs as nodes and Pearson correlations between pair of ROIs as connectivity weights, a functional full connected weighted network is constructed for each subject. The second data set is ADHD-200 with the Athena preprocessed rs-fMRI data, including 118 ADHD patients and 98 NCs (detailed description of data acquisition and post-processing are given online2 . 2.2

Ordinal Pattern and Frequent Ordinal Pattern

Definition 1: Ordinal Pattern. Let G = {V, E, w} denote a weighted network, where V is a set of nodes, E is a set of edges, and w is the weight vector for those edges with the i-th element w(ei ) representing the weight value for the edge ei . If w(ei ) > w(ej ) for all 0 < i < j ≤ M , an ordinal pattern (op) of G is defined as op = {e1 , e2 , · · · , eM } ⊆ E, where M is the number of edges in op. An illustration of the proposed ordinal patterns is given in Fig. 2(a), where a weighted network contains 5 nodes and 7 edges. We can get ordinal patterns that contain two edges, e.g., op1 = {ea−b , eb−c } and op2 = {eb−c , ec−e }. The ordinal pattern op1 actually denotes w(ea−b ) > w(eb−c ). We can further obtain

a

op1={ea-b, eb-c} a

0.7

b

0.5

0.7

c

c

op ={eb-c, ec-e} b

0.5

c

0.3

a

b

0.4

e

0.2

0.7

b

0.5

b

0.2

d

b

0.5

c

0.3

c

0.3

e

0.2

Ordinal patterns with 3 edges

(a) Illustration of ordinal patterns

h

n

Level 1

d g e

op5={eb-c, ec-e, ee-d}

Ordinal patterns with 2 edges

Root node

a

op4={ea-b, eb-c, ec-e}

e

op3={eb-e, ee-d}

0.4

A weighted network

e

0.3

0.5

2

0.6 0.4

b

d

c

op1 e

d f

j

i

op4

o

p

l

k

opM

m

discarded

Level 2

q

Level 3

Level 4

(b) Illustration of frequent ordinal pattern mining method

Fig. 2. Illustration of (a) ordinal patterns, and (b) frequent ordinal pattern mining method. 1 2

http://adni.loni.usc.edu/. http://www.nitrc.org/plugins/mwiki/index.php/neurobureau:AthenaPipeline.

4

M. Liu et al.

ordinal patterns containing three edges, e.g., op4 = {ea−b , eb−c , ec−e }. Hence, the proposed ordinal pattern can be regarded as the combination of some ordinal relations between pairs of edges. We only consider connected ordinal patterns in this study. That is, an ordinal pattern is connected if and only if the edges it contains can construct a connected sub-network. Different from conventional methods, the ordinal pattern is defined on a weighted network directly to explicitly utilize the weight information of edges. Also, as a special sub-network, an ordinal pattern can model the ordinal relations conveyed in a weighted network, and thus, can naturally preserve the local structures of the network. Definition 2: Frequent Ordinal Pattern. Let D = {G1 , G2 , · · · , GN } represent a set of N weighted networks. Given an ordinal pattern op, the frequency ratio of op is defined as follows f (op|D) =

|Gn |op is an ordinal pattern of Gn , Gn ∈ D| |D|

(1)

If f (op|D) > θ where θ is a pre-defined threshold value, the ordinal pattern op is called as a f requent ordinal pattern of D. We can see that frequent ordinal patterns are ordinal patterns that frequently appear in a weighted network set. For instance, a frequent ordinal pattern in a brain network set may represent common functional or structural information among subjects. Besides, frequent ordinal patterns have an appealing property that plays an important role in data mining process. Specifically, for two ordinal patterns opi = {ei1 , ei2 , · · · , eiM } and opj = {ej1 , ej2 , · · · , ejM , ejM +1 }, if eim = ejm (∀m ∈ {1, 2, · · · , M }), opi is called the parent of opj , and opj is called a child of opi . As shown in Fig. 2(a), op1 = {ea−b , eb−c } is the parent of op4 = {ea−b , eb−c , ec−e }. It is easy to prove that the frequency ratio of an ordinal pattern is no larger than the frequency ratios of its parents. That is, if an ordinal pattern is not a frequent ordinal pattern, its children and descendants are not frequent ordinal patterns, either. 2.3

Ordinal Pattern Based Learning

Ordinal Pattern Construction: Using the above-mentioned preprocessing method, we can construct one brain connectivity network for each subject, with each node denoting a ROI and each edge representing Pearson correlation between a pair of ROIs. We then construct ordinal patterns on patients’ and normal controls’ (NCs) brain connectivity networks separately. Given all training subjects, we can obtain a brain network set with patients’ and NCs’ networks. Frequent Ordinal Pattern Mining: We then propose a frequent ordinal pattern mining algorithm to identify ordinal patterns that are frequently occur in a brain network set, by construcing a deep first search (DFS) tree. We first

Ordinal Patterns for Brain Connectivity Network

5

randomly choose an edge whose frequency ratio is larger than a threshold θ as the root node. As illustrated in Fig. 2(b), a path from the root node to the current node forms a specific ordinal pattern, e.g., op1 = {ea−b , eb−c }. We then record the number of occurrences and compute the frequency ratio of this ordinal pattern in a network set (with each network corresponding to a subject). If its frequency ratio defined in Eq. (1) is larger than θ, the ordinal pattern (e.g., op1 ) is a frequent ordinal pattern and its children (e.g., op4 ) will be further searched. Otherwise, the ordinal pattern (e.g., opM ) is not a frequent ordinal pattern, and its descendants will be discarded directly. The max depth of a DFS tree is limited by the level number. For example, if the level is 3, the frequent ordinal patterns contain at most 3 edges. Obviously, more levels bring more frequent ordinal patterns as well as more run-time. Discriminative Ordinal Pattern Selection: There are a number of frequent ordinal patterns, and some of them could have less discriminative power. Accordingly, we perform a discriminative ordinal pattern selection process on those frequent ordinal patterns. Specifically, we first mine frequent ordinal patterns from the patients’ brain network set and the NCs’ brain network set separately. According to the discriminative power, we select the most discriminative ordinal patterns from all frequent ordinal patterns in both patients’ and NCs’ sets. The ratio score [10] is used to evaluate the discriminative power of frequent ordinal patterns. Given a frequent ordinal pattern opi mined from the patients’ brain network set (denoted as D+ ), the ratio score of opi is defined as RS(opi ) = log

|Gn |opi is an ordinal pattern of Gn , Gn ∈ D+ | |D− | × + i − |Gn |op is an ordinal pattern of Gn , Gn ∈ D | + ǫ |D |

(2)

where D− means the NCs’ brain network set, and ǫ is a small value to prevent the denominator to be 0. Similarly, the frequent ordinal pattern opj mined from the NCs’ brain network set (i.e., D− ), its ratio score is computed as RS(opj ) = log

|Gn |opj is an ordinal pattern of Gn , Gn ∈ D− | |D+ | × |Gn |opj is an ordinal pattern of Gn , Gn ∈ D+ | + ǫ |D− |

(3)

Classification: A total of k discriminative ordinal patterns are first selected, with half from patients’ and the other half from NCs’ brain connectivity network sets. We then combine those discriminative ordinal patterns to construct a feature matrix for representing subjects. Specifically, given |D| brain connectivity networks (with each network corresponding to a specific subject) and k selected discriminative ordinal patterns, we denote the feature matrix as F ∈ R|D|×k , where the element Fij represents the j-th feature of the i-th subject. Specifically, if the j-th discriminative ordinal pattern appears in the brain connectivity network of the i-th subject, Fi,j is equal to 1, and otherwise 0. Finally, we adopt an SVM classifier to identify AD/MCI/ADHD patients from NCs.

6

M. Liu et al.

3

Experiments

Experimental Settings: We perform three classification tasks, i.e., AD vs. NC, MCI vs. NC and ADHD vs. NC classification, by using a 10-fold crossvalidation strategy. Note that those discriminative ordinal patterns are selected only from training data. Classification performance is evaluated by accuracy (ACC), sensitivity (SEN), specificity (SPE) and area under the ROC curve (AUC). The parameter ǫ in ratio score in Eqs. (2) and (3) is set as 0.1 empirically. With a inner cross-validation strategy, the level number in our frequent ordinal pattern mining algorithm is chosen from [2, 6] with step 1, and the number of discriminative ordinal patterns are chosen from [10, 100] with step 10. We compare our method with two widely used network descriptors in brain connectivity network based studies, including cluster coefficients [4] and discriminative sub-networks [7]. Since these two descriptors require a thresholding process, we adopt both single-threshold and multi-thresholds [5,11] strategies to transform weighted networks to un-weighted ones. In summary, there are four competing methods, including (1) clustering coefficients (CC) with single-threshold, (2) clustering coefficient using multi-thresholds (CCMT), (3) discriminative sub-networks (DS) with single-threshold, and (4) discriminative sub-networks using multi-thresholds (DSMT). The linear SVM with the default parameter (i.e., C = 1) is used as the classifier in different methods. Results: Experimental results are listed in Table 1, from which we can see that our method consistently achieves the best performance in three tasks. For instance, the accuracy achieved by our method is 94.05 % in AD vs. NC classification, which is significantly better than the second best result obtained by DSMT. This demonstrates that the ordinal patterns are discriminative in distinguishing AD/MCI/ADHD patients from NCs, compared with conventional network descriptors. We further plot those top 2 discriminative ordinal patterns identified by our method in three tasks in Fig. 3. For instance, the most discriminative ordinal pattern for AD, shown in top left of Fig. 3(a), can be recorded as op = {eDCG.L−ACG.L , eACG.L−ROL.L , eROL.L−P AL.R , eP AL.R−LIN G.L , eP AL.R−M OG.R }. Table 1. Comparison of different methods in three classification tasks Method

AD vs. NC

MCI vs. NC

ADHD vs. NC

ACC SEN

SPE

AUC ACC SEN

SPE

AUC ACC SEN

SPE

AUC

(%)

(%)

(%)

(%)

(%)

(%)

(%)

(%)

(%)

(%)

(%)

(%)

CC

72.62 73.53 67.94 70.94 71.14 72.73 68.00 68.69 71.29 72.03 70.41 70.51

CCMT

80.95 82.35 80.00 76.35 74.50 75.76 72.00 74.79 74.53 75.43 73.47 77.64

DS

76.19 76.47 76.00 75.59 77.18 78.79 74.00 74.89 81.01 81.36 80.61 80.82

DSMT

85.71 85.29 86.00 87.59 79.19 80.81 76.00 76.99 83.79 84.74 82.65 84.63

Proposed 94.05 96.77 92.45 96.35 88.59 87.27 92.31 84.57 87.50 88.89 85.85 87.37

Ordinal Patterns for Brain Connectivity Network

7

From NC Set

From AD Set

(a) AD vs. NC classification

From NC Set

From MCI Set

(b) MCI vs. NC classification

From ADHD Set

From NC Set

(c) ADHD vs. NC classification

Fig. 3. The most discriminative ordinal patterns identified by the proposed method in three tasks. In each row, the first two columns show those top 2 discriminative ordinal patterns selected from positive classes (i.e., AD, MCI, and ADHD), while the last two columns illustrate those selected from the negative class (i.e., NC).

These results imply that the proposed ordinal patterns do reflect some local structures of original brain networks. We investigate the influence of frequent ordinal pattern mining level and the number of selected discriminative ordinal patterns, with results shown in Fig. 4. From this figure, we can see that our method achieves relatively stable results when the number of selected ordinal patterns is larger than 40. Also, our method achieves overall good performance when the level number in the frequent ordinal pattern mining algorithm are 4 in AD/MCI vs. NC classification and 5 in ADHD vs. NC classification, respectively. We perform an additional experiment by using weights of each edge in ordinal patterns as raw features, and achieve the accuracies of 71.43 %, 67.11 %, and 69.91 % in AD vs. NC, MCI vs. NC and ADHD vs. NC classification, respectively. We further utilize a real valued network descriptor based on ordinal patterns (by taking the product of weights in each ordinal pattern), and obtained the accuracies of 78.52 %, 72.37 %, and 72.69 % in three tasks, respectively.

M. Liu et al. 1.0

1.0

0.9

0.9

0.9

0.8

0.8

0.8

0.7

Level=1 Level=3 Level=5

0.6 0.5 0

20

40

60

Level=2 Level=4 80

100

120

Number of discriminative patterns

ACC

1.0

ACC

ACC

8

0.7 0.6

Level=1 Level=3 Level=5

0.5 0

20

40

60

Level=2 Level=4 80

100

120

Number of discriminative ordinal patterns

0.7

Level=1 Level=3 Level=5

0.6 0.5 0

Level=2 Level=4

20 40 60 80 100 120 140 160 180 200

Number of discriminative ordinal patterns

Fig. 4. Influence of the level number in frequent ordinal pattern mining method and the number of discriminative ordinal patterns in AD vs. NC (left), MCI vs. NC (middle), and ADHD vs. NC (right) classification.

4

Conclusion

In this paper, we propose a new network descriptor (i.e., ordinal pattern) for brain connectivity networks. The proposed ordinal patterns are defined on weighted networks, which can preserve the weights information of edges and the local structure of original brain networks. Then, we develop an ordinal pattern based brain network classification method for the diagnosis of AD/MCI and ADHD. Experimental results on both ADNI and ADHD-200 data sets demonstrate the efficacy of our method. Acknowledgment. This study was supported by National Natural Science Foundation of China (Nos. 61422204, 61473149, 61473190, 61573023), the Jiangsu Natural Science Foundation for Distinguished Young Scholar (No. BK20130034), and the NUAA Fundamental Research Funds (No. NE2013105).

References 1. Robinson, E.C., Hammers, A., Ericsson, A., Edwards, A.D., Rueckert, D.: Identifying population differences in whole-brain structural networks: a machine learning approach. NeuroImage 50(3), 910–919 (2010) 2. Sporns, O.: From simple graphs to the connectome: networks in neuroimaging. NeuroImage 62(2), 881–886 (2012) 3. Rubinov, M., Sporns, O.: Complex network measures of brain connectivity: uses and interpretations. NeuroImage 52(3), 1059–1069 (2010) 4. Wee, C.Y., Yap, P.T., Li, W., Denny, K., Browndyke, J.N., Potter, G.G., WelshBohmer, K.A., Wang, L., Shen, D.: Enriched white matter connectivity networks for accurate identification of MCI patients. NeuroImage 54(3), 1812–1822 (2011) 5. Jie, B., Zhang, D., Wee, C.Y., Shen, D.: Topological graph kernel on multiple thresholded functional connectivity networks for mild cognitive impairment classification. Hum. Brain Mapp. 35(7), 2876–2897 (2014) 6. Liu, M., Zhang, D., Shen, D.: Relationship induced multi-template learning for diagnosis of Alzheimer disease and mild cognitive impairment. IEEE Trans. Med. Imaging 35(6), 1463–1474 (2016) 7. Fei, F., Jie, B., Zhang, D.: Frequent and discriminative subnetwork mining for mild cognitive impairment classification. Brain Connect. 4(5), 347–360 (2014)

Ordinal Patterns for Brain Connectivity Network

9

8. Brier, M.R., Thomas, J.B., Fagan, A.M., Hassenstab, J., Holtzman, D.M., Benzinger, T.L., Morris, J.C., Ances, B.M.: Functional connectivity and graph theory in preclinical Alzheimer’s disease. Neurobiol. Aging 35(4), 757–768 (2014) 9. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., Mazoyer, B., Joliot, M.: Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage 15(1), 273–289 (2002) 10. Yan, X., Cheng, H., Han, J., Yu, P.S.: Mining significant graph patterns by leap search. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 433–444. ACM (2008) 11. Sanz-Arigita, E.J., Schoonheim, M.M., Damoiseaux, J.S., Rombouts, S., Maris, E., Barkhof, F., Scheltens, P., Stam, C.J., et al.: Loss of ‘smallworld’ netowrks in Alzheimer’s disease: graph analysis of FMRI resting-state functional connectivity. PLoS ONE 5(11), e13788 (2010)

Discovering Cortical Folding Patterns in Neonatal Cortical Surfaces Using Large-Scale Dataset Yu Meng1,2, Gang Li2, Li Wang2, Weili Lin2, John H. Gilmore3, and Dinggang Shen2(&) 1

Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA 2 Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA [email protected] 3 Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

Abstract. The cortical folding of the human brain is highly complex and variable across individuals. Mining the major patterns of cortical folding from modern large-scale neuroimaging datasets is of great importance in advancing techniques for neuroimaging analysis and understanding the inter-individual variations of cortical folding and its relationship with cognitive function and disorders. As the primary cortical folding is genetically influenced and has been established at term birth, neonates with the minimal exposure to the complicated postnatal environmental influence are the ideal candidates for understanding the major patterns of cortical folding. In this paper, for the first time, we propose a novel method for discovering the major patterns of cortical folding in a large-scale dataset of neonatal brain MR images (N = 677). In our method, first, cortical folding is characterized by the distribution of sulcal pits, which are the locally deepest points in cortical sulci. Because deep sulcal pits are genetically related, relatively consistent across individuals, and also stable during brain development, they are well suitable for representing and characterizing cortical folding. Then, the similarities between sulcal pit distributions of any two subjects are measured from spatial, geometrical, and topological points of view. Next, these different measurements are adaptively fused together using a similarity network fusion technique, to preserve their common information and also catch their complementary information. Finally, leveraging the fused similarity measurements, a hierarchical affinity propagation algorithm is used to group similar sulcal folding patterns together. The proposed method has been applied to 677 neonatal brains (the largest neonatal dataset to our knowledge) in the central sulcus, superior temporal sulcus, and cingulate sulcus, and revealed multiple distinct and meaningful folding patterns in each region.

© Springer International Publishing AG 2016 S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 10–18, 2016. DOI: 10.1007/978-3-319-46720-7_2

Discovering Cortical Folding Patterns Using Large-Scale Dataset

11

1 Introduction The human cerebral cortex is a highly convoluted and complex structure. Its cortical folding is quite variable across individuals (Fig. 1). However, certain common folding patterns exist in some specific cortical regions as shown in the classic textbook [1], which examined 25 autopsy specimen adult brains. Mining the major representative patterns of cortical folding from modern large-scale datasets is of great importance in advancing techniques for neuroimaging analysis and understanding the inter-individual variations of cortical folding and their relationship with structural connectivity, cognitive function, and brain disorders. For example, in cortical surface registration [2], typically a single cortical atlas is constructed for a group of brains. Such an atlas may not be able to reflect some important patterns of cortical folding, due to the averaging effect, thus leading to poor registration accuracy for some subjects that cannot be well characterized by the folding patterns in the atlas. Building multiple atlases, with each representing one major pattern of cortical folding, will lead to boosted accuracy in cortical surface registration and subsequent group-level analysis.

Fig. 1. Huge inter-individual variability of sulcal folding patterns in neonatal cortical surfaces, colored by the sulcal depth. Sulcal pits are shown by white spheres.

To investigate the patterns of cortical folding, a clustering approach has been proposed [3]. This approach used 3D moment invariants to represent each sulcus and used the agglomerative clustering algorithm to group major sulcal patterns in 150 adult brains. However, the discrimination of 3D moment invariants was limited in distinguishing different patterns. Hence, a more representative descriptor was proposed in [4], where the distance between any two sulcal folds in 62 adult brains was computed after they were aligned, resulting in more meaningful results. Meanwhile, sulcal pits, the locally deepest points in cortical sulci, were proposed for studying the inter-individual variability of cortical folding [5]. This is because sulcal pits have been suggested to be genetically affected and closely related to functional areas [6]. It has been found that the spatial distribution of sulcal pits is relatively consistent across individuals, compared to the shallow folding regions, in both adults (148 subjects) and infants (73 subjects) [7, 8]. In this paper, we propose a novel method for discovering major representative patterns of cortical folding on a large-scale neonatal dataset (N = 677). The motivation of using a neonatal dataset is that all primary cortical folding is largely genetically determined and has been established at term birth [9]; hence, neonates with the minimal exposure to the complicated postnatal environmental influence are the ideal candidates

12

Y. Meng et al.

for discovering the major cortical patterns. This is very important for understanding the biological relationships between cortical folding and brain functional development or neurodevelopmental disorders rooted during infancy. The motivation of using a large-scale dataset is that small datasets may not sufficiently cover all kinds of major cortical patterns and thus would likely lead to biased results. In our method, we leveraged the reliable deep sulcal pits to characterize the cortical folding, and thus eliminating the effects of noisy shallow folding regions that are extremely heterogeneous and variable. Specifically, first, sulcal pits were extracted using a watershed algorithm [8] and represented using a sulcal graph. Then, the difference between sulcal pit distributions of any two cortices was computed based on six complementary measurements, i.e., sulcal pit position, sulcal pit depth, ridge point depth, sulcal basin area, sulcal basin boundary, and sulcal pit local connection, thus resulting in six matrices. Next, these difference matrices were further converted to similarity matrices, and adaptively fused as one comprehensive similarity matrix using a similarity network fusion technique [10], to preserve their common information and also capture their complementary information. Finally, based on the fused similarity matrix, a hierarchical affinity propagation clustering algorithm was performed to group sulcal graphs into different clusters. The proposed method was applied to 677 neonatal brains (the largest neonatal dataset to our knowledge) in the central sulcus, superior temporal sulcus, and cingulate sulcus, and revealed multiple distinct and meaningful patterns of cortical folding in each region.

2 Methods Subjects and Image Acquisition. MR images for N = 677 term-born neonates were acquired on a Siemens head-only 3T scanner with a circular polarized head coil. Before scanning, neonates were fed, swaddled, and fitted with ear protection. All neonates were unsedated during scanning. T1-weighted MR images with 160 axial slices were obtained using the parameters: TR = 1,820 ms, TE = 4.38 ms, and resolution = 1  11 mm3. T2-weighted MR images with 70 axial slices were acquired with the parameters: TR =7,380 ms, TE = 119 ms, and resolution = 1.25  1.25  1.95 mm3. Cortical Surface Mapping. All neonatal MRIs were processed using an infantdedicated pipeline [2]. Specifically, it contained the steps of rigid alignment between T2 and T1 MR images, skull-stripping, intensity inhomogeneity correction, tissue segmentation, topology correction, cortical surface reconstruction, spherical mapping, spherical registration onto an infant surface atlas, and cortical surface resampling [2]. All results have been visually checked to ensure the quality. Sulcal Pits Extraction and Sulcal Graph Construction. To characterize the sulcal folding patterns in each individual, sulcal pits, the locally deepest point of sulci, were extracted on each cortical surface (Fig. 1) using the method in [8]. The motivation is that deep sulcal pits were relatively consistent across individuals and stable during brain development as reported in [6], and thus were well suitable as reliable landmarks for characterizing sulcal folding. To exact sulcal pits, each cortical surface was

Discovering Cortical Folding Patterns Using Large-Scale Dataset

13

partitioned into small basins using a watershed method based on the sulcal depth map [11], and the deepest point of each basin was identified as a sulcal pit, after pruning noisy basins [8]. Then, a sulcal graph was constructed for each cortical surface as in [5]. Specifically, each sulcal pit was defined as a node, and two nodes were linked by an edge, if their corresponding basins were spatially connected. Sulcal Graph Comparison. To compare two sulcal graphs, their similarities were measured using multiple metrics from spatial, geometrical, and topological points of view, to capture the multiple aspects of sulcal graphs. Specifically, we computed six distinct metrics, using sulcal pit position D, sulcal pit depth H, sulcal basin area S, sulcal basin boundary B, sulcal pit local connection C, and ridge point depth R. Given N sulcal graphs from N subjects, any two of them were compared using above six metrics, so a N  N matrix was constructed for each metric. The difference between two sulcal graphs can be measured by comparing the attributes of the corresponding sulcal pits in the two graphs. In general, the difference between any sulcal-pit-wise attribute of sulcal graphs P and Q can be computed as 1 1 X 1 X Diff ðP; Q; diff X Þ ¼ ð diff ði; QÞ þ diff X ðj; PÞÞ X i2P j2Q 2 VP VQ

ð1Þ

where VP and VQ are respectively the numbers of sulcal pits in P and Q, and diff X ði; QÞ is the difference of a specific attribute X between sulcal pit i and its corresponding sulcal pitin graph Q. Note that we treat the closest pit as the corresponding sulcal pit, as all cortical surfaces have been aligned to a spherical surface atlas. (1) Sulcal Pit Position. Based on Eq. 1, the difference between P and Q in terms of sulcal pit positions is computed as DðP; QÞ ¼ Diff ðP; Q; diff D Þ, where diff D ði; QÞ is the geodesic distance between sulcal pit i and its corresponding sulcal pit in Q on the spherical surface atlas. (2) Sulcal Pit Depth. For each subject, the sulcal depth map is normalized by dividing by the maximum depth value, to reduce the effect of the brain size variation. The difference between P and Q in terms of sulcal pit depth is computed as H ðP; QÞ ¼ Diff ðP; Q; diff H Þ, where diff H ði; QÞ is the depth difference between sulcal pit i and its corresponding sulcal pit in Q. (3) Sulcal Basin Area. To reduce the effect of surface area variation across subjects, the area of each basin is normalized by the area of the whole cortical surface. The difference between P and Q in terms of sulcal basin area of graphs P and Q is computed as SðP; QÞ ¼ Diff ðP; Q; diff S Þ, where diff S ði; QÞ is the area difference between the basins of sulcal pit i and its corresponding sulcal pit in Q. (4) Sulcal Basin Boundary. The difference between P and Q in terms of sulcal basin boundary is formulated as BðP; QÞ ¼ Diff ðP; Q; diff B Þ, where diff B ði; QÞ is the difference between the sulcal basin boundaries of sulcal pit i and its corresponding sulcal pit in Q. Specifically, we define a vertex as a boundary vertex of a sulcal basin, if one of its neighboring vertices belongs to a different basin. Given two corresponding sulcal pits i 2 P and i0 2 Q, their sulcal basin boundary vertices are respectively denoted as Bi and Bi0 . For any boundary vertex a 2 Bi , its closest vertex a0 is found from Bi0 ; and similarly for any boundary vertex b0 2 Bi0 , its closest vertex b is found from Bi . Then,

14

Y. Meng et al.

the difference between the basin boundaries of sulcal pit i and its corresponding pit i0 2 Q is defined as: 1 1 X 1 X 0 diff B ði; QÞ ¼ disðb0 ; bÞ 0 0 disða; a Þ þ b0 2Bi0 ;b2Bi a2B ;a 2B i 2 NBi NB0i i

!

ð2Þ

where NBi and NBi0 are respectively the numbers of vertices in Bi and Bi0 , and disð; Þ is the geodesic distance between two vertices on the spherical surface atlas. (5) Sulcal Pit Local Connection. The difference between local connections of two graphs P and Q is computed as CðP; QÞ ¼ Diff ðP; Q; diff C Þ, where diff C ði; QÞ is the difference of local connection after mapping sulcal pit i to graph Q. Specifically, for a sulcal pit i, assume k is one of its connected sulcal pits. Their corresponding sulcal pits in graph Q are respectively i0 and k0 . The change of local connection after mapping sulcal pit i to graph Q is measured by: diff C ði; QÞ ¼

1 X jdisði; kÞ k2Gi NGi

disði0 ; k 0 Þj

ð3Þ

where Gi is the set of sulcal pits connecting to i, and NGi is the number of pits in Gi. (6) Ridge Point Depth. Ridge points are the locations, where two sulcal basins meet. As suggested by [5], the depth of the ridge point is an important indicator for distinguishing sulcal patterns. Thus, we compute the difference between the average ridge point depth of sulcal graphs P and Q, as: X 1 RðP; QÞ ¼ r e2P e EP

1 X r e e2Q EQ

ð4Þ

where EP and EQ are respectively the numbers of edges in P and Q; e is the edge connecting two sulcal pits; and re is the normalized depth of ridge point in the edge e. Sulcal Graph Similarity Fusion. The above six metrics measured the inter-individual differences of sulcal graphs from different points of view, and each provided complementary information to the others. To capture both the common information and the complementary information, we employed a similarity network fusion (SNF) method [10] to adaptively integrate all six metrics together. To do this, each difference matrix was normalized by its maximum elements, and then transformed into a similarity matrix as: WM ðx; yÞ ¼ expð

M 2 ðx; yÞ  Þ U þ U þ M ðx;yÞ l x y3

ð5Þ

where l was a scaling parameter; M could be anyone of the above six matrices; Ux and Uy were respectively the average values of the smallest K elements in the x-th row and y-th row of M. Finally, six similarity matrices WD, WH, WR, WS, WB, and WC were fused

Discovering Cortical Folding Patterns Using Large-Scale Dataset

15

together as a single similarity matrix W by using SNF with t iterations. The parameters were set as l ¼ 0:8, K = 30, and t = 20 as suggested in [10]. Sulcal Pattern Clustering. To cluster sulcal graphs into different groups based on the fused similarity matrix W, we employed the Affinity Propagation Clustering (APC) algorithm [12], which could automatically determine the number of clusters based on the natural characteristics of data. However, since sulcal folding patterns were extremely variable across individuals, too many clusters were identified after performing APC, making it difficult to observe the most important major patterns. Therefore, we proposed a hierarchical APC framework to further group the clusters. Specifically, after running APC, (1) the exemplars of all clusters were used to perform a new-level APC, so less clusters were generated. Since the old clusters were merged, the old exemplars may be no longer representative for the new clusters. Thus, (2) a new exemplar was selected for each cluster based on the maximal average similarity to all the other samples in the cluster. We repeated these steps, until the cluster number reduced to an expected level ( l and m  (n  I)) and a sparse coefficient weight matrix awj RmðnIÞ using an effective online dictionary learning algorithm [14]. In brief, an empirical cost function considering the average loss of regression to n  I temporal segments is defined as D

fnI ðDwj Þ ¼

nI 1 X 1 wj jjx min wj m n  I k¼1 ak R 2 k

w

w

Dak j jj22 þ kjjak j jj1

ð2Þ

where ‘1 -norm regularization and k are adopted to trade-off the regression residual and w w sparsity level of ak j . xk j is the k-th column of X wj . To make the coefficients in awj w comparable, we also have a constraint for k-th column dk j of Dwj as defined in Eq. (3). The whole problem is then rewritten as a matrix factorization problem in Eq. (4) and solved by [14] to obtain Dwj and awj . n o D w T w C ¼ Dwj Rlm s:t:8 k ¼ 1; . . .m; ðdk j Þ dk j  1 wj

min w

mðnIÞ

D C;a j R

1 wj jjx 2 k

w

w

Dwj ak j jj22 þ kjjak j jj1

ð3Þ ð4Þ

Since the dictionary learning and sparse representation maintain the organization of all temporal segments and subjects in X wj , the obtained awj also preserve the spatial information of temporal segments across I subjects. We therefore decompose awj into w w I sub-matrices a1 j ; . . .; ai j Rmn corresponding to I subjects (Fig. 2a). The element (r, s) in each sub-matrix represents the corresponding coefficient value of the s-th grayordinate to the r-th dictionary in Dwj for each subject. In order to obtain a common sparse coefficient weight matrix across I subjects, we perform t-test of the null hypothesis for (r, s) across I subjects (p-value < 0.05) similar as in [15] to obtain the p-value matrix pwj Rmn (Fig. 2b), in which element (r, s) represents the statistically coefficient value of the s-th grayordinate to the r-th dictionary across all I subjects. pwj is thus the common sparse coefficient weight matrix. From a brain science perspective, w dk j (k-th column of Dwj ) represents the temporal pattern of a specific group-wise w consistent functional network and its corresponding coefficient vector pk j (k-th row of pwj ) can be mapped back to cortical surface (color-coded by z-score transformed from

Modeling Functional Dynamics of Cortical Gyri and Sulci

23

p-value) (Fig. 2b) to represent the spatial pattern of the network. We then identify those w meaningful group-wise consistent functional networks from pk j (k = 1, …, m) similar as in [10]. Specifically, the GLM-derived activation maps and the intrinsic networks w templates provided in [16] are adopted as the network templates. The network from pk j with the highest spatial pattern similarity with a specific network reference (defined as J ðS; T Þ ¼ jS \ T j=jT j, S and T are spatial patterns of a specific network and a template, respectively) is identified as a group-wise consistent functional brain network at wj. Once we identify all group-wise consistent functional brain networks at wj, the SOPFN at wj is defined as the set of all common cortical vertices gj (i = 1..64984) involved in the spatial patterns of all identified functional networks [9, 10]: Vwj ¼ 8gi s:t: gi belongs to all networks at wj

2.4

ð5Þ

Temporal Dynamics Assessment of SOPFN Distribution on Gyri/Sulci

Based on the identified SOPFN at time window wj in Eq. (5), we assess the SOPFN distribution on cortical gyral/sulcal regions at  wj. Denote the principal curvature value  0; gi  gyri which is provided in HCP of cortical vertex gj (i = 1..64984) as pcurvgi \0; gi  sulci data [11], the SOPFN Vwj on gyral and sulcal regions is represented as Vwj jgyri ¼ 8gi s:t: gi 2 Vwj ; pcurvgi  0, and Vwj jsulci ¼ 8gi s:t: gi 2 Vwj ; pcurvgi \0, respectively. Note that Vwj ¼ Vwj jgyri þ Vwj jsulci . We further define the SOPFN distri bution percentage at wj as Pwj jgyri ¼ Vwj jgyri = Vwj for gyri and Pwj jsulci ¼ Vw jsulci = Vw for sulci, where j:j denotes the number of members of a set and j j Pwj jgyri þ Pwj jsulci ¼ 1. Finally, to assess the temporal dynamics of SOPFN h i distribution on gyral/sulcal regions, we define Pgyri ¼ Pw1 jgyri ; Pw2 jgyri ; ::; Pwj jgyri . as a ðt

l þ 1Þ dimensional

feature vector representing the dynamics of SOPFN distribution percentage across all  ðt l þ 1Þ time windows on gyri. Similarly, we define Psulci ¼ Pw1 jsulci ; Pw2 jsulci ; ::; Pwj jsulci Š for sulci.

3 Experimental Results For each of the seven tfMRI datasets, we equally divided all 64 subjects into two groups (32 each) for reproducibility studies. The window length l was experimentally determined (l = 20) using the similar method in [8]. The values of m and k in Eq. (4) were experimentally determined (m = 50 and k = 1.5) using the similar method in [13].

24

3.1

X. Jiang et al.

Group-Wise Consistent Functional Networks Within Different Time Windows

We successfully identified group-wise consistent functional networks within different time windows based on methods in Sect. 2.3. Figure 3 shows the spatial maps of two example identified functional networks within different time windows in one subject group of emotion tfMRI data. We can see that for each of the two networks (Figs. 3b– c), albeit similar in overall spatial pattern, there is considerable variability of the spatial pattern across different time windows compared with the network template. Quantitatively, the mean spatial pattern similarity J defined in Sect. 2.3 is 0.69 ± 0.10 and 0.36 ± 0.06 for the two networks, respectively. This finding is consistent between two subject groups for all seven tfMRI datasets. The spatial pattern variability of the same functional network across different time windows is in agreement with the argument that there is different involvement of specific brain regions in the corresponding networks across different time windows [7].

Fig. 3. Two example group-wise consistent functional networks within different time windows in one subject group of emotion tfMRI data. (a) Task design curves across time windows (TW) of emotion tfMRI data. 12 example TWs are indexed. Three different TW types are divided by black dashed lines and labeled. TW type #1 involves task design 1, TW type #2 involves task design 2, and TW type #3 involves both two task designs. The spatial patterns of (b) one example task-evoked functional network and (c) one example intrinsic connectivity network (ICN) within the 12 example TWs are shown.

3.2

Temporal Dynamics Difference of SOPFN Distribution on Gyri/Sulci

We identified the SOPFN based on all identified functional networks using Eq. (5) and further assessed the SOPFN distribution on cortical gyral/sulcal regions within each time window. Figure 4 shows the mean SOPFN distribution on gyri and sulci across

Modeling Functional Dynamics of Cortical Gyri and Sulci

25

Fig. 4. The mean SOPFN distribution on gyral (G) and sulcal (S) regions across different time window types in the two subject groups of emotion tfMRI data. The common regions with higher density are highlighted by red arrows. The two example surfaces illustrate the gyri/sulci and are color-coded by the principal curvature value.

different TW types in emotion tfMRI data as example. We can see that albeit certain common regions (with relatively higher density as highlighted by red arrows), there is considerable SOPFN distribution variability between gyral and sulcal regions across different time windows. Quantitatively, the distribution percentage on gyral regions is statistically larger than that on sulcal regions across all time windows using two-sampled t-test (p < 0.05) for all seven tfMRI datasets as reported in Table 1. Table 1. The mean ratio of SOPFN distribution percentage on gyri vs. that on sulci across all time windows in the two subject groups of seven tfMRI datasets. Emotion Gambling Language Motor Relational Social WM Group 1 1.47 1.60 1.46 1.32 1.59 1.46 1.67 Group 2 1.45 1.55 1.38 1.33 1.49 1.47 1.66

Finally, we calculated and visualized Pgyri and Psulci representing the dynamics of SOPFN distribution percentage across all time windows on gyri and sulci, respectively in Fig. 5. It is interesting that there are considerable peaks/valleys for the distribution percentage on gyri/sulci which are coincident with the specific task designs across the entire scan, indicating the temporal dynamics difference of SOPFN distribution between gyral and sulcal regions. These results indicate that gyri might participate

26

X. Jiang et al.

Fig. 5. The temporal dynamics of SOPFN distribution percentage on gyri (green curve) and sulci (yellow curve) across all time windows in the seven tfMRI datasets shown in (a)–(g), respectively. The task design curves in each sub-figure are represented by different colors. Y-axis represents the percentage value (*100 %).

more in those spatially overlapped and interacting concurrent functional networks (neural processes) than sulci under temporal dynamics. It should be noted that the identified temporal dynamics difference of SOPFN distribution between gyral and sulcal regions (Fig. 5) is reasonably reproducible between the two subject groups and across all seven high-resolution tfMRI datasets. Given the lack of ground truth in brain mapping, the reproducibility of our results is unlikely due to systematic artifact and thus should be a reasonable verification of the meaningfulness of the results.

4 Discussion and Conclusion We proposed a novel computational framework to model the functional dynamics of cortical gyri and sulci. Experimental results based on 64 HCP subjects and their 7 tfMRI datasets demonstrated that meaningful temporal dynamics difference of SOPFN distribution between cortical gyral and sulcal regions was identified. Our results provide novel understanding of brain functional dynamics mechanisms in the future. We will investigate other potential sources of differences that are observed in the results in the future. We will apply the proposed framework on resting state fMRI data and more tfMRI datasets, e.g., the recent 900 subjects’ tfMRI data released by HCP, to further reproduce and validate the findings.

Modeling Functional Dynamics of Cortical Gyri and Sulci

27

References 1. Rakic, P.: Specification of cerebral cortical areas. Science 241, 170–176 (1988) 2. Nie, J., et al.: Axonal fiber terminations concentrate on gyri. Cereb. Cortex 22(12), 2831– 2839 (2012) 3. Chen, H., et al.: Coevolution of gyral folding and structural connection patterns in primate brains. Cereb. Cortex 23(5), 1208–1217 (2013) 4. Takahashi, E., et al.: Emerging cerebral connectivity in the human fetal brain: an MR tractography study. Cereb. Cortex 22(2), 455–464 (2012) 5. Deng, F., et al.: A functional model of cortical gyri and sulci. Brain Struct. Funct. 219(4), 1473–1491 (2014) 6. Jiang, X., et al.: Sparse representation of HCP grayordinate data reveals novel functional architecture of cerebral cortex. Hum. Brain Mapp. 36(12), 5301–5319 (2015) 7. Gilbert, C.D., Sigman, M.: Brain states: top-down influences in sensory processing. Neuron 54(5), 677–696 (2007) 8. Li, X., et al.: Dynamic functional connectomics signatures for characterization and differentiation of PTSD patients. Hum. Brain Mapp. 35(4), 1761–1778 (2014) 9. Duncan, J.: The multiple-demand (MD) system of the primate brain: mental programs for intelligent behaviour. Trends Cogn. Sci. 14(4), 172–179 (2010) 10. Lv, J.: Sparse representation of whole-brain fMRI signals for identification of functional networks. Med. Image Anal. 20(1), 112–134 (2015) 11. Glasser, M.F., et al.: The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage 80, 105–124 (2013) 12. Lee, K., et al.: A data-driven sparse GLM for fMRI analysis using sparse dictionary learning with MDL criterion. IEEE Trans. Med. Imaging 30(5), 1076–1089 (2011) 13. Lv, J., et al.: Holistic atlases of functional networks and interactions reveal reciprocal organizational architecture of cortical function. IEEE TBME 62(4), 1120–1131 (2015) 14. Mairal, J., et al.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010) 15. Lv, J., et al.: Assessing effects of prenatal alcohol exposure using group-wise sparse representation of fMRI data. Psychiatry Res. 233, 254–268 (2015) 16. Smith, S.M., et al.: Correspondence of the brain’s functional architecture during activation and rest. Proc. Natl. Acad. Sci. U.S.A. 106(31), 13040–13045 (2009)

A Multi-stage Sparse Coding Framework to Explore the Effects of Prenatal Alcohol Exposure Shijie Zhao1, Junwei Han1(&), Jinglei Lv1,2, Xi Jiang2, Xintao Hu1, Shu Zhang2, Mary Ellen Lynch3, Claire Coles3, Lei Guo1, Xiaoping Hu3, and Tianming Liu2 1

School of Automation, Northwestern Polytechnical University, Xi’an, China [email protected] 2 Cortical Architecture Imaging and Discovery, Department of Computer Science and Bioimaging Research Center, The University of Georgia, Athens, GA, USA 3 Emory University, Atlanta, GA, USA

Abstract. In clinical neuroscience, task-based fMRI (tfMRI) is a popular method to explore the brain network activation difference between healthy controls and brain diseases like Prenatal Alcohol Exposure (PAE). Traditionally, most studies adopt the general linear model (GLM) to detect task-evoked activations. However, GLM has been demonstrated to be limited in reconstructing concurrent heterogeneous networks. In contrast, sparse representation based methods have attracted increasing attention due to the capability of automatically reconstructing concurrent brain activities. However, this data-driven strategy is still challenged in establishing accurate correspondence across individuals and characterizing group-wise consistent activation maps in a principled way. In this paper, we propose a novel multi-stage sparse coding framework to identify group-wise consistent networks in a structured method. By applying this novel framework on two groups of tfMRI data (healthy control and PAE), we can effectively identify group-wise consistent activation maps and characterize brain networks/regions affected by PAE. Keywords: Task-based fMRI

 Dictionary learning  Sparse coding  PAE

1 Introduction TfMRI has been widely used in clinical neuroscience to understand functional brain disorders [1]. Among all of state-of-the-art tfMRI analysis methodologies, the general linear model (GLM) is the most popular approach in detecting functional networks under specific task performance [2]. The basic idea underling GLM is that task-evoked brain activities could be discovered by subtracting the activity from a control condition [3, 4]. In common practice, experimental and control trials are performed several times and fMRI signals are averaged to increase the signal-to-noise ratio [3]. Thus taskdominant brain activities are greatly enhanced and other subtle and concurrent activities are largely overlooked. Another alternative approach is independent component analysis © Springer International Publishing AG 2016 S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 28–36, 2016. DOI: 10.1007/978-3-319-46720-7_4

A Multi-Stage Sparse Coding Framework

29

(ICA) [5]. However, the theoretical foundation of ICA-based methods has been challenged in recent studies [6]. Therefore, more advanced tfMRI activation detection methods are still needed. Recently, dictionary learning and sparse representation methods have been adopted for fMRI data analysis [6, 7] and attracted a lot of attention. The basic idea is to factorize the fMRI signal matrix into an over-complete dictionary of basis and a coefficient matrix via dictionary learning algorithms [8]. Specifically, each dictionary atom represents the functional activity of a specific brain network and its corresponding coefficient vector stands for the spatial distribution of this brain network [7]. It should be noticed that the decomposed coefficient matrix naturally reveals the spatial patterns of the inferred brain networks. This novel strategy naturally accounts for the various brain networks that might be involved in concurrent functional processes [9, 10]. However, a notable challenge in current data-driven strategy is how to establish accurate network correspondence across individuals and characterize the group-wise consistent activation map in a structured method. Since each dictionary is learned in a data driven way, it is hard to establish the correspondence across subjects. To address this challenge, in this paper, we propose a novel multi-stage sparse coding framework to identify diverse group consistent brain activities and characterize the subtle cross group differences under specific task conditions. Specifically, we first concatenate all the fMRI dataset temporally and adopt dictionary learning method to identify the group-level activation maps across all the subjects. After that, we constrain spatial/ temporal features in dictionary learning procedure to identify individualized temporal pattern and spatial pattern from individual fMRI data. These constrained features naturally preserve the correspondence across different subjects. Finally, a statistical mapping method is adopted to identify group-wise consistent maps. In this way, the group-wise consistent maps are identified in a structured way. By applying the proposed framework on two groups of tfMRI data (healthy control and PAE groups), we successfully identified diverse group-wise consistent brain networks for each group and specific brain networks/regions that are affected by PAE under arithmetic task.

2 Materials and Methods 2.1

Overview

Figure 1 summarizes the computational pipeline of the multi-stage sparse coding framework. There are four major steps. First, we concatenate all the subjects’ datasets temporally to form a concatenated time*voxels data matrix (Fig. 1a) and employ the dictionary learning and sparse coding algorithms [8] to identify the group-level activation maps in the population. Then for each subject’s fMRI data, we adopt supervised dictionary learning method constraining group-level spatial patterns to learn the individualized dictionary for each subject (Fig. 1b). These individualized dictionaries are learned from individual data and thus the subject variety is better reserved. After that, for each subject, supervised dictionary learning constraining the individual dictionary is adopted to learn individualized coefficient matrix for each subject (Fig. 1c). In this way, the individualized spatial maps are reconstructed. Finally, based on the

30

S. Zhao et al.

Fig. 1. The computational framework of the proposed methods. (a) Concatenated sparse coding. t is the number of time point number and n is the voxel number and k is the dictionary atom number. (b) Supervised dictionary learning with spatial maps fixed. (c) Supervised dictionary learning with temporal features fixed. (d) Statistical mapping to identify group-wise consistent maps for each group.

correspondence established in our method, statistical coefficient mapping method is then adopted to characterize the group-consistent activation maps for each group (Fig. 1d). Therefore, the correspondence between different subjects is preserved in the whole procedure and the group-consistent activation maps are identified in a structured method.

2.2

Data Acquisition and Pre-processing

Thirty subjects participated in the arithmetic task-based fMRI experiment under IRB approval [11]. They are young adults aging from 20–26 and are from two groups: unexposed health control (16 subjects) and PAE affected ones (14 subjects). Two participants from healthy control group are abandoned due to the poor data quality. All participants were scanned in a 3T Siemens Trio scanner and 10 task blocks were alternated between a letter-matching control task and a subtraction arithmetic task. The acquisition parameters are as follows: TR = 3 s, TE = 32 ms, FA = 90, the resolution is 3.44 mm  3.44 mm  3 mm and the dimension is 64  64  34. The preprocessing pipeline was performed in FSL [12] including motion correction, slice time correlation, spatial smoothing, and global drift removal. The processed volumes were then registered to the standard space (MNI 152) for further analysis.

2.3

Dictionary Learning and Sparse Representation

Given the fMRI signal matrix S RLn , where L is the fMRI time points number and n is the voxel number, dictionary learning and sparse representation methods aim to represent each signal in S with a sparse linear combination of dictionary (D) atoms and the coefficient matrix A, i.e., S = DA. The empirical cost function is defined as

A Multi-Stage Sparse Coding Framework

f n ðD Þ ,

n 1X ‘ðsi ; DÞ n i¼1

31

ð1Þ

where D is the dictionary, ‘ is the loss function, n is the voxel number and, si is a training sample which represents the time course of a voxel. This problem of minimizing the empirical cost could be further rewritten as a matrix factorization problem with sparsity penalty: min

DeC;AeRkn

1 jjS 2

DAjj22 þ kjjAjj1;1

ð2Þ

where k is a sparsity regularization parameter, k is the number of dictionary atom number and C is the set defined by the constraint to prevent D having arbitrarily large values. In order to solve this problem, we adopt the online dictionary learning and sparse coding method [8] and the algorithm pipeline is summarized in Algorithm 1 below.

2.4

Constrain Spatial Maps in Dictionary Learning

In this section, we adjust the dictionary learning procedure to constrain spatial maps in dictionary learning procedure to learn the individualized dictionary. Similar to GLM, we name each identified network as activation map. First, each group of activation map is transferred into binary vector matrix V 2 f0; 1gkn by thresholding. Since both A and S share the same number of voxels, they have similar structures. We set all these vectors V as constrains in updating coefficient matrix. Specifically, if the coefficient matrix element in corresponding constrain matrix location is zero, this elements will be replaced with 0.1 (other small nonzero value is acceptable) to keep this element ‘ac-

32

S. Zhao et al.

tive’. It should be noticed that the coefficient matrix is updated except that part of the elements keeps ‘active’ (nonzero). The coefficient matrix updating procedure could be represented as follows. 2 1  Ai , argmin si Dðt 1Þ Ai 2 þ kjjAi jj1 ; m 2 Ai eR ð5Þ Api ¼ 0:1 if Api ¼ 0 && V ði; pÞ ¼ 1 2.5

Constrain Temporal Features in Dictionary Learning

In our method, the dictionary is set as a fully fixed dictionary and the learning problem becomes an easy regression problem. Specifically, this dictionary learning and sparse representation problem leads to the following formulation: min

AeRkn

1 jjS 2

Dc Ajj22 þ kjjAjj1;1

where Dc is the fixed individualized dictionary, k is the dictionary atom number, and A is the learned coefficient matrix from each individual fMRI data with constrained individualized dictionary in dictionary learning procedure.

2.6

Statistical Mapping

With the help of constrained features in dictionary learning procedure, the correspondences of spatial activation maps between different subjects are naturally preserved. In order to reconstruct accurate consistency maps between different groups, we hypothesize that each element in coefficient matrix is group-wisely null and a standard T-test is carried out to test the acceptance of the hypothesis. Specifically, AGx ði; jÞ Tði; jÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi varðAGx ði; jÞÞ

ð7Þ

where AGx ði; jÞ represents the average value of the elements in each group and x represents the patient group or control group. Specifically, the T-test acceptance threshold is set as p\0:05. The derived T-value is further transformed to the standard z-score. In this way, each group generated a group consistent Z statistic map and each row in Z can be mapped back to brain volume standing for the spatial distribution of the dictionary atom.

3 Experimental Results The proposed framework was applied to two groups of tfMRI data: unexposed healthy control and PAE patients. In each stage, the dictionary size is 300 and the sparsity is around 0.05 and the optimization method is stochastic approximations. Briefly, we

A Multi-Stage Sparse Coding Framework

33

identified 263 meaningful networks in concatenated sparse coding stage and 22 of them were affected by PAE. The detailed experimental results are reported as follows.

3.1

Identified Group-Level Activation Maps by Concatenated Sparse Coding

Figure 2 shows a few examples of identified group-level activation maps by concatenated sparse coding in Fig. 1a From these figures, we can see that both GLM activation map as well as common resting state networks [13] are identified, which indicates that sparse coding based methods are powerful in identifying diverse and concurrent brain activities. The quantitative measurement is shown in Table 1. The spatial similarity is defined as: RðX; T Þ ¼

jX \ T j jT j

ð8Þ

where X is the learned spatial network from Al and T is the RSN template.

Fig. 2. Examples of identified meaning networks by concatenated sparse coding. The first row is the template name and the second row is the template spatial map. The third row is the corresponding component network number in concatenated sparse coding. The last row is the corresponding spatial maps in concatenated sparse coding. RSN represents common resting state network in [13] and GLM result is computed from FSL feat software.

Table 1. The spatial overlap rates between the identified networks and the corresponding GLM activation map and resting state templates. GLM RSN#1 RSN#2 RSN#3 RSN#4 RSN#5 RSN#6 RSN#7 RSN#8 RSN#9 0.47 0.45 0.57 0.45 0.37 0.29 0.36 0.48 0.34 0.34

3.2

Learned Individualized Temporal Patterns

After concatenated sparse coding, in order to better account for subject activation variety, we constrained these identified spatial patterns in dictionary learning procedure and learned individualized temporal patterns (the method is detailed in Sect. 2.4) for each subject. Figure 3 shows two kinds of typically learned individualized temporal

34

S. Zhao et al.

(a)

(b) Fig. 3. Identified individualized temporal patterns and correlation matrix between different subjects. (a) Identified individualized temporal patterns by constraining the same task-evoked activation map (identified in concatenated sparse coding) in dictionary learning procedure. The red line is the task paradigm pattern and the other lines are derived individualized temporal activity patterns from healthy control group subjects for the same task-evoked activation map. The right figure is the correlation matrix between different subjects. (b) Identified individualized temporal patterns by constraining resting state activation map (identified in concatenated sparse coding).

patterns and the correlation matrix between different subjects. Specifically, Fig. 3a shows the learned temporal patterns from constraining task-evoked group activation map (Network #175 in Fig. 2). The red line is the task design paradigm which has been convoluted with hemodynamic response function. It is interesting to see that the learned individualized temporal patterns from constraining task-evoked activation map are quite consistent and the average of these learned temporal patterns is similar to the task paradigm regressor. The correlation matrix between subjects in healthy control group is visualized in the right map in Fig. 3a and the average value is as high as 0.5. Another kind of dictionary patterns are learned from constraining resting state networks. Figure 3b shows the learned temporal patterns and correlation matrix between the healthy control group subjects with constraining resting state network (#152 in Fig. 2). The temporal patterns are quite different among different subjects and the average correlation value is as low as 0.15. From these results, we can see that the learned individualized temporal patterns are reasonable according to current neuroscience knowledge and the subtle temporal activation pattern differences among different subjects under the same task condition are recognized with the proposed framework (Fig. 4).

A Multi-Stage Sparse Coding Framework

35

(a)

(b) Fig. 4. Examples of identified group-wise activation map in different groups. (a) and (b) are organized in the same fashion. The first row shows the component number and the second row shows the concatenated sparse coding results. While the third row shows the reconstructed statistical activation map in healthy control group, the last row shows the statistical activation map in PAE group. Blue circles highlight the difference between statistical maps in two groups.

3.3

Affected Activation Networks by Prenatal Alcohol Exposure

In order to identify individualized spatial activation maps, we then constrained individualized dictionary in dictionary learning procedure (detailed in Sect. 2.5) for each subject. These fixed features naturally preserve the correspondence information between subjects. After that, we adopted statistical mapping in Sect. 2.6 to generate statistical group-wise consistency maps for each group. It is easy to see that although the general spatial shapes are similar, there are subtle difference between different group statistical consistent maps which indicated the multi-stage sparse coding better captures the individual variety. Specifically, blue circles highlight the brain regions that are difference between healthy control group and PAE group. These areas includes left inferior occipital areas, left superior, right inferior parietal regions, and medial frontal gyrus which have been reported related to Prenatal Alcohol Exposure [11]. Further, it is also interesting to see that there is a clear reduction of region size in corresponding group consistency networks suggesting the similar effect of Prenatal Alcohol Exposure reported in the literature [11].

36

S. Zhao et al.

4 Conclusion We proposed a novel multi-stage sparse coding framework for inferring group consistency maps and characterizing the subtle group response differences under specific task performance. Specifically, we combined concatenated sparse coding and supervised dictionary learning methods and statistical mapping method together to identify statistical group consistency maps in each group. This novel framework greatly overcomes the limitation of lacking correspondence between different subjects in current sparse coding based methods and provides a structured way to identify statistical group consistent maps. Experiments on healthy control and PAE tfMRI data have demonstrated the great advantage of the proposed framework in identifying meaningful diverse group consistency brain networks. In the future, we will further investigate the evaluation of subjects’ individual maps in the frame work and parameter optimization and test our framework on a variety of other tfMRI datasets. Acknowledgements. J. Han was supported by the National Science Foundation of China under Grant 61473231 and 61522207. X. Hu was supported by the National Science Foundation of China under grant 61473234, and the Fundamental Research Funds for the Central Universities under grant 3102014JCQ01065. T. Liu was supported by the NIH Career Award (NIH EB006878), NIH R01 DA033393, NSF CAREER Award IIS-1149260, NIH R01 AG-042599, NSF BME-1302089, and NSF BCS-1439051.

References 1. Matthews, P.M., et al.: Applications of fMRI in translational medicine and clinical practice. Nat. Rev. Neurosci. 7(9), 732–744 (2006) 2. Fox, M.D., et al.: The human brain is intrinsically organized into dynamic, anticorrelated functional networks. PNAS 102(27), 9673–9678 (2005) 3. Mastrovito, D.: Interactions between resting-state and task-evoked brain activity suggest a different approach to fMRI analysis. J. Neurosci. 33(32), 12912–12914 (2013) 4. Friston, K.J., et al.: Statistical parametric maps in functional imaging: a general linear approach. Hum. Brain Mapp. 2(4), 189–210 (1994) 5. Mckeown, M.J., et al.: Spatially independent activity patterns in functional MRI data during the stroop color-naming task. PNAS 95(3), 803–810 (1998) 6. Lee, K., et al.: A data-driven sparse GLM for fMRI analysis using sparse dictionary learning with MDL criterion. IEEE Trans. Med. Imaging 30(5), 1076–1089 (2009) 7. Lv, J., et al.: Sparse representation of whole-brain fMRI signals for identification of functional networks. Med. Image Anal. 1(20), 112–134 (2014) 8. Mairal, J., et al.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010) 9. Pessoa, L.: Beyond brain regions: network perspective of cognition–emotion interactions. Behav. Brain Sci. 35(03), 158–159 (2012) 10. Anderson, M.L., Kinnison, J., Pessoa, L.: Describing functional diversity of brain regions and brain networks. Neuroimage 73, 50–58 (2013) 11. Santhanam, P., et al.: Effects of prenatal alcohol exposure on brain activation during an arithmetic task: an fMRI study. Alcohol. Clin. Exp. Res. 33(11), 1901–1908 (2009) 12. Jenkinson, M., Smith, S.: A global optimization method for robust affine registration of brain images. Med. Image Anal. 5(2), 143–156 (2001) 13. Smith, S.M., et al.: Correspondence of the brain’s functional architecture during activation and rest. PNAS 106(31), 13040–13045 (2009)

Correlation-Weighted Sparse Group Representation for Brain Network Construction in MCI Classification Renping Yu1,2 , Han Zhang2 , Le An2 , Xiaobo Chen2 , Zhihui Wei1 , and Dinggang Shen2(B) 1

2

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China Department of Radiology and BRIC, UNC at Chapel Hill, Chapel Hill, NC, USA [email protected]

Abstract. Analysis of brain functional connectivity network (BFCN) has shown great potential in understanding brain functions and identifying biomarkers for neurological and psychiatric disorders, such as Alzheimer’s disease and its early stage, mild cognitive impairment (MCI). In all these applications, the accurate construction of biologically meaningful brain network is critical. Due to the sparse nature of the brain network, sparse learning has been widely used for complex BFCN construction. However, the conventional l1 -norm penalty in the sparse learning equally penalizes each edge (or link) of the brain network, which ignores the link strength and could remove strong links in the brain network. Besides, the conventional sparse regularization often overlooks group structure in the brain network, i.e., a set of links (or connections) sharing similar attribute. To address these issues, we propose to construct BFCN by integrating both link strength and group structure information. Specifically, a novel correlation-weighted sparse group constraint is devised to account for and balance among (1) sparsity, (2) link strength, and (3) group structure, in a unified framework. The proposed method is applied to MCI classification using the resting-state fMRI from ADNI-2 dataset. Experimental results show that our method is effective in modeling human brain connectomics, as demonstrated by superior MCI classification accuracy of 81.8 %. Moreover, our method is promising for its capability in modeling more biologically meaningful sparse brain networks, which will benefit both basic and clinical neuroscience studies.

1

Introduction

Study of brain functional connectivity network (BFCN), based on resting-state fMRI (rs-fMRI), has shown great potentials in understanding brain functions R. Yu was supported by the Research Fund for the Doctoral Program of Higher Education of China (RFDP) (No. 20133219110029), the Key Research Foundation of Henan Province (15A520056) and NFSC (No. 61171165, No. 11431015). c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 37–45, 2016. DOI: 10.1007/978-3-319-46720-7 5

38

R. Yu et al.

and identifying biomarkers for neurological disorders [1]. Many BFCN modeling approaches have been proposed and most of them represent the brain network as a graph by treating brain regions as nodes, and the connectivity between a pair of region as an edge (or link) [2]. Specifically, the brain can be first parcellated into different regions-of-interest (ROIs) and then the connectivity in a pair of ROIs can be estimated by the correlation between the mean blood-oxygen-level dependent (BOLD) time series of these ROIs. The most common BFCN modeling approach is based on pairwise Pearson’s correlation (PC). However, PC is insufficient to account for the interaction among multiple brain regions [3], since it only captures pairwise relationship. Another common modeling approach is based on sparse representation (SR). For example, the sparse estimation of partial correlation with l1 -regularization can measure the relationship among certain ROIs while factoring out the effects of other ROIs [4]. This technique has been applied to construct brain network in the studies of Alzheimer’s disease (AD), mild cognitive impairment (MCI) [3], and autism spectrum disorder [5]. However, human brain inherently contains not only sparse connections but also group structure [6], with the latter considered more in the recent BFCN modeling methods. A pioneer work in [7] proposed nonoverlapping group sparse representation by considering group structures and supporting group selections. The group structure has been utilized in various ways. For example, Varoquaux et al. [8] used group sparsity prior to constrain all subjects to share the same network topology. Wee et al. [9] used group constrained sparsity to overcome inter-subject variability in the brain network construction. To introduce the sparsity within each group, sparse group representation (SGR) has also been developed by combining l1 -norm and lq,1 -norm constraints. For example, a recent work [10] defined “group” based on the anatomical connectivity, and then applied SGR to construct BFCN from the whole-brain fMRI signals. Note that, in all these existing methods, the l1 -norm constraint in both SR and SGR penalizes each edge equally. That is, when learning the sparse representation for a certain ROI, BOLD signals in all other ROIs are treated equally. This process ignores the similarity between BOLD signals of the considered ROI and the other ROIs during the network reconstruction. Actually, if BOLD signals of two ROIs are highly similar, their strong connectivity should be kept or enhanced during the BFCN construction, while the weak connectivity shall be restrained. In light of this, we introduce a link-strength related penalty in sparse representation. Moreover, to further make the penalty consistent across all similar links in the whole brain network, we propose a group structure based constraint on the similar links, allowing them to share the same penalty during the network construction. In this way, we can jointly model the whole brain network, instead of separately modeling a sub-network for each ROI. This is implemented by a novel weighted sparse group regularization that considers sparsity, link strength, and group structure in a unified framework. To validate the effectiveness of our proposed method in constructing brain functional network, we conduct experiments on a real fMRI dataset for the BFCN

Brain Network Construction

39

construction and also for BFCN-based brain disorder diagnosis. The experimental results in distinguishing MCI subjects from normal controls (NCs) confirm that our proposed method, with simple t-test for feature selection and linear SVM for classification, can achieve superior classification performance compared to the competing methods. The selected feature (i.e., network connections) by our method can be utilized as potential biomarkers in future studies on early intervention of such a progressive and incurable disease.

2

Brain Network Construction and MCI Classification

Suppose that each brain has been parcellated into N ROIs according to a certain brain atlas. The regional mean time series of the ith ROI can be denoted by a column vector xi = [x1i ; x2i ; ...; xT i ] ∈ RT , where T is the number of time points in the time series, and thus X = [x1 , ..., xi , ..., xN ] ∈ RT ×N denotes the data matrix of a subject. Then the key step of constructing the BFCN for this subject is to estimate the connectivity matrix W ∈ RN ×N , given the N nodes (i.e., xi , i = 1, 2, ..., N ), each of which represents signals in a ROI. Many studies model the connectivity of brain regions by a sparse network [4]. The optimization of the BFCN construction based on SR can be formulated as min W

N  1 i=1

2

||xi −



xj Wji ||22 + λ

N  

|Wji |.

(1)

i=1 j =i

j =i

The l1 -norm penalty involved in Eq. (1) penalizes each representation coefficient with the same weight. In other words, it treats each ROI equally when reconstructing a target ROI (xi ). As a result, sparse modeling methods based on this formulation tend to reconstruct the target ROI by some ROIs that have very different signals as the target ROI. Furthermore, the reconstruction of each ROI is independent from the reconstructions of other ROIs; thus, the estimated reconstruction coefficients for the similar ROIs could vary a lot, and this could lead to an unstable BFCN construction. Hence, the link strength that indicates signal similarity of two ROIs should be considered in the BFCN construction. 2.1

Correlation-Weighted Sparse Group Representation for BFCN Construction

To take into account the link strength, we introduce a correlation-weighted sparse penalty in Eq. (1). Specifically, if BOLD signals of the two ROIs have high similarity, i.e., their link is strong, then this link should be penalized less. On the other hand, weak link should be penalized more with larger weight. To measure the link strength between signals of two ROIs, PC coefficient can be calculated. Then the penalty weight for Wji , i.e., the link between the ith ROI xi and the j th ROI xj , can be defined as: Cji = e−

2 Pji σ

,

(2)

40

R. Yu et al.

where Pji is the PC coefficient between the ith ROI xi and the j th ROI xj , and σ is a parameter used to adjust the weight decay speed for the link strength adaptor. Accordingly, the correlation-weighted sparse representation (WSR) can be formulated as min W

N  1 i=1

2

||xi −



xj Wji ||22 + λ

N  

Cji |Wji |,

(3)

i=1 j =i

j =i

where C ∈ RN ×N is the link strength adaptor matrix with each element Cji being inversely proportional to the similarity (i.e., PC coefficient) between the signals in ROI xj and the signals in the target ROI xi . Note that the reconstruction of xi , i.e., the ith sub-network construction, is still independent from the reconstructions of sub-networks for other ROIs. In order to further make this link-strength related penalty consistent across all links with similar strength in the whole network, we propose a group structure constraint on the similar links, allowing them to share the same penalty during the whole BFCN construction. In this way, we can model the whole brain network jointly, instead of separately modeling sub-networks of all ROIs. 2000

1

G

0.8

0.2 0 -0.2

2

G3

1500

Group Size

0.4

1

G

0.6

G4 1000

G5 G6

500

G

-0.4 -0.6 -0.8

(a) PC Matrix

0

0

0.2

0.4

7

G8

0.6

G9

G10

0.8

1

Absolute PC value

(b) Group partition

Fig. 1. Illustration of group partition for a typical subject in our data. (a) Pearson correlation coefficient matrix P . (b) The corresponding group partition (K = 10) of (a).

To identify the group structure, we partition all links, i.e., the pairwise connections among ROIs, into K groups based on the PC coefficients. Specifically, K non-overlapping groups of links are pre-specified by their corresponding PC coefficients. Assuming the numerical range of the absolute value of the PC coefficient |Pij | is [Pmin , Pmax ] with Pmin ≥ 0 and Pmax ≤ 1, we partition [Pmin , Pmax ] into K uniform and non-overlapping partitions with the same interval ∆ = (Pmax − Pmin )/K. The k th group is defined as Gk = {(i, j) | |Pij | ∈ [Pmin + (k − 1)∆, Pmin + k∆]}. Figure 1 shows the grouping results by setting K = 10 for illustration purpose. Most link’s strength in the network is weak, while the strong connectivity accounts for a small number of links. To integrate constraints on link strength, group structure, as well as the sparsity in a unified framework, we propose a novel weighted sparse group regularization formulated as:

Brain Network Construction

min W

N  1 i=1

2



xj Wji ||22 + λ1



q (i,j)∈Gk (Wij )

||xi −

where ||WGk ||q =

N  

Cji |Wji | + λ2

i=1 j =i

j =i

K 

dk ||WGk ||q ,

41

(4)

k=1

is the lq -norm (with q=2 in this work).  dk = e is a pre-defined weight for the k th group and Ek = |G1k | (i,j)∈Gk Pij . σ is the same parameter as in Eq. (2), set as the mean of all subjects’ standard variances of absolute PC coefficients. In Eq. (4) the first regularizer (l1 -norm penalty) controls the overall sparsity of the reconstruction model, and the second regularizer (lq,1 -norm penalty) contributes the sparsity at the group level. E2 − σk

2.2

q

MCI Classification

The estimated BFCN are applied to classification of MCI and NC. Note that the learned connectivity matrix W could be asymmetric. Therefore, we simply make a symmetric matrix by W ∗ = (W + W T )/2, and use W ∗ to represent the final network that contains N (N − 1)/2 effective connectivity measures due to symmetry. These connectivity measures are used as the imaging features, with the feature dimensionality of 4005 for the case of N = 90. For feature selection, we use two-sample t-test with the significance level of p < 0.05 to select features that significantly differentiate between MCI and NC classes. After feature selection, we employ a linear SVM [11] with c = 1 for classification.

3

Experiments

The Alzheimers Disease Neuroimaging Initiative (ADNI) dataset is used in this study. Specifically, 50 MCI patients and 49 NCs are selected from the ADNI-2 dataset in our experiments. Subjects from both groups were scanned using 3.0T Philips scanners. SPM8 toolbox (http://www.fil.ion.ucl.ac.uk/spm/) was used to preprocess the rs-fMRI data according to the well-accepted pipeline [6]. 3.1

Brain Functional Network Construction

Automated Anatomical Labeling (AAL) template is used to define 90 brain ROIs, and the mean rs-fMRI signals are extracted from each ROI to model BFCN. For comparison, we also construct the brain networks using other methods, including PC, SR, WSR, and SGR (corresponding to Cji = 1 in Eq. (4)). The SLEP toolbox [12] is used to solve the sparse related models in this paper. Figure 2 shows the visualization of the constructed BFCNs of one typical subject based on 5 different methods. As can be seen from Fig. 2(a), the intrinsic grouping in brain connectivity can be indirectly observed, although the PC only measures pairwise ROI interaction. Comparing Fig. 2(b) and (d), we can observe that there are relatively fewer non-zero elements in the SGR-based model due to the use of group sparse regularization. Similarly, the group structure is more obvious in Fig. 2(e) by our WSGR method than that in Fig. 2(c) by WSR.

42

R. Yu et al. 1

0.6

0.6

0.8

0.4

0.4

0.2

0.2

0

0

-0.2

-0.2

-0.4

-0.4

0.6 0.4 0.2 0 -0.2

-0.6

-0.4

(a) PC

(d) SGR

-0.6

(c) WSR

(b) SR 0.6

0.6

0.4

0.4

0.2

0.2

0

0

-0.2

-0.2

-0.4

-0.4

-0.6

-0.6

(e) WSGR

Fig. 2. Comparison of BFCNs of the same subject reconstructed by 5 different methods.

Regarding the effectiveness of using the link-strength related weights, we can see from Fig. 2(c) and (e) that the sparse constraint with the link-strength related weights is more reasonable for modeling the BFCN than their counterparts without weights (shown in Fig. 2(b) and (d)). 3.2

Classification Results

A leave-one-out cross-validation (LOOCV) strategy is adopted in our experiments. To set the values of the regularization parameter (i.e., λ in SR, WSR, and λ1 , λ2 in SGR, WSGR), we employed a nested LOOCV strategy on the training set. The parameters are grid-searched in the range of [2−5 , 2−4 , ..., 20 , ..., 24 , 25 ]. To evaluate the classification performance, we use seven evaluation measures: accuracy (ACC), sensitivity (SEN), specificity (SPE), area under curve (AUC), Youden’s index (YI), F-Score and balanced accuracy (BAC). As shown in Fig. 3, the proposed WSGR model achieves the best classification performance with an accuracy of 81.8 %, followed by WSR (78.8 %). By comparison, we can verify the effectiveness of the link strength related weights from two aspects. First, it can be observed that the WSR model with link strength related weights from PC performs much better than both the PC and SR models. Second, the classification result our model outperforms the SGR model (72.73 %). Similarly, by comparing the results by the SR and WSR model with those by the SGR and WSGR models, the effectiveness of our introduced group structure based penalty can be well justified. With the DeLong’s non-parametric statistical significance test [13], our proposed method

Brain Network Construction PC

SR

WSR

SGR

1

WSGR

True Positive Rate

1

Results

0.8 0.6 0.4 0.2

ACC

SEN

SPE

AUC

YI

F-score

(a) Classification Performance

43

BAC

0.8 0.6 PC SR WSR SGR WSGR

0.4 0.2 0 0

0.2

0.4

0.6

0.8

1

False Positive Rate

(b) ROC Curves

Fig. 3. Comparison of classification results by five methods, using both seven classification performance metrics and ROC curve.

significantly outperforms PC, SR, WSR and SGR under 95 % confidence interval with p − value = 1.7 × 10−7 , 3.6 × 10−6 , 0.048 and 0.0017, respectively. The superior performance of our method suggests the weighted group sparsity is beneficial in constructing brain networks and also able to improve the classification performance. As the selected features by t-test in each validation might be different, we record all selected features during the training process. The 76 most frequently selected features are visualized in Fig. 4, where the thickness of an arc indicating the discriminative power of an edge, which is inversely proportional to the estimated p-values. The colors of arcs are randomly generated to differentiate ROIs

Fig. 4. The most frequently selected connections for the 90 ROIs of AAL template. The thickness of an arc indicates the discriminative power of an edge for MCI classification.

44

R. Yu et al.

and connectivity for clear visualization. We can see that several brain regions (as highlighted in the figure) are jointly selected as important features for MCI classification. For example, a set of brain regions in the temporal pole, olfactory areas and medial orbitofrontal cortex, as well as bilateral fusiform, are found to have dense connections which are pivotal to MCI classification [14].

4

Conclusion

In this paper, we have proposed a novel weighted sparse group representation method for brain network modeling, which integrates link strength, group structure, as well as sparsity in a unified framework. In this way, the complex brain network can be more accurately modeled as compared to other commonly used methods. Our proposed method was validated in the task of MCI and NC classification, and superior results were obtained compared to the classification performance of other brain network modeling approaches. In future work, we plan to work on more effective grouping strategy in order to further improve the modeling accuracy and the MCI diagnosis performance.

References 1. Fornito, A., Zalesky, A., Breakspear, M.: The connectomics of brain disorders. Nat. Rev. Neurosci. 16, 159–172 (2015) 2. Smith, S.M., Miller, K.L., et al.: Network modelling methods for FMRI. NeuroImage 54, 875–891 (2011) 3. Huang, S., Li, J., Sun, L., Ye, J., Fleisher, A., Wu, T.: Alzheimer’s Disease NeuroImaging Initiative: learning brain connectivity of Alzheimer’s disease by sparse inverse covariance estimation. NeuroImage 50, 935–949 (2010) 4. Meinshausen, N., B¨ uhlmann, P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat., 1436–1462 (2006) 5. Lee, H., Lee, D.S., et al.: Sparse brain network recovery under compressed sensing. IEEE Trans. Med. Imaging 30, 1154–1165 (2011) 6. Rubinov, M., Sporns, O.: Complex network measures of brain connectivity: uses and interpretations. NeuroImage 52, 1059–1069 (2010) 7. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Series. B. Stat. Methodol 68, 49–67 (2006) 8. Varoquaux, G., Gramfort, A., Poline, J.B., Thirion, B.: Brain covariance selection: better individual functional connectivity models using population prior. In: Advances in Neural Information Processing Systems, pp. 2334–2342 (2010) 9. Wee, C.Y., et al.: Group-constrained sparse fMRI connectivity modeling for mild cognitive impairment identification. Brain Struct. Funct. 219, 641–656 (2014) 10. Jiang, X., Zhang, T., Zhao, Q., Lu, J., Guo, L., Liu, T.: Fiber connection patternguided structured sparse representation of whole-brain fMRI signals for functional network inference. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 133–141. Springer, Heidelberg (2015) 11. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)

Brain Network Construction

45

12. Liu, J., Ji, S., Ye, J.: SLEP: sparse learning with efficient projections. Arizona State Univ. 6, 491 (2009) 13. DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics, 837–845 (1988) 14. Albert, M.S., DeKosky, S.T., Dickson, D., et al.: The diagnosis of mild cognitive impairment due to Alzheimers disease: recommendations from the National Institute on Aging-Alzheimers Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s Dement. 7, 270–279 (2011)

Temporal Concatenated Sparse Coding of Resting State fMRI Data Reveal Network Interaction Changes in mTBI Jinglei Lv1,2(&), Armin Iraji3, Fangfei Ge1,2, Shijie Zhao1, Xintao Hu1, Tuo Zhang1, Junwei Han1, Lei Guo1, Zhifeng Kou3,4, and Tianming Liu2 1

School of Automation, Northwestern Polytechnical University, Xi’an, China [email protected] 2 Cortical Architecture Imaging and Discovery Lab, Department of Computer Science and Bioimaging Research Center, The University of Georgia, Athens, GA, USA 3 Department of Biomedical Engineering, Wayne State University, Detroit, MI, USA 4 Department of Radiology, Wayne State University, Detroit, MI, USA

Abstract. Resting state fMRI (rsfMRI) has been a useful imaging modality for network level understanding and diagnosis of brain diseases, such as mild traumatic brain injury (mTBI). However, there call for effective methodologies which can detect group-wise and longitudinal changes of network interactions in mTBI. The major challenges are two folds: (1) There lacks an individualized and common network system that can serve as a reference platform for statistical analysis; (2) Networks and their interactions are usually not modeled in the same algorithmic structure, which results in bias and uncertainty. In this paper, we propose a novel temporal concatenated sparse coding (TCSC) method to address these challenges. Based on the sparse graph theory the proposed method can model the commonly shared spatial maps of networks and the local dynamics of the networks in each subject in one algorithmic structure. Obviously, the local dynamics are not comparable across subjects in rsfMRI or across groups; however, based on the correspondence established by the common spatial profiles, the interactions of these networks can be modeled individually and statistically assessed in a group-wise fashion. The proposed method has been applied on an mTBI dataset with acute and sub-acute stages, and experimental results have revealed meaningful network interaction changes in mTBI.

1 Introduction Mild traumatic brain injury (mTBI) has received increasing attention as a significant public health care burden worldwide [1, 2]. Microstructural damages could be found in most cases of mTBI using diffusion MRI [3, 4]. Meanwhile many researches based on resting state fMRI (rsfMRI) have reported that there are functional impairment at network level with the aspects of memory, attention, executive function and processing time [5–7]. However, there still lacks effective methodology that could model the © Springer International Publishing AG 2016 S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 46–54, 2016. DOI: 10.1007/978-3-319-46720-7_6

Temporal Concatenated Sparse Coding of Resting State fMRI Data

47

changes of interactions among brain networks longitudinally, which reflects the neural plasticity and functional compensation during different stages of mTBI. The challenges mainly lie in two folds: (1) There lacks an individualized and common network system that can serve as a reference platform for statistical analysis; (2) Networks and their interactions are usually not modeled in the same algorithmic structure, which results in bias and uncertainty. Conventional network analysis methods mainly include three streams: seed-based network analysis [8], graph theory based quantitative network analysis [9], and data-driven ICA component analysis [6, 10, 11]. Recently, sparse coding has attracted intense attention in the fMRI analysis field because the sparsity constraint coincides with the nature of neural activities, which makes it feasible in modeling the diversity of brain networks [12–14]. Based on the sparse graph theory, whole brain fMRI signals can be modeled by a learned dictionary of basis and a sparse parameter matrix. Each signal of voxel in the brain is sparsely and linearly represented by the learned dictionary with a sparse parameter vector [12–14]. The sparse parameters could be projected to the brain volume as spatial functional networks. The methodology has been validated to be effective in reconstructing concurrent brain networks from fMRI data [12–14]. However, functional interactions among these networks have not been well explored, especially for group-wise statistics on rsfMRI data. In this paper, we propose a novel group-wise temporal concatenating sparse coding method for modeling resting state functional networks and their network-level interactions. Briefly, a dictionary matrix and a parameter matrix are learned from the temporally concatenated fMRI data from multiple subjects and groups. Common network spatial profiles can then be reconstructed from the parameter matrix. It is interesting that the learned dictionary is also temporally concatenated and it can be decomposed into dictionary of each subject of each group to represent local dynamics of the common networks. Although the local dynamics of each network are quite individualized, it turns out that their interactions are comparable based on the correspondence built by the common spatial profiles. The proposed method has been applied on a longitudinal mTBI data set, and our results have shown that network interaction changes could be detected in different stages of mTBI, in suggestion of brain recovery and plasticity after injury.

2 Materials and Method 2.1

Overview

Briefly, our method is designed for cross-group analysis and longitudinal modeling. RsfMRI data from multiple subjects and groups are firstly pre-processed, and then they are spatially and temporally normalized, based on which fMRI signals will be temporally concatenated. There are mainly two steps in our framework. As shown in Fig. 1, in the first step, based on temporal concatenated sparse coding (TCSC), we model common spatial profiles of brain networks and local network dynamics at the same time. In the second step (Fig. 2), based on the local dynamics, functional interactions among networks will be calculated, statistically assessed and compared among groups. In this research, there are two groups of subjects, which are healthy controls

48

J. Lv et al.

and mTBI patients. For each group, there are two longitudinal stages: stage 1 as patients at the acute stage and controls at the first visit, and stage 2 as patients at subacute stage and controls at the second visit.

2.2

Data Acquisition and Pre-processing

This study was approved by both the Human Investigation Committee of Wayne State University and Institutional Review Board of Detroit Medical Center where the original data was collected [6]. Each participant signed an informed consent before enrollment. RsfMRI data was acquired from 16 mTBI patients at acute and sub-acute stages and from 24 healthy controls at two stages with one month interval, on a 3-Tesla Siemens Verio scanner with a 32-channel radiofrequency head-only coil by using a gradient EPI sequence with the following imaging parameters: TR/TE = 2000/30 ms, slice thickness = 3.5 mm, slice gap = 0.595 mm, pixel spacing size = 3.125  3.125 mm, matrix size = 64  64, flip angle = 90°, 240 volumes for whole-brain coverage, NEX = 1, acquisition time of 8 min [6]. Pre-processing included skull removal, motion correction, slice-time correction, spatial smoothing (FWHM = 5 mm), detrending and band-pass filtering (0.01 Hz– 0.1 Hz). All fMRI images were registered into the MNI space and all fMRI data were uniformed by a common brain mask across groups. To avoid bias caused by individual variability, each signal of voxel in each subject was normalized to mean 0 and standard deviation of 1. Based on the voxel correspondence built by registration, the temporal signals of each voxel can be temporally concatenated across subjects.

2.3

Concatenated Sparse Coding

The method of temporal concatenated sparse coding (TCSC) is summarized in Fig. 1. Whole brain fMRI signals of each subject is extracted in a designed order and managed in a signal matrix as shown in Fig. 1a. Then signal matrices from multiple subjects and

Fig. 1. The framework of TCSC method on longitudinal mTBI rsfMRI data.

Temporal Concatenated Sparse Coding of Resting State fMRI Data

49

multiple groups are concatenated as the input of dictionary learning and sparse coding, S ¼ ½s1 ; s2 . . .; si . . .sn Š (Fig. 1b). Eventually, the concatenated input matrix is decomposed with a concatenated dictionary matrix D (Fig. 1c) and a parameter matrix A ¼ ½a1 ; a2 . . .; ai . . .an Š (Fig. 1d). Each row of the matrix A is projected to brain volume to represent a functional network (Fig. 1e). As the learning is based on groups of subjects, the networks are group-wise and common spatial profiles. The dictionary learning and sparse coding problem is an optimized matrix factorization problem in the machine learning field [15]. The cost function of the problem can be summarized in Eq. (1) by considering the average loss of single representation. f n ðD Þ ,

n 1X ‘ðsi ; DÞ n i¼1

ð1Þ

The loss function for each input signal is defined in Eq. (2), in which an l1 regularization term was introduced to yield the sparse solution of ai . ‘ðsi ; DÞ ,

min

m

DC;ai R

n C , DRtm

1 jjsi 2

Dai jj22 þ kjjai jj1

s:t: 8j ¼ 1; . . .m;

djT dj  1

ð2Þ o

ð3Þ

For this problem, an established and open-sourced parallel computing solution has been provided by online dictionary learning method [15] in the SPArse Modeling Software (http://spams-devel.gforge.inria.fr/). We adopt the SPAMS method to solve our temporal concatenated sparse coding problem.

2.4

Network Interaction Statistics

As shown in Figs. 1c and 2a, the learned dictionary also follows the concatenating rules of the original fMRI signals. By decomposing the dictionary D into small dictionaries of each individual, D1 … Dk, we can map the local dynamics of the common networks. These signal patterns (Fig. 2a) are quite individualized and not comparable across subjects because in resting state brain activities of subjects are more random, however, interactions among these networks are comparable, and especially thanks to the correspondence established by the common spatial maps individual variability are balanced and statistical analysis could be realized for cross-group analysis. So as shown in Fig. 2b, for each subject from each group, we define the interaction matrix by calculating the Pearson’s correlations among dictionary atoms. And three steps are included in the statistics on the interactions. First, a global null hypothesis t-test was performed across all groups and stages, and in this way weak interactions which are globally close to zero are removed. In the second step, on the survived interactions one-way ANOVA was employed to detect interactions which exhibit significant difference across two stages and two groups. Finally, based on the significance analysis output, we use two sample t-test to detect significant interaction

50

J. Lv et al.

Fig. 2. Statistics on network interactions across two stages and two groups.

difference of control subjects and mTBI patients. In addition, longitudinal interaction changes can also be analyzed.

3 Results There are complex sources for mTBI patients and the micro damages in the brain tissue are quite different across subjects. However, based on the cognitive test and literature report, patients usually suffer from similar functional defect, so that we group them together to explore common functional interaction changes. In this part, we firstly present meaningful networks from the concatenated sparse coding and then we will analyze the statistical interaction differences among four groups. Note that, there are two scans for each subject, and we will use the following abbreviations: C1: Stage 1 of control group; C2: Stage 2 of control group; P1: Stage 1 of patient group; and P2: Stage 2 of patient group.

3.1

Common Networks from Temporal Concatenated Sparse Coding

In the pipeline of Sect. 2.3, we have set the dictionary size as 50 which is based on previous ICA and sparse coding works [6, 14], thus there are 50 common networks learned from concatenated sparse coding. However, based on visual inspection, a variety of these networks are artifact components. So we removed these networks as well as networks related with white matter and CSF from the following analysis. Finally 29 networks are kept for interaction analysis as shown in Fig. 3. In Fig. 3, networks are reordered and renamed from N1 to N29. Among these networks, there are conventional intrinsic networks such as visual, auditory, motor, executive control and default mode networks. These networks could cover the whole cerebral cortex and also include subcortical regions, such as cerebellum, thalamus, and corpus callosum.

Temporal Concatenated Sparse Coding of Resting State fMRI Data

51

Fig. 3. The networks reconstructed from the matrix A of the TCSC method. Each network is visualized with volume rendering and surface mapping from the most representative view.

3.2

Interaction Analysis and Comparison

With the statistical steps in Sect. 2.4, we analyzed the network interaction differences across groups. After the first step of global null hypothesis t-test (p < 0.05) and the second step of one-way ANOVA (p < 0.05), 16 interactions out of the 406 interactions show significant differences among four groups. In order to determine the direction of differences, we designed a series of two-sample t-test, as shown in Table 1. In Table 1, each element indicated by two group IDs shows the number of interaction with significant difference, e.g., element (C1, P1) is the number of interactions with significance of C1 > P1 and element (P1, C1) is the number of interactions with significance of P1 > C1. Note that we also put C1 and C2 together as C group, and put P1 and P2 together as P group in the lower part of Table 1. Considering the multiple comparisons, we used a corrected threshold of p < 0.01 to determine significance. In Fig. 4, we visualized all the interactions with significant differences with red lines. In general, only two network interactions are weakened because of mTBI; however, there are 8 network interactions that are strengthened as shown in Fig. 4a–b. Table 1. T-test design and number of interactions with significant difference (p < 0.01).

T-Test Design C1 C2 P1 P2 T-Test Design C P

C1 Non 0 2 2 C Non 8

C2 0 Non 4 4 P 2 Non

P1 3 2 Non 3

P2 0 0 0 Non

52

J. Lv et al.

Fig. 4. Visualization of the network interactions with significant difference in Table 1.

It indicates that in order to compensate the functional loss because of micro injury of mTBI, multiple networks and their interactions are combined to generate alternative functional pathways [16]. These could be signs of neural plasticity [17]. For longitudinal analysis, we expect P1 (acute stage) and P2 (sub-acute sage) to be different, so that t-tests are performed separately with the control groups as well as between the two groups. For validation, we also treat the C1 and C2 as different groups. First, from Table 1, there is no difference detected between C1 and C2, which is as expected. Interestingly, C1 and C2 have stronger interactions than P1 (Fig. 4c–d), but don’t have interactions stronger than P2. This indicates that patients at the sub-acute stage are recovering towards normal, and in the recovery, there are also interactions are strengthened (Fig. 4e) in P2. The interaction of N14 and N20 is stably decreased in P1 group, which makes sense because both N14 and N20 are related to memory function. P1 and P2 both have stronger interactions than control group, but they are quite different (Fig. 4f–i). For example, N24 (DMN) centered interactions are enhanced in P1, which might suggest the strengthened functional regularization for functional compensation. And N18 (cerebellum) centered interactions are enhanced in P2. These findings are interesting and explicit interpretations of these interactions will be explored in the future.

Temporal Concatenated Sparse Coding of Resting State fMRI Data

53

4 Conclusion In this paper, we proposed a network interaction modeling method to determine the longitudinal changes of mTBI. The method is based on the temporal concatenated sparse coding, by which common spatial network profiles can be modeled across groups and at the same time local dynamics can also be modeled for each individual. Based on the network correspondence established by the common spatial maps, network interactions are statistically compared across groups and longitudinally. Our method has been applied on an mTBI data set with acute and sub-acute stages. Experimental results have shown that neural plasticity and functional compensation could be observed through the interaction changes. Acknowledgement. This work was supported by NSF CAREER Award IIS-1149260, NSF BCS-1439051, NSF CBET-1302089, NIH R21NS090153 and Grant W81XWH-11-1-0493.

References 1. Iraji, A., et al.: The connectivity domain: analyzing resting state fMRI data using feature-based data-driven and model-based methods. Neuroimage 134, 494–507 (2016) 2. Kou, Z., Iraji, A.: Imaging brain plasticity after trauma. Neural Regen. Res. 9, 693–700 (2014) 3. Kou, Z., VandeVord, P.J.: Traumatic white matter injury and glial activation: from basic science to clinics. Glia 62, 1831–1855 (2014) 4. Niogi, S.N., Mukherjee, P.: Diffusion tensor imaging of mild traumatic brain injury. J. Head Trauma Rehabil. 25, 241–255 (2010) 5. Mayer, A.R., et al.: Functional connectivity in mild traumatic brain injury. Hum. Brain Mapp. 32, 1825–1835 (2011) 6. Iraji, A., et al.: Resting state functional connectivity in mild traumatic brain injury at the acute stage: independent component and seed based analyses. J. Neurotrauma 32, 1031– 1045 (2014) 7. Stevens, M.C., et al.: Multiple resting state network functional connectivity abnormalities in mild traumatic brain injury. Brain Imaging Behav. 6, 293–318 (2012) 8. Fox, M., Raichle, M.: Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. Nat. Rev. Neurosci. 8(9), 700 (2007) 9. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009) 10. van de Ven, V., Formisano, E., Prvulovic, D., Roeder, C., Linden, D.: Functional connectivity as revealed by spatial independent component analysis of fMRI measurements during rest. Hum. Brain Mapp. 22(3), 165–178 (2004) 11. Iraji, A., et al.: Compensation through functional hyperconnectivity: a longitudinal connectome assessment of mild traumatic brain injury. Neural Plast. 2016, 4072402 (2016) 12. Lee, Y.B, Lee, J., Tak, S., et al.: Sparse SPM: sparse-dictionary learning for resting-state functional connectivity MRI analysis. Neuroimage 125 (2015) 13. Lv, J., et al.: Assessing effects of prenatal alcohol exposure using group-wise sparse representation of fMRI data. Psychiatry Res. Neuroimaging 233(2), 254–268 (2015)

54

J. Lv et al.

14. Lv, J., et al.: Sparse representation of whole-brain FMRI signals for identification of functional networks. Med. Image Anal. 20(1), 112–134 (2014) 15. Mairal, J., Bach, F., Ponce, J., et al.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11(1), 19–60 (2010) 16. Chen, H., Iraji, A., Jiang, X., Lv, J., Kou, Z., Liu, T.: Longitudinal analysis of brain recovery after mild traumatic brain injury based on groupwise consistent brain network clusters. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part II. LNCS, vol. 9350, pp. 194–201. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24571-3_24 17. Mishina, M.: Neural plasticity and compensation for human brain damage. Nihon Ika Daigaku Igakkai Zasshi 10(2), 101–105 (2014)

Exploring Brain Networks via Structured Sparse Representation of fMRI Data Qinghua Zhao1,3, Jianfeng Lu1(&), Jinglei Lv2,3, Xi Jiang3, Shijie Zhao2,3, and Tianming Liu3(&) 1

2

School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China [email protected] School of Automation, Northwestern Polytechnical University, Xi’an, China 3 Cortical Architecture Imaging and Discovery, Department of Computer Science and Bioimaging Research Center, The University of Georgia, Athens, GA, USA [email protected]

Abstract. Investigating functional brain networks and activities using sparse representation of fMRI data has received significant interests in the neuroimaging field. It has been reported that sparse representation is effective in reconstructing concurrent and interactive functional brain networks. However, previous data-driven reconstruction approaches rarely simultaneously take consideration of anatomical structures, which are the substrate of brain function. Furthermore, it has been rarely explored whether structured sparse representation with anatomical guidance could facilitate functional networks reconstruction. To address this problem, in this paper, we propose to reconstruct brain networks using the anatomy-guided structured multi-task regression (AGSMR) in which 116 anatomical regions from the AAL template as prior knowledge are employed to guide the network reconstruction. Using the publicly available Human Connectome Project (HCP) Q1 dataset as a test bed, our method demonstrated that anatomical guided structure sparse representation is effective in reconstructing concurrent functional brain networks. Keywords: Sparse representation Functional networks



Dictionary learning



Group sparsity



1 Introduction Functional magnetic resonance imaging (fMRI) signal analysis and functional brain network investigation using sparse representation has received increasing interests in the neuroimaging field [1, 10]. The main theoretical assumption is that each brain fMRI signal can be represented as sparse linear combination of a set of signal basis in an over-complete dictionary. The data-driven strategy of dictionary learning and sparse coding is efficient and effective in reconstructing concurrent and interactive functional networks from both resting state fMRI (rsfMRI) and task base fMRI (tfMRI) data [1, 10]. However, these approaches have potential space of further improvement, because the pure data-driven sparse coding does not integrate brain science domain knowledge © Springer International Publishing AG 2016 S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 55–62, 2016. DOI: 10.1007/978-3-319-46720-7_7

56

Q. Zhao et al.

when reconstructing functional networks. In the neuroscience field, it is widely believed that brain anatomy and structure play crucial roles in determining brain function, and anatomical structure is the substrate of brain function. Thus integrating anatomical structure information into brain network representation is well motivated and justified. In this paper, we propose a novel anatomy-guided structured multi-task regression (AGSMR) method for functional network reconstruction by employing anatomical group structures to guide sparse representation of fMRI data. In general, group-wise structured multi-task regression has been an established methodology, which puts group structure on the multi-tasks and employs a combination of ‘2 and ‘1 norms in order to learn both intra-group homogeneity and inter-group sparsity [2, 6]. Our premise is that fMRI voxels from the same anatomical structure should potentially play similar role in brain function. Thus, employing 116 brain regions from the AAL template as anatomical group information could effectively improve the network representation by constraining both homogeneity within anatomical structure and sparsity across anatomical structures. After applying our method on the recently publicly released Human Connectome Project (HCP) data, our experimental results demonstrate that networks have been improved with higher similarity, which also provides anatomical clues for understanding the detected brain networks.

2 Method 2.1

Overview

Our computational framework of AGSMR is illustrated in Fig. 1. fMRI images from individual brain are first registered into a standard space(MNI) to align with the AAL template. Then extracting fMRI signals from a whole brain mask, an over-complete signal dictionary is learned via online dictionary method. The learned dictionary as a set of features (regressors), the group structured multi-task regression employs anatomical structures as group information to regress whole brain signals. Finally, the coefficients matrix are mapped back to the brain volume represent functional brain networks.

2.2

Data Acquisition and Preprocessing

The recently publicly released fMRI data by Human Connectome Project (HCP) (Q1) was used in this paper. The dataset (Q1) was acquired for 68 subjects and it includes 7 tasks such as Motor, Emotion, Gambling, Language, Relational, Social, and Working Memory. The acquisition parameters of tfMRI data are as follows: 90  104 matrix, 220 mm FOV, 72 slices, TR = 0.72 s, TE = 33.1 ms, flip angle = 52°, BW = 2290 Hz/Px, in-plane FOV = 208  180 mm, 2.0 mm isotropic voxels. The preprocessing pipelines included motion correction, spatial smoothing, temporal pre-whitening, slice time correction, global drift removal. More details about the task descriptions and preprocessing are referred to [9]. After preprocessing, all fMRI images are registered into a standard template space (MNI space). Then fMRI signals are extracted from voxels within a brain mask, and each signal was normalized to be with zero mean and standard deviation of 1.

Exploring Brain Networks via Structured Sparse Representation of fMRI Data Acquisition Preprocess

Dictionary Learning

Extracte Signals

Step1

57

D

Step2

Step3

Step4

Step5 Feature Mapping

Label of Signals Using AAL

Anatomy-guided Structured Multi-task Regression

Fig. 1. The flowchart of proposed AGSMR method pipeline: Step 1: data acquisition, preprocessing and extract the whole brain signals. Step 2: using the whole signals for learning dictionary D. Step 3: labelling of the whole signals via the AAL template. Step 4: feature Selection based on AGSMR method. Step 5: mapping the selected feature (coefficient matrix) in the whole brain to identify these meaningful functional networks.

2.3

The Whole Brain Signals Dictionary Learning

In our method, an over-complete dictionary D is first learned from the whole brain fMRI signals X ¼ ½x1 ; x2 ; . . .xn Š 2 Rtn (t is the fMRI signal time point and n is the voxel number) using online dictionary learning method [4]. The theoretical assumption here is that the whole brain fMRI signals are represented by sparse linear combination of a set of signal basis, i.e., dictionary atoms. The empirical cost function of learning is defined in Eq. (1) fn ðDÞ ,

1 Xn ‘ðxi ; DÞ i¼1 n

1 ‘ðxi ; DÞ , minm jjxi ai 2R 2

Dai jj22 þ kjjai jj1

ð1Þ ð2Þ

where D ¼ ½d1 ; d2 ; . . .dn Š 2 Rtm (t is the fMRI signal time point and m is the number of dictionary atoms) is the dictionary, each column representing a basis vector, the ‘1 regularization in Eq. (2) was adopted to generate a sparse solution, D and a are alternatively updated and learned by using online dictionary learning algorithm [4]. The learned D was adopted as the features (regressors) to perform sparse representation and the proposed structured sparse representation of brain fMRI signals is detailed in Sect. 2.5.

58

2.4

Q. Zhao et al.

Grouping fMRI Signals with Anatomical AAL Template

By using the AAL template [7], 116 brain regions are employed in our method as shown in Fig. 1. Specially, the whole brain voxel are separated into 116 groups based on AAL template. Before signal extraction, each subject has been registered into the standard space(MNI) and alignment is established with the AAL template, where each voxel in brain mask is associated with a template label. Voxels with same anatomical AAL label are grouped together. Thus, in each brain, voxels of fMRI signals are categorized and labeled as 116 AAL groups. This anatomical group information will be used to guide the coefficient matrix learning in the next section.

2.5

Anatomical Guided Structured Multi-task Regression (AGSMR)

In conventional approach, once the dictionary D are defined, the learning of coefficient matrix is summarized into the typical LASSO [5] problem in Eq. (3). ^ a ¼ argmin‘ðaÞ þ k/ðaÞ

ð3Þ

where ‘ðaÞ is the loss function, and /ðaÞ is the regularization term, which could regularize feature selection while achieving sparse regularization, and k > 0 is the regularization parameter. Once we learned dictionary D ¼ ½d1 ; d2 ; . . .dn Š 2 Rtm (Sect. 2.3), the conventional LASSO perform regression of brain fMRI signals X ¼ ½x1 ; x2 ; . . .xn Š 2 Rtn to obtain a sparse coefficient matrix a ¼ ½a1 ; a2 ; . . .an Š 2 Rmn was defined as: ^ a ¼ argmin

Xn

jjxi i¼1

Dai jj22 þ k

Xn Xm i¼1

j¼1

jaij j

ð4Þ

where ‘ðaÞ is defined as the least square loss, and /ðaÞ is the ‘1 -norm regularization term to induce sparsity, aij is the coefficient element at the i-th column and j-th row, m is the dictionary size. Equation (4) can be viewed as the LASSO penalized least squares problem, conventional LASSO in Eq. (4) is pure data-driven approach, However, according to the previous studies [2, 3, 6] that have shown that the priori structure information such as disjoint/overlapping groups, trees, and graphs may significantly improve the classification/regression performance and help identify the important features [3]. In this paper, we propose a novel structured sparse representation approach (group guided structured multi-task regression) into the regression of fMRI signals. Specifically, the group information of fMRI signals are defined by the anatomical structure in Sect. 2.4, i.e., the whole brain fMRI signals are separated into v groups fG1 ; G2 ; . . .Gv g; v ¼ 1; 2; . . .V based on the AAL template. The conventional LASSO adopted the ‘1 norm regularization term to induce sparsity (Eq. (4)), here the ‘2 norm penalty is introduced into the penalty term as shown in Eq. (5), which will improve the intra-group homogeneity. Meanwhile, we using ‘1 norm joint ‘2 norms penalty which will induce both intra-group sparsity and inter-group sparsity in Eq. (5).

Exploring Brain Networks via Structured Sparse Representation of fMRI

^ a ¼ argmin þ ð1

Xn



jjxi i¼1

Dai jj22 þ k

Xm Xs

s¼1

j¼1

Xn Xm i¼1

j¼1

59

jaij j

xs jjaGj s jj2

ð5Þ

Thus, Eq. (5) can be also viewed as the structured sparse penalized multi-task least squares problem. The detailed solution of this structured LASSO penalized multi-task least squares problem with combined ‘1 and ‘2 norms were referred to [6, 8] our final learning problem is summarized in Eq. (5). http://yelab.net/software/SLEP/) is the SLEP package employed to solve the problem and to learn the coefficient matrix a. From brain science perspective, the learned coefficient matrix a include the spatial feature of functional networks and each row of a spatial features were mapped back to brain volume to identify and quantitatively characterize those meaningful functional networks similar to the methods in [1].

3 Results 3.1

Identifying Resting State Networks on Seven Task Datasets

To evaluate the identified networks, we defined a spatial similarity coefficient for check the spatial similarity between the identified networks and the resting state networks (RSNs) template [11]. The similarity coefficient was defined as below: S¼

jA \ Bj jBj

ð6Þ

where A is the spatial map of our identified network component and B is that of the RSNs template network. jAj And jBj are the numbers of voxels. We performed quantitative measurements on Working Memory task dataset to demonstrate the performance of our method. We selected 10 well-known resting state networks to compare spatial similarity. The identified networks are visualized in Fig. 2. The figures(RSNs#1—RSNs#10) represent 10 resting state template networks(RSNs) and #1–#10 represent our identified networks. It is shown that our method identified networks are consistent with the templates. The slice #1, #2, and #3 are visual network, which correspond to medial, occipital pole, and lateral visual areas, the slice #4 is default mode network(DMN),the slice #5 to #8 are cerebellum, sensorimotor, auditory and executive control networks respectively. The slice #9 and #10 are frontoparietal networks, all of these identified networks activated areas are consistent with template networks and the detailed comparision results in Table 1. In order to validate our method effective and robust, we used seven different task datasets to test our approach. Figure 3 shows the results. Table 1 shows similarity results compare with template on 7 different datasets.

60

Q. Zhao et al.

Fig. 2. Comparison 10 resting state networks (RSNs) with our method identified networks on working memory task dataset. The figures(RSNs#—1RSNs#10) show 10 resting state template networks [11] and (#1–#10) our method identified networks. Table 1. Similarity coefficients between our results and the templates. The first column in table is 7 tasks. The first row (#1–#10) indexes 10 networks. The average similarity across 7 tasks is achieved as 0.623. Task WM Emotion Gambling Language Motor Relational Social

3.2

#1 0.84 0.82 0.86 0.86 0.83 0.81 0.82

#2 0.66 0.65 0.65 0.66 0.66 0.68 0.66

#3 0.72 0.61 0.61 0.62 0.62 0.62 0.67

#4 0.61 0.46 0.53 0.57 0.47 0.47 0.48

#5 0.67 0.54 0.54 0.74 0.47 0.47 0.54

#6 0.74 0.84 0.57 0.56 0.53 0.53 0.56

#7 0.68 0.62 0.55 0.62 0.51 0.51 0.63

#8 0.45 0.42 0.43 0.45 0.41 0.41 0.42

#9 0.63 0.65 0.66 0.67 0.60 0.60 0.71

#10 0.69 0.70 0.73 0.72 0.79 0.79 0.71

Comparison Between Our Method and Traditional Method

In this section, we compare our method and LASSO method on both Working Memory and Gambling datasets. Figure 4 shows the our method and LASSO method identified visual network, executive control network and auditory network, respectively. Table 2 shows the two methods similarities comparisons results with the template on two different task datasets. These comparisons show that our method has higher similarity with the template, and in this sense it is superior in reconstructing functional networks than no used anatomical structure the traditional method of LASSO.

Exploring Brain Networks via Structured Sparse Representation of fMRI

(a)

61

(b)

Fig. 3. (a) and (b) shows 10 resting state networks of one randomly selected subjects on HCP Q1 datasets. The first row represents 7 different tasks and the seven columns are corresponding to 7 tasks and the last column shows the corresponding resting state network templates.

(a)

(b) Fig. 4. (a), (b) Shows Template, LASSO and Our method identified Visual Network (a), Executive Control network and Auditory Network (b), on Working Memory dataset. Table 2. Comparison two methods by calculating the similarities with the templates. The first row represents 10 resting state networks (#1–#10). The first column represents two different methods. The second column represents two datasets working memory (WM) and gambling (GB). In general, our method have higher similarity compared with LASSO method. Method Lasso

RSNs WM GB AGSMR WM GB

#1 0.79 0.83 0.84 0.86

#2 0.64 0.64 0.66 0.65

#3 0.71 0.55 0.73 0.61

#4 0.62 0.49 0.61 0.53

#5 0.50 0.52 0.71 0.54

#6 0.48 0.56 0.74 0.57

#7 0.53 0.47 0.68 0.55

#8 0.44 0.43 0.45 0.43

#9 0.58 0.52 0.63 0.66

#10 0.68 0.62 0.69 0.73

62

Q. Zhao et al.

4 Conclusion In this paper, we propose a novel anatomy guided structured multi-task regression method for brain network identification. Experiments based on 7 different task datasets have demonstrated the effectiveness of our AGSMR method in identifying consistent brain networks. Comparisons have shown that our method is more effective and accurate than the traditional method of LASSO. In general, our approach provides the anatomical substrates for the reconstructed functional networks. In the future, we plan to apply and test this AGSMR method in larger fMRI datasets and compare it with other brain network construction methods. In addition, it will be applied on clinical fMRI datasets to potentially reveal the abnormalities of brain networks in diseased brains. Acknowledgements. This research was supported in part by Jiangsu Natural Science Foundation (Project No. BK20131351), by the Chinese scholarship council (CSC).

References 1. Lv, J., Jiang, X., Li, X., Zhu, D., Chen, H., Zhang, T., Hu, X., Han, J., Huang, H., Zhang, J.: Sparse representation of whole-brain fMRI signals for identification of functional networks. Med. Image Anal. 20, 112–134 (2015) 2. Kim, S., Xing, E.P.: Tree-guided group lasso for multi-task regression with structured sparsity. In: ICML, pp. 543–550 (2010) 3. Ye, J., Liu, J.: Sparse methods for biomedical data. ACM SIGKDD Explor. Newslett. 14, 4–15 (2012) 4. Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010) 5. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodological) 58, 267–288 (1996) 6. Liu, J., Ji, S., Ye, J.: Multi-task feature learning via efficient l 2, 1-norm minimization. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 339–348. AUAI Press (2009) 7. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., Mazoyer, B., Joliot, M.: Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage 15, 273–289 (2002) 8. Liu, J., Ji, S., Ye, J.: SLEP: Sparse learning with efficient projections. Arizona State University (2009) 9. Van Essen, D.C., Smith, S.M., Barch, D.M., Behrens, T.E., Yacoub, E., Ugurbil, K.: WU-Minn HCP consortium. The WU-Minn human connectome project: an overview. Neuroimage 80, 62–79 (2013) 10. Lv, J., Jiang, X., Li, X., Zhu, D., Zhang, S., Zhao, S., Chen, H., Zhang, T., Hu, X., Han, J., Ye, J.: Holistic atlases of functional networks and interactions reveal reciprocal organizational architecture of cortical function. IEEE Trans. Biomed. Eng. 62, 1120–1131 (2015) 11. Smith, S., Fox, P., Miller, K., Glahn, D., Fox, P., Mackay, C., Filippini, N., Watkins, K., Toro, R., Laird, A., Beckmann, C.: Correspondence of the brain’s functional architecture during activation and rest. Proc. Natl. Acad. Sci. U.S.A. 106, 13040–13045 (2009)

Discover Mouse Gene Coexpression Landscape Using Dictionary Learning and Sparse Coding Yujie Li1(&), Hanbo Chen1, Xi Jiang1, Xiang Li1, Jinglei Lv1,2, Hanchuan Peng3(&), Joe Z. Tsien4(&), and Tianming Liu1(&) 1

2

Cortical Architecture Imaging and Discovery Lab, Department of Computer Science and Bioimaging Research Center, The University of Georgia, Athens, GA, USA [email protected] School of Automation, Northwestern Polytechnical University, Xi’an, China 3 Allen Institute for Brain Science, Seattle, WA, USA 4 Brain and Behavior Discovery Institute, Medical College of Georgia at Augusta University, Augusta, GA, USA

Abstract. Gene coexpression patterns carry rich information of complex brain structures and functions. Characterization of these patterns in an unbiased and integrated manner will illuminate the higher order transcriptome organization and offer molecular foundations of functional circuitry. Here we demonstrate a data-driven method that can effectively extract coexpression networks from transcriptome profiles using the Allen Mouse Brain Atlas dataset. For each of the obtained networks, both genetic compositions and spatial distributions in brain volume are learned. A simultaneous knowledge of precise spatial distributions of specific gene as well as the networks the gene plays in and the weights it carries can bring insights into the molecular mechanism of brain formation and functions. Gene ontologies and the comparisons with published data reveal interesting functions of the identified coexpression networks, including major cell types, biological functions, brain regions, and/or brain diseases. Keywords: Gene coexpression network

 Sparse coding  Transcriptome

1 Introduction Gene coexpression patterns carry rich amount of valuable information regarding enormously complex cellular processes. Previous studies have shown that genes displaying similar expression profiles are very likely to participate in the same biological processes [1]. The gene coexpression network (GCN), offering an integrated and effective representation of gene interactions, has shown advantages in deciphering the biological and genetic mechanisms across species and during evolution [2]. In addition to revealing the intrinsic transcriptome organizations, GCNs have also demonstrated superior performance when they are used to generate novel hypotheses for molecular

Y. Li and H. Chen—Co-first Authors. © Springer International Publishing AG 2016 S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 63–71, 2016. DOI: 10.1007/978-3-319-46720-7_8

64

Y. Li et al.

mechanisms of diseases because many disease phenotypes are a result of dysfunction of complex network of molecular interactions [3]. Various proposals have been made to identify the GCNs, including the classical clustering methods [4, 5] and those applying network concepts and models to describe gene-gene interactions [6]. Given the high dimensionality of genetic data and the urgent need of making comparisons to unveil the changes or the consensus between subjects, one common theme of these methods is dimension reduction. Instead of analyzing the interactions between over ten thousands of genes, the groupings of data by its co-expression patterns can considerably reduce the complexity of comparisons from tens of thousands of genes to dozens of networks or clusters while preserving the original interactions. Along the line of data-reduction, we proposed dictionary learning and sparse coding (DLSC) algorithm for GCN construction. DLSC is an unbiased data-driven method that learns a set of new dictionaries from the signal matrix so that the original signals can be represented in a sparse and linear manner. Because of the sparsity constraint, the dimensions of genetic data can be significantly reduced. The grouping by co-expression patterns are encoded in the sparse coefficient matrix with the assumption that if two genes use same dictionary to represent their original signals, their gene expressions must share similar patterns, and thereby considered ‘coexpressed’. The proposed method overcomes the potential issues of overlooking multiple roles of regulatory domains in different networks that are seen in many clustering methods [3] because DLSC does not impose the bases be orthogonal so that one gene can be claimed by multiple networks. More importantly, for each of the obtained GCNs, both genetic compositions and spatial distributions are learned. A simultaneous knowledge of precise distributions of specific gene as well as the networks the gene plays in and the weights it carries can bring insights into the molecular mechanism of brain formation and functions. In this study the proposed framework was applied on Allen Mouse Brain Atlas (AMBA) [7], which surveyed over 20,000 genes expression patterns in C57BL6 J mouse brain using in situ hybridization (ISH). One major advantage of ISH is the ability of preserving the precise spatial distribution of genes. This powerful dataset, featured by the whole-genome scale, cellular resolution and anatomical coverage, has made it possible for a holistic understanding of the molecular underpinnings and related functional circuitry. Using AMBA, the GCNs identified by DLSC showed significant enrichment for major cell types, biological functions, anatomical regions, and/or brain disorders, which holds promises to serve as foundations to explore different cell types and functional processes in diseased and healthy brains.

2 Methods 2.1

Experimental Setup

We downloaded the 4,345 3D volumes of expression energy of coronal sections and the Allen Reference Atlas (ARA) from the website of AMBA (http://mouse.brain-map. org/). The ISH data were collected in tissue sections, then digitally processed, stacked,

Discover Mouse Gene Coexpression Landscape Using DLSC

65

Fig. 1. Computational pipeline for constructing GCNs. (a) Input is one slice of 3D expression grids of all genes. (b) Raw ISH data preprocessing step that removes unreliable genes and voxels and estimates the remaining missing data. (c) Dictionary learning and sparse coding of ISH matrix with sparse and non-negative constraints on coefficient a matrix. (d) Visualization of spatial distributions of GCNs. (e) Enrichment analysis of GCNs.

registered, gridded, and quantified - creating 3D maps of gene “expression energy” at 200 micron resolution. Coronal sections are chosen because they registered more accurately to the reference model than the counterparts of sagittal sections. Each 3D volume is composed by 67 slices with a dimension of 41  58. As the ISH data is acquired by coronal slice before they were stitched and aligned into a complete 3D volume, in spite of extensive preprocessing steps, quite significant changes in average expression levels of the same gene in the adjacent slices are observed. To alleviate the artifacts due to slice handling and preprocessing, we decided to study the coexpression networks slice by slice. The input of the pipeline are the expression grids of one of 67 coronal slices (Fig. 1a). A preprocessing module (Fig. 1b) is first applied to handle the foreground voxels with missing data (−1 in expression energy). Specifically, this module includes extraction, filtering and estimation steps. First, the foreground voxels of the slice based on ARA were extracted. Then the genes of low variance or with missing values in over 20 % of foreground voxels were excluded. A similar filtering step is also applied to remove voxels in which over 20 % genes do not have data. Most missing values were resolved in the filtering steps. The remaining missing values were recursively estimated as the mean of foreground voxels in its 8 neighborhood until all missing values were filled. After preprocessing, the cleaned expression energies were organized into a matrix and sent to DLSC (Fig. 1c). In DLSC, the gene expression matrix is factorized into a dictionary matrix D and a coefficient matrix a. These two matrices encode the distribution and composition of GCN (Fig. 1d–e) and will be further analyzed.

2.2

Dictionary Learning and Sparse Coding

DLSC is an effective method to achieve a compressed and succinct representation for ideally all signal vectors. Given a set of M-dimensional input signals X = [x1,…,xN] in RMN , learning a fixed number of dictionaries for sparse representation of X can be accomplished by solving the following optimization problem. As discussed later that

66

Y. Li et al.

each entry of a indicates the degree of conformity of a gene to a coexpression pattern, a non-negative constraint was added to the ‘1 regularization. 1 \D; a [ ¼ argmin kX 2

D  ak22 s:tkak1  k; 8i; ai  0

ð1Þ

where D 2 RNK is the dictionary matrix, a 2 RKM is the corresponding coefficient matrix, k is a sparsity constraint factor and indicates each signal has fewer than k items in its decomposition, kk1 ; kk2 are the summation of ‘1 and ‘2 norm of each column. kX D  ak22 denotes the reconstruction error. In practice, gene expression grids are arranged into a matrix X 2 RMN , such that rows correspond to foreground voxels and columns correspond to genes (Fig. 1c). After normalizing each column by its Frobenius norm, the publicly available online DLSC package [8] was applied to solve the matrix factorization problem proposed in Eq. (1). Eventually, X is represented as sparse combinations of learned dictionary atoms D. Each column in D is one dictionary consisted of a set of voxels. Each row in a details the coefficient of each gene in a particular dictionary. The key assumptions of enforcing the sparseness is that each gene is involved in a very limited number of gene networks. The non-negativity constraint on a matrix imposes that no genes with the opposite expression patterns placed in the same network. In the context of GCN construction, we consider that if two genes use the same dictionary to represent the original signals, then the two genes are ‘coexpressed’ in this dictionary. This set-up has several benefits. First, both the dictionaries and coefficients are learnt from the data and therefore intrinsic to the data. Second, the level of coexpressions are quantifiable and not only comparable within one dictionary, but the entire a matrix. Further, if we consider each dictionary as one network, the corresponding row of a matrix contains all genes that use this dictionary for sparse representation, or ‘coexpressed’. Each entry of a measures the extent to which this gene conforms to the coexpression pattern described by the dictionary atom. Therefore, this network, denoted as the coexpression network, is formed. Since the dictionary atom is composed of multiple voxels, by mapping each atom in D back to the ARA space, we can visualize the spatial patterns of the coexpressed networks. Combining information from both D and a matrices, we would obtain a set of intrinsically learned GCNs with the knowledge of both their anatomical patterns and gene compositions. As the dictionary is the equivalent of network, these two terms will be used interchangeably. The choice of number of dictionaries and the regularization parameter k are crucial to an effective sparse representation. The final goal here is a set of parameters that result in a sparse and accurate representation of the original signal while achieving the highest overlap with the ground truth - the known anatomy. A grid search of parameters is performed using three criteria: (1) reconstruction error; (2) mutual information between the dictionaries and ARA; (3) the density of a matrix measured by the percentage of none-zero-valued elements in a. As different number of genes are expressed in different slices, instead of a fixed number of dictionaries, a gene-dictionary ratio, defined as the ratio between the number of genes expressed and the number of dictionaries, is used. Guided by these criteria, k = 0.5 and gene-dictionary ratio of 100 are selected as the optimal parameters.

Discover Mouse Gene Coexpression Landscape Using DLSC

2.3

67

Enrichment Analysis of GCNs

GCNs were characterized based on common gene ontology (GO) categories (molecular function, biological process, cellular component), using Database for Annotation, Visualization and Integrated Discovery (DAVID) [9]. Enrichment analysis was performed by cross-referencing with published lists of genes related to cell type markers, known and predicted lists of disease genes, specific biological functions etc. This list consists of 32 publications and is downloaded from [10]. Significance was assessed using one-sided Fisher’s exact test with a threshold of p < 0.01.

3 Results DLSC allows readily interpretable results by plotting the spatial distributions of GCNs. A visual inspection showed a set of spatially contiguous clusters partitioning the slice (Fig. 2a,e). Many formed clusters correspond to one or more canonical anatomical regions, providing an intuitive validation to the approach. We will demonstrate the effectiveness of the DLSC by showing that the GCNs are mathematically valid and biologically meaningful. Since the grouping of genes is purely based on their expression patterns, a method with good mathematical ability will make the partitions so that the expression patterns of the genes are similar within group and dissimilar between groups. One caveat is that one gene may be expressed in multiple cell types or participate in multiple functional pathways. Therefore, maintaining the dissimilarity between groups may not be necessary. At the same time, the method should also balance the biological ability of finding functionally enriched networks. To show as an example, slice 27 and 38 are analyzed and discussed in depth due to its good anatomical coverage of various brain regions. Using a fixed gene-dictionary ratio of 100, 29 GCNs were identified for slice 27 and 31 GCNs were constructed on slice 38.

Fig. 2. Visualization of spatial distribution of GCNs and the corresponding raw ISH data. On the left are the slice ID and GCN ID. The second columns are the spatial maps of two GCNs, one for each slice, followed by the ISH raw data of 3 representative genes. Gene acronyms and the weights in the GCN are listed at the bottom. The weights indicate the extent to which a gene conforms to the GCN.

68

3.1

Y. Li et al.

Validation Against Raw ISH Data

One reliable way to examine whether the expression patterns are consistent within a GCN is to visually inspect the raw ISH data where the GCNs are derived. As seen in Fig. 2, in general the ISH raw data match well with the respective spatial map. In GCN 22 of slice 27, the expression peaks at hypothalamus and extends to the substantia innominate and bed nuclei of the stria teminais (Fig. 2a). All three genes showed strong signals in these areas in the raw data (Fig. 2b–d). Similarly, the expressions patterns for GCN 6 in slice 38 centered at the field CA1 (Fig. 2e) and all three genes showed significantly enhanced signals in the CA1 region compared with other regions (Fig. 2f– h). Relatedly, the weight in the parentheses is a measure of the degree to which a gene conforms to the coexpression patterns. With a decreasing weight, the resemblance of the raw data to the spatial map becomes weaker. One example is the comparison between Zkscanl6 and the other two genes. Weaker signals were found in lateral septal nucleus (Fig. 2b–d red arrow), preoptic area (Fig. 2b–d blue arrow), and lateral olfactory tract (Fig. 2b–d green arrows) in Zkscanl6. On the other hand, the spatial map of GCN 6 of slice 38 features an abrupt change of expression levels at the boundaries of field CA1 and field CA2. This feature is seen clearly in Fibcd (Fig. 2f red arrows) and Arhgap12 (Fig. 2g red arrows), but not Osbp16 (Fig. 2h red arrows). Also, the spatial map shows an absence of expression in the dental gyrus (DG) and field CA3. However, Arhgap12 displays strong signals at DG (Fig. 2g blue arrows) and Osbp16 shows high expressions in both DG and CA3 (Fig. 2h green arrows). The decreased similarity agrees well with the declining weights. Overall, we have demonstrated a good agreement between the ISH raw data and the corresponding spatial map. The level of agreement is correctly measured by the weight. These results visually validate the mathematical ability of DLSC in grouping genes with similar expression patterns.

3.2

Enrichment Analysis of GCNs

Enrichment analysis using GO terms and existing published gene lists [10] provided exciting biological insights for the constructed GCNs. We roughly categorize the networks into four types for the convenience of presentation. In fact, one GCN often falls into multiple categories as these categories characterize GCNs from different perspectives. A comparison with the gene lists generated using purified cellular population [11, 12] indicates that GCN1 (Fig. 3a), GCN5 (Fig. 3b), GCN28 (Fig. 3c), GCN25 (Fig. 3d) of slice 27 are significantly enriched with markers of oligodendrocytes, astrocytes, neurons and interneurons, with the p-values to be 1.1  10−7, 1.7  10−8, 2.5  10−3, 15  10−10 respectively. The findings have been not only confirmed by several other studies using microarray and ISH data, but also corroborated by the GO terms. For example, two significant GO term in GCN1 is myelination (p = 5.7  10−4) and axon ensheathment (p = 2.5  10−5), which are featured functions for oligodendrocyte, with established markers such as Mbp, Serinc5, Ugt8a. A visualization of the spatial map also offers a useful complementary source. For example, the fact that GCN5 (Fig. 3b) locates at the lateral ventricle, where the subventricular zone is rich with astrocytes, confirms its enrichment in astrocyte.

Discover Mouse Gene Coexpression Landscape Using DLSC

69

Fig. 3. Visualization of spatial distribution of GCNs enriched for major cell types, particular brain regions and function/disease related genes. In each panel, top row: Slice ID and GCN ID; second row: spatial map; third row: sub-category; fourth row: highly weighted genes in the sub-category.

In addition to cell type specific GCNs, we also found some GCNs remarkably selective for particular brain regions, such as GCN3 (Fig. 3e) in CA1, GCN5 (Fig. 3f) in thalamus, GCN11 (Fig. 3g) in hypothalamus and GCN16 (Fig. 3h) in caudeputaman. Other GCNs with more complex anatomical patterning revealed close associations to biological functions and brain diseases. The GCNs associated with ubiquitous functions such as ribosomal (Fig. 3j) and mitochondrial functions (Fig. 3k) have a wide coverage of brain. A functional annotation suggested GCN12 of slice 27 is highly enriched for ribosome pathway (p = 6.3  10−5). As to GCN21 on the same slice, besides mitochondrial function (p = 1.5  10−8), it also enriches in categories including neuron (p = 5.4  10−8) and postsynaptic proteins (p = 6.3  10−8) comparing with literatures [10]. One significant GO term synaptic transmission (p = 1.1  10−5) might add possible explanations to the strong signals in the cortex regions. GCN13 of slice 38 (Fig. 3i) showed strong associations with genes that found downregulated in Alzheimer’s disease. Comparisons with Autism susceptible genes generated from microarray and high-throughput RNA-sequencing data [13] indicates

70

Y. Li et al.

GCN 24 of slice 27’s association (p = 1.0  10−3) (Fig. 3h). Despite slightly lower weights, the most significant three genes Met, Pip5k1b, Avpr1a, have all been reported altered in Autism patients [13].

4 Discussion We have presented a data-driven framework that can derive biologically meaningful GCNs from the gene expression data. Using the rich and spatially-resolved ISH AMBA data, we found a set of GCNs that are significantly enriched for major cell types, anatomical regions, biological pathways and/or brain diseases. The highlighted advantage of this method is its capability of visualizing the spatial distribution of the GCNs while knowing the gene constituents and the weights they carry in the network. Although the edges in the network are not explicitly stated, it does not impact the interpretations of the GCNs biologically. In future work, new strategies will be developed to integrate the gene-gene interactions on a single slice and to construct brain-scale GCNs. These GCNs can offer new insights in multiple applications. For example, GCNs can serve as a baseline network to enable comparisons across different species to understand brain evolution. Also, charactering GCNs of brains at different stages can generate new hypotheses of brain formation process. When the GCNs are correlated with neuroimaging measurements as brain phenotypes, we are able to make new associations between the molecular scales and the macro scale measurement and advance the understanding of how genetic functions regulate and support brains structures and functions, as well as finding new genetic variants that might account for the variations in brain structure and functions.

References 1. Tavazoie, S., Hughes, J.D., et al.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999) 2. Stuart, J.M.: A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003) 3. Gaiteri, C., Ding, Y., et al.: Beyond modules and hubs: the potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders. Genes. Brain Behav. 13, 13–24 (2014) 4. Bohland, J.W., Bokil, H., et al.: Clustering of spatial gene expression patterns in the mouse brain and comparison with classical neuroanatomy. Methods 50, 105–112 (2010) 5. Eisen, M.B., Spellman, P.T., et al.: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U. S. A. 95, 12930–12933 (1999) 6. Langfelder, P., Horvath, S.: WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9, 559 (2008) 7. Lein, E.S., Hawrylycz, M.J., Ao, N., et al.: Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007) 8. Mairal, J., Bach, F., et al.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010)

Discover Mouse Gene Coexpression Landscape Using DLSC

71

9. Dennis, G., Sherman, B.T., et al.: DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 4, P3 (2003) 10. Miller, J.A., Cai, C., et al.: Strategies for aggregating gene expression data: the collapseRows R function. BMC Bioinform. 12, 322 (2011) 11. Cahoy, J., Emery, B., Kaushal, A., et al.: A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. J. Neuronsci. 28, 264–278 (2004) 12. Winden, K.D., Oldham, M.C., et al.: The organization of the transcriptional network in specific neuronal classes. Mol. Syst. Biol. 5, 1–18 (2009) 13. Voineagu, I., Wang, X., et al.: Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 474(7351), 380–384 (2011)

Integrative Analysis of Cellular Morphometric Context Reveals Clinically Relevant Signatures in Lower Grade Glioma Ju Han1,2 , Yunfu Wang1,5 , Weidong Cai3 , Alexander Borowsky4 , Bahram Parvin1,2 , and Hang Chang1,2(B) 1

3

Lawrence Berkeley National Laboratory, Berkeley, CA, USA [email protected] 2 Department of Electrical and Biomedical Engineering, University of Nevada, Reno, USA School of Information Technologies, University of Sydney, Sydney, NSW, Australia 4 Center for Comparative Medicine, University of California, Davis, CA, USA 5 Department of Neurology, Taihe Hospital, Hubei University of Medicine, Hubei, China

Abstract. Integrative analysis based on quantitative representation of whole slide images (WSIs) in a large histology cohort may provide predictive models of clinical outcome. On one hand, the efficiency and effectiveness of such representation is hindered as a result of large technical variations (e.g., fixation, staining) and biological heterogeneities (e.g., cell type, cell state) that are always present in a large cohort. On the other hand, perceptual interpretation/validation of important multi-variate phenotypic signatures are often difficult due to the loss of visual information during feature transformation in hyperspace. To address these issues, we propose a novel approach for integrative analysis based on cellular morphometric context, which is a robust representation of WSI, with the emphasis on tumor architecture and tumor heterogeneity, built upon cellular level morphometric features within the spatial pyramid matching (SPM) framework. The proposed approach is applied to The Cancer Genome Atlas (TCGA) lower grade glioma (LGG) cohort, where experimental results (i) reveal several clinically relevant cellular morphometric types, which enables both perceptual interpretation/validation and further investigation through gene set enrichment analysis; and (ii) indicate the significantly increased survival rates in one of the cellular morphometric context subtypes derived from the cellular morphometric context. Keywords: Lower grade glioma · Cellular morphometric context · Cellular morphometric type · Spatial pyramid matching · Consensus clustering · Survival analysis · Gene set enrichment analysis

This work was supported by NIH R01 CA184476 carried out at Lawrence Berkeley National Laboratory. c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 72–80, 2016. DOI: 10.1007/978-3-319-46720-7 9

Clinically Relevant Cellular Morphometric Context

1

73

Introduction

Histology sections provide wealth of information about the tissue architecture that contains multiple cell types at different states of cell cycles. These sections are often stained with hematoxylin and eosin (H&E) stains, which label DNA (e.g., nuclei) and protein contents, respectively, in various shades of color. Morphometric abberations in tumor architecture often lead to disease progression, and it is desirable to quantify indices associated with these abberations since they can be tested against the clinical outcome, e.g., survival, response to therapy. For the quantitative analysis of the H&E stained sections, several excellent reviews can be found in [7,8]. Fundamentally, the trend has been based either on nuclear segmentation and corresponding morphometric representation, or patchbased representation of the histology sections that aids in clinical association. The major challenge for tissue morphometric representation is the large amounts of technical and biological variations in the data. To overcome this problem, recent studies have focused on either fine tuning human engineered features [1, 4,11,12], or applying automatic feature learning [5,9,15,16,19,20], for robust representation and characterization. Even though there are inter- and intra- observer variations [6], a trained pathologist always uses rich content (e.g., various cell types, cellular organization, cell state and health), in context, to characterize tumor architecture and heterogeneity for the assessment of disease state. Motivated by the works of [13,18], we encode cellular morphometric signatures within the spatial pyramid matching (SPM) framework for robust representation (i.e., cellular morphometric context) of WSIs in a large cohort with the emphasis on tumor architecture and tumor heterogeneity, based on which an integrative analysis pipeline is constructed for the association of celllular morphometric context with clinical outcomes and molecular data, with the potential in hypothesis generation regarding the imaging biomarkers for personalized diagnosis or treatment. The proposed approach is applied to the TCGA LGG cohort, where experimental results (i) reveal several clinically relevant cellular morphometric types, which enables both perceptual interpretation/validation and further investigation through gene set enrichment analysis; and (ii) indicate the significantly increased survival rates in one of the cellular morphometric context subtypes derived from the cellular morphometric context.

2

Approaches

The proposed approach starts with the construction of cellular morphometric types and cellular morphometric context, followed by integrative analysis with both clinical and molecular data. Specifically, the nuclear segmentation method in [4] was adopted given its demonstrated robustness in the presence of biological and technical variations, where the corresponding nuclear morphometric

74

J. Han et al.

descriptors are described in [3], and the constructed cellular morphometric context representations are released on our website1 . 2.1

Construction of Cellular Morphometric Types and Cellular Morphometric Context

For a set of WSIs and corresponding nuclear segmentation results, let M be the total number of segmented nuclei; N be the number of morphometric descriptors extracted from each segmented nucleus, e.g. nuclear size, and nuclear intensity; and X be the set of morphometric descriptors for all segmented nuclei, where X = [x1 , ..., xM ]⊤ ∈ RM ×N . The construction of cellular morphometric types and cellular morphometric context are described as follows, 1. Construct cellular morphometric types (D), where D = [d1 , ..., dK ]⊤ are the K cellular morphometric types to be learned by the following optimization: min D,Z

M 

||xm − zm D||2

(1)

m=1

subject to card(zm ) = 1, |zm | = 1, zm  0, ∀m where Z = [z1 , ..., zM ]⊤ indicates the assignment of the cellular morphometric type, card(zm ) is a cardinality constraint enforcing only one nonzero element of zm , zm  0 is a non-negative constraint on the elements of zm , and |zm | is the L1-norm of zm . During training, Eq. 1 is optimized with respect to both Z and D; In the coding phase, for a new set of X, the learned D is applied, and Eq. 1 is optimized with respect to Z only. 2. Construct cellular morphometric context vis SPM. This is done by repeatedly subdividing an image and computing the histograms of different cellular morphometric types over the resulting subregions. As a result, the spatial histogram, H, is formed by concatenating the appropriately weighted histograms of all cellular morphometric types at all resolutions. For more details about SPM, please refer to [13]. In our experiment, K is fixed to be 64. Meanwhile, given the fact that each patient may contain multiple WSIs, SPM is applied at a single scale for the convenient construction of cellular morphometric context as well as the integrative analysis at patient level, where both cellular morphometric types and the subtypes of cellular morphometric context are associated with clinical outcomes, and molecular information. 2.2

Integrative Analysis

The construction of cellular morphometric context at patient level in a large cohort enables the integrative analysis with both clinical and molecular information, which contains the components as follows, 1

http://bmihub.org/project/tcgalggcellularmorphcontext.

Clinically Relevant Cellular Morphometric Context

75

1. Identification of cellular morphometric subtypes/clusters: consensus clustering [14] is performed for identifying subtypes/clusters across patients. The input of consensus clustering are the cellular morphometric context at the patient level. Consensus clustering aggregates consensus across multiple runs for a base clustering algorithm. Moreover, it provides a visualization tool to explore the number of clusters in the data, as well as assessing the stability of the discovered clusters. 2. Survival analysis: Cox proportional hazards (PH) regression model is used for survival analysis. 3. Enrichment analysis: Fisher’s exact test is used for the enrichment analysis between cellular morphometric context subtypes and genomic subtypes. 4. Genomic association: linear models are used for assessing differential expression of genes between subtypes of cellular morphometric context, and the correlation between genes and cellular morphometric types.

3

Experiments and Discussion

The proposed approach has been applied on the TCGA LGG cohort, including 215 WSIs from 209 patients, where the clinical annotation of 203 patients are available. For the quality control purpose, background and border portions of each whole slide image were detected and removed from the analysis. 3.1

Phenotypic Visualization and Integrative Analysis of Cellular Morphometric Types

The TCGA LGG cohort consists of ∼ 80 million segmented nuclear regions, from which 2 million were randomly selected for construction of cellular morphometric types. As described in Sect. 2, the cellular morphometric context representation for each patient is a 64-dimensional vector, where each dimension represents the normalized frequency of a specific cellular morphometric type appearing in the WSIs of the patient. Initial integrative analysis is performed by linking individual cellular morphometric types to clinical outcomes and molecular data. Each cellular morphometric type is chosen as the predictor variable in the Cox proportional hazards (PH) regression model together with the age of the patient (implemented through the R survival package). For each cellular morphometric type, the frequencies are further correlated with the gene expression values across all patients. The top-ranked genes of positive correlation and negative correlation, respectively, are imported into the MSigDB [17] for gene set enrichment analysis. Table 1 summarizes cellular morphometric types that best predict the survival distribution, and the corresponding enriched gene sets. Figure 1 shows the top-ranked examples for these cellular morphemetric types. As shown in Table 1, 8 out of 64 cellular morphometric types are clinically relevant to survival (FDR adjusted p-value < 0.01) with statistical significance. The first four cellular morphometric types in Fig. 1 all have a hazard ratio > 1, indicating that a higher frequency of these cellular morphometric types may lead

76

J. Han et al.

Table 1. Top cellular morphometric types for predicting the survival distribution based on the Cox proportional hazards (PH) regression model, and the corresponding enriched gene sets with respect to genes that best correlate the frequency of the cellular morphometric type appearing in the WSIs of the patient, positively or negatively. Hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions with a unit difference of an explanatory variable, and higher HR indicates higher hazard of death. Type p-value q-value Hazard

Enriched gene sets

ratio Worse prognosis #5

7.25e−4 7.73e−3 3.47e4

#28

2.05e−5 4.37e−4 9.32e3

Negatively correlated with: genes up-regulated in response to IFNG; genes up-regulated in response to alpha interferon proteins

#39

8.57e−7 2.74e−5 5.07e3

Positively correlated with: genes encoding proteins involved in oxidative phosphorylation; genes up-regulated during unfolded protein response, a cellular stress response related to the endoplasmic reticulum; genes involved in DNA repair Negatively correlated with: genes involved in the G2/M checkpoint, as in progression through the cell division cycle; genes important for mitotic spindle assembly; genes defining response to androgens; genes up-regulated by activation of the PI3K/AKT/mTOR pathway

#43

1.57e−9 1.00e−7 9.40e3

Negatively correlated with: genes up-regulated by activation of Notch signaling

Better prognosis #29

3.01e−4 3.85e−3 1.74e−8

#31

1.23e−4 1.96e−3 5.49e−12 Positively correlated with: genes encoding components of the complement system, which is part of the innate immune system; genes up-regulated by KRAS activation; genes up-regulated by IL6 via STAT3

#46

1.17e−3 9.84e−3 1.07e−8

#52

1.23e−3 9.84e−3 6.86e−11 Positively correlated with: genes up-regulated during transplant rejection; genes up-regulated during formation of blood vessels; genes up-regulated in response to IFNG; genes regulated by NF-kB in response to TNF ; genes up-regulated in response to TGFB1 ; genes up-regulated by IL6 via STAT3 ; genes mediating programmed cell death (apoptosis) by activation of caspases

Positively correlated with: genes up-regulated by IL6 via STAT3 ; genes defining inflammatory response; genes up-regulated in response to IFNG; genes regulated by NF-kB in response to TNF ; genes up-regulated in response to TGFB1 ; genes up-regulated in response to alpha interferon proteins; genes involved in DNA repair; genes mediating programmed cell death (apoptosis) by activation of caspases; genes up-regulated through activation of mTORC1 complex; genes involved in p53 pathways and networks

Positively correlated with: a subgroup of genes regulated by MYC; genes defining response to androgens; genes involved in DNA repair; genes encoding cell cycle related targets of E2F transcription factors

Clinically Relevant Cellular Morphometric Context

77

Fig. 1. Top-ranked examples for cellular morphometric types that best predict the survival distribution, as shown in Table 1. Each example is an image patch of 101 × 101 pixels centered by the retrieved cell marked with the green dot. The first four cellular morphometric types (hazard ratio> 1) indicate a worse prognosis and the last four cellular morphometric types (hazard ratio< 1) indicates a protective effect. Note, this figure is best viewed in color at 400 % zoom-in.

to a worse prognosis. A common phenotypic property of these cellular morphometric types is the loss of chromatin content in the nuclear regions, which may be associated with poor prognosis of lower grade glioma. The last four cellular morphometric types in Fig. 1 all have a hazard ratio< 1, indicating that a higher frequency of these cellular morphometric types may lead to a better prognosis. Table 1 also indicates the enrichment of genes up-regulated in response to IFNG in cellular morphometric types #28, #29 and #52. In the glioma microenvironment, tumor cells and local T cells produce abnormally low levels of IFNG. IFNG acts on cell-surface receptors, and activates transcription of genes that offer potentials in the treatment of brain tumors by increasing tumor immunogenicity, disrupting proliferative mechanisms, and inhibiting tumor angiogenesis [10]. The observations of IFNG as a positive survival factor confirms the prognostic effect of these cellular morphometric types: #28 – negative correlation and worse prognosis; #29 and #52 – positive correlation and better prognosis. Other interesting observations include that three cellular morphometric types of better prognosis are enriched with genes up-regulated by IL6

78

J. Han et al.

via STAT3, and two cellular morphometric types of better prognosis are enriched with genes regulated by NF-kB in response to TNF and genes up-regulated in response to TGFB1, respectively. 3.2

Subtyping and Integrative Analysis of Cellular Morphometric Context

Hierarchical clustering was adopted as the clustering algorithm for consensus clustering, which is implemented via R Bioconductor ConsensusClusterPlus package with χ2 distance as the distance function. The procedure was run for 500 iterations with a sampling rate of 0.8 on 203 patients, and the corresponding consensus clustering matrices with 2 to 9 clusters are shown in Fig. 2, where the matrices with 2 to 5 clusters reveal different levels of similarity among patients and matrices with 6 to 9 clusters provide little further information. Thus, we use the five-cluster result for integrative analysis with clinical outcomes and genomic signatures, where, due to insufficient patients in subtypes #1 (1 patient) and #2 (2 patients), we focus on the remaining three subtypes. Consensus CDF

1

0.9

0.8

0.7

CDF

0.6

2 clusters 3 clusters 4 clusters 5 clusters 6 clusters 7 clusters 8 clusters 9 clusters

0.5

0.4

0.3

0.2

0.1

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Consensus index value

Fig. 2. Consensus clustering matrices and corresponding consensus CDFs of 203 TCGA patients with LGG for cluster number of N = 2 to N = 9 based on cellular morphometric context.

Figure 3(a) shows the Kaplan-Meier survival plot for three major subtypes of the five-cluster consensus clustering result. The log-rank test p-value of 2.82e−5 indicates that the difference between survival times of subtype #5 patients and subtypes #3 patients is statistically significant. The integration of genomewide data from multiple platforms uncovered three molecular classes of lowergrade gliomas that were best represented by IDH and 1p/19q status: wild-type IDH, IDH mutation with 1p/19q codeletion, and IDH mutation without 1p/19q codeletion [2]. Further Fisher’s exact test reveals no enrichment between the cellular morphometric subtypes and these molecular subtypes. On the other hand, differential expressed genes between subtype #5 and subtypes #3 (Fig. 3(b)), indicate enrichment of genes that mediate programmed cell death (apoptosis) by activation of caspases, and genes defining epithelial-mesenchymal transition, as in wound healing, fibrosis and metastasis (via MSigDB).

Clinically Relevant Cellular Morphometric Context

(a)

79

(b)

Fig. 3. (a) Kaplan-Meier plot for three major subtypes associated with patient survival, where subtypes #3 (53 patients) #4 (65 patients) and #5 (82 patients) correspond to the three major subtypes from top-left to bottom-right, respectively, in Fig. 2 (N = 5). (b) Top genes that are differently expressed between the subtype #5 and subtypes #3.

4

Conclusion and Future Work

In this paper, we encode cellular morphometric signatures within the SPM framework for robust representation (i.e., cellular morphometric context) of WSIs in a large cohort at patient level, based on which an integrative analysis pipeline is constructed for the association of celllular morphometric context with clinical outcomes and molecular data. The integrative analysis, performed on TCGA LGG cohort, reveals clinically relevant cellular morphometric types and morphometric context subtypes, and the corresponding enriched gene sets. We believe that the proposed approach has the potential to contribute to hypothesis generation regarding the imaging biomarkers for personalized diagnosis or treatment, which will be further validated on independent cohort.

References 1. Bhagavatula, R., Fickus, M., Kelly, W., Guo, C., Ozolek, J., Castro, C., Kovacevic, J.: Automatic identification and delineation of germ layer components in H &E stained images of teratomas derived from human and nonhuman primate embryonic stem cells. In: IEEE ISBI, pp. 1041–1044 (2010) 2. Cancer Genome Atlas Research Network: Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372(26), 2481–2498 (2015)

80

J. Han et al.

3. Chang, H., Borowsky, A., Spellman, P.T., Parvin, B.: Classification of tumor histology via morphometric context. In: IEEE CVPR, pp. 2203–2210 (2013) 4. Chang, H., Han, J., Borowsky, A., Loss, L., Gray, J.W., Spellman, P.T., Parvin, B.: Invariant delineation of nuclear architecture in glioblastoma multiforme for clinical and molecular association. IEEE Trans. Med. Imaging 32(4), 670–682 (2013) 5. Chang, H., Zhou, Y., Borowsky, A., Barner, K.E., Spellman, P.T., Parvin, B.: Stacked predictive sparse decomposition for classification of histology sections. Int. J. Comput. Vis. 113(1), 3–18 (2015) 6. Dalton, L., Pinder, S., Elston, C., Ellis, I., Page, D., Dupont, W., Blamey, R.: Histolgical gradings of breast cancer: linkage of patient outcome with level of pathologist agreements. Mod. Pathol. 13(7), 730–735 (2000) 7. Demir, C., Yener, B.: Automated cancer diagnosis based on histopathological images: a systematic survey (2009) 8. Gurcan, M., Boucheron, L., Can, A., Madabhushi, A., Rajpoot, N., Bulent, Y.: Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 2, 147–171 (2009) 9. Huang, C.H., Veillard, A., Lomeine, N., Racoceanu, D., Roux, L.: Time efficient sparse analysis of histopathological whole slide images. Comput. Med. Imaging Graph. 35(7–8), 579–591 (2011) 10. Kane, A., Yang, I.: Interferon-gamma in brain tumor immunotherapy. Neurosurg. Clin. N. Am. 21(1), 77–86 (2010) 11. Kong, J., Cooper, L., Sharma, A., Kurk, T., Brat, D., Saltz, J.: Texture based image recognition in microscopy images of diffuse gliomas with multi-class gentle boosting mechanism. In: IEEE ICASSP, pp. 457–460 (2010) 12. Kothari, S., Phan, J.H., Osunkoya, A.O., Wang, M.D.: Biological interpretation of morphological patterns in histopathological whole slide images. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (2012) 13. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE CVPR, pp. 2169–2178 (2006) 14. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resamplingbased method for class discovery and visualization of gene expression microarray data. Mach. Learn. 52, 91–118 (2003) 15. Romo, D., Garcla-Arteaga, J.D., Arbelez, P., Romero, E.: A discriminant multiscale histopathology descriptor using dictionary learning. In: SPIE 9041 Medical Imaging (2014) 16. Sirinukunwattana, K., Khan, A.M., Rajpoot, N.M.: Cell words: modelling the visual appearance of cells in histopathology images. Comput. Med. Imaging Graph. 42, 16–24 (2015) 17. Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., Gillette, M., Paulovich, A., Pomeroy, S., Golub, T., Lander, E., Mesirov, J.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102(43), 15545–15550 (2005) 18. Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: IEEE CVPR, pp. 1794–1801 (2009) 19. Zhou, Y., Chang, H., Barner, K.E., Parvin, B.: Nuclei segmentation via sparsity constrained convolutional regression. In: IEEE ISBI, pp. 1284–1287 (2015) 20. Zhou, Y., Chang, H., Barner, K.E., Spellman, P.T., Parvin, B.: Classification of histology sections via multispectral convolutional sparse coding. In: IEEE CVPR, pp. 3081–3088 (2014)

Mapping Lifetime Brain Volumetry with Covariate-Adjusted Restricted Cubic Spline Regression from Cross-Sectional Multi-site MRI Yuankai Huo1(&), Katherine Aboud2, Hakmook Kang3, Laurie E. Cutting2, and Bennett A. Landman1 1

2

Department of Electrical Engineering, Vanderbilt University, Nashville, TN, USA [email protected] Department of Special Education, Vanderbilt University, Nashville, TN, USA 3 Department of Biostatistics, Vanderbilt University, Nashville, TN, USA

Abstract. Understanding brain volumetry is essential to understand neurodevelopment and disease. Historically, age-related changes have been studied in detail for specific age ranges (e.g., early childhood, teen, young adults, elderly, etc.) or more sparsely sampled for wider considerations of lifetime aging. Recent advancements in data sharing and robust processing have made available considerable quantities of brain images from normal, healthy volunteers. However, existing analysis approaches have had difficulty addressing (1) complex volumetric developments on the large cohort across the life time (e.g., beyond cubic age trends), (2) accounting for confound effects, and (3) maintaining an analysis framework consistent with the general linear model (GLM) approach pervasive in neuroscience. To address these challenges, we propose to use covariateadjusted restricted cubic spline (C-RCS) regression within a multi-site crosssectional framework. This model allows for flexible consideration of nonlinear age-associated patterns while accounting for traditional covariates and interaction effects. As a demonstration of this approach on lifetime brain aging, we derive normative volumetric trajectories and 95 % confidence intervals from 5111 healthy patients from 64 sites while accounting for confounding sex, intracranial volume and field strength effects. The volumetric results are shown to be consistent with traditional studies that have explored more limited age ranges using single-site analyses. This work represents the first integration of C-RCS with neuroimaging and the derivation of structural covariance networks (SCNs) from a large study of multi-site, cross-sectional data.

1 Introduction Brain volumetry across the lifespan is essential in neurological research and clinical investigation. Magnetic resonance imaging (MRI) allows for quantification of such changes, and consequent investigation of specific age ranges or more sparsely sampled lifetime data [1]. Contemporaneous advancements in data sharing have made considerable quantities of brain images available from normal, healthy populations. However, © Springer International Publishing AG 2016 S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 81–88, 2016. DOI: 10.1007/978-3-319-46720-7_10

82

Y. Huo et al.

the regression models prevalent in volumetric mapping (e.g., linear, polynomial, non-parametric model, etc.) have had difficulty in modeling complex, cross-sectional large cohorts while accounting for confound effects. This paper proposes a novel multi-site cross-sectional framework using Covariate-adjusted Restricted Cubic Spline (C-RCS) regression to map brain volumetry on a large cohort (5111 MR 3D images) across the lifespan (4 * 98 years). The C-RCS extends the Restricted Cubic Spline [2, 3] by regressing out the confound effects in a general linear model (GLM) fashion. Multi-atlas segmentation is used to obtain whole brain volume (WBV) and 132 regional volumes. The regional volumes are further grouped to 15 networks of interest (NOIs). Then, structural covariance networks (SCNs), i.e. regions or networks that mature or decline together during developmental periods, are established based on NOIs using hierarchical clustering analysis (HCA). To validate the large-scale framework, confidence intervals (CI) are provided for both C-RCS regression and clustering from 10,000 bootstrap samples. Table 1. Data summary of 5111 multi-site images. Study name

Website

Images Sites

Baltimore Longitudinal Study of Aging (BLSA) Cutting Pediatrics Autism Brain Imaging Data Exchange (ABIDE) Information eXtraction from Images (IXI) Attention Deficit Hyperactivity Disorder (ADHD200) National Database for Autism Research (NDAR) Open Access Series on Imaging Study (OASIS) 1000 Functional Connectome (fcon_1000) Nathan Kline Institute Rockland (NKI_rockland)

www.blsa.nih.gov vkc.mc.vanderbilt.edu/ebrl fcon_1000.projects.nitrc.org/indi/abide www.nitrc.org/projects/ixi_dataset fcon_1000.projects.nitrc.org/indi/adhd200 ndar.nih.gov www.oasis-brains.org fcon_1000.projects.nitrc.org fcon_1000.projects.nitrc.org/indi/enhanced

605 586 563 523 949 328 312 1102 143

4 2 17 3 8 6 1 22 1

2 Methods 2.1

Extracting Volumetric Information

The complete cohort aggregates 9 datasets with a total 5111 MR T1w 3D images from normal healthy subjects (Table 1). 45 atlases are non-rigidly registered [4] to a target image and non-local spatial staple (NLSS) label fusion [5] is used to fuse the labels from each atlas to the target image using the BrainCOLOR protocol [6] (Fig. 1). WBV and regional volume are then calculated by multiplying the volume of a single voxel by

Fig. 1. The large-scale cross-sectional framework on 5111 multi-site MR 3D images.

Mapping Lifetime Brain Volumetry with C-RCS Regression

83

the number of labeled voxels in original image space. In total, 15 NOIs are defined by structural and functional covariance networks including visual, frontal, language, memory, motor, fusiform, basal ganglia (BG) and cerebellum (CB). 2.2

Covariate-Adjusted Restricted Cubic Spline (C-RCS)

We define x as the ages of all subjects and Sð xÞ as the corresponding brain volumes. In canonical nth degree spline regression, splines are used to model non-linear relationships between variables Sð xÞ and x by deciding the connections between K knots ðt1 \t2 \    \tK Þ. In this work, such knots were determined based on previously identified developmental shifts [1], specifically corresponding with transitions between childhood (7–12), late adolescence (12–19), young adulthood (19–30), middle adulthood (30–55), older adulthood (55–75), and late life (75–90). Using the expression from Durrleman and Simon [2], the canonical nth degree spline function is defined as Xn XK _ xj þ Sð x Þ ¼ b_ ðx ti Þnþ b ð1Þ oj i¼1 in j¼0 where ðx ti Þ þ ¼ x ti ; if x [ ti ; ðx ti Þ þ ¼ 0; if x  ti . 0 0 0 To regress out confound effects, new covariates X1 ; X2 ; . . .; Xc (with coefficients 0 0 0 b1 ; b2 ; . . .; bc ) are introduced to the nth degree spline regression Sð x Þ ¼

Xn

b_ x j þ j¼0 oj

XK

b_ ðx i¼1 in

ti Þnþ þ

XC

0

u¼0

0

ð2Þ

bu X u

where C is the number of confound effects. In the RCS regression, a linear constrain is introduced [2] to address the poor behavior of the cubic spline model in the tails (x\t1 and x [ tK ) [7]. Using the same principle, C-RCS regression extends the RCS regression (n ¼ 3) and restricts the relationship between Sð xÞ and x to be a linear function in the tails. First, for x\t1 , Sð xÞ ¼ b_ 00 þ b_ 01 x þ b_ 02 x2 þ b_ 03 x3 þ b_ 13 þ

XC

u¼0

0

0

ð3Þ

bu Xu

where b_ 02 ¼ b_ 03 ¼ 0 ensures the linearity before the first knot. Second, for x [ tK , Sð xÞ ¼ b_ 00 þ b_ 01 x þ b_ 13 ðx

t1 Þ3þ þ    þ b_ K3 ðx

tK Þ3þ þ

XC

u¼0

0

0

bu Xu

ð4Þ

To guarantee the linearity of C-RCS after the last knot, we expand the previous expression and force the coefficients of x2 and x3 to be zero. After expansion,     XC 0 0 _ þ 3b_ t2 þ . . . þ 3b_ t2 x b X Sð xÞ ¼ b_ 00 þ b_ 13 t13 þ . . . þ b_ K3 tK3 þ þ b 01 13 K3 u u 1 K u¼0     ð5Þ _ _ _ þ 3b13 t1 þ 3b23 t2 þ . . . þ 3bK3 tK x2 þ 3b_ 13 þ 3b_ 23 þ . . . þ 3b_ K3 x3

P P As a result, linearity of Sð xÞ at x [ tK implies that Ki¼1 b_ i3 ti ¼ 0 and Ki¼1 b_ i3 ¼ 0. Following such restrictions, the b_ ðK 1Þ3 and b_ K3 are derived as

84

Y. Huo et al.

b_ ðK

1Þ3

¼

PK 2 _ b_ i3 ðtK ti Þ b ðt K and b_ K3 ¼ i¼1 i3 tK tK 1 tK tK

2 i¼1

PK

and the complete C-RCS regression model is defined as XK 2 tK ti b_ i3 ½ðx ti Þ3þ Sð xÞ ¼ b_ 00 þ b_ 01 x þ ðx i¼1 tK tK 1 X 0 C tK 1 ti 0 bX þ ðx tK Þ3þ Š þ u¼0 u u tK tK 1 2.3

1

ti Þ

ð6Þ

1

tK 1 Þ3þ ð7Þ

Regressing Out Confound Effects by C-RCS Regression in GLM Fashion

To adapt C-RCS regression in the GLM fashion, we redefine the coefficients b0 ; b1 ; b2 ; . . .; bK 1 as Harrell [3] where b0 ¼ b_ 00 ; b1 ¼ b_ 01 ; b2 ¼ b_ 13 ; b3 ¼ b_ 23 ; b4 ¼ b_ 33 ;    ; bK 1 ¼ b_ ðK 2Þ3 . Then, the C-RCS regression with confound effects becomes XC XK 1 0 0 Sð x Þ ¼ b 0 þ bX ð8Þ bj Xj þ u¼0 u u j¼1 0

where C is the number for all confound effects (Xu ). X1 ¼ x and for j ¼ 2; . . .; K 1 3 tK tj 1 tK 1 tj 1 ðx tK 1 Þ3þ þ ðx tK Þ3þ ð9Þ Xj ¼ x tj 1 þ tK tK 1 tK tK 1 ^ ;b ^ ^ Then, the beta coefficients are solvable under GLM framework. Once b 0 1 ; b2 ; ^ ^ ^ ;b K 1 are obtained, two linear assured terms bK and bK þ 1 are estimated: PK 1 ^ PK 1 ^ tK 1 Þ i¼2 bi ðti 1 ^ ¼ i¼2 bi ðti 1 tK Þ and b ^ b ð10Þ K þ1 ¼ K tK tK 1 tK 1 tK The final estimated volumetric trajectories ^ SðxÞ can be fitted as XC XK þ 1 ^0 X 0 ^ ðx tj Þ3 þ ^ þ ^ b b SðxÞ ¼ b ð11Þ j 0 þ u¼0 u u j¼1 In this work, gender, field strength and total intracranial volume (TICV) are employed 0 as covariates Xu . TICV values are calculated using SIENAX [8]. Field strength and TICV are used to regress out site effects rather than using site categories directly since the sites are highly correlated with the explanatory variable age.

2.4

SCNs and CI Using Bootstrap Method

Using aforementioned C-RCS regression, the lifespan volumetric trajectories of WBV and 15 NOIs are obtained from 5111 images. Simultaneously, the piecewise volumetric trajectories within a particular age bin (between adjacent knots) of all 15 NOIs (^ Si ð xÞ; i ¼ 1; 2; . . .; 15) are separated to establish SCNs dendrograms using HCA [9]. The distance metric D used in HCA is defined as D ¼ 1 corrð^Si ð xÞ; ^Sj ð xÞÞ;

Mapping Lifetime Brain Volumetry with C-RCS Regression

85

Fig. 2. Volumetry and growth rate. The left plot in (a) shows the volumetric trajectory of whole brain volume (WBV) using C-RCS regression on 5111 MR images. The right figure in (a) indicates the growth rate curve, which shows volumetric change per year of the volumetric trajectory. In (b), C-RCS regression is deployed on the same dataset by additionally regressing out TICV. Our growth rate curves are compared with 40 previous longitudinal studies [1] on smaller cohorts (21 studies in (a) without regressing out TICV and 19 studies in (b) regressing out TICV). The standard deviations of previous studies are provided as black bars (if available). The 95 % CIs in all plots are calculated from 10,000 bootstrap samples.

i; j 2 ½1; 2; . . .; 15Š and i 6¼ j, where corrðÞ is the Pearson’s correlation between any two C-RCS fitted piecewise trajectories ^ Si ð xÞ and ^Sj ð xÞ in the same age bin. The stability of proposed approaches is demonstrated by the CIs of C-RCS regression and SCNs using bootstrap method [10]. First, the 95 % CIs of volumetric trajectories on WBV (Fig. 2) and 15 NOIs (Fig. 3) are derived by deploying C-RCS regression on 10,000 bootstrap samples. Then, the distances D between all pairs of clustered NOIs are derived using 15 (NOIs)  10,000 (bootstrap) C-RCS fitted trajectories. Then, the 95 % CIs are obtained for each pair of clustered NOIs and shown on six SCNs dendrograms (Fig. 4). The average network distance (AND), the average distance between 15 NOIs for a dendrogram, can be calculated 10,000 times using bootstrap. The AND reflects the modularity of connections between all NOIs. We are able to see if the AND are significantly different during brain development periods by deploying the two-sample t-test on AND values (10,000/age bin) between age bins.

86

Y. Huo et al.

Fig. 3. Lifespan trajectories of 15 NOIs are provided with 95 % CI from 10,000 bootstrap samples. The upper 3D figures indicate the definition of NOIs (in red). The lower figures show the trajectories with CI using C-RCS regression method by regressing out gender, field strength and TICV (same model as Fig. 2b). For each NOI, the piecewise CIs of six age bins are shown in different colors. The piecewise volumetric trajectories and CIs are separated by 7 knots in the lifespan C-RCS regression rather than conducting independent fittings. The volumetric trajectories on both sides of each NOI are derived separately except for CB.

3 Results Figure 2a shows the lifespan volumetric trajectories using C-RCS regression as well as the growth rate (volume change in percentage per year) of WBV when regressing out gender and field strength effects. Figure 2b indicates the C-RCS regression on the same dataset by adding TICV as an additional covariate. The cross sectional growth rate

Mapping Lifetime Brain Volumetry with C-RCS Regression

87

Fig. 4. The six structural covariance networks (SCNs) dendrograms using hierarchical clustering analysis (HCA) indicate which NOIs develop together during different developmental periods (age bins). The distance on the x-axis is in log scale, which equals to one minus Pearson’s correlation between two curves. The correlation between NOIs becomes stronger from right to left on the x-axis. The horizontal range of each colored rectangles indicates the 95 % CI of distance from 10,000 bootstrap samples. Note that the colors are chosen for visualization purposes without quantitative meanings.

curve using C-RCS regression is compared with 40 previous longitudinal studies (19 are TICV corrected) [1], which are typically limited on smaller age ranges. Using the same C-RCS model in Figs. 2b and 3 indicates the both lifespan and piecewise volumetric trajectories of 15 NOIs. In Fig. 4, the piecewise volumetric trajectories of the 15 NOIs within each age bin are clustered using HCA and shown in one SCNs dendrogram. Then, six SCNs dendrograms are obtained by repeating HCA on different age bins, which demonstrate the evolution of SCNs during different developmental periods. The ANDs between any two age bins in Fig. 4 are statistically significant (p < 0.001).

88

Y. Huo et al.

4 Conclusion and Discussion This paper proposes a large-scale cross-sectional framework to investigate life-time brain volumetry using C-RCS regression. C-RCS regression captures complex brain volumetric trajectories across the lifespan while regressing out confound effects in a GLM fashion. Hence, it can be used by researchers within a familiar context. The estimated volume trends are consistent with 40 previous smaller longitudinal studies. The stable estimation of volumetric trends for NOI (exhibited by narrow confidence bands) provides a basis for assessing patterns in brain changes through SCNs. Moreover, we demonstrate how to compute confidence intervals for SCNs and correlations between NOIs. The significant difference of AND indicates that the C-RCS regression detects the changes of average SCNs connections during the brain development. The software is freely available online1. Acknowledgments. This research was supported by NSF CAREER 1452485, NIH 5R21EY 024036, NIH 1R21NS064534, NIH 2R01EB006136, NIH 1R03EB012461, NIH R01NS095291 and also supported by the Intramural Research Program, National Institute on Aging, NIH.

References 1. Hedman, A.M., van Haren, N.E., Schnack, H.G., Kahn, R.S., Hulshoff Pol, H.E.: Human brain changes across the life span: a review of 56 longitudinal magnetic resonance imaging studies. Hum. Brain Mapp. 33, 1987–2002 (2012) 2. Durrleman, S., Simon, R.: Flexible regression models with cubic splines. Stat. Med. 8, 551– 561 (1989) 3. Harrell, F.: Regression Modeling Strategies: with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer, Switzerland (2015) 4. Avants, B.B., Epstein, C.L., Grossman, M., Gee, J.C.: Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med. Image Anal. 12, 26–41 (2008) 5. Asman, A.J., Dagley, A.S., Landman, B.A.: Statistical label fusion with hierarchical performance models. In: Proceedings - Society of Photo-Optical Instrumentation Engineers, vol. 9034, p. 90341E (2014) 6. Klein, A., Dal Canton, T., Ghosh, S.S., Landman, B., Lee, J., Worth, A.: Open labels: online feedback for a public resource of manually labeled brain images. In: 16th Annual Meeting for the Organization of Human Brain Mapping (2010) 7. Stone, C.J., Koo, C.-Y.: Additive splines in statistics, p. 48 (1986) 8. Smith, S.M., Zhang, Y., Jenkinson, M., Chen, J., Matthews, P.M., Federico, A., De Stefano, N.: Accurate, robust, and automated longitudinal and cross-sectional brain change analysis. Neuroimage 17, 479–489 (2002) 9. Anderberg, M.R.: Cluster Analysis for Applications: Probability and Mathematical Statistics: A Series of Monographs and Textbooks. Academic Press, New York (2014) 10. Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. CRC Press, Boca Raton (1994)

1

https://masi.vuse.vanderbilt.edu/index.php/C-RCSregression.

Extracting the Core Structural Connectivity Network: Guaranteeing Network Connectedness Through a Graph-Theoretical Approach Demian Wassermann1(B) , Dorian Mazauric2 , Guillermo Gallardo-Diez1 , and Rachid Deriche1 1

2

Athena EPI, Inria Sophia Antipolis-Medit´erran´ee, Sophia Antipolis 06902, France [email protected] ABS EPI, Inria Sophia Antipolis-Medit´erran´ee, Sophia Antipolis 06902, France

Abstract. We present a graph-theoretical algorithm to extract the connected core structural connectivity network of a subject population. Extracting this core common network across subjects is a main problem in current neuroscience. Such network facilitates cognitive and clinical analyses by reducing the number of connections that need to be explored. Furthermore, insights into the human brain structure can be gained by comparing core networks of different populations. We show that our novel algorithm has theoretical and practical advantages. First, contrary to the current approach our algorithm guarantees that the extracted core subnetwork is connected agreeing with current evidence that the core structural network is tightly connected. Second, our algorithm shows enhanced performance when used as feature selection approach for connectivity analysis on populations.

1

Introduction

Isolating the common core structural connectivity network (SCN) of a population is an important problem in current neuroscience [3,5]. This procedure facilitates cognitive and clinical studies based on Diffusion MRI e.g. [1,5] by increasing their statistical power through a reduction of the number of analyzed structural connections. We illustrate this process in Fig. 1. Furthermore, recent evidence indicates a core common network exists in human and macaque brains and that it is tightly connected [2]. In this work we develop, for the first time, a group-wise core SCN extraction algorithm which guarantees a connected network output. Furthermore, we show the potential of such network to select gender-specific connections through an experiment on 300 human subjects. The most used population-level core SCN extraction technique [5] is based on an effective statistical procedure to extract a population SCN: (1) computing, for each subject, a connectivity matrix using a standardised parcellation; and (2) extracting a binary graph by analysing each connection separately and rejecting hypothesis is not in the population. The resulting graph can be a set of disconnected subgraphs. This is problematic, recent studies have shown the core c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 89–96, 2016. DOI: 10.1007/978-3-319-46720-7 11

90

D. Wassermann et al.

Tractographies from the Subject Sample

Structural Connectivity Matrices Derived from Tractographies

Groupwise Core Connectivity Matrix

Statistical Analyses

Fig. 1. Scheme of analyses involving the core structural connectivity matrix.

network to be tightly connected [2]. However, extracting connected group-wise core SCN is far from simple: an algorithm to find the largest core network of a population cannot find an approximated solution in polynomial time. In this work, we propose a graph-theoretical algorithm to obtain the connected core SCN of a subject sample. Our approach guarantees a connected core SCN, agreeing with novel evidences on structural connectivity network topology e.g. [2]. We start by proving that we can formulate the problem such that core network extraction is NP-Complete in general but in our case we find an exact polynomial time algorithm to perform the extraction. Finally, we show that our algorithm outperforms that of Gong et al. [5] as a tool for selecting regressors in SC. For this, we use 300 subjects from the HCP database and comparing the performance of the networks obtained with both algorithms to predict connectivity values from gender in a subsection of the core network.

2

Definitions, Problems, and Contributions

A first approach to core sub-network identification can be derived from the binary connectivity model. In this model the cortical and sub-cortical regions are common across subjects and what varies is whether these regions are connected or not [5]. Using this approach, a sample of human brain connectivity of a given population can be represented by k ≥ 1 graphs G1 = (V, E1 ), . . . , Gk = (V, Ek ). In this formalism each graph Gi corresponds to a subject and, in accordance with Gong et al. [5], the vertices, V , stable across subjects are cortical and sub-cortical regions and the edges Ei are white matter bundles connecting those regions. Note that all graphs have the same ordered set of nodes. A first approach to compute, or approximate, the core sub-network of the population sample consists in finding the core SCN graph G∗ = (V ∗ , E ∗ ) such that G∗ and every Gi has some quantitative common properties, where V ∗ ⊆ V . In this article, we model the difference between the core SCN, G∗ , and the subject ones, Gi , by a function fλ . This function measures the difference between the sets of edges (and the sets of non-edges) of the core network and those of the subjects: fλ (G∗ , Gi ) = λ|{e ∈ E, e ∈ / E(Gi [V ∗ ])}| + (1 − λ)|{e ∈ / E, e ∈ E(Gi [V ∗ ])}|, where λ ∈ [0, 1] and Gi [V ∗ ] is the subgraph of Gi induced by the set of nodes V ∗ ⊆ V and |S| is the cardinality of a set S. In other words, fλ represents the

Extracting the Core Structural Connectivity Network

91

difference between the set of edges of G and the set of edges of Gi modulated by the parameter λ. In the following, we will refer to fλ (G∗ , Gi ) as the difference threshold of a core sub-network G∗ wrt Gi . Note that if λ = 1, we only consider edges excluded from the core network, |{e ∈ E, e ∈ / E(Gi [V ∗ ])}|, and if λ = 0, we only consider edges included in the core network, |{e ∈ / E, e ∈ E(Gi [V ∗ ])}|. In Definition 1, we formalize the problem of computing the core sub-network as a combinatorial optimization problem: Definition 1 (Core Sub-network Problem). Let G1 = (V, E1 ), . . . , Gk = (V, Ek ) be k ≥ 1 undirected graphs. Let λ be any real such that λ ∈ [0, 1]. Let n ≥ 0 be any integer. Then, the core sub-network problem consists in computing a connected graph G∗ = (V ∗ , E ∗ ) such that |V ∗ | ≥ n and such that the sum of k the difference thresholds i=1 fλ (G∗ , Gi ) is minimum. Small Example: Consider the instance depicted in Fig. 2 with λ = 12 . Figure 2(a), (b), and (c) represent G1 , G2 , G3 , respectively. Figure 2(d) is a solution G∗ = (V ∗ , E ∗ ) when n = 5. Indeed, we have fλ (G∗ , G3 ) = 21 because the difference between G∗ and G3 is the edge connecting nodes 2 and 5 or the two element set {2, 5}; we have fλ (G∗ , G2 ) = 2 because the difference between G∗ and G2 is the four edges {1, 5}, {1, 4}, {3, 4}, and {4, 5}; and we have fλ (G∗ , G1 ) = 1 because the difference between G∗ and G1 is the two edges {3, 5} and {4, 5}. We get fλ (G∗ , G1 ) + fλ (G∗ , G2 ) + fλ (G∗ , G3 ) = 72 . 1

5

1

5

1

5

1

5

2

4

2

4

2

4

2

4

3

3

3

3

(a)

(b)

(c)

(d)

Fig. 2. Instance of the common sub-network problem. (a–c) Brain connectivity of dif∗ ferent subjects, namely G 1, G 2 and G 3 . (d) Extracted common sub-network G that is optimal for n= 5 with λ= 21 : the difference threshold is 72 .

In the rest of this section we state our main contribution, an optimal polynomial time exact algorithm for the core sub-network problem if the number of nodes is sufficiently large (optimal means here that there is no exact algorithm with better complexity). Solving the problem in Definition 1, is hard: it can be proved that given an integer n ≥ 0 and a real number δ ≥ 0, then the decision version of the SCN problem is NP-complete even if k = 2. However, focusing on the problem of minimizing fλ we obtain a polynomial time algorithm for SCN extraction. The main point of this work is to present an algorithm for the core graph extraction and assess its potential for clinical and cognitive studies. Even if the problem is very difficult to solve in general, we design our polynomial time core subnetwork extraction algorithm and show that it is optimal, when we focus

92

D. Wassermann et al.

on the problem of minimizing the difference threshold and when the number of nodes of the core sub-network is large. Theorem 1. Consider k ≥ 1 undirected graphs G1 = (V, E1 ), . . . , Gk = (V, Ek ) and consider any real number λ ∈ [0, 1]. Then, Core-Sum-Alg (Algorthim 1) is an O(max(k, log(|V |)).|V |2 )-time complexity exact algorithm for the core subnetwork problem when n = |V |.

Algorithm 1 Core-Sum-Alg: Exact polynomial time complexity algorithm for the core sub-network problem when n = |V |. Require: SC graphs for each subject G ,E ,E 1 = (V 1 ), . . . , G k = (V k ), and λ∈ [0, 1] Start Computing a Core Graph that can have Multiple Connected Components: 1: Construct G = (V , V× V), the completely connected graph G 2: Compute w 0 (·), w 1 (·) across all subject graphs as in Eq. 1. 1 = {e |w )≤w )} 3: Compute the set of edges to be added to the core graph E 1 (e 0 (e 1 ,E ). and construct G1 = (V 4: 5: 6: 7:

8: 9: 10: 11:

Compute the Connected Components of G1 and Connect Them Compute the set of maximal connected components c c (G1 ) = (c c c 1 (G1 ) . . . c t (G1 )) Construct Gcc = (Vcc , Vcc × Vcc ) with Vcc = {u 1, . . . , u t} Compute w cc as in Eq. 4. 0 that correspond to argument minimum of Eq. 4. In Compute the set of edges E other words, for every {u i, u j }, select the edge econnecting the maximal connected c )−w )=w components c c i (G1 ) and c j (G1 ) such that w 1 (e 0 (e cc ({u i, u j }). Compute a minimum spanning tree T cc of Gcc 0 0 Compute the set of edges E ∗ ⊆ E that corresponds to the set of edges of the previous minimum spanning tree. ∗ ∗ ∗ 1 0 = (V∗ , E ) with V∗ := Vand E := E ∪E Construct G ∗ ∗ return G the connected Core Structural Connectivity Network

In the following, we aim at proving Theorem 1. Consider k ≥ 1 undirected graphs G1 = (V, E1 ), . . . , Gk = (V, Ek ) and consider any real number λ such that λ ∈ [0, 1]. Let us define some notations and auxiliary graphs. Let G = (V, V × V ) be the completely connected cortical network graph with V × V the set all pairs from the elements from V . We define two edge-weighting functions w0 and w1 : w0 (e) = (1 − λ)|{i, e ∈ Ei , 1 ≤ i ≤ k}|,

w1 (e) = λ|{i, e ∈ / Ei , 1 ≤ i ≤ k}|. (1)

Intuitively, w0 (e) represents the cost of not adding the edge e in the solution and w1 (e) represents the cost of adding the edge e in the solution. From this, we define the graph induced by the set of edges to keep in the core subnetwork G1 = (V, E 1 ),

E 1 = {e | w1 (e) ≤ w0 (e)} ⊆ V × V.

(2)

If G1 is a connected graph, then it is an optimal solution. Otherwise, we have to add edges in order to obtain a connected graph while minimizing the cost of such

Extracting the Core Structural Connectivity Network

93

adding. To add such edges we define a graph representing the fully connected graph where each node represents a maximal connected component: Gcc = (Vcc , Ecc ) with Vcc = {u1 , . . . , ut } and Ecc = Vcc × Vcc ,

(3)

where cc(G1 ) = (cc1 (G1 ), . . . , cct (G1 )) is the t maximal connected components of G1 . Then, to select which maximal connected components to include in our core subnetwork graph, we define a weight function wcc : wcc ({ui , uj }) =

min

v∈V (cci (G1 )),v ′ ∈V (ccj (G1 ))

w1 (e)−w0 (e), where 1 ≤ i < j ≤ t. (4)

We formally prove in Lemma 1 that the problem of obtaining a minimum connected graph from G1 , that is solving the core sub-network problem when n = |V |, consists in computing a minimum spanning tree of Gcc . Lemma 1. The core sub-network problem when n = |V | is equivalent to compute a minimum spanning tree of Gcc = (Vcc , Ecc ) with weight function wcc . Proof. The core sub-network problem when n = |V | consists in computing k ∗ a graph G∗ = (V ∗ , E ∗ ) such that V ∗ = V and δ ∗ = i=1 fλ (G , Gi ) is 1 minimum. Consider the graph G = (V, E ) previously defined. Observe that: 1    w (e). Indeed, δ ∗ ≥ e∈V ×V min(w0 (e), w1 (e)) = e∈E 1 w1 (e) + e∈V ×V |e∈E / 1 0 for every pair of nodes v, v ′ of V , either we set {v, v ′ } ∈ E ∗ if w1 ({u, v}) ≥ ∗ w0 ({v, v ′ }) or we set {v, v ′ } ∈ / E ∗ . Thus,  if G1 is a connected graph, then G = G1 ∗ is an optimal solution such that δ = e∈V ×V min(w0 (e), w1 (e)). Otherwise, we have to add edges in E 1 in order to get a connected graph (that is a spanning graph) and the “cost” of this addition has to be minimized. Thus, suppose that the graph G1 contains at least two maximal connected components. Let cc(G1 ) = (cc1 (G1 ), . . . , cct (G1 )) be the t ≥ 1 maximal connected components of G1 . We have to connect these different components minimizing the increasing of the difference threshold. Let E 0 be the set of candidate edges constructed as follows. For every i, j, 1 ≤ i < j ≤ t, let {vi , vj } be an edge such that for every v ∈ V (cci (G1 )) and for every v ′ ∈ V (ccj (G1 )), we have w1 ({vi , vj }) − w0 ({vi , vj }) ≤ w1 ({v, v ′ }) − w0 ({v, v ′ }). In other words, {vi , vj } is an edge that minimizes the marginal cost for connecting cci (G1 ) and ccj (G1 ). We add {vi , vj } in E 0 . Thus, we have to add exactly t − 1 edges of E 0 in order to get a connected graph and we aim at minimizing the cost of this addition. More precisely, we get our optimal core network by finding, for every i, j, 1 ≤ i < j ≤ t, an edge e such that wcc ({ui , uj }) = w1 (e) − w0 (e), that is an edge e of minimum marginal cost between the maximal connected component cci (G1 ) and the maximal cont − 1 edges. Thus, we get nected component ccj (G1 ). Let E∗0 be such a subset of   ∗ that δ = e∈E 1 w1 (e) + e∈V ×V |e∈E 1 ,e∈E 0 w1 (e) + / e∈V ×V |e∈E / 1 ,e∈E / ∗0 w0 (e). ∗  Observe that e∈V ×V |e∈E w (e) is exactly the cost of a mnimum spanning 1 0 1 / ,e∈E∗ tree of the graph Gcc defined before. ⊓ ⊔

94

D. Wassermann et al.

We are now able to prove Theorem 1. Proof. [of Theorem 1] Core-Sum-Alg (Algorithm 1) follows the proof of Lemma 1 and so solves the core sub-network problem when n = |V |. The construction of G (line 1) can be done in linear time in the number of edges, that is in O(|V |2 )-time. The time complexity of line 2 is O(k|V |2 ). The construction of G1 (line 3) can be done in O(|V | + |E 1 |)-time, O(|V |2 )-time in the worst case. The computation of the maximal connected components of G1 (line 4) can be done in linear time in the size of G1 , that is in O(|V | + |E 1 |), O(|V |2 )-time in the worst case. The construction of Gcc (line 5) can be done in linear time in the size of Gcc , that is, in the worst case, in O(|V |2 )-time. The time complexity of line 6 and 7 is O(|V |2 ). There is an O(m log(n))-time complexity algorihm for the problem of computing a minimum spanning tree of a graph composed of n nodes amd m edges (line 8). Thus, in our case, we get an O(log(|V |)|V |2 )-time algorithm. The time complexity of line 9 is O(|V |). Finally, the construction of G∗ = (V ∗ , E ∗ ) (line 10) can be done in constant time because V ∗ = V and ⊓ ⊔ E ∗ = E 1 ∪ E∗0 . Having developed the core subnetwork extraction guaranteeing a connected core network (Algorithm 1). We proceed to assess its performance.

3

Experiments and Results

To assess the performance of our method, we compared our novel approach with the currently used [5]: first, we compared the stability of the obtained binary graph across randomly chosen subpopulations; second, we compared connectivity prediction performance. For this comparisons, we used an homogeneous set from the HCP500 datatest [6]: all subjects aged 21–40 with complete dMRI protocol, which resulted in 309 subjects (112 male). We obtained the weighted connectivity matrices between the cortical regions defined by the Desikan atlas [4] as done by Bassett et al. [1]. To verify the untresholded graph construction, we computed the average degree, number connections over number of possible connections, on each subject. Bassett et al. [1] reported an average degree of 0.20 and we obtained 0.20 ± 0.01 (min: 0.17, max: 0.25) showing our preprocessing in agreement. 3.1

Consistency of the Extracted Graph

To quantify the consistency of the core graph extraction procedure we performed 500 Leave-N-Out experiments. At each experiment randomly sampled 100 subjects from the total and computed the core graphs with both techniques. We performed the extraction at 4 different levels of the parameter for each technique, choosing the parameters such that the density of the resulting graph connections is stable across methods. Also, we reported the number of unstable connections, selected as the connections that were not present or absent in all experiments. We show the results of this experiment in Fig. 3. In this figure we can observe that the resulting graphs are similar, while the number of unstable connections is larger for Gong et al. [5] by an order of magnitude.

Extracting the Core Structural Connectivity Network

95

Fig. 3. Consistency analysis for extracted core graphs. We performed a Leave-N-Out procedure to quantify the consistency across methods at 4 different parameter levels. The results show similar graphs for both methods. However our method, in blue, has a smaller number of connections that are not present or absent across all experiments, i.e. unstable connections (marked in red).

3.2

Predicting Gender-Specific Connectivity

To assess model fit and prediction we implemented a nested Leave- 13 -Out procedure. The outer loop performs model selection on 31 of the subjects randomly selected. First, it computes the core graph of a population, with our approach and that of Gong et al. [5]. Then, it selects the features F that are more determinant of gender classification using the f-test feature selection procedure. The features are taken from the core graph adding the connectivity weights to each subject. The inner loop performs model fitting and prediction using the selected features F . First, we randomly take 13 of the remaining subjects and fits a linear model on F for predicting gender. Second, we predict the values of the features F from the gender column. The outer loop is performed 100 times and the inner Fitting

Fig. 4. Performance of core network as feature selection for a linear model for gender specific connectivity. We evaluate model fit (left) and prediction (right), Gong et al. [5] in green, and ours, in blue. We show the histograms of both values from our nested Leave- 31 -Out experiment. In both measures, our approach has more frequent lower values, showing a better performance.

96

D. Wassermann et al.

loop 500 times per outer loop. This totals 50,000 experiments. Finally, for each experiment, we quantify the prediction performance of the linear model at each inner loop with the mean squared error (MSE) of the prediction and Akaike Information Criterion (AIC) for model fitting. We show the experiment’s results in Fig. 4. In these results we can see that our approach, in blue, performed better than Gong et al. [5], in green as the number of cases with lower AIC and MSE is larger in our case.

4

Discussion and Conclusion

We present, for the first time, an algorithm to extract the core structural connectivity network of a subject population while guaranteeing connectedness. We start by formalizing the problem and showing that, although the problem is very hard (it is NP-complete), we produce a polynomial time exact algorithm to extract such network when its number of nodes is large. Finally, we show an example in which that our network constitutes a better feature selection step for statistical analyses of structural connectivity. For this, we performed a nested leave- 13 -out experiment on 300 hundred subjects. The results show that performing feature selection with our technique outperforms the most commonly used approach. Acknowledgments. This work has received funding from the European Research Council (ERC Advanced Grant agreement No. 694665).

References 1. Bassett, D.S., Brown, J.A., Deshpande, V., Carlson, J.M., Grafton, S.T.: Conserved and variable architecture of human white matter connectivity. Neuroimage 54(2), 1262–1279 (2011) 2. Bassett, D.S., Wymbs, N.F., Rombach, M.P., Porter, M.A., Mucha, P.J., Grafton, S.T.: Task-based core-periphery organization of human brain dynamics. PLoS Comput. Biol. 9(9), e1003171 (2013) 3. Bullmore, E., Sporns, O.: Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10(3), 186–198 (2009) 4. Desikan, R.S., S´egonne, F., Fischl, B., Quinn, B.T., Dickerson, B.C., Blacker, D., Buckner, R.L., Dale, A.M., Maguire, R.P., Hyman, B.T., Albert, M., Killiany, R.J.: An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31(3), 968–980 (2006) 5. Gong, G., He, Y., Concha, L., Lebel, C., Gross, D.W., Evans, A.C., Beaulieu, C.: Mapping anatomical connectivity patterns of human cerebral cortex using in vivo diffusion tensor imaging tractography. Cereb. Cortex 19(3), 524–536 (2009) 6. Sotiropoulos, S.N., Jbabdi, S., Xu, J., Andersson, J.L., Moeller, S., Auerbach, E.J., Glasser, M.F., Hernandez, M., Sapiro, G., Jenkinson, M., Feinberg, D.A., Yacoub, E., Lenglet, C., Van Essen, D.C., Ugurbil, K., Behrens, T.E.J.: Advances in diffusion MRI acquisition and processing in the Human Connectome Project. Neuroimage 80, 125–143 (2013)

Fiber Orientation Estimation Using Nonlocal and Local Information Chuyang Ye(B) Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China [email protected]

Abstract. Diffusion magnetic resonance imaging (dMRI) enables in vivo investigation of white matter tracts, where the estimation of fiber orientations (FOs) is a crucial step. Dictionary-based methods have been developed to compute FOs with a lower number of dMRI acquisitions. To reduce the effect of noise that is inherent in dMRI acquisitions, spatial consistency of FOs between neighbor voxels has been incorporated into dictionary-based methods. Because many fiber tracts are tube- or sheet-shaped, voxels belonging to the same tract could share similar FO configurations even when they are not adjacent to each other. Therefore, it is possible to use nonlocal information to improve the performance of FO estimation. In this work, we propose an FO estimation algorithm, Fiber Orientation Reconstruction using Nonlocal and Local Information (FORNLI), which adds nonlocal information to guide FO computation. The diffusion signals are represented by a set of fixed prolate tensors. For each voxel, we compare its patch-based diffusion profile with those of the voxels in a search range, and its nonlocal reference voxels are determined as the k nearest neighbors in terms of diffusion profiles. Then, FOs are estimated by iteratively solving weighted ℓ1 -norm regularized least squares problems, where the weights are determined using local neighbor voxels and nonlocal reference voxels. These weights encourage FOs that are consistent with the local and nonlocal information. FORNLI was performed on simulated and real brain dMRI, which demonstrates the benefit of incorporating nonlocal information for FO estimation. Keywords: Diffusion MRI

1

· FO estimation · Nonlocal information

Introduction

By capturing the anisotropy of water diffusion in tissue, diffusion magnetic resonance imaging (dMRI) enables in vivo investigation of white matter tracts. The fiber orientation (FO) is a crucial feature computed from dMRI, which plays an important role in fiber tracking [5]. Voxelwise FO estimation methods have been proposed and widely applied, such as constrained spherical deconvolution [16], multi-tensor models [9,13,17], c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 97–105, 2016. DOI: 10.1007/978-3-319-46720-7 12

98

C. Ye

and ensemble average propagator methods [10]. In particular, to reduce the number of dMRI acquisitions required for resolving crossing fibers, sparsity assumption has been incorporated in the estimation problem. For example, it has been used in the multi-tensor framework [1,9,13], leading to dictionary-based FO estimation algorithms that have been shown to reconstruct FOs of good quality yet using a lower number of dMRI acquisitions [1]. Because of image noise that adversely affects FO estimation, the regularization of spatial consistency has been used in FO estimation problems. For example, smoothness of diffusion tensors and FOs has been used as regularization terms in the estimation in [12,15], respectively, but no sparsity regularization is introduced. Other methods incorporate both sparsity and smoothness assumption. For example, in [11,14] sparsity regularization is used together with the smoothness of diffusion images in a spherical ridgelets framework, where FO smoothness is enforced indirectly. More recently, [4,18] manage to directly encode spatial consistency of FOs between neighbor voxels with sparsity regularization in the multi-tensor models by using weighted ℓ1 -norm regularization, where FOs that are consistent with neighbors are encouraged. These methods have focused on the use of local information for robust FO estimation. However, because fiber tracts are usually tube-like or sheet-like [19], voxels that are not adjacent to each other can also share similar FO configurations. Thus, nonlocal information could further contribute to improved FO reconstruction by providing additional information. In this work, we propose an FO estimation algorithm that improves estimation quality by incorporating both nonlocal and local information, which is named Fiber Orientation Reconstruction using Nonlocal and Local Information (FORNLI). We use a dictionary-based FO estimation framework, where the diffusion signals are represented by a tensor basis so that sparsity regularization can be readily incorporated. We design an objective function that consists of data fidelity terms and weighted ℓ1 -norm regularization. The weights in the weighted ℓ1 -norm encourage spatial consistency of FOs and are here encoded by both local neighbors and nonlocal reference voxels. To determine the nonlocal reference voxels for each voxel, we compare its patch-based diffusion profile with those of the voxels in a search range, and select the k nearest neighbors in terms of diffusion profiles. FOs are estimated by minimizing the objective function, where weighted ℓ1 -norm regularized least squares problems are iteratively solved.

2 2.1

Methods Background: A Signal Model with Sparsity and Smoothness Regularization

Sparsity regularization has been shown to improve FO estimation and reduce the number of gradient directions required for resolving crossing fibers [1]. A commonly used strategy to incorporate sparsity is to model the diffusion signals using a fixed basis. The prolate tensors have been a popular choice because of

Fiber Orientation Estimation Using Nonlocal and Local Information

99

their explicit relationship with FOs [1,9,13]. Specifically, let {Di }N i=1 be a set of N fixed prolate tensors. The primary eigenvector (PEV) vi of each Di represents a possible FO and these PEVs are evenly distributed on the unit sphere. The eigenvalues of the basis tensors can be determined by examining the diffusion tensors in noncrossing tracts [9]. Then, the diffusion weighted signal Sm (gk ) at voxel m associated with the gradient direction gk (k = 1, 2, . . . , K) and b-value bk can be represented as Sm (gk ) = Sm (0)

N 

T

fm,i e−bk gk Di gk + nm (gk ),

(1)

i=1

where Sm (0) is the baseline signal without diffusion weighting, fm,i is Di ’s N unknown nonnegative mixture fraction ( i=1 fm,i = 1), and nm (gk ) is noise. We define ym (gk ) = Sm (gk )/Sm (0) and ηm (gk ) = nm (gk )/Sm (0), and let ym = (ym (g1 ), ym (g2 ), . . . , ym (gK ))T and ηm = (ηm (g1 ), ηm (g2 ), . . . , ηm (gK ))T . Then, Eq. (1) can be written as ym = Gfm + ηm ,

(2) T

where G is a K × N dictionary matrix with Gki = e−bk qk Di qk , and fm = (fm,1 , fm,2 , . . . , fm,N )T . Based on the assumption that at each voxel the number of FOs is small with respect to the number of gradient directions, the mixture fractions can be estimated using a voxelwise sparse reconstruction formulation fˆm =

arg min fm ≥0,||fm ||1 =1

||Gfm − ym ||22 + β||fm ||0 .

(3)

In practice, the constraint of ||fm ||1 = 1 is usually relaxed, and the sparse reconstruction can be either solved directly [8] or by approximating the ℓ0 -norm with ℓ1 -norm [1,9,13]. Basis directions corresponding to nonzero mixture fractions are determined as FOs. To further incorporate spatial coherence of FOs, weighted ℓ1 -norm regularization has been introduced into dictionary-based FO estimation [4,18]. For example, in [18] FOs in all voxels are jointly estimated by solving {fˆm }M m=1 =

arg min

M 

f1 ,f2 ,...,fM ≥0 m=1

||Gfm − ym ||22 + β||Cm fm ||1 ,

(4)

where M is the number of voxels and Cm is a diagonal matrix that encodes neighbor interaction. It places smaller penalties on mixtures fractions associated with basis directions that are more consistent with neighbor FOs so that these mixture fractions are more likely to be positive and their associated basis directions are thus encouraged. 2.2

FO Estimation Incorporating Nonlocal Information

In image denoising or segmentation problems, nonlocal information has been used to improve the performance [3,6]. In FO estimation, because fiber tracts

100

C. Ye

are usually tube-shaped (e.g., the cingulum bundle) or sheet-shaped (e.g., the corpus callosum) [19], voxels that are not adjacent to each other can still have similar FO patterns, and it is possible to use nonlocal information to improve the estimation. We choose to use a weighted ℓ1 -norm regularized FO estimation framework similar to Eq. (4), and encode the weighting matrix Cm using both nonlocal and local information. Finding Nonlocal Reference Voxels. For each voxel m, the nonlocal information is extracted from a set Rm of voxels, which are called nonlocal reference voxels and should have diffusion profiles similar to that of m. To identify the nonlocal reference voxels for m, we compute patch-based dissimilarities between the voxel m and the voxels in a search range Sm , like the common practice in nonlocal image processing [3,6]. Specifically, we choose a search range of a 11 × 11 × 11 cube [3] whose center is m. The patch at each voxel n ∈ Sm is formed by the diffusion tensors of its 6-connected neighbors and the diffusion tensor at n, which is represented as ∆n = (∆n,1 , . . . , ∆n,7 ). We define the following patch-based diffusion dissimilarity between two voxels m and n 7

d∆ (∆m , ∆n ) =

1 d(∆m,j , ∆n,j ), 7 j=1

where d(·, ·) is the log-Euclidean tensor distance [2]  d(∆m,j , ∆n,j ) = Trace({log(∆m,j ) − log(∆n,j )}2 ).

(5)

(6)

For each m we find its k nearest neighbors in terms of the diffusion dissimilarity in Eq. (5), and define them as the nonlocal reference voxels. k is a parameter to be specified by users. Note that although we call these reference voxels nonlocal, it is possible that Rm contains the neighbors of m as well, if they have very similar diffusion profiles to that of m. We used the implementation of k nearest neighbors in the scikit-learn toolkit1 based on a ball tree search algorithm. Guided FO Estimation. We seek to guide FO estimation using the local neighbors and nonlocal reference voxels. Like [18], we use a 26-connected neighborhood Nm of m. Then, the set of voxels guiding FO estimation at m is Gm = Nm ∪ Rm . Using Gm , we extract a set of likely FOs for m to determine the weighting of basis directions and guide FO estimation. First, a voxel similarity between m and each voxel n ∈ Gm is defined  exp{−μd2 (Dm , Dn )}, if n ∈ Nm w(m, n) = , (7) exp{−μd2∆ (∆m , ∆n )}, otherwise where μ = 3.0 is a constant [18], and Dm and Dn are the diffusion tensors at m and n, respectively. When n is a neighbor of m, the voxel similarity is exactly 1

http://scikit-learn.org/stable/modules/neighbors.html.

Fiber Orientation Estimation Using Nonlocal and Local Information

101

the one defined in [18]; when n is not adjacent to m, the voxel similarity is defined using the patches ∆m and ∆n . Second, suppose the FOs at a voxel n n are {wn,j }W j=1 , where Wn is the number of FOs at n. For each m we can compute the similarity between the basis direction vi and the FO configurations of the voxels in the guiding set Gm  (8) w(m, n) max |vi · wn,j |, i = 1, 2, . . . , N. Rm (i) = n∈Gm

j=1,2,...,Wn

When vi is aligned with the FOs in many voxels in the guiding set Gm and these voxels are similar to m, large Rm (i) is observed, indicating that vi is likely to be an FO. Note that Rm (i) is similar to the aggregate basis-neighbor similarity defined in [18]. Here we have replaced the neighborhood Nm in [18] with the guiding set Gm containing both local and nonlocal information. These Rm (i) can then be plotted on the unit sphere according to their associated basis directions, and the basis directions with local maximal Rm (i) are determined as m likely FOs Um = {um,p }U p=1 (Um is the cardinality of Um ) at m [18]. With the likely FOs Um , the diagonal entries of Cm are specified as [18]

Cm,i =

1−α max |vi ·um,p | p=1,2,...,Um  , 1−α max |vq ·um,p | min

q=1,2,...,N

i = 1, 2, . . . , N ,

(9)

p=1,2,...,Um

where α is a constant controlling the influence of guiding voxels. Smaller weights are associated with basis directions closer to likely FOs, and these directions are encouraged. In this work, we set α = 0.8 as suggested by [18]. We estimate FOs in all voxels by minimizing the following objective function with weighted ℓ1 -norm regularization, E(f1 , f2 , . . . , fM ) =

M 

||Gfm − ym ||22 +

m=1

β ||Cm fm ||1 , Wm

(10)

where fm ≥ 0 and β is a constant. Note that we assign smaller weights to the weighted ℓ1 -norm when the number of FOs is larger, which in practice increases accuracy. In this work, we set β = 0.3, which is smaller than the one used in [18] because the number of gradient directions in the dMRI data is smaller than that in [18]. Because Cm is a function of the unknown FOs, to solve Eq. (10) we iteratively solve fm sequentially. At iteration t, for each fm we have t−1 t−1 t t fˆm ) , . . . , fˆM = arg min E(fˆ1t , . . . , fˆm−1 , fm , fˆm+1 fm ≥0

= arg min ||Gfm − ym ||22 + fm ≥0

β t t−1 ||Cm fm ||1 , Wm

(11)

which is a weighted Lasso problem that can be solved using the strategy in [17].

102

C. Ye

Fig. 1. 3D rendering of the digital phantom.

3

Results

3.1

3D Digital Crossing Phantom

A 3D digital phantom (see Fig. 1) with the same tract geometries and diffusion properties used in [18] was created to simulate five tracts. Thirty gradient directions (b = 1000 s/mm2 ) were used to simulate the diffusion weighted images (DWIs). Rician noise was added to the DWIs. The signal-to-noise ratio (SNR) is 20 on the b0 image. FORNLI with k = 4 was applied on the phantom and compared with CSD [16], CFARI [9], and FORNI [18] using the FO error proposed in [18]. CSD and CFARI are voxelwise FO estimation methods, and FORNI incorporates neighbor information for FO estimation. We used the CSD implementation in the Dipy software2 , and implemented CFARI and FORNI using the parameters reported in [9,18], respectively. The errors over the entire phantom and in the regions with noncrossing or crossing tracts are plotted in Fig. 2(a), where FORNLI achieves the most accurate result. In addition, we compared the two best algorithms here, FORNI and FORNLI, using a paired Student’s t-test. In

Fig. 2. FO estimation errors. (a) Means and standard deviations of the FO errors of CSD, CFARI, FORNI, and FORNLI; (b) mean FORNLI FO errors with different numbers of nonlocal reference voxels in regions with noncrossing or crossing tracts. 2

http://nipy.org/dipy/examples built/reconst csd.html.

Fiber Orientation Estimation Using Nonlocal and Local Information

103

all four cases, errors of FORNLI are significantly smaller than those of FORNI (p < 0.05), and the effect sizes (Cohen’s d) are between 0.5 and 0.6. Next, we studied the impact of the number of nonlocal reference voxels. Using different k, the errors in regions with noncrossing or crossing tracts are shown in Fig. 2(b). Note that k = 0 represent cases where only the local information from neighbors is used. Incorporation of nonlocal information improves the estimation quality, especially in the more complex regions with three crossing tracts. When k reaches four, the estimation accuracy becomes stable, so we will use k = 4 for the brain dMRI dataset. 3.2

Brain dMRI

We selected a random subject in the publicly available dataset of COBRE [7]. The DWIs and b0 images were acquired on a 3T Siemens Trio scanner, where 30 gradient directions (b = 800 s/mm2 ) were used. The resolution is 2 mm isotropic. The SNR is about 20 on the b0 image. To evaluate FORNLI (with k = 4) and compare it with CSD, CFARI, and FORNI, we demonstrate the results in a region containing the crossing of the corpus callosum (CC) and the superior longitudinal fasciculus (SLF) in Fig. 3. We have also shown the results of FORNLI with k = 0, where no nonlocal information is used. By enforcing spatial consistency of FOs, FORNI and FORNLI improve the estimation of crossing FOs. In addition, in the orange box FORNLI (k = 4) achieves more consistent FO configurations than FORNI; and in the blue box, compared with FORNI and FORNLI (k = 0), FORNLI (k = 4) avoids the FO configurations in the upper-right voxels that seem to contradict with the adjacent voxels by having sharp turning angles.

Fig. 3. FO estimation in the crossing regions of SLF and CC overlaid on the fractional anisotropy map. Note the highlighted region for comparison.

4

Conclusion

We have presented an FO estimation algorithm FORNLI which is guided by both local and nonlocal information. Results on simulated and real brain dMRI data demonstrate the benefit of the incorporation of nonlocal information for FO estimation.

104

C. Ye

References 1. Aranda, R., Ramirez-Manzanares, A., Rivera, M.: Sparse and adaptive diffusion dictionary (SADD) for recovering intra-voxel white matter structure. Med. Image Anal. 26(1), 243–255 (2015) 2. Arsigny, V., Fillard, P., Pennec, X., Ayache, N.: Log-Euclidean metrics for fast and simple calculus on diffusion tensors. Magn. Reson. Med. 56(2), 411–421 (2006) 3. Asman, A.J., Landman, B.A.: Non-local statistical label fusion for multi-atlas segmentation. Med. Image Anal. 17(2), 194–208 (2013) 4. Aur´ıa, A., Daducci, A., Thiran, J.P., Wiaux, Y.: Structured sparsity for spatially coherent fibre orientation estimation in diffusion MRI. NeuroImage 115, 245–255 (2015) 5. Basser, P.J., Pajevic, S., Pierpaoli, C., Duda, J., Aldroubi, A.: In vivo fiber tractography using DT-MRI data. Magn. Reson. Med. 44(4), 625–632 (2000) 6. Buades, A., Coll, B., Morel, J.M.: A non-local algorithm for image denoising. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 60–65. IEEE (2005) 7. Cetin, M.S., Christensen, F., Abbott, C.C., Stephen, J.M., Mayer, A.R., Ca˜ nive, J.M., Bustillo, J.R., Pearlson, G.D., Calhoun, V.D.: Thalamus and posterior temporal lobe show greater inter-network connectivity at rest and across sensory paradigms in schizophrenia. NeuroImage 97, 117–126 (2014) 8. Daducci, A., Van De Ville, D., Thiran, J.P., Wiaux, Y.: Sparse regularization for fiber ODF reconstruction: from the suboptimality of ℓ2 and ℓ1 priors to ℓ0 . Med. Image Anal. 18(6), 820–833 (2014) 9. Landman, B.A., Bogovic, J.A., Wan, H., ElShahaby, F.E.Z., Bazin, P.L., Prince, J.L.: Resolution of crossing fibers with constrained compressed sensing using diffusion tensor MRI. NeuroImage 59(3), 2175–2186 (2012) 10. Merlet, S.L., Deriche, R.: Continuous diffusion signal, EAP and ODF estimation via compressive sensing in diffusion MRI. Med. Image Anal. 17(5), 556–572 (2013) 11. Michailovich, O., Rathi, Y., Dolui, S.: Spatially regularized compressed sensing for high angular resolution diffusion imaging. IEEE Trans. Med. Imaging 30(5), 1100–1115 (2011) 12. Pasternak, O., Assaf, Y., Intrator, N., Sochen, N.: Variational multiple-tensor fitting of fiber-ambiguous diffusion-weighted magnetic resonance imaging voxels. Magn. Reson. Imaging 26(8), 1133–1144 (2008) 13. Ramirez-Manzanares, A., Rivera, M., Vemuri, B.C., Carney, P., Mareci, T.: Diffusion basis functions decomposition for estimating white matter intravoxel fiber geometry. IEEE Trans. Med. Imaging 26(8), 1091–1102 (2007) 14. Rathi, Y., Michailovich, O., Laun, F., Setsompop, K., Grant, P.E., Westin, C.F.: Multi-shell diffusion signal recovery from sparse measurements. Med. Image Anal. 18(7), 1143–1156 (2014) 15. Reisert, M., Kiselev, V.G.: Fiber continuity: an anisotropic prior for ODF estimation. IEEE Trans. Med. Imaging 30(6), 1274–1283 (2011) 16. Tournier, J.D., Calamante, F., Connelly, A.: Robust determination of the fibre orientation distribution in diffusion MRI: non-negativity constrained super-resolved spherical deconvolution. NeuroImage 35(4), 1459–1472 (2007) 17. Ye, C., Murano, E., Stone, M., Prince, J.L.: A Bayesian approach to distinguishing interdigitated tongue muscles from limited diffusion magnetic resonance imaging. Comput. Med. Imaging Graph. 45, 63–74 (2015)

Fiber Orientation Estimation Using Nonlocal and Local Information

105

18. Ye, C., Zhuo, J., Gullapalli, R.P., Prince, J.L.: Estimation of fiber orientations using neighborhood information. Med. Image Anal. 32, 243–256 (2016) 19. Yushkevich, P.A., Zhang, H., Simon, T.J., Gee, J.C.: Structure-specific statistical mapping of white matter tracts. NeuroImage 41(2), 448–461 (2008)

Reveal Consistent Spatial-Temporal Patterns from Dynamic Functional Connectivity for Autism Spectrum Disorder Identification Yingying Zhu1, Xiaofeng Zhu1, Han Zhang1, Wei Gao2, Dinggang Shen1, and Guorong Wu1(&) 1 Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA [email protected] 2 Biomedical Imaging Research Institute, Department of Biomedical Sciences and Imaging, Cedars-Sinai Medical Center, Los Angeles, USA

Abstract. Functional magnetic resonance imaging (fMRI) provides a noninvasive way to investigate brain activity. Recently, convergent evidence shows that the correlations of spontaneous fluctuations between two distinct brain regions dynamically change even in resting state, due to the conditiondependent nature of brain activity. Thus, quantifying the patterns of functional connectivity (FC) in a short time period and changes of FC over time can potentially provide valuable insight into both individual-based diagnosis and group comparison. In light of this, we propose a novel computational method to robustly estimate both static and dynamic spatial-temporal connectivity patterns from the observed noisy signals of individual subject. We achieve this goal in two folds: (1) Construct static functional connectivity across brain regions. Due to low signal-to-noise ratio induced by possible non-neural noise, the estimated FC strength is very sensitive and it is hard to define a good threshold to distinguish between real and spurious connections. To alleviate this issue, we propose to optimize FC which is in consensus with not only the low level region-to-region signal correlations but also the similarity of high level principal connection patterns learned from the estimated link-to-link connections. Since brain network is intrinsically sparse, we also encourage sparsity during FC optimization. (2) Characterize dynamic functional connectivity along time. It is hard to synchronize the estimated dynamic FC patterns and the real cognitive state changes, even using learning-based methods. To address these limitations, we further extend above FC optimization method into the spatial-temporal domain by arranging the FC estimations along a set of overlapped sliding windows into a tensor structure as the window slides. Then we employ low rank constraint in the temporal domain assuming there are likely a small number of discrete states that the brain transverses during a short period of time. We applied the learned spatial-temporal patterns from fMRI images to identify autism subjects. Promising classification results have been achieved, suggesting high discrimination power and great potentials in computer assisted diagnosis.

© Springer International Publishing AG 2016 S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 106–114, 2016. DOI: 10.1007/978-3-319-46720-7_13

Reveal Consistent Spatial-Temporal Patterns from Dynamic FC

107

1 Introduction In general, resting-state functional connectivity is a set of pair-wise connectivity measurements, each of which describes the strength of co-activity between two regions in human brain. In many group comparison studies, FC obtained from resting-state fMRI shows observable abnormal patterns in patient cohort to understand different disease mechanisms. In clinical practice, FC is regarded as an important biomarker for disease diagnosis and monitoring in various clinical applications such as Alzheimer’s disease [1] and Autism [2]. In current functional brain network studies, Pearson’s correlation on BOLD (Blood Oxygen Level Dependent) signals is widely used to measure the strength of FC between two brain regions [2, 3]. It is worth noting that such correlation based connectivity measure is exclusively calculated based on the observed BOLD signals and fixed for the subsequent data analysis. However, the BOLD signal usually has very poor signal-to-noise ratio and is mixed with substantial non-neural noise and artefacts. Therefore, it is hard for current state-of-the-art methods to determine a good threshold of FC measure which can effectively distinguish real and spurious connections. For simplicity, many FC characterization methods assume that connectivity patterns in the brain do not change over the course of a resting-state fMRI scan. There is a growing consensus in the neuroimaging field, however, that the spontaneous fluctuations and correlations of signals between two distinct brain regions change with correspondence to cognitive states, even in a task-free environment [4]. Thus, dynamic FC patterns have been investigated recently by mainly using sliding window technique [4, 11–13]. However, it is very difficult to synchronize the estimated dynamic patterns with the real fluctuations of cognitive state, even using advanced machine learning techniques such as clustering [5] and hidden Markov model [6]. For example, both

Fig. 1. The advantage of our learning-based spatial-temporal FC optimization method (bottom) over the conventional method (top) which calculate the FC based on signal correlations. As the trajectory of FC at Amygdala shown in the right, the dynamic FC optimized by our learning-based method is more reasonable than the conventional correlation-based method.

108

Y. Zhu et al.

methods have to determine the number of states (clusters) which might work well on the training data but have the potential issue of generality to the unseen testing subjects. To address above issues, we propose a novel data-driven solution to reveal the consistent spatial-temporal FC patterns from resting-state fMRI image. Our work has two folds. First, we present a robust learning-based method to optimize FC from the BOLD signals in a fixed sliding window. In order to avoid the unreliable calculation of FC based on signal correlations, high level feature representation is of necessity to guide the optimization of FC. Specifically, we apply singular value decomposition (SVD) to the tentatively estimated FC matrix and regard the top ranked eigenvectors are as the high level network features which characterize the principal connection patterns across all brain regions. Thus, we can optimize functional connections for each brain region based on not only the observed region-to-region signal correlations but also the similarity between high level principal connection patterns. In turn, the refined FC can lead to more reasonable estimation of principal connection patterns. Since brain network is intrinsically economic and sparse, sparsity constraint is used to control the number of connections during the joint estimation of principal connection patterns and the optimization of FC. Second, we further extend the above FC optimization framework from one sliding window (capturing the static FC patterns) to a set of overlapped sliding windows (capturing the dynamic FC patterns), as shown in the middle of Fig. 1. The leverage is that we arrange the FCs along time into a tensor structure (pink cubic in Fig. 1) and we employ additional low rank constraint to penalize the oscillatory changes of FC in the temporal domain. In this paper, we apply our learning-based method to find the spatial-temporal functional connectivity patterns for identifying childhood autism spectrum disorders (ASD). Compared with conventional approaches which simply calculate FC based on signal correlations, more accurate classification results have been achieved in classifying normal control (NC) and ASD subjects by using our learned spatial-temporal FC patterns.

2 Method 2.1

Construct Robust Functional Connectivity

Let xi 2 λ1 > ... > λκ , which can be very time-consuming. Enhanced Dual Polytope Projection (EDPP) [10] is a highly efficient safe screening rules. Implementation details of EDPP is available on the GitHub: http://dpc-screening.github.io/lasso.html. To address the problem of data privacy, we propose a distributed Lasso screening rule, termed Distributed Enhanced Dual Polytope Projection (DEDPP), to identify and discard inactive features along a sequence of parameter values in a distributed manner. The idea of D-EDPP is similar to LQM. Specifically, to update the global variables, we apply LQM to query each local center for intermediate results–computed locally–and we aggregate them at global center. After obtaining the reduced matrix for each institution, we apply LQM to solve i , i = 1, ..., m. We assume that j the Lasso problem on the reduced data set A

340

Q. Li et al.

Algorithm 1. Distributed Enhanced Dual Polytope Projection (D-EDPP) Require: A set of data pairs {(A1 , y1 ), (A2 , y2 ), ..., (An , yn )} and ith institution holds the data pair (Ai , yi ). A sequence of parameters: λmax = λ0 > λ1 > ... > λκ . Ensure: The learnt models: {x∗ (λ0 ), x∗ (λ1 ), ..., x∗ (λκ )}. 1: Perform the computation on n institutions. For the ith institution:  R by LQM. Then we get λmax by ||R||∞ . 2: Let Ri = ATi yi , compute R = m i i 3: J = arg maxj |R|, vi = [Ai ]J where [Ai ]J is the Jth column of Ai . 4: Let λ0 ∈ (0,  λmax ] and λ ∈ (0, λ0 ]. yi , if λ = λmax , 5: θi (λ) = λyimax −Ai x∗ (λ) , if λ ∈ (0, λmax ), λ  6: Ti = viT ∗ yi , compute T = m i Ti by LQM.  sign(T ) ∗ vi , if λ0 = λmax , 7: v1 (λ0 )i = yi − θi (λ0 ), if λ0 ∈ (0, λmax ), λ0  8: v2 (λ, λ0 )i = yλi − θi (λ0 ), Si = ||v1 (λ0 )i ||22 , compute S = m i Si by LQM. 9: v2⊥ (λ, λ0 )i = v2 (λ, λ0 )i − v1 (λ0 )i . 10:

Given a sequence of parameters: λmax = λ0 > λ1 > ... > λκ , for k ∈ [1, κ], we make a prediction of screening on λk if x∗ (λk−1 ) is known:

11: 12: 13:

for j=1 to p do  wi = [Ai ]Tj (θi (λk−1 ) + 12 v2⊥ (λk , λk−1 )i ), compute w = m i wi by LQM. if w < 1 − 21 ||v2⊥ (λk , λk−1 )||2 ||[A]j ||2 then

14: 15:

We identify [x∗ (λk )]j = 0. end for

indicates the jth column in A, j = 1, ..., p, where p is the number of features. We summarize the proposed D-EDPP in Algorithm 1. To calculate R, we apply m LQM through aggregating all the Ri together in the global center by R = i Ri and send R back to every institution. The same approach is used to calculate T , S and w in D-EDPP. The calculation of ||[A]j ||2 and ||v2⊥ (λk , λk−1 )||2 follows the same way in D-SAFE. The discarding result of λk relies on the previous optimal solution x∗ (λk−1 ). Especially, λk equals to λmax when k is zero. Thus, we identify all the elements to be zero at x∗ (λ0 ). When k is 1, we can perform screening based on x∗ (λ0 ). 3.4

Local Query Model for Lasso

To further accelerate the learning process, we apply FISTA [1] to solve the Lasso problem in a distributed manner. The convergence rate of FISTA is O(1/k 2 ) compared to O(1/k) of ISTA, where k is the iteration number. We integrate FISTA with LQM (F-LQM) to solve the Lasso problem on the reduced matrix i . We summarize the updating rule of F-LQM in kth iteration as follows: A T (A i xk − yi ), update ∇g k = m ∇g k by LQM. Step 1: ∇gik = A i i √i 1+ 1+4t2k Step 2: z k = Γλtk (xk − tk ∇g k ) and tk+1 = . 2 −1 k Step 3: xk+1 = z k + ttkk+1 (z − z k−1 ).

Large-Scale Collaborative Imaging Genetics Studies

341

i denotes the reduced matrix for the ith institution obtained The matrix A by D-EDPP rule. We repeat this procedure until a satisfactory global model is i , yi ). Then, each institution obtained. Step 1 calculates ∇gik from local data (A performs LQM to get the gradient ∇g k based on (5). Step 2 updates the auxiliary variables z k and step size tk . Step 3 updates the model x. Similar to LQM, the data privacy of institutions are well preserved by F-LQM.

4

Experiment

We implement the proposed framework across three institutions on a state-ofthe-art distributed platform—Apache Spark—a fast and efficient distributed platform for large-scale data computing. Experiment shows the efficiency and effectiveness of proposed models. 4.1

Comparison of Lasso with and Without D-EDPP Rule

We choose the volume of lateral ventricle as variables being predicted in trials containing 717 subjects by removing subjects without labels. The volumes of brain regions were extracted from each subject’s T1 MRI scan using Freesurfer: http://freesurfer.net. We evaluate the efficiency of D-EDPP across three research institutions that maintain 326, 215, and 176 subjects, respectively. The subjects are stored as HDFS files. We solve the Lasso problem along a sequence of 100 parameter values equally spaced on the linear scale of λ/λmax from 1.00 to 0.05. We randomly select 0.1 million to 1 million features by applying F-LQM since [1] proved that FISTA converges faster than ISTA. We report the result in Fig. 2 and achieved about a speedup of 66-fold compared to F-LQM. 100

2500 Speedup D-EDPP +F-LQM F-LQM

80

2000

Time(in minutes)

70

x66

60

x57 x51

50

x53

x61

1500

x54

x42 40

1000 x32

30 20

Time(in minutes)

90

x25 x21

500

10 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

The Number of features(in millions)

Fig. 2. Running time comparison of Lasso with and without D-EDPP rules.

4.2

Stability Selection for Top Risk Genetic Factors

We employ stability selection [6,11] with D-EDPP+F-LQM to select top risk SNPs from the entire GWAS with 5,906,152 features. We conduct four groups

342

Q. Li et al.

Table 1. Top 5 selected risk SNPs associated with diagnose, the volume of hippocampal, entorhinal cortex, and lateral ventricle at baseline, based on ADNI. Diagnose at baseline No. Chr Position

Hippocampus at baseline RS ID

Gene

No. Chr Position

RS ID

19

45411941 rs429358

APOE

1

19

2

19

45410002 rs769449

APOE

2

8

145158607 rs34173062 SHARPIN

3

12

9911736 rs3136564

CD69

3

11

11317240 rs10831576 GALNT18

4

1

172879023 rs2227203

unknown

4

10

71969989 rs12412466 PPA1

5

20

58267891 rs6100558

PHACTR3 5

6

Entorhinal cortex at baseline

45411941 rs429358

Gene

1

APOE

168107162 rs71573413 unknown

Lateral ventricle at baseline

No.C hr

P osition RS ID

Gene

No.C hr

1

19

45411941 rs429358

APOE

1

Y

Position

RS ID

2

15

89688115 rs8025377

ABHD2

2

10

62162053 rs10994327 ANK3

3

Y

10070927 rs79584829 unknown

3

Y

13395084 rs62610496 unknown

4

14

47506875 rs41354245 MDGA2

4

1

77895410 rs2647521

AK5

5

3

30106956 rs55904134 unknown

5

1

114663751 rs2629810

SYT6

3164319 rs2261174

Gene unknown

of trials in Table 1. In each trial, D-EDPP+F-LQM is carried out along a 100 linear-scale sequence from 1 to 0.05. We simulate this 200 times and perform on 500 of subjects in each round. Table 1 shows the top 5 selected SNPs. APOE, one of the top genetic risk factors for AD [5], is ranked #1 for three groups. Acknowledgments. This work was supported in part by NIH Big Data to Knowledge (BD2K) Center of Excellence grant U54 EB020403, funded by a cross-NIH consortium including NIBIB and NCI.

References 1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009) 2. Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004) 3. Ghaoui, L.E., Viallon, V., Rabbani, T.: Safe feature elimination for the lasso and sparse supervised learning problems. arXiv preprint arXiv:1009.4219 (2010) 4. Harold, D., et al.: Genome-wide association study identifies variants at clu and picalm associated with Alzheimer’s disease. Nature Genet. 41(10), 1088–1093 (2009) 5. Liu, C.C., Kanekiyo, T., Xu, H., Bu, G.: Apolipoprotein e and Alzheimer disease: risk, mechanisms and therapy. Nature Rev. Neurol. 9(2), 106–118 (2013) 6. Meinshausen, N., B¨ uhlmann, P.: Stability selection. J. R. Stat. Soc. Series B (Stat. Methodol.) 72(4), 417–473 (2010) 7. Sasieni, P.D.: From genotypes to genes: doubling the sample size. Biometrics, 1253– 1261 (1997)

Large-Scale Collaborative Imaging Genetics Studies

343

8. Shalev-Shwartz, S., Tewari, A.: Stochastic methods for l 1-regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011) 9. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B (Methodol.), 267–288 (1996) 10. Wang, J., Zhou, J., Wonka, P., Ye, J.: Lasso screening rules via dual polytope projection. In: Advances in Neural Information Processing Systems (2013) 11. Yang, T., et al.: Detecting genetic risk factors for Alzheimer’s disease in whole genome sequence data via lasso screening. In: IEEE International Symposium on Biomedical Imaging, pp. 985–989 (2015)

Structured Sparse Low-Rank Regression Model for Brain-Wide and Genome-Wide Associations Xiaofeng Zhu1 , Heung-Il Suk2 , Heng Huang3 , and Dinggang Shen1(B) 1

Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, USA [email protected] 2 Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea 3 Computer Science and Engineering, University of Texas at Arlington, Arlington, USA

Abstract. With the advances of neuroimaging techniques and genome sequences understanding, the phenotype and genotype data have been utilized to study the brain diseases (known as imaging genetics). One of the most important topics in image genetics is to discover the genetic basis of phenotypic markers and their associations. In such studies, the linear regression models have been playing an important role by providing interpretable results. However, due to their modeling characteristics, it is limited to effectively utilize inherent information among the phenotypes and genotypes, which are helpful for better understanding their associations. In this work, we propose a structured sparse lowrank regression method to explicitly consider the correlations within the imaging phenotypes and the genotypes simultaneously for BrainWide and Genome-Wide Association (BW-GWA) study. Specifically, we impose the low-rank constraint as well as the structured sparse constraint on both phenotypes and phenotypes. By using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset, we conducted experiments of predicting the phenotype data from genotype data and achieved performance improvement by 12.75 % on average in terms of the rootmean-square error over the state-of-the-art methods.

1

Introduction

Recently, it has been of great interest to identify the genetic basis (e.g., Single Nucleotide Polymorphisms: SNPs) of phenotypic neuroimaging markers (e.g., features in Magnetic Resonance Imaging: MRI) and study the associations between them, known as imaging-genetic analysis. In the previous work, Vounou et al. categorized the association studies between neuroimaging phenotypes and genotypes into four classes depending on both the dimensionality of the phenotype being investigated and the size of genomic regions being searched for association [13]. In this work, we focus on the Brain-Wide and Genome-Wide Association (BW-GWA) study, in which we search non-random associations for both the whole brain and the entire genome. c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 344–352, 2016. DOI: 10.1007/978-3-319-46720-7 40

Structured Sparse Low-Rank Regression Model

345

The BW-GWA study has a potential benefit to help discover important associations between neuroimaging based phenotypic markers and genotypes from a different perspective. For example, by identifying high associations between specific SNPs and some brain regions related to Alzheimer’s Disease (AD), one can utilize the information of the corresponding SNPs to predict the risk of incident AD much earlier, even before pathological changes begin. This will help clinicians have much time to track the progress of AD and find potential treatments to prevent the AD. Due to the high-dimensional nature of brain phenotypes and genotypes, there were only a few studies for BW-GWA [3,8]. Conventional methods formulated the problem as Multi-output Linear Regression (MLR) to estimate the coefficients independently, thus resulting in unsatisfactory performance. Recent studies were mostly devoted to conduct dimensionality reduction while the results should be still interpretable at the end. For example, Stein et al. [8] and Vounou et al. [13], separately, employed t-test and sparse reducedrank regression to conduct association study between voxel-based neuroimaging phenotypes and SNP genotypes. In this paper, we propose a novel structured sparse low-rank regression model for the BW-GWA study with MRI features of a whole brain as phenotypes and the SNP genotypes. To do this, we first impose a low-rank constraint on the coefficient matrix of the MLR. With a low-rank constraint, we can think of the coefficient matrix decomposed by two low-rank matrices, i.e., two transformation subspaces, each of which separately transfers high-dimensional phenotypes and genotypes into their own low-rank representations via considering the correlations among the response variables and the features. We then introduce a structured sparsity-inducing penalty (i.e., an ℓ2,1 -norm regularizer) on each of transformation matrices to conduct biomarker selection on both phenotypes and genotypes by taking the correlations among the features into account. The structured sparsity constraint allows the low-rank regression to select highly predictive genotypes and phenotypes, as a large number of them are not expected to be important and involved in the BW-GWA study [14]. In this way, our new method integrates low-rank constraint with structured sparsity constraints in a unified framework. We apply the proposed method to study the genotypephenotype associations using the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data. Our experimental results show that our new model consistently outperforms the competing methods in term of the prediction accuracy.

2 2.1

Methodology Notations

In this paper, we denote matrices, vectors, and scalars as boldface uppercase letters, boldface lowercase letters, and normal italic letters, respectively. For a matrix X = [xij ], its i -th row and the j -th column are denoted as xi and xj , respectively. Also, we denote the Frobenius norm and theℓ2,1 -norm of a matrix  i 2 2 i X as XF = i x 2 , respectively. i x 2 = j xj 2 and X2,1 =

346

X. Zhu et al.

We further denote the transpose operator, the trace operator, the rank, and the inverse of a matrix X as XT , tr(X), rank(X), and X−1 , respectively. 2.2

Low-Rank Multi-output Linear Regression

We denote X ∈ Rn×d and Y ∈ Rn×c matrices as n samples of d SNPs and c MRI features, respectively. We assume that there exists a linear relationship between them and thus formulate as follows: Y = XW + eb

(1)

where W ∈ Rd×c is a coefficient matrix, b ∈ R1×c is a bias term, and e ∈ Rn×1 denotes a column vector with all ones. If the covariance matrix XT X has full rank, i.e., rank(XT X) = d, the solution of W in Eq. (1) can be obtained by the Ordinary Least Square (OLS) estimation [4] as: ˆ = (XT X)−1 XT (Y − eb). W

(2)

However, the MLR illustrated in Fig. 1(a) with the OLS estimation in Eq. (2) has at least two limitations. First, Eq. (2) is equivalent to conduct massunivariate linear models, which fit each of c univariate response variables, independently. This obviously doesn’t make use of possible relations among the response variables (i.e., ROIs). Second, neither X nor Y in MLR are ensured to have a full-rank due to noise, outliers, correlations in the data [13]. For the non-full rank (or low-rank) case of XT X, Eq. (2) is not applicable.

Yn x c

=

+

Xn x d

En x c

Yn x c

=

Xn x d

Wd x c

+ Bd x r

(a) Multi-output linear regression

En x c

Ac x rT

(b) Low-rank regression

Fig. 1. Illustration of multi-output linear regression and low-rank regression.

The principle of parsimony in many areas of science and engineering, especially in machine learning, justifies to hypothesize low-rankness of the data, i.e., the MRI phenotypes and the SNP genotypes in our work. The low-rankness leads to the inequality rank(W) ≤ min(d, c) or even rank(W) ≤ min(n, d, c) in the case with limited samples. It thus allows to decompose the coefficient matrix W by the product of two low-rank matrices, i.e., W = BAT , where B ∈ Rd×r , A ∈ Rc×r , and r is the rank of W. For a fixed r, a low-rank MLR model illustrated in Fig. 1(b) is formulated as: min Y − XBAT − eb2F .

A,B,b

(3)

Structured Sparse Low-Rank Regression Model

347

The assumption of the existence of latent factors in either phenotypes or genotypes has been reported, making imaging-genetic analysis gain accurate estimation [1,15]. Equation (3) may achieve by seeking the low-rank representation of phenotypes and genotypes, but not producing interpretable results and also not touching the issues of non-invertible XT X and over-fitting. Naturally, a regularizer is preferred. 2.3

Structured Sparse Low-Rank Multi-output Linear Regression

From a statistical point of view, a well-defined regularizer may produce a generalized solution, and thus resulting in stable estimation. In this section, we devise new regularizers for identifying statistically interpretable BW-GWA. The high-dimensional feature matrix often suffers from multi-collinearity, i.e., lack of orthogonality among features, which may lead to the singular problem and the inflation of variance of coefficients [13]. In order to circumvent this problem, we introduce an orthogonality constraint on A to Eq. (3). In the BWGWA study, there are a large number of SNP genotypes or MRI phenotypes, some of them may not be related to the association analysis between them. The unuseful SNP genotypes (or MRI phenotypes) may affect the extraction of r latent factors of X (or Y). In these cases, it is not known with certainty which quantitative phenotypes or genotypes provide good estimation to the model. As human brain is a complex system, brain regions may be dependently related to each other [3,14]. This motivates us to conduct feature selection via structured sparsity constraints on both X (i.e., SNPs) and Y (i.e., brain regions) while conducting subspace learning via the low-rank constraint. The rationale of using a structured sparsity constraint (e.g., an ℓ2,1 -norm regularizer on A, i.e., A2,1 ) is that it effectively selects highly predictive features (i.e., discarding the unimportant features from the model) by considering the correlations among the features. Such a process implies to extract latent vectors from ‘purified data’ (i.e., the data after removing unuseful features by conducting feature selection) or conduct feature selection with the help of the low-rank constraint. By applying the constraints of orthogonality and structured sparsity, Eq. (3) can be rewritten as follows: min Y − XBAT − eb2F + αB2,1 + βA2,1 , s.t., AT A = I.

A,B,b,r

(4)

Clearly, the ℓ2,1 -norm regularizers on B and A penalize coefficients of B and A in a row-wise manner for joint selection or un-selection of the features and the response variables, respectively. Compared to sparse Reduced-Rank Regression (RRR) [13] that exploits regularization terms of ℓ1 -norm on B and A to sequentially output a vector of either B or A, thus leading to suboptimal solutions of B and A, our method panelizes ℓ2,1 -norm on BAT and A to explicitly conduct feature selection on X and Y. Furthermore, the orthogonality constraint on A helps avoid the multicollinearity problem, and thus simplifies the objective function to only optimize B (instead of BAT ) and A.

348

X. Zhu et al.

Finally, after optimizing Eq. (4), we conduct feature selection by discarding the features (or the response variables) whose corresponding coefficients (i.e., in B or A) are zeros in the rows.

3

Experimental Analysis

We conducted various experiments on the ADNI dataset (‘www.adni-info.org’) by comparing the proposed method with the state-of-the-art methods. 3.1

Preprocessing and Feature Extraction

By following the literatures [9,11,20], we used baseline MRI images of 737 subjects including 171 AD, 362 mild cognitive impairments, and 204 normal controls. We preprocessed the MRI images by sequentially applying spatial distortion correction, skull-stripping, and cerebellum removal. We then segmented images into gray matter, white matter, and cerebrospinal fluid, and further warped them into 93 Regions Of Interest (ROIs). We computed the gray matter tissue volume in each ROI by integrating the gray matter segmentation result of each subject. Finally, we acquired 93 features for one MRI image. The genotype data of all participants were first obtained from the ADNI 1 and then genotyped using the Human 610-Quad BeadChip. In our experiments, 2,098 SNPs, from 153 AD candidate genes (boundary: 20 KB) listed on the AlzGene database (www.alzgene.org) as of 4/18/2011, were selected by the standard quality control (QC) and imputation steps. The QC criteria includes (1) call rate check per subject and per SNP marker, (2) gender check, (3) sibling pair identification, (4) the Hardy-Weinberg equilibrium test, (5) marker removal by the minor allele frequency, and (6) population stratification. The imputation step imputed the QC?ed SNPs using the MaCH software. 3.2

Experimental Setting

The comparison methods include the standard regularized Multi-output Linear Regression (MLR) [4], sparse feature selection with an ℓ2,1 -norm regularizer (L21 for short) [2], Group sparse Feature Selection (GFS) [14], sparse Canonical Correlation Analysis (CCA) [6,17], and sparse Reduced-Rank Regression (RRR) [13]. The former two are the most widely used methods in both statistical learning and medical image analysis, while the last three are the state-of-the-art methods in imaging-genetic analysis. Besides, we define the method ‘Baseline’ by removing the third term (i.e., βA2,1 ) in Eq. (4) to only select SNPs using our model. We conducted a 5-fold Cross Validation (CV) on all methods, and then repeated the whole process 10 times. The final result was computed by averaging results of all 50 experiments. We also used a 5-fold nested CV to tune the parameters (such as α and β in Eq. (4)) in the space of {10−5 , 10−4 , ..., 104 , 105 }

Structured Sparse Low-Rank Regression Model

349

for all methods in our experiments. As for the rank of the coefficient matrix W, we varied the values of r in {1, 2, ..., 10} for our method. By following the previous work [3,14], we picked up the top {20, 40, ..., 200} SNPs to predict test data. The performance of each experiment was assessed by Root-Mean-Square Error (RMSE), a widely used measurement for regression analysis, and ‘Frequency’ (∈ [0, 1]) defined as the ratio of the features selected in 50 experiments. The larger the value of ‘Frequency’, the more likely the corresponding SNP (or ROI) is selected. 3.3

Experimental Results

We summarized the RMSE performances of all methods in Fig. 2(a), where the mean and standard deviation of the RMSEs were obtained from the 50 (5-fold CV × 10 repetition) experiments. Figure 2(b) and (c) showed, respectively, the values of ‘Frequency’ of the top 10 selected SNPs by the competing methods and the frequency of the top 10 selected ROIs by our method. Figure 2(a) discovered the following observations: (i) The RMSE values of all methods decreased with the increase of the number of selected SNPs. This is because the more the SNPs, the better performance the BW-GWA study is, in our experiments. (ii) The proposed method obtained the best performance, followed by the Baseline, RRR, GFS, CCA, L21, and MLR. Specifically, our method improved by on average 12.75 % compared to the other competing methods. In the paired-sample t-test at 95 % confidence level, all p-values between the proposed method and the comparison methods were less than 0.00001. Moreover, our method was considerably stable than the comparison methods. This clearly manifested the advantage of the proposed method integrating a low-rank constraint with structured sparsity constraints in a unified framework. (iii) The Baseline method improved by on average 8.26 % compared to the comparison

MLR L21 GFS CCA RRR Baseline Proposed

RMSE

3.5

3

2.5 20 40 60 80 100 120 140 160 180 200 Number of selected SNPs

(a) RMSE

rs429358 rs11234495 rs7938033 rs10792820 rs7945931 rs2276346 rs6584307 rs1329600 rs17367504 rs10779339

50

0.8 0.7 0.6 0.5 0.4

Frequency

× 10-4

40

30

0.3 0.2 0.1

R 21 FS A R ne ed ML L G CC RR aseli opos B Pr

(b) Top 10 selected SNPs

20

1 2 3 4 5 6 7 8 9 10 ROI

(c) Top 10 selected ROIs

Fig. 2. (a) RMSE with respect to different number of selected SNPs of all methods; (b) Frequency of top 10 selected SNPs by all methods; and (c) Frequency of the top 10 selected ROIs by our method in our 50 experiments. The name of the ROIs (indexed from 1 to 10) are middle temporal gyrus left, perirhinal cortex left, temporal pole left, middle temporal gyrus right, amygdala right, hippocampal formation right, middle temporal gyrus left, amygdala left, inferior temporal gyrus right, and hippocampal formation left.

350

X. Zhu et al.

methods and the p-values were less than 0.001 in the paired-sample t-tests at 95 % confidence level. This manifested that our model without selecting ROIs (i.e., Baseline) still outperformed all comparison methods. It is noteworthy that our proposed method improved by on average 4.49 % over the Baseline method and the paired-sample t-tests also indicated the improvements were statistically significant difference. This verified again that it is essential to simultaneously select a subset of ROIs and a subset of SNPs. Figure 2(b) indicated that phenotypes could be affected by genotypes in different degrees: (i) The selected SNPs in Fig. 2(b) belonged to the genes, such as PICALM, APOE, SORL1, ENTPD7, DAPK1, MTHFR, and CR1, which have been reported as the top AD-related genes in the AlzGene website. (ii) Although we know little about the underlying mechanisms of genotypes in relation to AD, but Fig. 2(b) enabled a potential to gain biological insights from the BW-GWA study. (iii) The selected ROIs by the proposed method in Fig. 2(c) were known to be highly related to AD in previous studies [10,12,19]. It should be noteworthy that all methods selected ROIs in Fig. 2(c) as their top ROIs but with different probability. Finally, our method conducted the BW-GWA study to select a subset of SNPs and a subset of ROIs, which were also known in relation to AD by the previous state-of-the-art methods. The consistent performance of our methods clearly demonstrated that the proposed method enabled to conduct more statistically meaningful BW-GWA study, compared to the comparison methods.

4

Conclusion

In this paper, we proposed an efficient structured sparse low-rank regression method to select highly associated MRI phenotypes and SNP genotypes in a BW-GWA study. The experimental results on the association study between neuroimaging data and genetic information verified the effectiveness of the proposed method, by comparing with the state-of-the-art methods. Our method considered SNPs (or ROIs) evenly. However, SNPs are naturally connected via different pathways, while ROIs have various functional or structural relations to each other [6,7]. In our future work, we will extend our model to take the interlinked structures within both genotypes and incomplete multi-modality phenotypes [5,16,18] into account for further improving the performance of the BW-GWA study. Acknowledgements. This work was supported in part by NIH grants (EB006733, EB008374, EB009634, MH100217, AG041721, AG042599). Heung-Il Suk was supported in part by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. B0101-16-0307, Basic Software Research in Human-level Lifelong Machine Learning (Machine Learning Center)). Heng Huang was supported in part by NSF IIS 1117965, IIS 1302675, IIS 1344152, DBI 1356628, and NIH AG049371. Xiaofeng Zhu was supported in part by the National Natural Science Foundation of China under grants 61573270 and 61263035.

Structured Sparse Low-Rank Regression Model

351

References 1. Du, L., et al.: A novel structure-aware sparse learning algorithm for brain imaging genetics. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8675, pp. 329–336. Springer, Heidelberg (2014). doi:10. 1007/978-3-319-10443-0 42 2. Evgeniou, A., Pontil, M.: Multi-task feature learning. NIPS 19, 41–48 (2007) 3. Hao, X., Yu, J., Zhang, D.: Identifying genetic associations with MRI-derived measures via tree-guided sparse learning. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 757–764. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10470-6 94 4. Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 5(2), 248–264 (1975) 5. Jin, Y., Wee, C.Y., Shi, F., Thung, K.H., Ni, D., Yap, P.T., Shen, D.: Identification of infants at high-risk for autism spectrum disorder using multiparameter multiscale white matter connectivity networks. Hum. Brain Mapp. 36(12), 4880–4896 (2015) 6. Lin, D., Cao, H., Calhoun, V.D., Wang, Y.P.: Sparse models for correlative and integrative analysis of imaging and genetic data. J. Neurosci. Methods 237, 69–78 (2014) 7. Shen, L., Thompson, P.M., Potkin, S.G., et al.: Genetic analysis of quantitative phenotypes in AD and MCI: imaging, cognition and biomarkers. Brain Imaging Behav. 8(2), 183–207 (2014) 8. Stein, J.L., Hua, X., Lee, S., Ho, A.J., Leow, A.D., Toga, A.W., Saykin, A.J., Shen, L., Foroud, T., Pankratz, N., et al.: Voxelwise genome-wide association study (vGWAS). NeuroImage 53(3), 1160–1174 (2010) 9. Suk, H., Lee, S., Shen, D.: Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. NeuroImage 101, 569–582 (2014) 10. Suk, H., Wee, C., Lee, S., Shen, D.: State-space model with deep learning for functional dynamics estimation in resting-state fMRI. NeuroImage 129, 292–307 (2016) 11. Thung, K., Wee, C., Yap, P., Shen, D.: Neurodegenerative disease diagnosis using incomplete multi-modality data via matrix shrinkage and completion. NeuroImage 91, 386–400 (2014) 12. Thung, K.H., Wee, C.Y., Yap, P.T., Shen, D.: Identification of progressive mild cognitive impairment patients using incomplete longitudinal MRI scans. Brain Struct. Funct., 1–17 (2015) 13. Vounou, M., Nichols, T.E., Montana, G.: ADNI: discovering genetic associations with high-dimensional neuroimaging phenotypes: a sparse reduced-rank regression approach. NeuroImage 53(3), 1147–1159 (2010) 14. Wang, H., Nie, F., Huang, H., et al.: Identifying quantitative trait loci via groupsparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort. Bioinformatics 28(2), 229–237 (2012) 15. Yan, J., Du, L., Kim, S., et al.: Transcriptome-guided amyloid imaging genetic analysis via a novel structured sparse learning algorithm. Bioinformatics 30(17), i564–i571 (2014) 16. Zhang, C., Qin, Y., Zhu, X., Zhang, J., Zhang, S.: Clustering-based missing value imputation for data preprocessing. In: IEEE International Conference on Industrial Informatics, pp. 1081–1086 (2006)

352

X. Zhu et al.

17. Zhu, X., Huang, Z., Shen, H.T., Cheng, J., Xu, C.: Dimensionality reduction by mixed kernel canonical correlation analysis. Pattern Recogn. 45(8), 3003–3016 (2012) 18. Zhu, X., Li, X., Zhang, S.: Block-row sparse multiview multilabel learning for image classification. IEEE Trans. Cybern. 46(2), 450–461 (2016) 19. Zhu, X., Suk, H.I., Lee, S.W., Shen, D.: Canonical feature selection for joint regression and multi-class identification in Alzheimers disease diagnosis. Brain Imaging Behav., 1–11 (2015) 20. Zhu, X., Suk, H., Shen, D.: A novel matrix-similarity based loss function for joint regression and classification in AD diagnosis. NeuroImage 100, 91–105 (2014)

3D Ultrasonic Needle Tracking with a 1.5D Transducer Array for Guidance of Fetal Interventions Wenfeng Xia1(B) , Simeon J. West2 , Jean-Martial Mari3 , Sebastien Ourselin4 , Anna L. David5 , and Adrien E. Desjardins1 1

2

Department of Medical Physics and Biomedical Engineering, University College London, Gower Street, London WC1E 6BT, UK [email protected] Department of Anaesthesia, Main Theatres, Maple Bridge Link Corridor, Podium 3, University College Hospital, 235 Euston Road, London NW1 2BU, UK 3 GePaSud, University of French Polynesia, Faa’a 98702, French Polynesia 4 Translational Imaging Group, Centre for Medical Image Computing, Department of Medical Physics and Biomedical Engineering, University College London, Wolfson House, London NW1 2HE, UK 5 Institute for Women’s Health, University College London, 86-96 Chenies Mews, London WC1E 6HX, UK

Abstract. Ultrasound image guidance is widely used in minimally invasive procedures, including fetal surgery. In this context, maintaining visibility of medical devices is a significant challenge. Needles and catheters can readily deviate from the ultrasound imaging plane as they are inserted. When the medical device tips are not visible, they can damage critical structures, with potentially profound consequences including loss of pregnancy. In this study, we performed 3D ultrasonic tracking of a needle using a novel probe with a 1.5D array of transducer elements that was driven by a commercial ultrasound system. A fiber-optic hydrophone integrated into the needle received transmissions from the probe, and data from this sensor was processed to estimate the position of the hydrophone tip in the coordinate space of the probe. Golay coding was used to increase the signal-to-noise (SNR). The relative tracking accuracy was better than 0.4 mm in all dimensions, as evaluated using a water phantom. To obtain a preliminary indication of the clinical potential of 3D ultrasonic needle tracking, an intravascular needle insertion was performed in an in vivo pregnant sheep model. The SNR values ranged from 12 to 16 at depths of 20 to 31 mm and at an insertion angle of 49o relative to the probe surface normal. The results of this study demonstrate that 3D ultrasonic needle tracking with a fiber-optic hydrophone sensor and a 1.5D array is feasible in clinically realistic environments.

1

Introduction

Ultrasound (US) image guidance is of crucial importance during percutaneous interventions in many clinical fields including fetal medicine, regional anesthesia, c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 353–361, 2016. DOI: 10.1007/978-3-319-46720-7 41

354

W. Xia et al.

interventional pain management, and interventional oncology. Fetal interventions such as amniocentesis, chorionic villus sampling and fetal blood sampling are commonly performed under US guidance [1,2]. Two-dimensional (2D) US imaging is typically used to visualize anatomy and to identify the location of the needle tip. The latter is often challenging, however. One reason is that the needle tip can readily deviate from the US imaging plane, particularly with needle insertions at large depths. A second reason is that the needles tend to have poor echogenicity during large-angle insertions, as the incident US beams can be reflected outside the aperture of the external US imaging probe. In the context of fetal interventions, misplacement of the needle tip can result in severe complications, including the loss of pregnancy [2]. A number of methods have been proposed to improve needle tip visibility during US guidance, including the use of echogenic surfaces, which tend to be most relevant at steep insertion angles. However, a recent study on peripheral nerve blocks found that even with echogenic needles, tip visibility was lost in approximately 50 % of the procedure time [3]. Other methods for improving needle tip visibility are based on the introduction of additional sources of image contrast, including shaft vibrations [4], acoustic radiation force imaging [5], Doppler imaging [6], and photoacoustic imaging [7]. Electromagnetic (EM) tracking has many advantages, but the accuracy of EM tracking can be severely degraded by EM field disturbances such as those arising from metal in tables [8], and the sensors integrated into needles tend to be bulky and expensive. A needle tracking method that is widely used in clinical practice has remained elusive. Ultrasonic needle tracking is an emerging method that has shown promise in terms of its accuracy and its compatibility with clinical workflow: positional information and ultrasound images can be acquired from the same probe. With this method, there is ultrasonic communication between the external US imaging probe and the needle. One implementation involves integrating a miniature US sensor into the needle that receives transmissions from the imaging probe; the location of the needle tip can be estimated from the times between transmission onset and reception, which we refer to here as the “time-of-flights”. With their flexibility, small size, wide bandwidths, and low manufacturing costs, fiber-optic US sensors are ideally suited for this purpose [9–11]. Recently, ultrasonic tracking with coded excitation was performed in utero, in an in vivo ovine model [12]. A piezoelectric ring sensor has also been used [13]. In this study, we present a novel system for ultrasonic tracking that includes a 1.5D array of 128 US transducer elements to identify the needle tip position in three-dimensions (3D). Whilst ultrasonic tracking can be performed with 3D US imaging probes, including those with 2D matrix arrays [14,15], the use of these probes in clinical practice is limited. Indeed, 3D imaging probes tend to be bulky and expensive, 2D matrix arrays are only available on a few high-end systems, and it can be challenging to interpret 3D image volumes acquired from complex tissue structures in real-time. In contrast, the 1.5D array in this study is compatible with a standard commercial US system that drives 1D US imaging probes. We evaluated the relative tracking accuracy with a water phantom, and validated the system with an in vivo pregnant sheep model.

3D Ultrasonic Needle Tracking with a 1.5D Transducer Array

2 2.1

355

Materials and Methods System Configuration

The ultrasonic tracking system was centered on a clinical US imaging system (SonixMDP, Analogic Ultrasound, Richmond, BC, Canada) that was operated in research mode (Fig. 1a). A custom 1.5D tracking probe, which comprised four linear rows of 32 transducer elements with a nominal bandwidth of 4–9 MHz (Fig. 1b), was produced by Vermon (Tours, France). This array was denoted as “1.5D” to reflect the much larger number of elements in one dimension than in the other. The US sensor was a fiber-optic hydrophone (FOH) that was integrated into the cannula of a 20 gauge spinal needle (Terumo, Surrey, UK). The FOH sensor (Precision Acoustics, Dorchester, UK) has a Fabry-P´erot cavity at the distal end, so that impinging ultrasound waves result in changes in optical reflectivity [16]. It was epoxied within the needle cannula so that its tip was flush with the bevel surface, and used to receive US transmissions from the tracking probe. Three transmission sequences were used for tracking. The first comprised bipolar pulses; the second and third, 32-bit Golay code pairs [17]. Transmissions were performed from individual transducer elements, sequentially across rows (Fig. 1b). The synchronization of data acquisition from the FOH sensor with US transmissions was presented in detail in Refs. [10,11]. Briefly, two output triggers were used: a frame trigger (FT) for the start of all 128 transmissions, and a line trigger (LT) for each transmission. The FOH sensor signal was digitized at 100 MS/s (USB-5132, National Instruments, Austin, TX). Transmissions from the ultrasonic tracking probe were controlled by a custom LabView program operating on the ultrasound scanner PC, with access to low-level libraries.

Fig. 1. The 3D ultrasonic needle tracking system, shown schematically (a). The tracking probe was driven by a commercial ultrasound (US) scanner; transmissions from the probe were received by a fiber-optic hydrophone sensor at the needle tip. The transducer elements in the probe (b) were arranged in four rows (A–D).

356

W. Xia et al.

Fig. 2. The algorithm to estimate the needle tip position from the sensor data is shown schematically (top). Representative data from all transducer elements obtained before Golay decoding (1) and after (2), show improvements in SNR relative to bipolar excitation (3). These three datasets are plotted on a linear scale as the absolute value of their Hilbert transforms, normalized separately to their maximum values.

2.2

Tracking Algorithms

The algorithm for converting raw FOH sensor data to a 3D needle tip position estimate is shown schematically in Fig. 2. It was implemented offline using custom scripts written in Matlab. First, band-pass frequency filtering matched to the bandwidth of the transducer elements of the tracking probe was performed (Chebyshev Type I; 5th order; 2–6 MHz). For Golay-coded transmissions, the frequency-filtered data from each pair of transmissions were convolved with the time-reversed versions of the oversampled Golay codes. As the final decoding step, these convolved data from each pair were summed. The decoded data were concatenated according to the rows of transducer elements from which the transmissions originated to form 4 tracking images. The 4 tracking images were processed to obtain an estimate of the needle tip position in the coordinate space of the tracking probe (˜ x, y˜, z˜). The horizontal coordinate of each tracking image was the transducer element number; the vertical component, the distance from the corresponding transducer element. Typically, each tracking image comprised a single region of high signal amplitude. For the k th tracking image (k = {1, 2, 3, 4}), the coordinate of the image for which the signal was a maximum, (h(k) , v (k) ) was identified. The h(k) values were consistent across tracking images (Fig. 2). Accordingly, y˜ was calculated as their mean, offset from center and scaled by the distance between transducer (k) elements. To obtain x ˜, and z˜, the measured time-of-flights tm were calculated (k) as v (k) /c, where c is the speed of sound. The tm values were compared with a (k) set of simulated time-of-flight values ts . The latter were pre-computed at each point (xi , zj ) of a 2D grid in the X-Z coordinate space of the tracking probe, where i and j are indices. This grid had ranges of −20 to 20 mm in X and 0 to 80 mm in Z, with a spacing of 0.025 mm. For estimation, the squared differences (k) (k) between tm and ts , were minimized:

3D Ultrasonic Needle Tracking with a 1.5D Transducer Array

(˜ x, z˜) = arg min (xi ,zj )

⎧ 4  2 ⎫  (k) (k) ⎪ ⎪ ⎪ [tm − ts (xi , zj )] · w(k) ⎪ ⎨ ⎬ k=1

⎪ ⎪ ⎩

4 

[w(k) ]2

k=1

357

(1)

⎪ ⎪ ⎭

where the signal amplitudes at the coordinates (h(k) , v (k) ) were used as weighting factors, w(k) , so that tracking images with higher signal amplitudes contributed more prominently. 2.3

Relative Tracking Accuracy

The relative tracking accuracy of the system was evaluated with a water phantom. The needle was fixed on a translation stage, with its shaft oriented to simulate an out-of-plane insertion: it was positioned within an X-Z plane with its tip approximately 38 mm in depth from the tracking probe, and angled at 45o to the water surface normal (Fig. 3a). The tracking probe was translated relative to the needle in the out-of-plane dimension, X. This translation was performed across 20 mm, with a step size of 2 mm. At each position, FOH sensor data were acquired for needle tip tracking. Each needle tip position estimate was compared with a corresponding reference position. The relative tracking accuracy was defined as the absolute difference between these two quantities. The X component of the reference position was obtained from the translation stage, centered relative to the probe axis. As Y and Z were assumed to be constant during translation of the tracking probe, the Y and Z components of the reference position were taken to be the mean values of these components of the position estimates.

Fig. 3. (a) Relative tracking accuracy measurements were performed with the needle and the ultrasonic needle tracking (UNT) probe in water. (b) The signal-to-noise ratios (SNRs) of the tracking images were consistently higher for Golay-coded transmissions than for bipolar transmissions, and they increased with proximity to the center of the probe (X = 0). The error bars in (b) represent standard deviations calculated from the four tracking images. (c) Estimated relative tracking accuracies for Golay-coded transmissions along orthogonal axes; error bars represent standard deviations calculated from all needle tip positions.

358

2.4

W. Xia et al.

In Vivo Validation

To obtain a preliminary indication of the system’s potential for guiding fetal interventions, 3D needle tracking was performed in a pregnant sheep model in vivo [18]. The primary objective of this experiment was to measure the signalto-noise ratios (SNRs) in a clinically realistic environment. All procedures on animals were conducted in accordance with U.K. Home Office regulations and the Guidance for the Operation of Animals (Scientific Procedures) Act (1986). Ethics approval was provided by the joint animal studies committee of the Royal Veterinary College and the University College London, United Kingdom. Gestational age was confirmed using ultrasound. The sheep was placed under general anesthesia and monitored continuously. The needle was inserted into the uterus, towards a vascular target (Fig. 4a), with the bevel facing upward. During insertion, tracking was performed continuously, so that 4 tracked positions were identified.

Fig. 4. In vivo validation of the 3D ultrasonic needle tracking system in a pregnant sheep model. (a) Schematic illustration of the measurement geometry showing the outof-plane needle insertion into the abdomen of the sheep. The needle tip was tracked at 4 positions (p1–p4). (b) Comparison of signal-to-noise ratios (SNRs) using Golay-coded and bipolar excitation, for all 4 tracked positions. The error bars represent standard deviations obtained at each tracked position. (c) The tracked needle tip positions, which were used to calculate the needle trajectory.

2.5

SNR Analysis

The SNR, was calculated for each tracking image at each needle tip position. The numerator was defined as the maximum signal value attained for each tracking image; the denominator, as the standard deviation of signal values obtained from each tracking image in a region above the needle tip, where there was a visual absence of signal (20 mm × 16 tracking elements).

3

Results and Discussion

With the needle in water (Fig. 3a), transmissions from the tracking probe could clearly be identified in the received signals without averaging. With bipolar excitation, the SNR values ranged from 12 to 21, with the highest values obtained

3D Ultrasonic Needle Tracking with a 1.5D Transducer Array

359

when the needle was approximately centered relative to the probe axis (X ∼ 0). With Golay-coded excitation, they increased by factors of 7.3 to 8.5 (Fig. 3b). The increases were broadly consistent with those anticipated: the temporal averaging √ provided by a pair of 32-bit Golay codes results in an SNR improvement of 32 × 2 = 8. In water, the mean relative tracking accuracy depended on the spatial dimension: 0.32 mm, 0.31 mm, and 0.084 mm in X, Y, and Z, respectively (Fig. 3c). By comparison, these values are smaller than the inner diameter of 22 G needles that are widely used in percutaneous procedures. They are also smaller than recently reported EM tracking errors of 2 ± 1 mm [19]. The Z component of the mean relative tracking accuracy is particularly striking; it is smaller than the ultrasound wavelength at 9 MHz. This result reflects a high level of consistency in the tracked position estimates. With the pregnant sheep model in vivo, in which clinically realistic ultrasound attenuation was present, the SNR values were sufficiently high for obtaining tracking estimates. As compared with conventional bipolar excitation, the SNR was increased with Golay-coded excitation. In the former case, the SNR values were in the range of 2.1 to 3.0; coding increased this range by factors of 5.3 to 6.2 (Fig. 4b). From the tracked position estimates, a needle insertion angle of 49o and a maximum needle tip depth of 31 mm were calculated. We presented, for the first time, a 3D ultrasonic tracking system based on a 1.5D transducer array and a fiber-optic ultrasound sensor. A primary advantage of this system is its compatibility with existing US imaging scanners, which could facilitate clinical translation. There are several ways in which the tracking system developed in this study could be improved. For future iterations, imaging array elements and a corresponding cylindrical acoustic lens could be included to enable simultaneous 3D tracking and 2D US imaging. The SNR could be improved by increasing the sensitivity of the FOH sensor, which could be achieved with a Fabry-P´erot interferometer cavity that has a curved distal surface to achieve a high finesse [20]. Additional increases in the SNR could be obtained with larger code lengths that were beyond the limits of the particular ultrasound scanner used in this study. The results of this study demonstrate that 3D ultrasonic needle tracking with a 1.5D array of transducer elements and a FOH sensor is feasible in clinically realistic environments and that it provides highly consistent results. When integrated into an ultrasound imaging probe that includes a linear array for acquiring 2D ultrasound images, this method has strong potential to reduce the risk of complications and decrease procedure times. Acknowledgments. This work was supported by an Innovative Engineering for Health award by the Wellcome Trust (No. WT101957) and the Engineering and Physical Sciences Research Council (EPSRC) (No. NS/A000027/1), by a Starting Grant from the European Research Council (ERC-2012-StG, Proposal No. 310970 MOPHIM), and by an EPSRC First Grant (No. EP/J010952/1). A.L.D. is supported by the UCL/UCLH NIHR Comprehensive Biomedical Research Centre.

360

W. Xia et al.

References 1. Daffos, F., et al.: Fetal blood, sampling during pregnancy with use of a needle guided by ultrasound: a study of 606 consecutive cases. Am. J. Obstet. Gynecol. 153(6), 655–660 (1985) 2. Agarwal, K., et al.: Pregnancy loss after chorionic villus sampling and genetic amniocentesis in twin pregnancies: a systematic review. Ultrasound Obstet. Gynecol. 40(2), 128–134 (2012) 3. Hebard, S., et al.: Echogenic technology can improve needle visibility during ultrasound-guided regional anesthesia. Reg. Anesth. Pain Med. 36(2), 185–189 (2011) 4. Klein, S.M., et al.: Piezoelectric vibrating needle and catheter for enhancing ultrasoundguided peripheral nerve blocks. Anesth. Analg. 105, 1858–1860 (2007) 5. Rotemberg, V., et al.: Acoustic radiation force impulse (ARFI) imaging-based needle visualization. Ultrason. Imaging 33(1), 1–16 (2011) 6. Fronheiser, M.P., et al.: Vibrating interventional device detection using real-time 3-D color doppler. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 55(6), 1355– 1362 (2008) 7. Xia, W., et al.: Performance characteristics of an interventional multispectral photoacoustic imaging system for guiding minimally invasive procedures. J. Biomed. Opt. 20(8), 086005 (2015) 8. Poulin, F.: Interference during the use of an electromagnetic tracking system under OR conditions. J. Biomech. 35, 733–737 (2002) 9. Guo, X., et al.: Photoacoustic active ultrasound element for catheter tracking. In: Proceedings of SPIE, vol. 8943, p. 89435M (2014) 10. Xia, W., et al.: Interventional photoacoustic imaging of the human placenta with ultrasonic tracking for minimally invasive fetal surgeries. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 371–378. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24553-9 46 11. Xia, W., et al.: In-plane ultrasonic needle tracking using a fiber-optic hydrophone. Med. Phys. 42(10), 5983–5991 (2015) 12. Xia, W., et al.: Coded excitation ultrasonic needle tracking: an in vivo study. Med. Phys. 43(7), 4065–4073 (2016) 13. Nikolov, S.I.: Precision of needle tip localization using a receiver in the needle. In: IEEE International Ultrasonics Symposium Proceedings, Beijing, pp. 479–482 (2008) 14. Mung, J., et al.: A non-disruptive technology for robust 3D tool tracking for ultrasound-guided interventions. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6891, pp. 153–160. Springer, Heidelberg (2011). doi:10. 1007/978-3-642-23623-5 20 15. Mung, J.: Ultrasonically marked instruments for ultrasound-guided interventions. In: IEEE Ultrasonics Symposium (IUS), pp. 2053–2056 (2013) 16. Morris, P., et al.: A Fabry-P´erot fiber-optic ultrasonic hydrophone for the simultaneous measurement of temperature and acoustic pressure. J. Acoust. Soc. Am. 125(6), 3611–3622 (2009) 17. Budisin, S.Z., et al.: New complementary pairs of sequences. Electron. Lett. 26(13), 881–883 (1990) 18. David, A.L., et al.: Recombinant adeno-associated virus-mediated in utero gene transfer gives therapeutic transgene expression in the sheep. Hum. Gene Ther. 22, 419–426 (2011)

3D Ultrasonic Needle Tracking with a 1.5D Transducer Array

361

19. Boutaleb, S., et al.: Performance and suitability assessment of a real-time 3D electromagnetic needle tracking system for interstitial brachytherapy. J. Contemp. Brachyther. 7(4), 280–289 (2015) 20. Zhang, E.Z., Beard, P.C.: A miniature all-optical photoacoustic imaging probe. In: Proceedings of SPIE, p. 78991F (2011). http://proceedings.spiedigitallibrary.org/ proceeding.aspx?articleid=1349009

Enhancement of Needle Tip and Shaft from 2D Ultrasound Using Signal Transmission Maps Cosmas Mwikirize1(B) , John L. Nosher2 , and Ilker Hacihaliloglu1,2 1

Department of Biomedical Engineering, Rutgers University, Piscataway, USA [email protected] 2 Department of Radiology, Rutgers Robert Wood Johnson Medical School, New Brunswick, USA

Abstract. New methods for needle tip and shaft enhancement in 2D curvilinear ultrasound are proposed. Needle tip enhancement is achieved using an image regularization method that utilizes ultrasound signal transmission maps to model inherent signal loss due to attenuation. Shaft enhancement is achieved by optimizing the proposed signal transmission map using the information based on trajectory constrained boundary statistics derived from phase oriented features. The enhanced tip is automatically localized using spatially distributed image statistics from the estimated shaft trajectory. Validation results from 100 ultrasound images of bovine, porcine, kidney and liver ex vivo reveal a mean localization error of 0.3 ± 0.06 mm, a 43 % improvement in localization over previous state of the art. Keywords: Image regularization ment · Local phase · Ultrasound

1

· Confidence maps · Needle enhance-

Introduction

Ultrasound (US) is a popular image-guidance tool used to facilitate real-time needle visualization in interventional procedures such as fine needle and core tissue biopsies, catheter placement, drainages, and anesthesia. During such procedures, it is important that the needle precisely reaches the target with minimum attempts. Unfortunately, successful visualization of the needle in US-based procedures is greatly affected by the orientation of the needle to the US beam and is inferior for procedures involving steep needle insertion angles. The visualization becomes especially problematic for curvilinear transducers since only a small portion or none of the needle gives strong reflection. There is a wealth of literature on improving needle visibility and detection in US. A sampling of the literature is provided here to see the wide range of approaches. External tracking technologies are available to track the needle [1], but this requires custom needles and changes to the clinical work-flow. Hough Transform [2,3], Random Sample Consensus (RANSAC) [4,5] and projection based methods [6,7] were proposed for needle localization. In most of the previous approaches, assumptions were made for the appearance of the needle in US c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 362–369, 2016. DOI: 10.1007/978-3-319-46720-7 42

Enhancement of Needle Tip and Shaft from 2D Ultrasound

363

images such as the needle having the longest and straightest line feature with high intensity. Recently, Hacihaliloglu et al. [6] combined local phase-based image projections with spatially distributed needle trajectory statistics and achieved an error of 0.43 ± 0.31 mm for tip localization. Although the method is suitable in instances when the shaft is discontinuous, it fails when apriori information on shaft orientation is less available and when the tip does not appear as characteristic high intensity along the needle trajectory. Regarding shaft visibility, approaches based on beam steering [8], using linear transducers, or mechanically introduced vibration [9] have been proposed. The success of beam steering depends on the angle values used during the procedure. Furthermore, only a portion of the needle is enhanced with curvilinear arrays so the tip is still indistinguishable. Vibration-based approaches sometimes require external mechanical devices, increasing the overall complexity of the system. In view of the above mentioned limitations, there is need to develop methods that perform both needle shaft and tip enhancement for improved localization and guidance without changing the clinical work-flow and increasing the overall complexity of the system. The proposed method is specifically useful for procedures, such as lumbar blocks, where needle shaft visibility is poor and the tip does not have a characteristic high intensity appearance. We introduce an efficient L1 -norm based contextual regularization that enables us to incorporate a filter bank into the image enhancement method by taking into account US specific signal propagation constraints. Our main novelty is incorporation of US signal modeling, for needle imaging, into an optimization problem to estimate the unknown signal transmission map which is used for enhancement of needle shaft and tip. Qualitative and quantitative validation results on scans collected from porcine, bovine, kidney and liver tissue samples are presented. Comparison results against previous state of the art [6], for tip localizations, are also provided.

2

Methods

The proposed framework is based on the information that needle insertion side (left or right) is known a priori, the needle is inserted in plane and the shaft close to the transducer surface is visible. Explicitly, we are interested in the enhancement of needle images obtained from 2D curvilinear transducers. 2.1

L1 -Norm Based Contextual Regularization for Image Enhancement

Modeling of US signal transmission has been one of the main topics of research in US-guided procedures [10]. The interaction of the US signal within the tissue can be characterized into two main categories, namely, scattering and attenuation. Since the information of the backscattered US signal from the needle interface to the transducer, is modulated by these two interactions they can be viewed as a mechanisms of structural information coding. Based on this we develop a

364

C. Mwikirize et al.

model, called US signal transmission map, for recovering the pertinent needle structure from the US images. US signal transmission map maximizes the visibility of high intensity features inside a local region and satisfies the constraint that the mean intensity of the local region is less than the echogenicity of the tissue confining the needle. In order to achieve this we propose the following linear interpolation model which combines scattering and attenuation effects in the tissue: U S(x, y) = U SA (x, y)U SE (x, y) + (1 − U SA (x, y))α. Here, U S(x, y) is the B-mode US image, U SA (x, y) is the signal transmission map, U SE (x, y) is the enhanced needle image and α is a constant value representative of echogenicity in the tissue surrounding the needle. Our aim is the extraction of U SE (x, y) which is obtained by estimating the signal transmission map image U SA (x, y). In order to calculate U SA (x, y), we make use of the well known Beer Lambert Law: U ST (x, y) = U S0 (x, y)exp(−ηd(x, y)) which models the attenuation as a function of depth. Here U ST (x, y) is the attenuated intensity image, U S0 is the initial intensity image, η the attenuation coefficient, and d(x, y) the distance from the source/transducer. U ST (x, y) is modeled as a patch-wise transmission function modulated by attenuation and orientation of the needle which will be explained in the next section. Once U ST (x, y) is obtained U SA (x, y) is estimated by minimizing the following objective function [11]:  λ  U SA (x, y) − U ST (x, y) 22 +  Wj ◦ (Dj ⋆ U SA (x, y)) 1 . 2 j∈ω

(1)

Here ω is an index set, ◦ represents element wise multiplication, and ⋆ is convolution operator. Dj is obtained using a bank of high order differential filters consisting of eight Kirsch filters and one Laplacian filter [11]. The incorporation of this filter bank into the contextual regularization framework helps in attenuating the image noise and results in the enhancement of ridge edge features such as the needle shaft and tip in the local region. Wj is a weighting matrix calculated using: Wj (x, y) = exp(− | Dj (x, y) ⋆ U S(x, y) |2 ). In Eq. (1), the first part is the data term which measures the dependability of U SA (x, y) on U ST (x, y)). The second part of Eq. (1) models the contextual constraints of U SA (x, y), and λ is the regularization parameter used to balance the two terms. The optimization of Eq. (1) is achieved using variable splitting where several auxiliary variables are introduced to construct a sequence of simple sub-problems, the solutions of which finally converge to the optimal solution of the original problem [11]. Once U SA (x, y) is estimated, the enhanced needle image U SE (x, y) can be extracted using U SE (x, y) = [(U S(x, y) − α)/[max(U SA (x, y), ǫ)]δ ] + α. Here ǫ is a small constant used to avoid division by zero and δ is related to η, the attenuation coefficient. In next sections we explain how U ST (x, y)) is modeled for needle tip and shaft enhancement. Needle Tip Enhancement: For tip enhancement (Fig. 1), we apply the regularization framework described in previous section. With reference to Eq. (1), we require U ST (x, y) to represent boundary constraints imposed on the US image by attenuation and orientation of the needle within the tissue. Therefore, we first calculate the US confidence map, denoted as U SCM (x, y), using US specific

Enhancement of Needle Tip and Shaft from 2D Ultrasound

365

domain constraints previously proposed by Karamalis et al. [10]. The confidence map is a measure of probability that a random walk starting from a pixel would be able to reach a virtual transducer elements given US image and US specific constraints. The B-mode US image, U S(x, y), is represented as a weighted connected graph and random walks from virtual transducers at the top of the image are used to calculate the apparent signal strengths at each pixel location. Let eij denote the edge between nodes i and j. The graph Laplacian has the weights: ⎧ H wij = exp(−β | ci − cj | +γ), if i, j adjacent and eij ∈ EH ⎪ ⎪ ⎪ ⎨wV = exp(−β | c − c |), if i, j adjacent and eij ∈ EV i j ij √ (2) wij = D ⎪ if i, j adjacent and eij ∈ ED wij = exp(−β | ci − cj | + 2γ), ⎪ ⎪ ⎩ 0, otherwise . where EH ,EV and ED are the horizontal, vertical and diagonal edges on the graph and ci = U S(x, y)i exp(−ηρ(x, y)i ). Here U S(x, y)i is the image intensity at node i and ρ(x, y)i is the normalized closest distance from the node to the nearest virtual transducer [10]. The attenuation coefficient η is inherently integrated in the weighting function, γ is used to model the beam width and β = 90 is an algorithmic constant. U ST (x, y) is obtained by taking the complement of U SCM (x, y) (U ST (x, y) = U SCM (x, y)C ). Since we expect the signal transmission map function U SA (x, y) to display higher intensity with increasing depth, the complement of the confidence map provides the ideal patch-wise transmission map, U ST (x, y), in deriving the minimal objective function. The result of needle tip enhancement is shown in Fig. 1. Investigating Fig. 1(b), we can see that the calculated transmission map U SA (x, y), using Eq. (1), has low intensity values close to the transducer surface (shallow image regions) and high intensity features in the regions away from the transducer (deep image regions). Furthermore, it provides a smooth attenuation density estimate for the US image formation model. Finally, the mean intensity of the local region in the estimated signal transmission map is less than the echogenicity of the tissue confining the needle. This translates into the enhanced image, where the tip will be represented by a local average of the surrounding points, thus giving a uniform intensity region with a high intensity feature belonging to the needle tip in the enhanced image U SE (x, y). Needle Tip Localization: The first step in tip localization is the enhancement of the needle shaft appearing close to the transducer surface (Fig. 1(c) top right) in the enhanced US image U SE (x, y). This is achieved by constructing a phasebased image descriptor, called phase symmetry (PS), using a 2D Log-Gabor 2 −(θ−θm )2 filter whose function is defined as: LG(ω, θ) = exp( −log(ω/κ) 2log(σω )2 )exp( 2(σθ )2 ). Here, ω is the filter frequency while θ is its orientation, k is the center frequency, σω is the bandwidth on the frequency spectrum, σθ is the angular bandwidth and θm is the filter orientation. These filter parameters are selected automatically using the framework proposed in [6]. An initial needle trajectory is calculated by using the peak value of the Radon Transformed PS image. This initial trajectory is further optimized using Maximum Likelihood Estimation Sample Consensus

366

C. Mwikirize et al.

Fig. 1. Needle tip enhancement by the proposed method.(a) B-mode US image showing inserted needle at an angle of 450 . The needle tip has a low contrast to the surrounding tissue and the needle shaft is discontinuous. (b) The derived optimal signal transmission map function U SA (x, y). The map gives an estimation of the signal density in the US image, and thus displays higher intensity values in more attenuated and scattered regions towards the bottom of the image (c) Result of needle tip enhancement. The red arrow points to the conspicuous tip along the trajectory path.

(MLESAC) [6] algorithm for outlier rejection and geometric optimization for connecting the extracted inliers [6]. The image intensities at this stage, lying along a line L in a point cloud, are distributed into a set of line segments, each defined by set of points or knots denoted as t1 ...tn . Needle tip is extracted using:

U Sneedle (U SB , L(t)) =

ti+1 

U SB (L(t)) dt

ti

 L(ti+1 − Lti ) 2

; t ∈ [ti , ti+1 ].

(3)

Here, U SB is the result after band-pass filtering the tip enhanced US image, while ti and ti+1 are successive knots. U Sneedle consists of averaged pixel intensities, and the needle tip is localized as the farthest maximum intensity pixel of U Sneedle at the distal end of the needle trajectory. One advantage of using the tip enhanced image for localization instead of the original image is minimization of interference from soft tissue. In the method of [6], if a high intensity region other than that emanating from the tip were encountered along the trajectory beyond the needle region, the likelihood of inaccurate localization was high. In our case, the enhanced tip has a conspicuously higher intensity than soft tissue interfaces and other interfering artifacts (Fig. 1(c)). Shaft Enhancement: For shaft enhancement (Fig. 2), we use the regularization framework previously explained. However, with reference to Eq. (1), since our objective is to enhance the shaft, we construct a new patch-wise transmission function U ST (x, y) using trajectory and tip information calculated in the needle tip localization section. Consequently, we model the patch-wise transmission function as U ST (x, y) = U SDM (x, y) which represents the Euclidean distance map of the trajectory constrained region. Knowledge of the initial trajectory, from previous section, enables us to model an extended region which includes the entire trajectory of the needle. Incorporating the needle tip location, calculated in the previous step, we limit this region to the trajectory depth

Enhancement of Needle Tip and Shaft from 2D Ultrasound

367

Fig. 2. The process of needle shaft enhancement (a) B-mode US image. (b) Trajectory constrained region obtained from local phase information, indicated by the red line. Line thickness can be adjusted to suit different needle diameters and bending insertions. (c) The optimal signal transmission function U SA (x, y) for needle shaft enhancement. (d) Result of shaft enhancement. Enhancement does not take place for features along the trajectory that may lie beyond the needle tip.

so as to minimize enhancement of soft tissue interfaces beyond the tip (Fig. 2(c)). Investigating Fig. 2(c) we can see that the signal transmission map calculated for the needle shaft has low density values for the local regions confining the needle shaft and high density values for local regions away from the estimated needle trajectory. The difference of the signal transmission map for needle shaft enhancement compared to the tip enhancement is that the signal transmission map for shaft enhancement is limited to the geometry of the needle. This translates into the enhanced needle shaft image, where the shaft will be represented by a local average of the surrounding points, thus giving a uniform intensity region with a high intensity feature belonging to the needle shaft in the enhanced image U SE (x, y). 2.2

Data Acquisition and Analysis

The US images used to evaluate the proposed methods were obtained using a SonixGPS system (Analogic Corporation, Peabody, MA, USA) with a C5-2 curvilinear probe. A standard 17 gauge SonixGPS vascular access needle was inserted at varying insertion angles (300 − 700 ) and depths up to 8 cm. The image resolutions varied from 0.13 mm to 0.21 mm for different depth settings. Freshly excised ex vivo porcine, bovine, liver and kidney tissue samples, obtained from a local butcher were used as the imaging medium. A total of 100 images were analyzed (25 images per tissue sample). The proposed method was implemented using MATLAB 2014a software package and run on a 3.6 GHz Intel(R) CoreTM i7 CPU, 16 GB RAM windows PC. In order to quantify the accuracy, we compared the localized tip against the gold standard tip location obtained by an expert user. The Euclidean Distance (ED) between the ground truth and the estimated tip locations was calculated. We also report the Root Mean Square Error (RMS) and 95 % Confidence Interval (CI) for the quantitative localization results. Finally, we also provide comparison results against the method proposed in [6]. For calculating the US confidence map, U SCM (x, y), the constant values were chosen as: η = 2, β = 90, γ = 0.03. For Eq. (1), λ = 2 and the LogGabor filter parameters were calculated using [6]. α, the constant related to

368

C. Mwikirize et al.

tissue echogenicity, was chosen as 90 % of the maximum intensity value of the patch-wise transmission function. Throughout the experimental validation these values were not changed.

3

Experimental Results

Qualitative and quantitative results obtained from the proposed method are provided in Fig. 3. It is observed that the proposed method gives clear image detail for the tip and shaft, even in instances where shaft information is barely visible (Fig. 3(a) middle column). Using the method proposed in [6], incorrect tip localization arises from soft tissue interface which manifests a higher intensity than the tip along the needle trajectory in the B mode image (Fig. 3(a) right column). In the proposed method, the tip is enhanced but the soft tissue interface is not, thus improving localization as shown in Fig. 3(b). The tip and shaft enhancement takes 0.6 seconds and 0.49 seconds for a 370 × 370 2D image respectively. Figure 3(b) shows a summary of the quantitative results. The overall localization error from the proposed method was 0.3 ± 0.06 mm while that from [6] under an error cap of 2 mm (73 % of the dataset had an error of less than 2 mm) was 0.53 ± 0.07 mm.

Fig. 3. Qualitative an quantitative results for the proposed method. (a) Left column: Bmode US images for bovine and porcine tissue respectively. Middle column: Respective localized tip, red dot, overlaid on shaft enhanced image. Right column: Tip localization results, red dot, from the method of Hacihaliloglu et al. [6]. (b) Quantitative analysis of needle tip localization for bovine, porcine, liver and kidney tissue. Top: Proposed method. Bottom: Using the method of Hacihaliloglu et al. [6]. For the method in [6] only 73 % of the localization result had an error value under 2 mm and were used during validation. Overall, the new method improves tip localization.

4

Discussions and Conclusions

We presented a method for needle shaft and tip enhancement from curvilinear US images. The proposed method is based on the incorporation of US signal

Enhancement of Needle Tip and Shaft from 2D Ultrasound

369

modeling into a L1 -norm based contextual regularization framework by taking into account US specific signal propagation constraints. The improvement achieved in terms of needle tip localization compared to the previously proposed state-of-the-art method [6] was 43 %. The method is validated on epidural needles with minimal bending. For enhancement of bending needles, the proposed model can be updated by incorporating the bending information into the framework. As part of future work, we will incorporate an optical tracking system in validation to minimize variability associated with an expert user. We will also explore shaft and tip enhancement at steeper angles (>700 ), and deeper insertion (>8 cm) depths. The achieved tip localization accuracy and shaft enhancement makes our method appropriate for further investigation in vivo, and is valuable to all previously proposed state of the art needle localization methods.

References 1. Hakime, A., Deschamps, F., De Carvalho, E.G., Barah, A., Auperin, A., De Baere, T.: Electromagnetic-tracked biopsy under ultrasound guidance: preliminary results. Cardiovasc. Intervent. Radiol. 35(4), 898–905 (2012) 2. Zhou, H., Qiu, W., Ding, M., Zhang, S.: Automatic needle segmentation in 3D ultra-sound images using 3D improved Hough transform. In: SPIE Medical Imaging, vol. 6918, pp. 691821-1–691821-9 (2008) 3. Elif, A., Jaydev, P.: Optical flow-based tracking of needles and needle-tip localization using circular hough transform in ultrasound images. Ann. Biomed. Eng. 43(8), 1828–1840 (2015) 4. Uhercik, M., Kybic, J., Liebgott, H., Cachard, C.: Model fitting using ransac for surgical tool localization in 3D ultrasound images. IEEE Trans. Biomed. Eng. 57(8), 1907–1916 (2010) 5. Zhao, Y., Cachard, C., Liebgott, H.: Automatic needle detection and tracking in 3D ultrasound using an ROI-based RANSAC and Kalman method. Ultrason. Imaging 35(4), 283–306 (2013) 6. Hacihaliloglu, I., Beigi, P., Ng, G., Rohling, R.N., Salcudean, S., Abolmaesumi, P.: Projection-based phase features for localization of a needle tip in 2D curvilinear ultrasound. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 347–354. Springer, Heidelberg (2015). doi:10.1007/ 978-3-319-24553-9 43 7. Wu, Q., Yuchi, M., Ding, M.: Phase grouping-based needle segmentation in 3-D trans-rectal ultrasound-guided prostate trans-perineal therapy. Ultrasound Med. Biol. 40(4), 804–816 (2014) 8. Hatt, C.R., Ng, G., Parthasarathy, V.: Enhanced needle localization in ultrasound using beam steering and learning-based segmentation. Comput. Med. Imaging Graph. 14, 45–54 (2015) 9. Harmat, A., Rohling, R.N., Salcudean, S.: Needle tip localization using stylet vibration. Ultr. Med. Biol. 32(9), 1339–1348 (2006) 10. Karamalis, A., Wein, W., Klein, T., Navab, N.: Ultrasound confidence maps using random walks. Med. Image Anal. 16(6), 1101–1112 (2012) 11. Meng, G., Wang, Y., Duan, J., Xiang, S., Pan, C.: Efficient image dehazing with boundary constraint and contextual regularization. In: IEEE International Conference on Computer Vision, pp. 617–624 (2013)

Plane Assist: The Influence of Haptics on Ultrasound-Based Needle Guidance Heather Culbertson1(B) , Julie M. Walker1 , Michael Raitor1 , Allison M. Okamura1 , and Philipp J. Stolka2 1

Stanford University, Stanford, CA 94305, USA {hculbert,jwalker4,mraitor,aokamura}@stanford.edu 2 Clear Guide Medical, Baltimore, MD 21211, USA [email protected]

Abstract. Ultrasound-based interventions require experience and good hand-eye coordination. Especially for non-experts, correctly guiding a handheld probe towards a target, and staying there, poses a remarkable challenge. We augment a commercial vision-based instrument guidance system with haptic feedback to keep operators on target. A user study shows significant improvements across deviation, time, and ease-of-use when coupling standard ultrasound imaging with visual feedback, haptic feedback, or both. Keywords: Ultrasound Image guidance

1

·

Haptic feedback

·

Instrument guidance

·

Introduction

The use of ultrasound for interventional guidance has expanded significantly over the past decade. With research showing that ultrasound guidance improves patient outcomes in procedures such as central vein catheterizations and peripheral nerve blocks [3,7], the relevant professional certification organizations began recommending ultrasound guidance as the gold standard of care, e.g. [1,2]. Some ultrasound training is now provided in medical school, but often solely involves the visualization and identification of anatomical structures – a very necessary skill, but not the only one required [11]. Simultaneous visualization of targets and instruments (usually needles) with a single 2D probe is a significant challenge. The difficulty of maintaining alignment (between probe, instrument, and target) is a major reason for extended intervention duration [4]. Furthermore, if target or needle visualization is lost due to probe slippage or tipping, the user has no direct feedback to find them again. Prior work has shown that bimanual tasks are difficult if the effects of movements of both hands are not visible in the workspace; when there is lack of visual alignment, users must rely on their proprioception, which has an error of up to 5 cm in position and 10◦ of orientation at the hands [9]. This is a particular challenge for novice or infrequent ultrasound users, as this is on the c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 370–377, 2016. DOI: 10.1007/978-3-319-46720-7 43

Plane Assist: Haptics for Ultrasound Needle Guidance

371

order of the range of unintended motion during ultrasound scans. Clinical accuracy limits (e.g. deep biopsies to lesions) are greater than 10 mm in diameter. With US beam thickness at depth easily greater than 2 cm, correct continuous target/needle visualization and steady probe position is a critical challenge. Deviations less than 10 mm practically cannot be confirmed by US alone. One study [13] found that the second most common error of anesthesiology novices during needle block placement (occurring in 27 % of cases) was unintentional probe movement. One solution to this problem is to provide corrective guidance to the user. Prior work in haptic guidance used vibrotactile displays effectively in tasks where visual load is high [12]. The guiding vibrations can free up cognitive resources for more critical task aspects. Combined visual and haptic feedback has been shown to decrease error [10] and reaction time [16] over visual feedback alone, and has been shown to be most effective in tasks with a high cognitive load [6]. Handheld ultrasound scanning systems with visual guidance or actuated feedback do exist [8], but are either limited to just ini- Fig. 1. Guidance system used in tial visual positioning guidance when using this study (Clear Guide ONE), camera-based local tracking [15], or offer including a computer and handheld active position feedback only for a small ultrasound probe with mounted range of motion and require external track- cameras, connected to a standard ultrasound system. ing [5]. To improve this situation, we propose a method for intuitive, always-available, direct probe guidance relative to a clinical target, with no change to standard workflows. The innovation we describe here is Plane Assist: ungrounded haptic (tactile) feedback signaling which direction the user should move to bring the ultrasound imaging plane into alignment with the target. Ergonomically, such feedback helps to avoid information overload while allowing for full situational awareness, making it particularly useful for less experienced operators.

2

Vision-Based Guidance System and Haptic Feedback

Image guidance provides the user with information to help aligning instruments, targets, and possibly imaging probes to facilitate successful instrument handling relative to anatomical targets. This guidance information can be provided visually, haptically, or auditorily. In this study we consider visual guidance, haptic guidance, and their combinations, for ultrasound-based interventions.

372

2.1

H. Culbertson et al.

Visual Guidance

For visual guidance, we use a Clear Guide ONE (Clear Guide Medical, Inc., Baltimore MD; Fig. 1), which adds instrument guidance capabilities to regular ultrasound machines for needle-based interventions. Instrument and ultrasound probe tracking is based on computer vision, using wide-spectrum stereo cameras mounted on a standard clinical ultrasound transducer [14]. Instrument guidance is displayed as a dynamic overlay on live ultrasound imaging. Fiducial markers are attached to the patient skin in the cameras’ field of view to permit dynamic target tracking. The operator defines a target initially by tapping on the live ultrasound image. If the cameras observe a marker during this target definition, further visual tracking of that marker allows continuous 6-DoF localization of the probe. This target tracking enhances the operator’s ability to maintain probe alignment with a chosen target. During the intervention, as (inadvertent) deviations from this reference pose relative to the target – or vice versa in the case of actual anatomical target motion – are tracked, guidance to the target is indicated through audio and on-screen visual cues (needle lines, moving target circles, and targeting crosshairs; Fig. 1). From an initial target pose U S P in ultrasound (U S) coordinate frame and camera/ultrasound calibration transformation matrix C T U S , one determines the pose of the target in the original camera frame: C

P =

C

T US

US

P

(1)

In a subsequent frame, where the same marker is observed in the new camera coordinate frame (C, t), one finds the transformation between the two camera frames (C,t T C ) by simple rigid registration of the two marker corner point sets. Now the target is found in the new ultrasound frame (U S, t): U S,t

P =

U S,t

T C,t

C,t

TC

C

P

(2)

Noting that the ultrasound and camera frames are fixed relative to each other (U S,t T C,t = U S T C ), and expanding, we get the continuously updated target positions in the ultrasound frame: U S,t

P = (C T U S )−1

C,t

TC

C

T US

US

P

(3)

This information can be used for both visual and haptic (see below) feedback. 2.2

Haptic Guidance

To add haptic cues to this system, two C-2 tactors (Engineering Acoustics, Inc., Casselberry, FL) were embedded in a silicone band that was attached to the ultrasound probe, as shown in Fig. 2. Each tactor is 3 cm wide, 0.8 cm tall, and has a mass of 17 g. The haptic feedback band adds 65 g of mass and 2.5 cm of thickness to the ultrasound probe. The tactors were located on the probe sides to provide feedback to correct unintentional probe tilting. Although other degrees

Plane Assist: Haptics for Ultrasound Needle Guidance

373

of freedom (i.e. probe translation) will also result in misalignment between the US plane and target, we focus this initial implementation on tilting because our pilot study showed that tilting is one of the largest contributors to error between US plane and target. Haptic feedback is provided to the user if the target location is further than 2 mm away from the ultrasound plane. This ±2 mm deadband thus corresponds to different amounts of probe tilt for different target depths1 . The tactor on the side corresponding to the direction of tilt is vibrated with an amplitude proportional to the amount of deviation.

3

Experimental Methods

We performed a user study to test the effectiveness of haptic feedback in reducing unintended probe motion during a needle insertion task. All procedures were approved by the Stanford University Institutional Review Board. Eight righthanded novice non-medical students were recruited for the study (five male, three female, 22–43 years old). Novice subjects were used as an approximate representation of medical residents’ skills to evaluate the effect of both visual and haptic feedback on the performance of inexperienced users and to assess the efficacy of this system for use in training. (Other studies indicate that the system shows the greatest benefit with non-expert operators.) 3.1

Experiment Set-Up

In the study, the participants used the ultrasound probe to image a synthetic homogeneous gelatin phantom (Fig. 2(b)) with surface-attached markers for probe pose tracking. After target definition, the participants used the instrument guidance of the Clear Guide system to adjust a needle to be in-plane with ultrasound, and its trajectory to be aligned with the target. After appropriate alignment, they then inserted the needle into the phantom until reaching the target, and the experimenter recorded success or failure of each trial. The success of a trial was determined by watching the needle on the ultrasound image; if it intersected the target, the trial was a success, otherwise a failure. The system continuously recorded the target position in ultrasound coordinates (U S,t P ) for all trials. 3.2

Guidance Conditions

Each participant was asked to complete sixteen needle insertion trials. At the beginning of each trial, the experimenter selected one of four pre-specified target locations ranging from 3 cm to 12 cm in depth. When the experimenter defined a target location on the screen, the system saved the current position and orientation of the ultrasound probe as the reference pose. 1

Note that we ignore the effects of ultrasound beam physics resulting in varying resolution cell widths (beam thickness), and instead consider the ideal geometry.

374

H. Culbertson et al.

Fig. 2. (a) Ultrasound probe, augmented with cameras for visual tracking of probe and needle, and a tactor band for providing haptic feedback. (b) Participant performing needle insertion trial into a gelatin phantom using visual needle and target guidance on the screen, and haptic target guidance through the tactor band.

During each trial, the system determines the current position and orientation of the ultrasound probe, and calculates its deviation from the reference pose. Once the current probe/target deviation is computed, the operator is informed of required repositioning using two forms of feedback: (1) Standard visual feedback (by means of graphic overlays on the live US stream shown on-screen) indicates the current target location as estimated by visual tracking and the probe motion necessary to re-visualize the target in the US view port. The needle guidance is also displayed as blue/green lines on the live imaging stream. (2) Haptic feedback is presented as vibration on either side of the probe to indicate the direction of probe tilt from its reference pose. The participants were instructed to tilt the probe away from the vibration to correct for the unintended motion. Each participant completed four trials under each of four feedback conditions: no feedback (standard US imaging with no additional guidance), visual feedback only, both visual and haptic feedback, and haptic feedback only. The conditions and target locations were randomized and distributed across all sixteen trials to mitigate learning effects and differences in difficulty between target locations. Participants received each feedback and target location pair once.

4

Results

In our analysis, we define the amount of probe deviation as the perpendicular distance between the ultrasound plane and the target location at the depth of the target. In the no-feedback condition, participants had an uncorrected probe deviation larger than 2 mm for longer than half of the trial time in 40 % of

Plane Assist: Haptics for Ultrasound Needle Guidance

375

the trials. This deviation caused these trials to be failures as the needle did not hit the original 3D target location. This poor performance highlights the prevalence of unintended probe motion and the need for providing feedback to guide the user. We focus the remainder of our analysis on the comparison of the effectiveness of the visual and haptic feedback, and do not include the results from the no-feedback condition in our statistical analysis. **

* **

10

2

1.5

1

0.5

0

***

12

Correction Time (s)

Probe Deviation (mm)

2.5

8 6 4 2

Vision On Haptics Off

Vision On Haptics On

0

Vision Off Haptics On

Vision On Haptics Off

Vision On Haptics On

Vision Off Haptics On

Fig. 3. (a) Probe deviation, and (b) time to correct probe deviation, averaged across each trial. Statistically significant differences in probe deviation and correction time marked (∗ ∗ ∗ ≡ p ≤ 0.001, ∗∗ ≡ p ≤ 0.01, ∗ ≡ p ≤ 0.05).

Rated Difficulty

** The probe deviation was averaged * Very Hard in each trial. A three-way ANOVA was run on the average deviation with participant, condition, and target locaHard tion as factors (Fig. 3(a)). Feedback condition and target locations were Moderate found to be significant factors (p < 0.001). No significant difference was Easy found between the average probe deviations across participants (p > 0.1). A multiple-comparison test between Very Easy Vision On Vision On Vision Off the three feedback conditions indiHaptics Off Haptics On Haptics On cated that the average probe deviation for the condition including visual feed- Fig. 4. Rated difficulty for the three feedback only (1.12±0.62 mm) was signifi- back conditions (see below). Statistically cantly greater than that for the condi- significant differences in rated difficulty tions with both haptic and visual feed- marked (∗∗ ≡ p ≤ 0.01, ∗ ≡ p ≤ 0.05). back (0.80 ± 0.38 mm; p < 0.01) and haptic feedback only (0.87 ± 0.48 mm; p < 0.05). Additionally, the time it took for participants to correct probe deviations larger than the 2 mm deadband was averaged in each trial. A three-way ANOVA was run on the average correction time with participant, condition, and target location as factors (Fig. 3(b)). Feedback condition was found to be a significant

376

H. Culbertson et al.

factor (p < 0.0005). No significant difference was found between the average probe deviations across participants or target locations (p > 0.4). A multiplecomparison test between the three feedback conditions indicated that the average probe correction time for the condition including visual feedback only (2.15 ± 2.40 s) was significantly greater than that for the conditions with both haptic and visual feedback (0.61±0.36 s; p < 0.0005) and haptic feedback only (0.77±0.59 s; p < 0.005). These results indicate that the addition of haptic feedback resulted in less undesired motion of the probe and allowed participants to more quickly correct any deviations. Several participants indicated that the haptic feedback was especially beneficial because of the high visual-cognitive load of the needle alignment portion of the task. The participants were asked to rate the difficulty of the experimental conditions on a five-point Likert scale. The difficulty ratings (Fig. 4) support our other findings. The condition including both haptic and visual feedback was rated as significantly easier (2.75±0.76) than the conditions with visual feedback only (3.38 ± 0.92; p < 0.05) and haptic feedback only (3.5 ± 0.46; p < 0.01).

5

Conclusion

We described a method to add haptic feedback to a commercial, vision-based navigation system for ultrasound-guided interventions. In addition to conventional on-screen cues (target indicators, needle guides, etc.), two vibrating pads on either side of a standard handheld transducer indicate deviations from the plane containing a locked target. A user study was performed under simulated conditions which highlight the central problems of clinical ultrasound imaging – namely difficult visualization of intended targets, and distraction caused by task focusing and information overload, both of which contribute to inadvertent target-alignment loss. Participants executed a dummy needle-targeting task, while probe deviation from the target plane, reversion time to return to plane, and perceived targeting difficulty were measured. The experimental results clearly show (1) that both visual and haptic feedback are extremely helpful at least in supporting inexperienced or overwhelmed operators, and (2) that adding haptic feedback (presumably because of its intuitiveness and independent sensation modality) improves performance over both static and dynamic visual feedback. The considered metrics map directly to clinical precision (in the case of probe deviation) or efficacy of the feedback method (in the case of reversion time). Since the addition of haptic feedback resulted in significant improvement for novice users, the system shows promise for use in training. Although this system was implemented using a Clear Guide ONE, the haptic feedback can in principle be implemented with any navigated ultrasound guidance system. In the future, it would be interesting to examine the benefits of haptic feedback in a clinical study, across a large cohort of diversely-skilled operators, while directly measuring the intervention outcome (instrument placement accuracy). Future prototypes would be improved by including haptic feedback for additional degrees of freedom such as translation and rotation of the probe.

Plane Assist: Haptics for Ultrasound Needle Guidance

377

References 1. Emergency ultrasound guidelines: Ann. Emerg. Med. 53(4), 550–570 (2009) 2. Revised statement on recommendations for use of real-time ultrasound guidance for placement of central venous catheters. Bull. Am. Coll. Surg. 96(2), 36–37 (2011) 3. Antonakakis, J.G., Ting, P.H., Sites, B.: Ultrasound-guided regional anesthesia for peripheral nerve blocks: an evidence-based outcome review. Anesthesiol. Clin. 29(2), 179–191 (2011) 4. Banovac, F., Wilson, E., Zhang, H., Cleary, K.: Needle biopsy of anatomically unfavorable liver lesions with an electromagnetic navigation assist device in a computed tomography environment. J. Vasc. Interv. Radiol. 17(10), 1671–1675 (2006) 5. Becker, B.C., Maclachlan, R.A., Hager, G.D., Riviere, C.N.: Handheld micromanipulation with vision-based virtual fixtures. In: IEEE International Conference of Robotics Automation, vol. 2011, pp. 4127–4132 (2011) 6. Burke, J.L., Prewett, M.S., Gray, A.A., Yang, L., Stilson, F.R., Coovert, M.D., Elliot, L.R., Redden, E.: Comparing the effects of visual-auditory and visual-tactile feedback on user performance: a meta-analysis. In: Proceedings of the 8th International Conference on Multimodal Interfaces, pp. 108–117. ACM (2006) 7. Cavanna, L., Mordenti, P., Bert`e, R., Palladino, M.A., Biasini, C., Anselmi, E., Seghini, P., Vecchia, S., Civardi, G., Di Nunzio, C.: Ultrasound guidance reduces pneumothorax rate and improves safety of thoracentesis in malignant pleural effusion: report on 445 consecutive patients with advanced cancer. World J. Surg. Oncol. 12(1), 1 (2014) 8. Courreges, F., Vieyres, P., Istepanian, R.: Advances in robotic tele-echographyservices-the otelo system. In: 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEMBS 2004, vol. 2, pp. 5371–5374. IEEE (2004) 9. Gilbertson, M.W., Anthony, B.W.: Ergonomic control strategies for a handheld force-controlled ultrasound probe. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1284–1291. IEEE (2012) 10. Oakley, I., McGee, M.R., Brewster, S., Gray, P.: Putting the feel in ‘look and feel’. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2000, pp. 415–422. ACM (2000) 11. Shapiro, R.S., Ko, P.P., Jacobson, S.: A pilot project to study the use of ultrasonography for teaching physical examination to medical students. Comput. Biol. Med. 32(6), 403–409 (2002) 12. Sigrist, R., Rauter, G., Riener, R., Wolf, P.: Augmented visual, auditory, haptic, and multimodal feedback in motor learning: a review. Psychon. Bull. Rev. 20(1), 21–53 (2013) 13. Sites, B.D., Spence, B.C., Gallagher, J.D., Wiley, C.W., Bertrand, M.L., Blike, G.T.: Characterizing novice behavior associated with learning ultrasound-guided peripheral regional anesthesia. Reg. Anesth. Pain Med. 32(2), 107–115 (2007) 14. Stolka, P.J., Wang, X.L., Hager, G.D., Boctor, E.M.: Navigation with local sensors in handheld 3D ultrasound: initial in-vivo experience. In: SPIE Medical Imaging, p. 79681J. International Society for Optics and Photonics (2011) 15. Sun, S.Y., Gilbertson, M., Anthony, B.W.: Computer-guided ultrasound probe realignment by optical tracking. In: 2013 IEEE 10th International Symposium on Biomedical Imaging (ISBI), pp. 21–24. IEEE (2013) 16. Van Erp, J.B., Van Veen, H.A.: Vibrotactile in-vehicle navigation system. Transp. Res. Part F: Traffic Psychol. Behav. 7(4), 247–256 (2004)

A Surgical Guidance System for Big-Bubble Deep Anterior Lamellar Keratoplasty Hessam Roodaki1(B) , Chiara Amat di San Filippo1 , Daniel Zapp3 , Nassir Navab1,2 , and Abouzar Eslami4 1

2 3

Computer Aided Medical Procedures, Technische Universit¨ at M¨ unchen, Munich, Germany [email protected] Computer Aided Medical Procedures, Johns Hopkins University, Baltimore, USA Augenklinik rechts der Isar, Technische Universit¨ at M¨ unchen, Munich, Germany 4 Carl Zeiss Meditec AG, Munich, Germany

Abstract. Deep Anterior Lamellar Keratoplasty using Big-Bubble technique (BB-DALK) is a delicate and complex surgical procedure with a multitude of benefits over Penetrating Keratoplasty (PKP). Yet the steep learning curve and challenges associated with BB-DALK prevents it from becoming the standard procedure for keratoplasty. Optical Coherence Tomography (OCT) aids surgeons to carry out BB-DALK in a shorter time with a higher success rate but also brings complications of its own such as image occlusion by the instrument, the constant need to reposition and added distraction. This work presents a novel real-time guidance system for BB-DALK which is practically a complete tool for smooth execution of the procedure. The guidance system comprises of modified 3D+t OCT acquisitions, advanced visualization, tracking of corneal layers and providing depth information using Augmented Reality. The system is tested by an ophthalmic surgeon performing BB-DALK on several ex vivo pig eyes. Results from multiple evaluations show a maximum tracking error of 8.8 micrometers.

1

Introduction

Ophthalmic anterior segment surgery is among the most technically challenging manual procedures. Penetrating Keratoplasty (PKP) is a well-established transplant procedure for the treatment of multiple diseases of the cornea. In PKP, the full thickness of the diseased cornea is removed and replaced with a donor cornea that is positioned into place and sutured with stitches. Deep Anterior Lamellar Keratoplasty (DALK) is proposed as an alternative method for corneal disorders not affecting the endothelium. The main difference of DALK compared to PKP is the preservation of the patient’s own endothelium. This advantage reduces the risk of immunologic reactions and graft failure while showing similar overall visual outcomes. However, DALK is generally more complicated and time-consuming with a steep learning curve particularly when the host stroma is manually removed layer by layer [4]. In addition, high rate of intraoperative c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 378–385, 2016. DOI: 10.1007/978-3-319-46720-7 44

A Surgical Guidance System for Big-Bubble Deep Anterior

379

perforation keeps DALK from becoming surgeons’ method of choice [7]. To overcome the long surgical time and high perforation rate of DALK, in [1] Anwar et al. have proposed the big-bubble DALK technique (BB-DALK). The fundamental step of the big-bubble technique is the insertion of a needle into the deep stroma where air is injected with the goal of separating the posterior stroma and the Descemet’s Membrane (DM). The needle is intended to penetrate to a depth of more than 60 % of the cornea, where the injection of air in most cases forms a bubble. However, in fear of perforating the DM, surgeons often stop the insertion before the target depth, where air injection results only in diffuse emphysema of the anterior stroma [7]. When bubble formation is not achieved, effort on exposing a deep layer nearest possible to the DM carries the risk of accidental perforation which brings further complications to the surgical procedure. Optical Coherence Tomography (OCT) has been shown to increase the success rate of the procedure by determining the depth of the cannula before attempting the air injection [2]. Furthermore, recent integration of Spectral Domain OCT (SD-OCT) into surgical microscopes gives the possibility of continuous monitoring of the needle insertion. However, current OCT acquisition configurations and available tools to visualize the acquired scans are insufficient for the purpose. Metallic instruments interfere with the OCT signal leading to obstruction of deep structures. The accurate depth of the needle can only be perceived by removing the needle and imaging the created tunnel since the image captured when the needle is in position only shows the reflection of the top segment of the metallic instrument [2]. Also, limited field of view makes it hard to keep the OCT position over the needle when pressure is applied for insertion. Here we propose a complete system as a guidance tool for BB-DALK. The system consists of modified 3D+t OCT acquisition using a microscopemounted scanner, sophisticated visualization, tracking of the epithelium (top) and endothelium (bottom) layers and providing depth information using Augmented Reality (AR). The method addresses all aspects of the indicated complex procedure, hence is a practical solution to improve surgeons’ and patients’ experience.

2

Method

As depicted in Fig. 1, the system is based on an OPMI LUMERA 700 microscope equipped with a modified integrated RESCAN 700 OCT device (Carl Zeiss Meditec, Germany). A desktop computer with a quad-core Intel Core i7 CPU, a single NVIDIA GeForce GTX TITAN X GPU and two display screens are connected to the OCT device. Interaction with the guidance system is done by the surgeon’s assistant via a 3D mouse (3Dconnexion, Germany). The surgeon performs the procedure under the microscope while looking at the screens for both microscopic and OCT feedback. The experiments are performed on ex vivo pig eyes as shown in Fig. 3a using 27 and 30 gauge needles. For evaluations, a micromanipulator and a plastic anterior segment phantom eye are used.

380

H. Roodaki et al.

Fig. 1. Experimental setup of the guidance system.

2.1

OCT Acquisition

2mm

30 B-scans

(a)

540px

m

18 0p x

2mm 180px

6mm

2m

6m m

90 Asc

an s

2mm 1024px

The original configuration of the intraoperative OCT device is set to acquire B-scans consisting of either 512 or 1024 A-scans. It can be set to acquire a single B-scan, 2 orthogonal B-scans or 5 parallel B-scans. For the proposed guidance system, the OCT device is set to provide 30 B-scans each with 90 A-scan samples by reprogramming the movement of its internal mirror galvanometers. B-scans are captured in a reciprocating manner for shorter scanning time. The scan region covered by the system is 2 mm by 6 mm. The depth of each A-scan is 1024 pixels corresponding to 2 mm in tissue. The concept is illustrated in Fig. 2a. The cuboid of 30 × 90 × 1024 voxels is scanned at the rate of 10 volumes per second. Since the cuboid is a 3D grid of samples from a continuous scene, it is interpolated using tricubic interpolants to the target resolution of 180×540×180

(b)

Fig. 2. (a): The modified pattern of OCT acquisition. (b): The lateral visualization of the cornea (orange) and the surgical needle (gray) in an OCT cuboid.

A Surgical Guidance System for Big-Bubble Deep Anterior

381

voxels (Fig. 2b). For that, frames are first averaged along the depth to obtain 30 frames of 90 × 30 pixels. Then in each cell of the grid, a tricubic interpolant which maps coordinates to intensity values is defined as follows: f (x, y, z) =

3 

cijk xi y j z k ,

x, y, z ∈ [0, 1],

(1)

i,j,k=0

in which cijk are the 64 interpolant coefficients calculated locally from the grid sample points and their derivatives. The coefficients are calculated by multiplication of a readily available 64×64 matrix and the vector of 64 elements consisting of 8 sample points and their derivatives [6]. The interpolation is implemented on the CPU in a parallel fashion. 2.2

Visualization

The achieved 3D OCT volume is visualized on both 2D monitors using GPU ray casting with 100 rays per pixel. Maximum information in OCT images is gained from high-intensity values representing boundaries between tissue layers. Hence, the Maximum Intensity Projection (MIP) technique is employed for rendering to put an emphasis on corneal layers. Many segmentation algorithms in OCT imaging are based on adaptive intensity thresholding [5]. Metallic surgical instruments including typical needles used for the BB-DALK procedure have infrared reflectivity profiles that are distinct from cellular tissues. The 3D OCT volume is segmented into the background, the cornea and the instrument by taking advantage of various reflectivity profiles and employing K-means clustering. The initial cluster mean values are set for the background to zero, the cornea to the volume mean intensity (µ) and the instrument to the volume mean intensity plus two standard deviations (µ + 2σ). The segmentation is used to dynamically alter the color and opacity transfer functions to ensure the instrument is distinctly and continuously visualized in red, the background speckle noise is suppressed and the corneal tissue opacity does not obscure the instrument (Fig. 3b, c).

(a)

(b)

(c)

Fig. 3. (a): Needle insertion performed by the surgeon on the ex vivo pig eye. (b), (c): 3D visualization of the OCT cuboid with frontal and lateral viewpoints. The needle is distinctly visualized in red while endothelium (arrow) is not apparent.

382

H. Roodaki et al.

The OCT cuboid could be examined from different viewpoints according to the exact need of the surgeon. For this purpose, one of the two displays could be controlled by the surgeon’s assistant using a 3D mouse with zooming, panning and 3D rotating functionalities. The proposed guidance system maintains an automatic viewpoint of the OCT volume next to the microscopic view in the second display using the tracking procedure described below. 2.3

Tracking

The corneal DM and endothelial layer are the main targets of the BB-DALK procedure. The DM must not be perforated while the needle must be guided as close as possible to it. However, the two layers combined do not have a footprint larger than a few pixels in OCT images. As an essential part of the guidance system, DM and endothelium 3D surfaces are tracked for continuous feedback by solid visualization. The advancement of the needle in a BB-DALK procedure is examined and reported by percentage of the stroma that is above the needle tip. Hence, the epithelium surface of the cornea is also tracked to assist the surgeon by the quantitative guidance of the insertion. Tracking in each volume is initiated by detection of the topmost and bottommost 3D points in the segmented cornea of the OCT volume. Based on the spherical shape of the cornea, two half spheres are considered as models of the endothelium and epithelium surfaces. The models are then fitted to the detected point clouds using iterative closest point (ICP) algorithm. Since the insertion of the needle deforms the cornea, ICP is utilized with 3D affine transformation at its core [3]. If the detected and the model half sphere point clouds are respecNM 3 3 P tively denoted as P = {pi }N i=1 ∈ R and M = {mi }i=1 ∈ R , each iteration of the tracker algorithm is consecutively minimizing the following functions: C(i) = arg min (Ak−1 mi + tk−1 ) − pi 22 ,

for all i ∈ {1, .., NM }.

(2)

j∈{1,...,NP }

(Ak , tk ) = arg min A,t

N 1  (Ami + t) − pC(i) 22 . N i=1

(3)

Equation 2 finds the correspondence C(i) between N ≤ min(NP , NM ) detected and model points. Equation 3 minimizes the Euclidean distance between the detected points and the transformed points of the model. Ak and tk are the desired affine and translation matrices at iteration k. For each incoming volume, ICP is initialized by the transformation that brings the centroid of the model points to the centroid of the detected points. The algorithm stops after 30 iterations. The lateral view of the OCT volume gives a better understanding of the needle dimensions and advancement. Also, the perception of the distance between the instrument and the endothelium layer is best achieved from viewing the scene parallel to the surface. Therefore, the viewpoint of the second display is constantly kept parallel to a small plane at the center of the tracked endothelium

A Surgical Guidance System for Big-Bubble Deep Anterior

(a)

383

(b)

Fig. 4. Augmented Reality is used to solidly visualize the endothelium and epithelium surfaces (yellow) using wireframes. A hypothetical surface (green) is rendered to indicate the insertion target depth.

surface (Fig. 4a). The pressure applied for insertion of the needle leads to deformation of the cornea. To keep the OCT field of view centered on the focus of the procedure despite the induced shifts, the OCT depth range is continuously centered to halfway between top and bottom surfaces. This is done automatically to take the burden of manual repositioning away from the surgeon. 2.4

Augmented Reality

To further assist the surgeon, a hypothetical third surface is composed between the top and bottom surfaces indicating the insertion target depth (Fig. 4). The surgeon can choose a preferred percentage of penetration at which the imaginary surface would be rendered. Each point of the third surface is a linear combination of the corresponding points on the tracked epithelium and endothelium layers according to the chosen percentage. To visualize the detected surfaces, a wireframe mesh is formed on each of the three point sets. The two detected surfaces are rendered in yellow at their tracked position and the third surface is rendered

(a)

(b)

(c)

Fig. 5. Results of air injection in multiple pig eyes visualized from various viewpoints. The concentration of air in the bottommost region of the cornea indicates the high insertion accuracy. Deep stroma is reached with no sign of perforation.

384

H. Roodaki et al.

in green at its hypothetical location. Visualization of each surface could be turned off if necessary. After injection, the presence of air leads to high-intensity voxels in the OCT volume. Therefore, the separation region is visualized effectively in red and could be used for validation of separation (Fig. 5).

3

Experiments and Results

The proposed guidance system is tested by an ophthalmic surgeon experienced in corneal transplantation procedure on several ex vivo pig eyes. The visualization gives a new dimension never seen before in conventional systems in his comment. His experience with the system signifies the ability of the real-time guidance solution to help in deep needle insertions with fewer perforation incidents. For the purpose of real-time OCT acquisition, the surgical scene is sparsely sampled via a grid of A-scans and interpolated. To evaluate the accuracy of interpolation against dense sampling, four fixed regions of a phantom eye (2 mm × 6 mm × 2 mm) are scanned once with real-time sparse sampling (30 px × 90 px × 1024 px) and two times with slow dense sampling (85 px × 512 px × 1024 px). The sparse volumes are then interpolated to the size of the dense volumes. Volume pixels have intensities in the range of [0, 1]. For each of the four regions, Mean Absolute Error (MAE) of pixel intensities is once calculated for the two dense captures and once for one of the dense volumes and the interpolated volume. A maximum pixel intensity error of 0.073 is observed for the dense-sparse comparison while a minimum pixel intensity error of 0.043 is observed for the dense-dense comparison. The reason for the observed error in dense-dense comparison lies in the presence of OCT speckle noise which is a known phenomenon. The error observed for the dense-sparse comparison is comparable with the error induced by speckle noise hence the loss in sparse sampling is insignificant. Human corneal thickness is reported to be around 500 µm. To ensure a minimum chance of perforation when insertion is done to the depth of 90 %, the tracking accuracy required is around 50 µm. To evaluate tracking accuracy of the proposed solution, a micromanipulator with a resolution of 5 µm is used. A phantom eye and a pig eye are fixed to a micromanipulator and precisely moved upwards and downwards while the epithelium and endothelium surfaces are tracked. At each position, the change in the depth of the tracked surfaces corresponding points are studied. Results are presented in Table 1 using box-andwhisker plots. The whiskers are showing the minimum and maximum recorded change of all tracked points while the start and the end of the box are the first and third quartiles. Bands and dots represent medians and means of the recorded changes respectively. The actual value against which the tracking accuracy should be compared is highlighted in red on the horizontal axis of the plots. Overall, the maximum tracking error is 8.8 µm.

A Surgical Guidance System for Big-Bubble Deep Anterior

385

Table 1. Evaluation of Tracking Experiment

Actual move (µm)

Phantom eye

10

Phantom eye

30

Pig eye

10

Pig eye

30

4

Detected epithelium displacement (µm)

Detected endothelium displacement (µm)

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14 16 18

21 24 27 30 33 36 39

21 24 27 30 33 36 39

2 4 6 8 10 12 14 16 18

2 4 6 8 10 12 14 16 18

21 24 27 30 33 36 39

21 24 27 30 33 36 39

Conclusion

This work presents a novel real-time guidance system for one of the most challenging procedures in ophthalmic microsurgery. The use of medical AR aims at facilitation of the BB-DALK learning process. Experiments on ex vivo pig eyes suggest the usability and reliability of the system leading to more effective yet shorter surgery sessions. Quantitative evaluations of the system indicate its high accuracy in depicting the surgical scene and tracking its changes leading to precise and deep insertions. Future work will be in the direction of adding needle tracking and navigation, further evaluations and clinical in vivo tests.

References 1. Anwar, M., Teichmann, K.D.: Big-bubble technique to bare Descemet’s membrane in anterior lamellar keratoplasty. J. Cataract Refract. Surg. 28(3), 398–403 (2002) 2. De Benito-Llopis, L., Mehta, J.S., Angunawela, R.I., Ang, M., Tan, D.T.: Intraoperative anterior segment optical coherence tomography: a novel assessment tool during deep anterior lamellar keratoplasty. Am. J. Ophthalmol. 157(2), 334–341 (2014) 3. Du, S., Zheng, N., Ying, S., Liu, J.: Affine iterative closest point algorithm for point set registration. Pattern Recogn. Lett. 31(9), 791–799 (2010) 4. Fontana, L., Parente, G., Tassinari, G.: Clinical outcomes after deep anterior lamellar keratoplasty using the big-bubble technique in patients with keratoconus. Am. J. Ophthalmol. 143(1), 117–124 (2007) 5. Ishikawa, H., Stein, D.M., Wollstein, G., Beaton, S., Fujimoto, J.G., Schuman, J.S.: Macular segmentation with optical coherence tomography. Invest. Ophthalmol. Vis. Sci. 46(6), 2012–2017 (2005) 6. Lekien, F., Marsden, J.: Tricubic interpolation in three dimensions. Int. J. Numer. Meth. Eng. 63(3), 455–471 (2005) 7. Scorcia, V., Busin, M., Lucisano, A., Beltz, J., Carta, A., Scorcia, G.: Anterior segment optical coherence tomography-guided big-bubble technique. Ophthalmology 120(3), 471–476 (2013)

Real-Time 3D Tracking of Articulated Tools for Robotic Surgery Menglong Ye(B) , Lin Zhang, Stamatia Giannarou, and Guang-Zhong Yang The Hamlyn Centre for Robotic Surgery, Imperial College London, London, UK [email protected]

Abstract. In robotic surgery, tool tracking is important for providing safe tool-tissue interaction and facilitating surgical skills assessment. Despite recent advances in tool tracking, existing approaches are faced with major difficulties in real-time tracking of articulated tools. Most algorithms are tailored for offline processing with pre-recorded videos. In this paper, we propose a real-time 3D tracking method for articulated tools in robotic surgery. The proposed method is based on the CAD model of the tools as well as robot kinematics to generate online part-based templates for efficient 2D matching and 3D pose estimation. A robust verification approach is incorporated to reject outliers in 2D detections, which is then followed by fusing inliers with robot kinematic readings for 3D pose estimation of the tool. The proposed method has been validated with phantom data, as well as ex vivo and in vivo experiments. The results derived clearly demonstrate the performance advantage of the proposed method when compared to the state-of-the-art.

1

Introduction

Recent advances in surgical robots have significantly improved the dexterity of the surgeons, along with enhanced 3D vision and motion scaling. Surgical R (Intuitive Surgical, Inc. CA) platform, can allow robots such as the da Vinci the augmentation of preoperative data to enhance the intraoperative surgical guidance. In robotic surgery, tracking of surgical tools is an important task for applications such as safe tool-tissue interaction and surgical skills assessment. In the last decade, many approaches for surgical tool tracking have been proposed. The majority of these methods have focused on the tracking of laparoscopic rigid tools, including using template matching [1] and combining coloursegmentation with prior geometrical tool models [2]. In [3], the 3D poses of rigid robotic tools were estimated by combining random forests with level-sets segmentation. More recently, tracking of articulated tools has also attracted a lot of interest. For example, Pezzementi et al. [4] tracked articulated tools based on an offline synthetic model using colour and texture features. The CAD model of a robotic tool was used by Reiter et al. [5] to generate virtual templates using the robot kinematics. However, thousands of templates were created by configuring the original tool kinematics, leading to time-demanding rendering and template matching. In [6], boosted trees were used to learn predefined parts of surgical c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 386–394, 2016. DOI: 10.1007/978-3-319-46720-7 45

Real-Time 3D Tracking of Articulated Tools for Robotic Surgery

387

Fig. 1. (a) Illustration of transformations; (b) Virtual rendering example of the large needle driver and its keypoint locations; (c) Extracted gradient orientations from virtual rendering. The orientations are quantised and colour-coded as shown in the pie chart.

tools. Similarly, regression forests have been employed in [7] to estimate the 2D pose of articulated tools. In [8], the 3D locations of robotic tools estimated with offline trained random forests, were fused with robot kinematics to recover the 3D poses of the tools. Whilst there has been significant progress on surgical tool detection and tracking, none of the existing approaches have thus far achieved real-time 3D tracking of articulated robotic tools. In this paper, we propose a framework for real-time 3D tracking of articulated tools in robotic surgery. Similar to [5], CAD models have been used to generate virtual tools and their contour templates are extracted online, based on the kinematic readings of the robot. In our work, the tool detection on the real camera image is performed via matching the individual parts of the tools rather than the whole instrument. This enables our method to deal with the changing pose of the tools due to articulated motion. Another novel aspect of the proposed framework is the robust verification approach based on 2D geometrical context, which is used to reject outlier template matches of the tool parts. The inlier 2D detections are then used for 3D pose estimation via the Extended Kalman Filter (EKF). Experiments have been conducted on phantom, ex vivo and in vivo video data, and the results verify that our approach outperforms the state-of-the-art.

2

Methods

Our proposed framework includes three main components. The first component is a virtual tool renderer that generates part-based templates online. After template matching, the second component performs verification to extract the inlier 2D detections. These 2D detections are finally fused with kinematic data for 3D tool R robot. The pose estimation. Our framework is implemented on the da Vinci R  robot kinematics are retrieved using the da Vinci Research Kit (dVRK) [9]. 2.1

Part-Based Online Templates for Tool Detection

In this work, to deal with the changing pose of articulated surgical tools, the tool detection has been performed by matching individual parts of the tools, rather than the entire instrument, similar to [6]. To avoid the limitations of

388

M. Ye et al.

Fig. 2. (a) An example of part-based templates; (b) Quantised gradient orientations from a camera image; (c) Part-based template matching results of tool parts; (d) and (e) Geometrical context verification; (f) Inlier detections obtained after verification.

offline training, we propose to generate the part models on-the-fly such that the changing appearance of tool parts can be dynamically adapted. To generate the part-based models online, the CAD model of the tool and the robot kinematics have been used to render the tool in a virtual environment. The pose of a tool in the robot base frame B can be denoted as the transformation B TB E , where E is the end-effector coordinate frame shown in Fig. 1(a). TE can be retrieved from dVRK (kinematics) to provide the 3D coordinates of the tool in B. Thus, to set the virtual view to be the same as the laparoscopic view, a standard hand-eye calibration [10] is used to estimate the transformation TC B from B to the camera coordinate frame C. However, errors in the calibration can affect the accuracy of TC B , resulting in a 3D pose offset between the virtual tool and the real tool in C. In this regard, we represent the transformation found from the − − is the camera coordinate frame that includes the calibration as TC B , where C accumulated calibration errors. Therefore, a correction transformation denoted as TC C − can be introduced to compensate for the calibration errors.  n In this work, we have defined n = 14 keypoints PB = pB i i=1 on the tool, and the large needle driver is taken as an example. The keypoints include the points shown in Fig. 1(b) and those on the symmetric side of the tool. These R keypoints represent the skeleton of the tool, which also apply to other da Vinci tools. At time t, an image It can be obtained from the laparoscopic camera. The keypoints can be projected in It with the camera intrinsic matrix K via PIt =

1 C− B KTC C − TB Pt . s

(1)

Here, s is the scaling factor that normalises the depth to the image plane. To represent the appearance of the tool parts, the Quantised Gradient Orientations (QGO) approach [11] has been used (see Fig. 1(c)). Bounding boxes are created to represent part-based models and centred at the keypoints in the virtual view (see Fig. 2(a)). The box size for each part is adjusted based on the z

Real-Time 3D Tracking of Articulated Tools for Robotic Surgery

389

coordinate (from kinematics) of the keypoint with respect to the virtual camera centre. QGO templates are then extracted inside these bounding boxes. As QGO represents the contour information of the tool, it is robust to cluttered scenes and illumination changes. In addition, a QGO template is represented as a binary code by quantisation, thus template matching can be performed efficiently. Note that not all of the defined parts are visible in the virtual view, as some of them may be occluded. Therefore, the templates are only extracted for those m parts that facing the camera. To find the correspondences of the tool parts between the virtual and real images, QGO is also computed on the real image (see Fig. 2(b)) and template matching is then performed for each part via sliding windows. Exemplar template matching results are shown in Fig. 2(c). 2.2

Tool Part Verification via 2D Geometrical Context

To further extract the best location estimates of the tool parts, a consensus-based verification approach [12] is included. This approach analyses the geometrical context of the correspondences in a PROgressive SAmple Consensus (PROSAC) m scheme [13]. For the visible keypoints {pi }i=1 in the virtual view, we denote their k m,k 2D correspondences in the real camera image as {pi,j }i=1,j=1 , where {pi,j }j=1 represent the top k correspondences of pi sorted by QGO similarities. m,k For each iteration in PROSAC, we select two point pairs from {pi,j }i=1,j=1 in a sorted descending order. These two pairs represent the correspondences for two different parts, e.g., pair of p1 and p1,2 , and pair of p3 and p3,1 . The two pairs are then used to verify the geometrical context of the tool parts. As shown in Fig. 2(d) and (e), we use two polar grids to indicate the geometrical context of the virtual view and the camera image. The origins of the grids are defined as p1 and p1,2 , respectively. The major axis of the grids can be defined as the vectors from p1 to p3 and p1,2 to p3,1 , respectively. The scale difference between the two grids is found by comparing d (p1 , p3 ) and d (p1,2 , p3,1 ), where d (·, ·) is the euclidean distance. We can then define the angular and radial bin sizes as 30◦ and 10 pixels (allowing moderate out-of-plane rotation), respectively. With these, two polar grids can be created and placed on the virtual and camera images. A point pair is determined as an inlier if the two points are located in the same zone in the polar grids. Therefore, if the number of inliers is larger than a predefined value, the geometrical context of the tools in the virtual and the real camera images are considered as matched. Otherwise, the above verification is repeated until it reaches the maximum number (100) of iterations. After verification, the inlier point matches are used to estimate the correction transformation TC C− . 2.3

From 2D to 3D Tool Pose Estimation

We now describe how to combine the 2D detections with 3D kinematic data to estimate TC C − . Here the transformation matrix is represented as a vector of T rotation angles and translations along each axis: x = [θx , θy , θz , rx , ry , rz ] . We denote the n observations (corresponding to the tool parts defined in Sect. 2.1) as

390

M. Ye et al. T

z = [u1 , v1 , . . . , un , vn ] , where u and v are their locations in the camera image. To estimate x on-the-fly, the EKF has been adopted to find xt given the observations zt at time t. The process model is defined as xt = Ixt−1 + wt , where wt is the process noise at time t, and I is the transition function defined as the identity matrix. The measurement model is defined as zt = h(xt ) + vt , with vt being the T noise. h(·) is the nonlinear function with respect to [θx , θy , θz , rx , ry , rz ] : h(xt ) =

− 1 B Kf (xt )TC B Pt , s

(2)

which is derived according to Eq. 1. Note here, f (·) is the function that composes the euler angles and translation (in xt ) into the 4×4 homogenous transformation matrix TC C − . As Eq. 2 is a nonlinear function, we derive the Jacobian matrix J of h(·) regarding each element in xt . For iteration t, the predicted state x− t is calculated and used to predict the , and also to calculate J measurement z− t . In addition, zt is obtained from the t inlier detections (Sect. 2.2), which is used, along with Jt and x− t , to derive the which contains the corrected angles and translations. These corrected state x+ t are finally used to compose the transformation TC at time t, and thus the 3D − C C C C− B pose of the tool in C is obtained as TE = TC − TB TE . Note that if no 2D detections are available at time t, the previous TC C − is then used. At the beginning of the tracking process, an estimate 0 TC C − is required to initialise EKF, and correct the virtual view to be as close as possible to the real view. Therefore, template matching is performed in multiple scales and rotations for initialisation, however, only one template is needed for matching of each tool part after initialisation. The Efficient Perspective-n-Points (EPnP) algorithm [14] is applied to estimate 0 TC C − based on the 2D–3D correspondences of the tool parts matched between the virtual and real views and their 3D positions from kinematic data. The proposed framework can be easily extended to track multiple tools. This only requires to generate part-based templates for all the tools in the same graphic rendering and follow the proposed framework. As template matching is performed in binarised templates, the computational speed is not deteriorated.

3

Results

The proposed framework has been implemented on an HP workstation with an Intel Xeon E5-2643v3 CPU. Stereo videos are captured at 25 Hz. In our C++ implementation, we have separated the part-based rendering and image processing into two CPU running threads, enabling our framework to be realtime. The rendering part is implemented based on VTK and OpenGL, of which the speed is fixed as 25 Hz. As our framework only requires monocular images for 3D pose estimation, only the images from the left camera were processed. For image size 720 × 576, the processing speed is ≈29 Hz (without any GPU programming). The threshold of the inlier number in the geometrical context verification is empirically defined as 4. For initialisation, template matching is

Real-Time 3D Tracking of Articulated Tools for Robotic Surgery

391

Fig. 3. (a) and (b) Detection rate results of our online template matching and GradBoost [6] on two single-tool tracking sequences (see supplementary videos); (c) Overall rotation angle errors (mean ± std) along each axis on Seqs. 1–6. Table 1. Translation and rotation errors (mean ± std) on Seqs. 1–6. Tracking accuracies with run-time speed in Hz (in brackets) compared to [8] on their dataset (Seqs. 7–12). 3D Pose error

Tracking accuracy

Our method EPnP-based Trans. (mm) Rot. (rads.) Trans. (mm) Rot. (rads.)

Seq.

1

1.31 ± 1.15 0.11 ± 0.08 3.10 ± 3.89 0.12 ± 0.09

7

97.79 % (27) 97.12 % (1)

2

1.50 ± 1.12 0.12 ± 0.07 6.69 ± 8.33 0.24 ± 0.19

8

99.45 % (27) 96.88 % (1)

3

3.14 ± 1.96 0.12 ± 0.08 8.03 ± 8.46 0.23 ± 0.20

9

99.25 % (28) 98.04 % (1)

4

4.04 ± 2.21 0.19 ± 0.15 5.02 ± 5.41 0.29 ± 0.18

10

96.84 % (28) 97.75 % (1)

5

3.07 ± 2.02 0.14 ± 0.11 5.47 ± 5.63 0.26 ± 0.20

11

96.57 % (36) 98.76 % (2)

6

3.24 ± 2.70 0.12 ± 0.05 4.03 ± 3.87 0.23 ± 0.13

12

98.70 % (25) 97.25 %(1)

Seq.

Overall 2.83 ± 2.19 0.13 ± 0.10 5.51 ± 6.45 0.24 ± 0.18 Overall

Our method

97.83 %

[8]

97.81 %

performed with additional scale ratios of 0.8 and 1.2, and rotations of ±15◦ , which does not deteriorate the run-time speed due to template binarisation. Our method was compared to the tracking approaches for articulated tools including [6,8]. For demonstrating the effectiveness of the online part-based templates for tool detection, we have compared our approach to the method proposed in [6], which is based on boosted trees for 2D tool part detection. For ease of training data generation, a subset of the tool parts was evaluated in this comparison, namely the front pin, logo, and rear pin. The classifier was trained with 6000 samples for each part. Since [6] applies to single tool tracking only, the trained classifier along with our approach were tested on two single-tool sequences (1677 and 1732 images), where ground truth data was manually labelled. A part detection is determined to be correct if the distance of its centre and ground truth is smaller than a threshold. To evaluate the results with different accuracy requirements, the threshold was therefore sequentially set to 5, 10, and 20 pixels. The detection rates of the methods were calculated among the top N detections. As shown in Fig. 3(a–b) our method significantly outperforms [6] in all accuracy requirements. This is because our templates are generated adaptively online. To validate the accuracy of the 3D pose estimation, we manually labelled the centre locations of the tool parts on both left and right camera images

392

M. Ye et al.

Fig. 4. Qualitative results. (a–c) phantom data (Seqs. 1–3); (d) ex vivo ovine data (Seq. 4); (e) and (g) ex vivo porcine data (Seqs. 9 and 12); (f) in vivo porcine data (Seq. 11). Red lines indicate the tool kinematics, and green lines indicate the tracking results of our framework with 2D detections in coloured dots.

on phantom (Seqs. 1–3) and ex vivo (Seqs. 4–6) video data to generate the 3D ground truth. The tool pose errors are then obtained as the relative pose between the estimated pose and the ground truth. Our approach was also compared to the 3D poses estimated performing EPnP for every image where the tool parts are detected. However, EPnP generated unstable results and had inferior performance to our approach as shown in Table 1 and Fig. 3(c). We have also compared our framework to the method proposed in [8]. As their code is not publicly available, we ran our framework on the same ex vivo (Seqs. 7–10, 12) and in vivo data (Seq. 11) used in [8]. Example results are shown in Fig. 4(e–g). For achieving a fair comparison, we have evaluated the tracking accuracy as explained in their work, and presented both our results and theirs reported in the paper in Table 1. Although our framework achieved slightly better accuracies than their approach, our processing speed is significantly faster, ranging from 25–36 Hz, while theirs is approximately 1–2 Hz as reported in [8]. As shown in Figs. 4(b) and (d), our proposed method is robust to occlusion due to tool intersections and specularities, thanks to the fusion of 2D part detections and kinematics. In addition, our framework is able to provide accurate track− ing even when TC B becomes invalid after the laparoscope has moved (Fig. 4(c), Seq. 3). This is because TC C − is estimated online using the 2D part detections. All the processed videos are available via https://youtu.be/oqw 9Xp qsw.

4

Conclusions

In this paper, we have proposed a real-time framework for 3D tracking of articulated tools in robotic surgery. Online part-based templates are generated using the tool CAD models and robot kinematics, such that efficient 2D detection can then be performed in the camera image. For rejecting outliers, a robust verification method based on 2D geometrical context is included. The inlier 2D

Real-Time 3D Tracking of Articulated Tools for Robotic Surgery

393

detections are finally fused with robot kinematics for 3D pose estimation. Our framework can run in real-time for multi-tool tracking, thus can be used for imposing dynamic active constraints and motion analysis. The results on phantom, ex vivo and in vivo experiments demonstrate that our approach can achieve accurate 3D tracking, and outperform the current state-of-the-art. Acknowledgements. We thank Dr. DiMaio (Intuitive Surgical) for providing the CAD models, and Dr. Reiter (Johns Hopkins University) for assisting method comparisons. Dr. Giannarou is supported by the Royal Society (UF140290).

References 1. Sznitman, R., Ali, K., Richa, R., Taylor, R.H., Hager, G.D., Fua, P.: Data-driven visual tracking in retinal microsurgery. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7511, pp. 568–575. Springer, Heidelberg (2012) 2. Wolf, R., Duchateau, J., Cinquin, P., Voros, S.: 3D tracking of laparoscopic instruments using statistical and geometric modeling. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011. LNCS, vol. 6891, pp. 203–210. Springer, Heidelberg (2011) 3. Allan, M., Chang, P.-L., Ourselin, S., Hawkes, D.J., Sridhar, A., Kelly, J., Stoyanov, D.: Image based surgical instrument pose estimation with multi-class labelling and optical flow. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 331–338. Springer, Heidelberg (2015) 4. Pezzementi, Z., Voros, S., Hager, G.: Articulated object tracking by rendering consistent appearance parts. In: ICRA, pp. 3940–3947 (2009) 5. Reiter, A., Allen, P.K., Zhao, T.: Articulated surgical tool detection using virtuallyrendered templates. In: CARS (2012) 6. Sznitman, R., Becker, C., Fua, P.: Fast part-based classification for instrument detection in minimally invasive surgery. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8674, pp. 692–699. Springer, Heidelberg (2014) 7. Rieke, N., Tan, D.J., Alsheakhali, M., Tombari, F., San Filippo, C.A., Belagiannis, V., Eslami, A., Navab, N.: Surgical tool tracking and pose estimation in retinal microsurgery. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 266–273. Springer, Heidelberg (2015) 8. Reiter, A., Allen, P.K., Zhao, T.: Appearance learning for 3D tracking of robotic surgical tools. Int. J. Rob. Res. 33(2), 342–356 (2014) 9. Kazanzides, P., Chen, Z., Deguet, A., Fischer, G., Taylor, R., DiMaio, S.: An openR surgical system. In: ICRA, pp. 6434–6439 source research kit for the da vinci (2014) 10. Tsai, R., Lenz, R.: A new technique for fully autonomous and efficient 3D robotics hand/eye calibration. IEEE Trans. Rob. Autom. 5(3), 345–358 (1989) 11. Hinterstoisser, S., Cagniart, C., Ilic, S., Sturm, P., Navab, N., Fua, P., Lepetit, V.: Gradient response maps for real-time detection of textureless objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 876–888 (2012) 12. Ye, M., Giannarou, S., Meining, A., Yang, G.Z.: Online tracking and retargeting with applications to optical biopsy in gastrointestinal endoscopic examinations. Med. Image Anal. 30, 144–157 (2016)

394

M. Ye et al.

13. Chum, O., Matas, J.: Matching with PROSAC - progressive sample consensus. In: CVPR, vol. 1, pp. 220–226 (2005) 14. Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155–166 (2008)

Towards Automated Ultrasound Transesophageal Echocardiography and X-Ray Fluoroscopy Fusion Using an Image-Based Co-registration Method Shanhui Sun1(B) , Shun Miao1 , Tobias Heimann1 , Terrence Chen1 , Markus Kaiser2 , Matthias John2 , Erin Girard2 , and Rui Liao1 1

Siemens Healthcare, Medical Imaging Technologies, Princeton, NJ 08540, USA [email protected] 2 Siemens Healthcare, Advanced Therapies, 91301 Forchheim, Germany

Abstract. Transesophageal Echocardiography (TEE) and X-Ray fluoroscopy are two routinely used real-time image guidance modalities for interventional procedures, and co-registering them into the same coordinate system enables advanced hybrid image guidance by providing augmented and complimentary information. In this paper, we present an image-based system of co-registering these two modalities through realtime tracking of the 3D position and orientation of a moving TEE probe from 2D fluoroscopy images. The 3D pose of the TEE probe is estimated fully automatically using a detection based visual tracking algorithm, followed by intensity-based 3D-to-2D registration refinement. In addition, to provide high reliability for clinical use, the proposed system can automatically recover from tracking failures. The system is validated on over 1900 fluoroscopic images from clinical trial studies, and achieves a success rate of 93.4 % at 2D target registration error (TRE) less than 2.5 mm and an average TRE of 0.86 mm, demonstrating high accuracy and robustness when dealing with poor image quality caused by low radiation dose and pose ambiguity caused by probe self-symmetry. Keywords: Visual tracking based pose detection

1

· 3D-2D registration

Introduction

There is a fast growth of catheter-based procedures for structure heart disease such as transcatheter aortic valve implantation (TAVI) and transcatheter mitral valve replacement (TMVR). These procedures are typically performed under the independent guidance of two real-time imaging modalities, i.e. fluoroscopic Xray and transesophageal echocardiaography (TEE). Both imaging modalities have their own advantages, for example, Xray is good at depicting devices, and TEE is much better at soft tissue visualization. Therefore fusion of both modalities could provide complimentary information for improved security and accuracy c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 395–403, 2016. DOI: 10.1007/978-3-319-46720-7 46

396

S. Sun et al.

during the navigation and deployment of the devices. For example, a Xray/TEE fusion system can help the physican finding correct TAVR deployment angle on fluoroscopic image using landmarks transformed from annotations on TEE. To enable the fusion of Xray and TEE images, several methods have been proposed to recover the 3D pose of TEE probe from the Xray image [1–3,5,6], where 3D pose recovery is accomplished by 3D-2D image registration. In [1,2,5], 3D-2D image registration is fulfilled via minimizing dissimilarity between digitally generated radiographies (DRR) and X-ray images. In [6], DRR rendering is accelerated by using mesh model instead of a computed tomography (CT) volume. In [3], registration is accelerated using a cost function which is directly computed from X-ray image and CT scan via splatting from point cloud model without the explicit generation of DRR. The main disadvantage of these methods is that they are not fully automatic and requires initialization due to small capture range. Recently, Montney et al. proposed a detection based method to recover the 3D pose of the TEE probe from an Xray image in work [7]. 3D translation is derived from probe’s in-plane position detector and scale detector. 3D Rotation (illustrated in Fig. 1(a)) is derived from in-plane rotation (yaw angle) based on orientation detector and out-of-plane rotations (roll and pitch angles) based on a template matching based approach. They demonstrated feasibility on synthetic data. Motivated by the detection based method, we present a new method in this paper to handle practical challenges in a clinical setup such as low X-Ray dose, noise, clutters and probe self-symmetry in 2D image. Two selfsymmetry examples are shown in Fig. 1(b). To minimize appearance ambiguity, three balls (Fig. 2(a)) and three holes (Fig. 2(b)) are manufactured on the probe. Examples of ball marker and hole marker appearing in fluoroscopic images are shown in Fig. 2(c) and (d). Our algorithm explicitly detects the markers and incorporates the marker detection results into TEE probe pose estimation for an improved robustness and accuracy.

Fig. 1. (a) Illustration of TEE Euler angles. Yaw is an in-plane rotation. Pitch and roll are out-of-plane rotations. (b) Example of ambiguous appearance in two different poses. Green box indicates probe’s transducer array. Roll angle between two poses are close to 90◦ . Without considering markers (Fig. 2), probe looks similar in X-ray images.

In addition, based on the fact of that physicians acquire series of frames (a video sequence) in interventional cardiac procedure, we incorporate temporal information to boost the accuracy and speed, and we formulate our 6-DOF parameter tracking inference as a sequential Bayesian inference framework. To

Towards Automated Ultrasound Transesophageal Echocardiography

397

further remove discretization errors, Kalman filter is applied to temporal pose parameters. In addition, tracking failure is automatically detected and automated tracking initialization method is applied. For critical time points when the measurements (e.g., annotated anatomical landmarks) from the TEE image are to be transformed to the fluoroscopic image for enhanced visualization, intensitybased 3D to 2D registration of the TEE probe is performed to further refine the estimated pose to ensure a high accuracy.

Fig. 2. Illustration of probe markers circled in red. (a) 3D TEE probe front side with 3 ball markers and (b) back side with 3 hole markers. (c) Ball markers and (d) hole markers appear in X-Ray images.

2

Methods

A 3D TEE point QT EE can be projected to the 2D fluoroscopic image point QF luoro = Pint Pext (RTWEE QT EE + TTWEE ), where Pint is C-Arm’s internal projection matrix. Pext is C-Arm’s external matrix which transforms a point from TEE world coordinate to C-Arm coordinate. RTWEE and TTWEE are TEE probe’s rotation and position in the world coordinate. The internal and external matrices −1 C RT EE and are known from calibration and C-Arm rotation angles. RTWEE = Pext −1 C C C W TT EE = Pext TT EE , where RT EE and TT EE are the probe’s rotation and position in the C-Arm coordinate system. RTCEE is composed of three euler angles (θz , θx , θy ), which are illustrated in Fig. 1(a), and TTCEE = (x, y, z). The proposed tracking algorithm is formulated as finding an optimal pose on the current image t constrained via prior pose from image t − 1. In our work, pose hypotheses with pose parameters (u, v), θz , s, θx and θy are generated and optimal pose among these hypotheses are identified in a sequential Bayesian inference framework. Figure 3 illustrates an overview of the proposed algorithm. We defined two tracking stages: in-plane pose tracking for parameters (u, v), s, and θz and out-of-plane tracking for parameters θx and θy. In the context of visual tracking, the searching spaces of (ut , vt , θzt , st ) and (θxt , θyt ) are significantly reduced via generating in-plane pose hypotheses in the region of interest (ut−1 ± δT , vt−1 ± δT , θzt−1 ± δz , st−1 ± δs ), and out-of-plane pose hypotheses in the region of interest (θxt−1 ± δx , θyt−1 ± δy ), where δT , δz , δs, δx and δy are searching ranges. Note that we choose these searching ranges conservatively, i.e. much larger than typical frame-to-frame probe motion.

398

S. Sun et al.

Fig. 3. Overview of tracking framework.

2.1

In-Plane Pose Tracking

To realize tracking, we use Bayesian inference network [9] as follows. P (Mt |Zt ) ∝ P (Mt )P (Zt |Mt ), ˆ t = argmaxP (Mt |Zt ) M

(1a) (1b)

Mt

ˆ t is the optimal solution where Mt is in-plane pose parameters (u, v, θz , s). M using maximum a posterior (MAP) probability. P (Zt |Mt ) is the likelihood of an in-plane hypothesis being positive. P (Mt ) represents in-plane motion prior probability, which is defined as a joint Gaussian distribution with respect to the parameters (u, v, θz , s) with standard deviations (σT , σT , σθz and σs ). In-plane pose hypotheses are generated using marginal space learning method similar to the work in [10]. A series of cascaded classifiers are trained to classify probe position (u, v), size s, and orientation θz . These classifiers are trained sequentially: two position detectors for (u, v), orientation detector for θz and scale detector for s. Each detector is a Probabilistic Boosting Tree (PBT) classifier [8] using Haar-like features [9] and rotated Haar-like features [9]. The position classifier is trained on the annotations (positive samples) and negative samples randomly sample to be away from annotations. The second position detector performs bootstrapping procedure. Negative samples are collected from both false positive of the first position detection results and random negative samples. Orientation detector is trained on the rotated images, which are rotated to 0◦ according to annotated probe’s orientations. The Haar-like features are computed on rotated images. During orientation test stage, input image is rotated every 5◦ in range of θzt−1 ± δz . Scale detector is trained on the rotated images. Haar-like feature is computed on the rotated images and the Haar feature windows are scaled based on probe’s size. During scale test stage, Haar feature window is scaled and quantified in the range of st−1 ± δs . 2.2

Out-of-Plane Pose Tracking

Out-of-plane pose tracking performs another Bayesian inference network derived from Eq. 1. Thus in this case Mt (in Eq. 1) is out-of-plane pose parameˆ t is the optimal solution using MAP probability. P (Zt |Mt ) is ters (θx , θy ). M

Towards Automated Ultrasound Transesophageal Echocardiography

399

likelihood of an out-of-plane hypothesis being positive. P (Mt ) is an out-of-plane motion prior probability, which is defined as a joint Gaussian distribution with respect to the parameters (θx , θy ) with standard deviations (σx ,σy ). Out-of-plane pose hypothesis generation is based on a K nearest neighbour search using library-based template matching. At training stage, we generate 2D synthetic X-Ray images at different out-of-plane poses and keeping the same in-plane pose. Roll angle ranges from −180◦ to 180◦ . Pitch angle ranges from −52◦ to 52◦ , and angles out of this range are not considered since they are not clinically relevant. Both step sizes are 4◦ . All in-plane poses of these synthetic images are set to the same canonical space: probe positioned at image center, 0◦ yaw angle and normalized size. Global image representation of each image is computed representing out-of-plane pose and saved in a database. The image representation is derived based on method presented in [4]. At test stage, L in-plane pose perturbations (small translations, rotations and scales) about the computed in-plane pose (Sect. 2.1) are produced. L in-plane poses are utilized to define L probe ROIs in the same canonical space. Image representation of each ROI is computed and is used to search (e.g. KD-Tree) in the database and resulting K nearest neighbors. Unfortunately, only using global representation is not able to differentiate symmetric poses. For example, a response map of an exemplar pose to all the synthetic images shown in Fig. 4. Note that there are two dominant symmetrical modes and thus out-of-plane hypotheses are generated around these two regions. We utilize markers (Fig. 2) to address this problem. For each synthetic image, we thus save the marker positions in the database. The idea is that we perform a visibility test at each marker position in L ∗ K N searching results. The updated searching score Tˆscore = Tscore i=1 α+Pi (xi , yi ), N th where Tscore is a searching score. Pi is i marker’s visibility ([0.0, 1.0]) at marker position (xi , yi ) in the corresponding synthetic image template. N is the number of markers. α is a constant value 0.5. Marker visibility test is fulfilled using two marker detectors: ball marker detector and hole marker detector. Both detectors are two cascaded position classifiers (PBT classifier with Haar-like features), and visibility maps are computed based on the detected marker locations.

Fig. 4. An example of template matching score map for one probe pose. X-axis is roll angle and Y-axis is pitch angle. Each pixel represents one template pose. Dark red color indicates a high matching score and dark blue indicates a small matching score.

2.3

Tracking Initialization and Failure Detection

Initial probe pose in the sequence is derived from detection results without considering temporal information. We detect the in-plane position, orientation and scale, and out-of-plane roll and pitch hypotheses in the whole required searching

400

S. Sun et al.

space. We get a final in-plane pose via Non-maximal suppression and weighted average to the pose with the largest detection probability. The hypothesis with largest searching score is used as out-of-plane pose. For initializing tracking: (1) we save poses of Ni (e.g. Ni = 5) consecutive image frames. (2) A median pose is computed from Ni detection results. (3) Weighted mean pose is computed based on distance to the median pose. (4) Standard deviation σp to the mean pose is computed. Once σp < σthreshod , tracking starts with initial pose (i.e. the mean pose). During tracking, we identify tracking failure through: (1) we save Nf (e.g. Nf = 5) consecutive tracking results. (2) The average searching score mscore is computed. If mscore < mthreshold , we stop tracking and re-start tracking initialization procedure. 2.4

3D-2D Registration Based Pose Refinement

In addition, we perform 3D-2D registration of the probe at critical time points when measurements are to be transformed from TEE images to fluoroscopic images. With known perspective geometry of the C-Arm system, a DRR can be rendered for any given pose parameters. In 3D-2D registration, the pose parameters are iteratively optimized to maximize a similarity metric calculated between the DRR and the fluoroscopic image. In the proposed method, we use Spatially Weighted Gradient Correlation (SWGC) as the similarity metric, where areas around the markers in the DRR are assigned higher weights as they are more distinct and reliable features indicating the alignment of the two images. SWGC is calculated as Gradient Correlation (GC) of two weighted images: SW GC = GC(If · W, Id · W ), where If and Id denote the fluoroscopic image and the DRR, respectively, W is a dense weight map calculated based on the projection of the markers, and GC(·, ·) denotes the GC of the two input images. Using SWGC as the similarity metric, the pose parameters are optimized using Nelder-Mead optimizer to maximize SWGC.

3

Experiment Setup, Results and Discussions

For our study, we trained machine learning based detectors on ∼ 10, 000 fluoroscopic images (∼ 90 % images are synthetically generated images and ∼ 10 % images are clinical images). We validated our methods on 34 X-Ray fluoroscopic videos (1933 images) acquired from clinical experiments, and 13 videos (2232 images) from synthetic generation. The synthetic images were generated by blending DRRs of the TEE probe (including tube) with real fluoroscopic images containing no TEE probe. Particularly for the test synthetic sequences, we simulate realistic probe motions (e.g., insertion, retraction, roll etc.) in the fluoroscopic sequences. Ground truth poses for synthetic images are derived from 3D probe geometry and rendering parameters. Clinical images are manually annotated using our developed interactive tool by 4 experts. Image size is 1024 × 1024 pixels. Computations were performed on a workstation with Intel Xeon (E5-1620) CPU 3.7 GHz and 8.00 GB Memory. On average, our tracking

Towards Automated Ultrasound Transesophageal Echocardiography

401

algorithm performs at 10 fps. We performed our proposed detection algorithm (discussed in Sect. 2.3, tracking is not enabled), proposed automated tracking algorithm and registration refinement after tracking on all test images. Algorithm accuracy was evaluated by calculating the standard target registration error (TRE) in 2D. The targets are defined at the four corners of the TEE imaging cone at 60 mm depth and the reported TRE is the average TRE over the four targets. 2D TRE is a target registration error that z axis (depth) of the projected target point is not considered when computing distance error. Table 1 shows success rate, average TRE and median TRE at 2D TRE < 4 mm and < 2.5 mm respectively. Figure 5 shows success rate vs 2D TRE on all validated clinical and synthetic images.

(a)

(b)

Fig. 5. Result of success rate vs 2D TRE on clinical (a) and synthetic (b) validations of the proposed detection, tracking and 3D-2D registration refinement algorithms.

Due to limited availability of clinical data, we enlarged our training data set using synthetic images. Table 1 and Fig. 5 show our approach performs well on real clinical data utilizing hybrid training data. We expect increased robustness and accuracy after larger number of real clinical cases become available. Tracking algorithm improved robustness and accuracy comparing to detection alone approach. One limitation of our tracking algorithm is not able to compensate all discretization errors although temporal smoothing is applied using Kalman filter. This is a limitation of any detection based approach. To further enhance accuracy, refinement is applied when physicians perform the measurements. To Table 1. Quantitative results on validations of the proposed detection (Det), tracking (Trak) and 3D-2D registration refinement (Reg) algorithms. Numbers in the table show success rate, mean TRE (mm), median TRE (mm) under different TRE error ranges. Clinical data Method TRE < 4 mm

Synthetic data TRE < 2.5 mm

TRE < 4 mm

TRE < 2.5 mm

Det

(80.0 %, 2.09, 2.02) (50.9 %, 1.47, 1.48) (88.4 %, 1.86, 1.73) (64.7 %, 1.38, 1.35)

Trak

(91.6 %, 1.71, 1.61) (73.7 %, 1.38, 1.36) (96.4 %, 1.59, 1.42) (79.7 %, 1.28, 1.22)

Reg

(98.0 %, 0.97, 0.79) (93.4 %, 0.86, 0.75) (96.4 %, 0.69, 0.52) (94.3 %, 0.63, 0.51)

402

S. Sun et al.

better understand the performance from registration refinement, in our study we applied the refinement step on all images after tracking. Note that the refinement algorithm did not bring more robustness but improved the accuracy.

4

Conclusion

In this work, we presented a fully automated method of recovering the 3D pose of TEE probe from the Xray image. Tracking is very important to give physicians the confidence that the probe pose recovery is working robustly and continuously. Abrupt failed probe detection is not good especially when the probe does not move. Detection alone based approach is not able to address abrupt failures due to disturbance, noise and appearance ambiguities of the probe. Our proposed visual tracking algorithm avoids abrupt failure and improves detection robustness as shown in our experiment. In addition, our approach is a near realtime approach (about 10 FPS) and a fully automated approach without any user interaction, e.g. manual pose initialization as required by many state-of-the-art methods. Our proposed complete solution addressing TEE and X-Ray fusion problem is applicable to clinical practice due to high robustness and accuracy. Disclaimer: The outlined concepts are not commercially available. Due to regulatory reasons their future availability cannot be guaranteed.

References 1. Gao, G., et al.: Rapid image registration of three-dimensional transesophageal echocardiography and X-ray fluoroscopy for the guidance of cardiac interventions. In: Navab, N., Jannin, P. (eds.) IPCAI 2010. LNCS, vol. 6135, pp. 124–134. Springer, Heidelberg (2010) 2. Gao, G., et al.: Registration of 3D transesophageal echocardiography to X-ray fluoroscopy using image-based probe tracking. Med. Image Anal. 16(1), 38–49 (2012) 3. Hatt, C.R., Speidel, M.A., Raval, A.N.: Robust 5DOF transesophageal echo probe tracking at fluoroscopic frame rates. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 290–297. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24553-9 36 4. Hinterstoisser, S., et al.: Gradient response maps for real-time detection of textureless objects. IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 876–888 (2012) 5. Housden, R.J., et al.: Evaluation of a real-time hybrid three-dimensional echo and X-ray imaging system for guidance of cardiac catheterisation procedures. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7511, pp. 25–32. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33418-4 4 6. Kaiser, M., et al.: Significant acceleration of 2D–3D registraion-based fusion of ultrasound and X-ray images by mesh-based DRR rendering. In: SPIE, p. 867111 (2013) 7. Mountney, P., et al.: Ultrasound and fluoroscopic images fusion by autonomous ultrasound probe detection. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012. LNCS, vol. 7511, pp. 544–551. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33418-4 67

Towards Automated Ultrasound Transesophageal Echocardiography

403

8. Tu, Z.: Probabilistic boosting-tree: learning discriminative models for classification, recognition, and clustering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1589–1596 (2005) 9. Wang, P., et al.: Image-based co-registration of angiography and intravascular ultrasound images. IEEE TMI 32(12), 2238–2249 (2013) 10. Zheng, Y., et al.: Four-chamber heart modeling and automatic segmentation for 3D cardiac CT volumes using marginal space learning and steerable featurs. IEEE Trans. Med. Imaging 27(11), 1668–1681 (2008)

Robust, Real-Time, Dense and Deformable 3D Organ Tracking in Laparoscopic Videos Toby Collins(B) , Adrien Bartoli, Nicolas Bourdel, and Michel Canis ALCoV-ISIT, UMR 6284 CNRS/Universit´e d’Auvergne, Clermont-Ferrand, France [email protected]

Abstract. An open problem in computer-assisted surgery is to robustly track soft-tissue 3D organ models in laparoscopic videos in real-time and over long durations. Previous real-time approaches use locally-tracked features such as SIFT or SURF to drive the process, usually with KLT tracking. However this is not robust and breaks down with occlusions, blur, specularities, rapid motion and poor texture. We have developed a fundamentally different framework that can deal with most of the above challenges and in real-time. This works by densely matching tissue texture at the pixel level, without requiring feature detection or matching. It naturally handles texture distortion caused by deformation and/or viewpoint change, does not cause drift, is robust to occlusions from tools and other structures, and handles blurred frames. It also integrates robust boundary contour matching, which provides tracking constraints at the organ’s boundaries. We show that it can track over long durations and can handles challenging cases that were previously unsolvable.

1

Introduction and Background

There is much ongoing research to develop and apply Augmented Reality (AR) to improve laparoscopic surgery. One important goal is to visualise hidden subsurface structures such as tumors or major vessels by augmenting optical images from a laparoscope with 3D radiological data from e.g. MRI or CT. Solutions are currently being developed to assist various procedures including liver tumor resection such as [6], myomectomy [3] and partial nephrectomy [9]. To solve the problem one must register the data modalities. The general strategy is to build a deformable 3D organ model from the radiological data, then to determine the model’s 3D transformation to the laparoscope’s coordinate system at any given time. This is very challenging and a general, automatic, robust and realtime solution does not yet exist. The problem is especially hard with monocular laparoscopes because of the lack of depth information. A crucial missing component is a way to robustly compute dense matches between the organ’s surface and the laparoscopic images. Currently, real-time results have only been achieved with sparse feature-based matches using KLT [5,10], however this is quite fragile, suffers from drift, and can quickly break down for a number of reasons including occlusions, sudden camera motion, motion blur and optical blur. c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 404–412, 2016. DOI: 10.1007/978-3-319-46720-7 47

Robust, Real-Time, Dense and Deformable 3D Organ Tracking

405

To reliably solve the problem a much more advanced, integrated framework is required, which is the focus of this paper. Our framework is fundamentally a template-driven approach which works by matching each image directly to a deformable 3D template, which in our case is a textured 3D biomechanical model of the organ. The model’s intrinsic, physical constraints are fully integrated which allows a high level of robustness. This differs from registration using KLT tracks, where tracks are made by independently tracking points frame-to-frame without being constrained by the model. This causes a lack of robustness and drift, where over time the tracked points no longer corresponds to the same physical point. We propose to solve this by densely and robustly matching the organ’s texture at the pixel level, which is designed to overcome several fundamental limitations of feature-base matching. Specifically, feature-based matches exist only at sparse. discriminative, repeatable feature points (or interest points), and for tissues with weak and/or repetitive texture it can be difficult to detect and match enough features to recover the deformation. This is especially true with blurred frames, lens smears, significant illumination changes, and distortions caused by deformations or viewpoint change. By contrast we match the organ’s texture densely, without requiring any feature detection or feature matching, and in a way that naturally handles texture distortions and illumination change.

2

Methodology

We now present the framework, which we refer to as Robust, Real-time, Dense and Deformable (R2D2) tracking. Figure 1 gives an overview of R2D2 tracking using an in-vivo porcine kidney experiment as an example. Model Requirements. We require three main models. The first is a geometric model of the organ’s outer surface, which we assume is represented by a closed surface mesh S. We denote its interior by Ω ⊂ R3 . The second is a deformation model, which has a transform function f (p; xt ) : Ω → R3 that transforms a 3D

Fig. 1. Overview of R2D2 tracking with monocular laparoscopes. Top row: modelling the organ’s texture by texture-mapping it from a set of reference laparoscopic images. Bottom row: real-time tracking of the textured model.

406

T. Collins et al.

point p ∈ Ω to the laparoscope’s coordinates frame at time t. The vector xt denotes the model’s parameters at time t, and our task is to recover it. We also require the deformation model to have an internal energy function, which gives the associated energy for transforming the organ according to xt . We use Et to regularise the tracking problem. In the presented experiments the deformation models used are tetrahedral finite element models, generated by a regular 3D vertex grid cropped to the organ’s surface mesh (sometimes called a cage), and we compute f with trilinear interpolation. Thus xt denotes the unknown 3D positions of the grid vertices in the laparoscope’s coordinate frame. For Einternal we have used the isotropic Saint Venant-Kirchoff (StVK) strain energy, which has been shown to work well for reconstructing deformations from 2D images [5]. Further modelling deails are given in the experimental section. The third model that we require is a texture model, which models the photometric appearance of S. Unlike feature-based tracking, where the texture model is essentially a collection of 2D features, we will be densely tracking its texture, and so we require a dense texture model. We do this with a texture-map, which is a common model used in computer graphics. Specifically, our texture-map is a 2D colour image T (u, v) : R2 → [0, 255]3 which models the surface appearance up to changes of illumination. Texture-Map Construction. Before tracking begins we construct T through a process known as image-based texture-mapping. This requires taking laparoscopic images of the organ from several viewpoints (we call these reference images). The reference images are then used to generate T through an image mosaicing process. To do this we must align S to the reference images. Once done T can be constructed automatically using an existing method (we currently use Agisoft’s Photoscan’s method [1], using a default mosaic resolution of 4096 × 4096 pixels). The difficult part is computing the alignments. Note that this is done just once so it does not need to be real-time. We do this using an existing semi-automatic approach based on [3], which assumes the organ does not deform when the reference images are taken. This requires a minimum of two reference images, however more can be used to build a more complete texture model (in our experiments we use between 4 and 8 reference images), taking approximately three minutes to compute with non-optimised code. Tracking Overview. Our solution builds on a new technique called Deformable Render-based Block Matching (DRBM) [2], which was originally proposed to track thin-shell objects such as cloth and plastic bottles, yet has great potential for our problem. It works by densely matching each image It to a time-varying 2D photometric render Rt of the deforming object. The render is generated from the camera’s viewpoint and is continuously updated to reflect the current deformation. Matching is performed by dividing Rt into local pixel windows, then each window is matched to It with an illumination-invariant score function and a fast coarse-to-fine search process. At a final stage most incorrect matches, caused by e.g. occlusions or specularities are detected and eliminated using several consistency tests. The remaining matches are used as deformation constraints, which

Robust, Real-Time, Dense and Deformable 3D Organ Tracking

407

are combined with the model’s internal energy, then xt is solved with energy minimisation. Once completed the new solution is used to update the render, the next image is acquired and the process repeats. Because this process tracks the model frame-to-frame a mechanism is needed for initialisation (to provide an initial extimate of xt at the start) and re-initialisation (to provide and initial estimate if tracking fails). We discuss these mechanisms below. We use DRBM as a basis and extend it to our problem. Firstly, DRBM requires at least some texture variation to be present, however tissue can be quite textureless in some regions. To deal with this additional constraints are needed. One that has rarely been exploited before are organ boundary constraints. Specifically, if the organ’s boundary is visible (either partially or fully) it can be used as a tracking constraint. Organ boundaries have been used previously to semiautomatically register pre-operative models [3], but not for automatic real-time tracking. This is non-trivial because one does not know which points correspond to the organ’s boundary a priori. Secondly, we extend it to volumetric biomechanical deformable models, and thirdly we introduce semi-automatic texture map updating, which allows strong changes of the organ’s appearance to be handled, due to e.g. coagulation. Overview and Energy-Based Formulation. To ease readability we now drop the time index. During tracking texture matches are found using DRBM, which outdef puts a quasi-dense set of texture matches Ctexture = {(p1 , q1 ), . . . , (pN , qN )} between 3D points pi ∈ R3 on the surface mesh S and points qi ∈ R2 def in the image. We also compute a dense set of boundary matches Cbound = ˜ 1 ), . . . , (˜ ˜ M )} along the model’s boundary, as described below. Note {(˜ p1 , q pM , q that this set can be empty if none of its boundaries are visible. The boundary matches work in an Iterative Closest Point (ICP) sense, where over time the boundary correspondences slide over the surface as it deforms. Our energy function E(x) ∈ R+ encodes tracking cues from the image (Ctexture , Cbound ) and the model’s internal deformation energy, and has the following form: E(x) = Ematch (x; Ctexture ) + λbound Ematch (x; Cbound ) + λinternal Einternal (x) (1) The term Ematch is a point-match energy, which generates the energy for both texture and boundary matches. This is defined as follows:  def ρ (π(f (pi ; x)) − qi 2 ) (2) Ematch (x; C) = (pi ,qi )∈C

where π : R3 → R2 is the camera’s projection function. We assume the laparoscope is intrinsically calibrated, which means π is known. The function ρ : R → R+ is an M-estimator and is crucial to achieve robust tracking. It ts purpose is to align the model point pi with the image point qi , but to do so robustly to account for erroneous matches, which are practically unavoidable.

408

T. Collins et al.

When a match is erroneous the model should not align the match, and the Mestimator provides this by reducing the influence of an erroneous match on E. We have tested various M-estimators and found good results are obtained with def √ pseudo-L1 ρ(x) = x2 + ǫ with ǫ = 10−3 being a small constant to make Ematch differentiable everywhere. The terms λbound and λinternal are influence weights, and discuss how they have been set in the experimental section. We follow the same procedure to minimise E as described in [2]. This is done by linearising E about the current estimate (which is the solution from the previous frame), then we form the associated linear system and solve its normal equations using a coarse-to-fine multi-grid Gauss-Newton optimisation with backtracking line-search. Computing Boundary Matches. We illustrate this process in Fig. 2(k). First we take R and extract all pixels P on the render’s boundary. For each pixel pi ∈ P ˜ i , which is determined from the we denote its 3D position on the model by p ˜i. render’s depthmap. We then conduct a 1D search in I for a putative match q The search is centred at pi in the direction orthogonal to the render’s boundary, which we denote by the unit vector vi . We search within a range [−l, +l] in one pixel increments where l is a free parameter, and measure the likelihood b(p) ∈ R that a sample p corresponds to the organ’s boundary. We currently compute b with a hand-crafted detector, based on the fact that organ boundaries tend to occur at low-frequency intensity gradients, which correspond to a change of predominant tissue albedo. We give the precise algorithm for computing b in the ˜ i as the sample with the maximal b beyond a supplementary material. We take q detection threshold bτ . If no such sample exists then we do not have a boundary match. An important stage is then to eliminate false positives because there may be other nearby boundary structures that could cause confusion. For this we adopt a conservative strategy and reject the match if there exists another local minimum of b along the search line that also exceeds bτ . Initialisation, Re-localisation and Texture Model Updating. There are various approaches one can use for initialisation and re-localisation. One is with an automatic wide-baseline pose estimation method such as [7]. An alternative is to have the laparoscope operator provide them, by roughly aligning the live video with a overlaid render of the organ from some canonical viewpoint (Figs. 1 and 2(a)), and then tracking is activated. The alignment does not need to be particularly precise due to the robustness of our match terms, which makes it a practical option. For the default viewpoint we use the model’s pose in one of the reference images from the texture-map construction stage. The exact choice is not too important so we simply use the one where the model centroid is closest to the image centre. During tracking, we have the option to update the texture model by re-texturing its front-facing surface regions with the current image. This is useful where the texture changes substantially during surgery. Currently this is semi-automatic to ensure the organ is not being occluded by tools or other organs in the current image, and is activated by a user notification. In future work aim to make this automatic, but this is non-trivial.

Robust, Real-Time, Dense and Deformable 3D Organ Tracking

409

Fig. 2. Visualisations of the five test cases and tracking results. Best viewed in colour.

3

Experimental Results

We evaluate performance with five test cases which are visualised in Fig. 2 as five columns. These are two in-vivo porcine kidneys (a,b), an in-vivo human uterus (c), an ex-vivo chicken thigh used for laparoscopy training (d) and an ex-vivo porcine kidney (e). We used the same kidney in cases (a) and (e). The models were constructed from CT (a,b,d,e) and T2 weighted MRI (c), and segmented interactively with MITK. For each case we recorded a monocular laparoscopic video (10 mm Karl Storz 1080p, 25fps with CLARA image enhancement) of the object being moved and deformed with surgical tools (a,b,c,d) or with human hands (e). The video durations ranged from 1424 to 2166 frames (57 to 82 s). The objects never moved completely out-of-frame in the videos, so we used them to test tracking performance without re-localisation. The main challenges present are low light and high noise (c), strong motion blur (b,c), significant texture change caused by intervention (a,c), tool occlusions (a,b,c,d), specularities (a,b,c,d,e), dehydration (b), smoke (c), and partial occlusion where the organ disappears behind the peritoneum (b,c). We constructed deformable models with a 6 mm grid spacing with the number of respective tetrahedral elements for (a– e) being 1591, 1757, 8618, 10028 and 1591. Homogeneous StVK elements were used for (a,b,c,e) using rough generic Poison’s ratio ν values from the literature. These were ν = 0.43 for (a,b,e) [4] and ν = 0.45 for (c). Note that when we use homogeneous elements, the Young’s modulus E is not actually a useful parameter for us. This because if we double E and halve λinternal we end up with the same internal energy. We therefore arbitrarily set E = 1 for (a,b,c,e). For (d) we

410

T. Collins et al.

used two coarse element classes corresponding to bone and all other tissue, and we set their Young’s moduli using relative values of 200 and 1 respectively. Our tracking framework has several tunable parameters, which are (i) the energy weights, (ii) the boundary search length l, (iii) the boundary detector parameters and (iv) the DRBM parameters. To make them independent of the image resolution, we pre-scale the images to a canonical width of 640 pixels. For all five cases we used the same values of (iii) and (iv) (their respective defaults), and the same value for (iii) of l = 15 pixels. For (i), we used the same value of λbound = 0.7 in all cases. For λinternal we used category-specific values, which were λinternal = 0.2 for the uterus, λinternal = 0.09 for kidneys and λinternal = 0.2 for the chicken thigh. In the interest of space, the results presented here do not use texture model updating. This is to evaluate tracking robustness despite significant appearance change. We refer the reader to the associated videos to see texture model updating in action. We benchmarked processing speed on a mid-range Intel i7-5960X desktop PC with a single NVidia GTX 980Ti GPU. With our current multi-threaded C++/CUDA implementation the average processing speeds were 35, 27, 22, 17 and 31fps for cases (ae) respectively. We also ran our framework without the boundary constraints (λbound = 0). This was to analyse its influence on tracking accuracy, and we call this version R2D2-b. We show snapshot results from the videos in Fig. 2. In Fig. 2(f–j) we show five columns corresponding to each case. The top image is an example input image, the middle image shows DRBM matches (with coarse-scale matches in green, fine-scale matches in blue, gross outliers in red) and the boundary matches in yellow. The third image shows an overlay of the tracked surface mesh. We show three other images with corresponding overlays in Fig. 2(l–n). The light path on the uterus in Fig. 2(h) is a coagulation path used for interventional incision planning, and it significantly changed the appearance. The haze in Fig. 2(m) is a smoke plume. In Fig. 2(o) we show the overlay with and without boundary constraints (top and bottom respectively). This is an example where the boundary constraints have clearly improved tracking. We tested how well KLT-based tracking worked by measuring how long it could sustain tracks from the first video frames. Due to the challenges of the conditions, KLT tracks dropped off quickly in most cases. mostly due to blur or tool occlusions. Only in case (b) did some KLT tracks persist to the end, however they were limited to a small surface region which congregated around specularities (and therefore were drifting). By contrast our framework sustained tracking through all videos. It is difficult to quantitatively evaluate tracking accuracy in 3D without interventional radiological images, which were not available. We therefore measured accuracy using 2D proxies. These were (i) Correspondence Prediction Error (CPE) and (ii) Boundary Prediction Error (BPE). CPE tells us how well the tracker aligns the model with respect to a set of manually located point correspondences. We found approximately 20 per case, and located them in 30 representative video frames. We then measured the distance (in pixels) to their tracked positions. BPE tells us how well the tracker aligns the model’s boundaries to the image. This was done by manually marking any contours in

Robust, Real-Time, Dense and Deformable 3D Organ Tracking

411

Table 1. Summary statistics of the quantitative performance evaluation (in pixels). Errors are computed using a default image width of 640 pixels.

the representative images that corresponded to the object’s boundary. We then measured the distance (in pixels) between each contour point and the model’s boundary. The results are shown in Table 1, where we give summary statistics (median, inter-quartile range, median, standard deviation and maximum). The table also includes results from R2D2-b. To show the benefits of tracking with a deformable model, we also compare with a fast feature-based baseline method using a rigid transform model. For this we used SIFT matching with HMA outlier detection [8] (using the author’s implementation) and rigid pose estimation using OpenCV’s PnP implementation. We denote this by R-HMA. Its performance is certainly worse, which is because it cannot model deformation, and also because HMA was sometimes unable to find any correct feature clusters, most notably in (c) due to poor texture, blur and appearance changes.

4

Conclusion

We have presented a new, integrated, robust and real-time solution for dense tracking of deformable 3D soft-tissue organ models in laparoscopic videos. There are a number of possible future directions. The main three are to investigate automatic texture map updating, to investigate its performance using stereo laparoscopic images, and to automatically detect when tracking fails.

References 1. Agisoft Photoscan. http://www.agisoft.com. Accessed 30 May 2016 2. Collins, T., Bartoli, A.: Realtime shape-from-template: system and applications. In: ISMAR (2015) 3. Collins, T., Pizarro, D., Bartoli, A., Canis, M., Bourdel, N.: Computer-assisted laparoscopic myomectomy by augmenting the uterus with pre-operative MRI data. In: ISMAR (2014)

412

T. Collins et al.

4. Egorov, V., Tsyuryupa, S., Kanilo, S., Kogit, M., Sarvazyan, A.: Soft tissue elastometer. Med. Eng. Phys. 30(2), 206–212 (2008) 5. Haouchine, N., Dequidt, J., Berger, M., Cotin, S.: Monocular 3D reconstruction and augmentation of elastic surfaces with self-occlusion handling. IEEE Trans. Vis. Comput. Graph. 21(12), 1363–1376 (2015) 6. Haouchine, N., Dequidt, J., Peterlik, I., Kerrien, E., Berger, M.-O., Cotin, S.: Image-guided simulation of heterogeneous tissue deformation for augmented reality during hepatic surgery. In: ISMAR (2013) 7. Puerto-Souza, G., Cadeddu, J.A., Mariottini, G.: Toward long-term and accurate augmented-reality for monocular endoscopic videos. Bio. Eng. 61(10), 2609–2620 (2014) 8. Puerto-Souza, G., Mariottini, G.: A fast and accurate feature-matching algorithm for minimally-invasive endoscopic images. TMI 32(7), 1201–1214 (2013) 9. Su, L.-M., Vagvolgyi, B.P., Agarwal, R., Reiley, C.E., Taylor, R.H., Hager, G.D.: Augmented reality during robot-assisted laparoscopic partial nephrectomy: toward real-time 3D-CT to stereoscopic video registration. Urology 73, 896–900 (2009) 10. Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical report CMU-CS-91-132 (1991)

Structure-Aware Rank-1 Tensor Approximation for Curvilinear Structure Tracking Using Learned Hierarchical Features Peng Chu1 , Yu Pang1 , Erkang Cheng1 , Ying Zhu2 , Yefeng Zheng3 , and Haibin Ling1(B) 1

3

Computer and Information Sciences Department, Temple University, Philadelphia, USA [email protected] 2 Electrical and Computer Engineering Department, Temple University, Philadelphia, USA Medical Imaging Technologies, Siemens Healthcare, Princeton, USA

Abstract. Tracking of curvilinear structures (CS), such as vessels and catheters, in X-ray images has become increasingly important in recent interventional applications. However, CS is often barely visible in lowdose X-ray due to overlay of multiple 3D objects in a 2D projection, making robust and accurate tracking of CS very difficult. To address this challenge, we propose a new tracking method that encodes the structure prior of CS in the rank-1 tensor approximation tracking framework, and it also uses the learned hierarchical features via a convolutional neural network (CNN). The three components, i.e., curvilinear prior modeling, high-order information encoding and automatic feature learning, together enable our algorithm to reduce the ambiguity rising from the complex background, and consequently improve the tracking robustness. Our proposed approach is tested on two sets of X-ray fluoroscopic sequences including vascular structures and catheters, respectively. In the tests our approach achieves a mean tracking error of 1.1 pixels for vascular structure and 0.8 pixels for catheter tracking, significantly outperforming state-of-the-art solutions on both datasets.

1

Introduction

Reliable tracking of vascular structures or intravascular devices in dynamic X-ray images is essential for guidance during interventional procedures and postprocedural analysis [1–3,8,13,14]. However, bad tissue contrast due to low radiation dose and lack of depth information always bring challenges on detecting and tracking those curvilinear structures (CS). Traditional registration and alignment-based trackers depend on local image intensity or gradient. Without high-level context information, they cannot efficiently discriminate lowcontrasted target structure from complex background. On the other hand, the confounding irrelevant structures bring challenges to detection-based tracking. Recently, a new solution is proposed that exploits the progress in multi-target c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 413–421, 2016. DOI: 10.1007/978-3-319-46720-7 48

414

P. Chu et al.

tracking [2]. After initially detecting candidate points on a CS, the idea is to model CS tracking as a multi-dimensional assignment (MDA) problem, then a tensor approximation is applied to search for a solution. The idea encodes highorder temporal information and hence gains robustness against local ambiguity. However, it suffers from the lack of mechanism to encode the structure prior in CS, and the features used in [2] via random forests lack discrimination power.

Rand-1 tensor approximation

Detect using hierarchical feature

Model likelihood

Unfolding

Build links Spatial interaction

Likelihood neighbor on model

Fig. 1. Overview of the proposed method.

In this paper, we present a new method (refer to Fig. 1 for the flowchart) to detect and track CS in dynamic X-ray sequences. First, a convolutional neural network (CNN) is used to detect candidate landmarks on CS. CNN automatically learns the hierarchical representations of input images [6,7] and has been recently used in medical image analysis (e.g. [9,10]). With the detected CS candidates, CS tracking is converted to a multiple target tracking problem and then a multi-dimensional assignment (MDA) one. In MDA, candidates are associated along motion trajectories cross time, while the association is constructed according to the trajectory affinity. It has been shown in [11] that MDA can be efficiently solved via rank-1 tensor approximation (R1TA), in which the goal is to seek vectors to maximize the “joint projection” of an affinity tensor. Sharing the similar procedure, our solution adopts R1TA to estimate the CS motion. Specifically, a high-order tensor is first constructed from all trajectory candidates over a time span. Then, the model prior of CS is integrated into R1TA encoding the spatial interaction between adjacent candidates in the model. Finally, CS tracking results are inferred from model likelihood. The main contribution of our work lies in two-fold. (1) We propose a structure-aware tensor approximation framework for CS tracking by considering the spatial interaction between CS components. The combination of such spatial interaction and higher order temporal information effectively reduces association ambiguity and hence improves the tracking robustness. (2) We design a discriminative CNN detector for CS candidate detection. Compared with traditional hand-crafted features, the learned CNN features show very high detection quality in identifying CS from low-visibility dynamic X-ray images. As a result, it greatly reduces the number of hypothesis trajectories and improves the tracking efficiency.

Structure-Aware Rank-1 Tensor Approximation for Curvilinear Structure

415

For evaluation, our method is tested on two sets of X-ray fluoroscopic sequences including vascular structures and catheters, respectively. Our approach achieves a mean tracking error of 1.1 pixels on the vascular dataset and 0.8 pixels on the catheter dataset. Both results are clearly better than other state-of-the-art solutions in comparison.

2

Candidate Detection with Hierarchical Features

Detecting CS in the low-visibility dynamic X-ray images is challenging. Without color and depth information, CS shares great similarity with other anatomical structures or imaging noise. Attacking these problems, a four-layer CNN (Fig. 2) is designed to automatically learn hierarchical features for CS candidate detection. We employ 32 filters of size 5 × 5 in the first convolution stage, and 64 filters of the same size in the second stage. Max-pooling layers with a receptive window of 2 × 2 pixels are employed to down-sample the feature maps. Finally, two fully-connected layers are used as the classifier. Dropout is employed to reduce overfitting. The CNN framework used in our experiments is based on MatConvNet [12].

Probability map

Inputimage

Image patches @

Convolutional

Max-pooling ConvolutionalMax-pooling

Stage 1

Stage 2 Feature extraction

Fully connected Classifier

Fig. 2. The CNN architecture for CS candidate detection.

For each image in the sequence except the first one which has groundtruth annotated manually, a CS probability map is computed by the learned classifier. A threshold is set to eliminate most of the false alarms in the image. Result images are further processed by filtering and thinning. Typically, binarized probability map is filtered by a distance mask in which locations too far from the model are excluded. Instead of using a groundtruth bounding box, we take the tracking results from previous image batches. Based on the previously tracked model, we calculate the speed and acceleration of the target to predict its position in next image batch. Finally, after removing isolated pixels, CS candidates are generated from the thinning results. Examples of detection results are shown in Fig. 3. For comparison, probability maps obtained by a random forests classifier with hand-crafted features [2] are also listed. Our probability maps contain less false alarm, which guarantees more accurate candidate locations after postprocessing.

416

P. Chu et al.

Fig. 3. Probability maps and detected candidates of a vessel (left) and catheter (right). For each example, from left to right are groundtruth, random forests result, and CNN result, respectively. Red indicates region with high possibility, while green dots show resulting candidates.

3

Tracking with Model Prior

To encode the structure prior in a CS model, we use an energy maximization scheme that combines temporal energy of individual candidate and spatial interaction energy of multiple candidates into a united optimization framework. Here, we consider the pairwise interactions of two candidates on neighboring frames. The assignment matrix between two consecutive sets O(k−1) and O(k) (i.e. detected candidate CS landmarks) can be written as X(k) = (xik−1 ik )(k) , (k) where k = 1, 2, . . . , K, and oik ∈ O(k) is the ik -th landmark candidate of CS. For notation convenience, we use a single subscript jk to represent the entry (k) . (k) (k) index (ik−1 , ik ), such as xjk = xik−1 ik , i.e., vec(X(k) ) = (xjk ) for vectorized X(k) . Then our objective function can be written as f (X ) =



(1) (2)

(K)

cj1 j2 ...jK xj1 xj2 . . . xjK +

K  

(k)

(k)

(k) (k)

(1)

wlk jk elk jk xlk xjk ,

k=1 lk ,jk (k)

where cj1 j2 ...jK is the affinity measuring trajectory confidence; wlk jk the likeli(k)

(k)

(k)

hood that candidates xjk and xlk are neighboring on the model; and elk jk the spatial interaction of two candidates on two consecutive frames. The affinity has two parts as (2) ci0 i1 ,...iK = appi0 i1 ,...iK × kini0 i1 ,...iK , where appi0 i1 ,...iK describes the appearance consistency of the trajectory, and kini0 i1 ,...iK the kinetic affinity modeling the higher order temporal affinity as detailed in [2]. Model Prior. CS candidates share two kinds of spatial constrains. First, trajectories of two neighboring elements should have similar direction. Second, relative order of two neighboring elements should not change so that re-composition of CS is prohibited. Thus inspired, we formulate the spatial interaction of two candidates as . (3) elk jk = emk−1 mk ik−1 ik = Epara + Eorder ,

Structure-Aware Rank-1 Tensor Approximation for Curvilinear Structure

417

where (k−1)

Epara =

(oi

k−1

(k−1)

(oi

k−1

(k)

(k−1)

(k)

− oi ) · (omk−1 − omk ) k

(k)

(k−1)

(k)

− oi ) · (omk−1 − omk )

(k−1)

, Eorder =

(oi

k−1

(k−1)

(oi

k−1

k

(k−1)

(k)

− omk−1 ) · (oi

k

(k−1)

(k)

− omk−1 ) · (oi

k

(k)

− omk ) (k)

,

− omk )

such that Epara models the angle between two neighbor trajectories, which also penalizes large distance change between them; and Eorder models the relative order of two adjacent candidates by the inner product of vectors between two neighbor candidates. Maximizing Eq. 1 closely correlates with the rank-1 tensor approximation (R1TA) [4], which aims to approximate a tensor by the tensor product of unit vectors up to a scale factor. By relaxing the integer constraint on the assignment variables, once a real valued solution of Xk is achieved, it can be binarized using the Hungarian algorithm [5]. The key issue here is to accommodate the row/column ℓ1 normalization in a general assignment problem, which is different from the commonly used ℓ2 norm constraint in tensor factorization. We develop an approach similar to [11], which is a tensor power iteration solution with ℓ1 row/column normalization. (k) . (k) Model Likelihood. Coefficient wlk jk = wmk−1 mk ik−1 ik measures the likelihood (k−1)

(k−1)

that two candidates oik−1 and omk−1 are neighboring on model. In order to get the association of each candidate pair in each frame, or in other words, to (k) (0) measure the likelihood a candidate oik matching a model element part oi0 , we (k)

maintain a “soft assignment”. In particular, we use θi0 ik to indicate the likelihood that

(k) oik

corresponds to

(0) oi0 .

It can be estimated by

Θ(k) = Θ(k−1) X(k) , k = 1, 2, . . . , K,

(4)

(k) (θi0 ik )

where Θ(k) = ∈ RI0 ×Ik and Θ(0) is fixed as the identity matrix. The model likelihood is updated in each step of the power iteration. After the update of the first term in Eq. 1, a pre-likelihood Θ′(k) is estimated for computing (k) wlk jk . Since Θ(k) associates candidates directly with the model, final tracking result of the matching between o(0) and o(k) can be derived from Θ(k) . (k−1) (k−1) With Θ′(k) , the approximated distance on model of oik−1 and omk−1 can be calculated as following  (0) (0) (k) (k) (oi0 − oi0 +1 )θi0 ik θi0 +1mk (k) dik mk = i0 . (5)  (k) (k) i0 θi0 ik θi0 +1mk (k)

Thereby, wlk jk then can be simply calculated as (k) . (k) wlk jk = wmk−1 mk ik−1 ik =

(k−1)

2dik−1 mk−1 d¯ (k−1) ¯2 (dik−1 mk−1 )2 + (d)

,

(6)

where d¯ is the average distance between two neighboring elements on model O(0) . The proposed tracking method is summarized in Algorithm 1.

418

P. Chu et al.

Algorithm 1. Power iteration with model prior 1: Input: Global affinity C = (cj1 j2 ...jK ), spatial interaction elk jk , k = 1 . . . K, and CS candidates O(k) , k = 0 . . . K. 2: Output: CS Matching. 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17:

4

Initialize X(k) , k = 1 . . . K, CS (0) = O(0) and Θ(0) = I. repeat for k = 1, . . . , K do for jk = 1, . . . , J do (k) . (k) (k) (k)  (1) (f ) (K) update xjk = xik−1 ik by xjk ∝ xjk jf :f =k cj1 ...jk ...jK xj1 . . . xjf . . . xjK end for row/column normalize X(k) update model pre-likelihood: Θ′(k) = Θ(k−1) X(k) for jk = 1, . . . , J do . (k) (k) (k)  (k) (k) update xjk = xik−1 ik by xjk ∝ xjk lk wlk jk elk jk xlk end for update model likelihood: Θ(k) = Θ(k−1) X(k) end for until convergence discretize Θ(k) to CS Matching.

Experiments

We evaluate the proposed CS tracking algorithm using two groups of X-ray clinical data collected from liver and cardiac interventions. The first group consists of six sequences of liver vessel images and the second 11 sequences of catheter images, each with around 20 frames. The data is acquired with 512 × 512 pixels and physical resolution of 0.345 or 0.366 mm. Groundtruth of each image is manually annotated (Fig. 4(a)). Vascular Structure Tracking. We first evaluate the proposed algorithm on the vascular sequences. First frame from each sequence is used to generate training samples for CNN. To be specific, 800 vascular structure patches and 1500 negative patches are generated from each image. From the six images, a total of 2300 × 6 = 13, 800 samples are extracted and split as 75 % training and 25 % validation. All patches have the same size of 28 × 28 pixels. Distance threshold of predictive bounding box is set to 60 pixels for enough error tolerance. Finally, there are around 200 vascular structure candidates left in each frame. The number of points on the model is around 50 for each sequence. In our work, K = 3 is used to allow each four consecutive frames to be associated. During tracking, tensor kernel costs around 10s and 100 MB (peak value) RAM to process one frame with 200 candidates in our setting running on a single Intel [email protected] core. The tracking error is defined as the shortest distance between tracked pixels and groundtruth annotation. For each performance metric, we compute its mean and standard deviation. For comparison, the registration-based (RG) approach [14], bipartite graph matching [2] (BM) and pure tensor based method [2] (TB) are applied to the same sequences. For BM

Structure-Aware Rank-1 Tensor Approximation for Curvilinear Structure

419

and TB, same tracking algorithms but with the CNN detector are also tested and reported. The first block of Fig. 4 illustrates the tracking results of vascular structures. B-spline is used to connect all tracked candidates to represent the tracked vascular structure. The zoom-in view of a selected region (rectangle in blue) in each tracking result is presented below, where portions with large errors are colored red. Quantitative evaluation for each sequence is listed in Table 1. Catheter Tracking. Similar procedures and parameters are applied to the 11 sequences of catheter images. The second block of Fig. 4 shows example of catheter tracking results. The numerical comparisons are listed in Table 1. The results show that our method clearly outperforms other three approaches. Candidates in our approach are detected by a highly accurate CNN detector, ensuring most extracted candidates to be on CS, while registrationbased method depends on the first frame as reference to identify targets. Our approach is also better than the results of bipartite graph matching where K = 1. The reason is that our proposed method incorporates higher-order temporal information from multiple frames; by contrast, bipartite matching is only computed from two frames. Compared with the pure tensor based algorithm, the proposed method incorporates the model prior which provides more powerful Table 1. Curvilinear structure tracking errors (in pixels) Dataset

Seq ID

RG [14]

BM [2]

TB [2]

BM+CNN TB+CNN Proposed

Vascular VAS1 structures

2.77 ± 3.25 1.54 ± 1.59 1.33 ± 1.08 1.44 ± 2.37

1.15 ± 0.91

1.14 ± 0.84

VAS2

2.02 ± 3.10 1.49 ± 1.14 1.49 ± 1.74 1.11 ± 0.83

1.30 ± 2.48

1.09 ± 0.83

VAS3

3.25 ± 7.64 1.65 ± 2.40 1.41 ± 1.54 1.19 ± 0.91

1.17 ± 0.92

1.17 ± 0.91

VAS4

2.16 ± 2.52 1.61 ± 2.25 1.99 ± 3.02 1.12 ± 1.00 1.95 ± 5.00

1.17 ± 1.53

VAS5

3.04 ± 5.46 2.71 ± 4.36 1.36 ± 1.44 1.95 ± 3.94

1.14 ± 1.55

1.09 ± 1.42

VAS6

2.86 ± 5.60 1.40 ± 1.94 1.32 ± 1.68 1.39 ± 2.53

1.09 ± 1.70 1.11 ± 1.90

75 %ile, 100 %ile Catheters

2.00, 31.2

2.00, 26.8

1.40, 32.6

1.40, 56.9

1.40, 23.2 1.13 ± 1.30

Overall

2.69 ± 5.03 1.75 ± 2.60 1.49 ± 1.86 1.37 ± 2.26

1.30 ± 2.64

CAT1

2.86 ± 3.83 1.47 ± 1.57 1.29 ± 1.06 1.13 ± 1.19

1.08 ± 0.85

1.00 ± 0.77

CAT2

1.98 ± 2.66 2.38 ± 5.33 1.11 ± 1.58 1.77 ± 4.11

0.77 ± 1.06

0.56 ± 0.89

CAT3

2.20 ± 1.56 1.55 ± 1.98 1.39 ± 1.70 0.99 ± 1.52

0.72 ± 0.66 0.74 ± 0.65

CAT4

1.07 ± 0.76 2.12 ± 3.35 1.15 ± 1.33 0.94 ± 1.37

0.92 ± 1.34

0.76 ± 0.77

CAT5

2.54 ± 3.65 2.02 ± 4.85 1.04 ± 0.88 1.65 ± 5.36

0.84 ± 1.01

0.83 ± 0.97

CAT6

1.93 ± 2.15 2.06 ± 3.92 1.14 ± 0.95 1.19 ± 2.03

0.96 ± 0.92

0.93 ± 0.89

CAT7

1.39 ± 2.18 1.86 ± 3.79 1.00 ± 0.78 0.76 ± 0.72

0.76 ± 0.72

0.73 ± 0.63

CAT8

2.74 ± 4.32 2.30 ± 5.53 1.31 ± 2.21 1.22 ± 2.21

1.74 ± 3.81

0.96 ± 1.37

CAT9

1.74 ± 1.25 2.80 ± 4.78 2.00 ± 2.74 1.54 ± 3.44

1.18 ± 2.02

0.99 ± 1.33

CAT10

3.17 ± 5.26 2.86 ± 4.33 2.48 ± 3.59 0.86 ± 1.26

0.81 ± 1.12 0.86 ± 1.29

CAT11

3.96 ± 5.89 2.68 ± 4.36 1.17 ± 0.97 3.50 ± 11.3

75 %ile, – 100 %ile Overall

2.00, 47.7

1.40, 24.0

1.00, 70.5

2.40 ± 3.62 2.17 ± 4.14 1.38 ± 1.90 1.39 ± 4.16

1.35 ± 3.72

0.80 ± 0.74

1.00, 48.4

1.00, 19.2

1.01 ± 1.93

0.83 ± 0.98

P. Chu et al.

Catheter

Vascular structures

420

(a) GT

(b) RG [2]

(c) BM [2]

(d) TB [2]

(e) Proposed

Fig. 4. Curvilinear structure tracking results. (a) groundtruth, (b) registration, (c) bipartite matching, (d) tensor based, and (e) proposed method. Red indicates regions with large errors, while green indicates small errors.

clues for tracking the whole CS. Confirmed by the zoom-in views, with model prior, our proposed method is less affected by neighboring confounding structures.

5

Conclusion

We presented a new method to combine hierarchical features learned in CNN and encode model prior to estimate the motion of CS in X-ray image sequences. Experiments on two groups of CS demonstrate the effectiveness of our proposed approach. Achieving a tracking error of around one pixel (or smaller than 0.5 mm), it clearly outperforms the other state-of-the-art algorithms. For future work, we plan to adopt pyramid detection strategy in order to accelerate the pixel-wised probability map calculation in our current approach. Acknowledgement. We thank the anonymous reviewers for valuable suggestions. This work was supported in part by NSF grants IIS-1407156 and IIS-1350521.

Structure-Aware Rank-1 Tensor Approximation for Curvilinear Structure

421

References 1. Baert, S.A., Viergever, M.A., Niessen, W.J.: Guide-wire tracking during endovascular interventions. IEEE Trans. Med. Imaging 22(8), 965–972 (2003) 2. Cheng, E., Pang, Y., Zhu, Y., Yu, J., Ling, H.: Curvilinear structure tracking by low rank tensor approximation with model propagation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3057–3064 (2014) 3. Cheng, J.Z., Chen, C.M., Cole, E.B., Pisano, E.D., Shen, D.: Automated delineation of calcified vessels in mammography by tracking with uncertainty and graphical linking techniques. IEEE Trans. Med. Imaging 31(11), 2143–2155 (2012) 4. De Lathauwer, L., De Moor, B., Vandewalle, J.: On the best rank-1 and rank(r1, r2,. . ., rn) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21(4), 1324–1342 (2000) 5. Frank, A.: On Kuhn’s Hungarian method —a tribute from Hungary. Nav. Res. Logistics 52(1), 2–5 (2005) 6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 7. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 8. Palti-Wasserman, D., Brukstein, A.M., Beyar, R.P.: Identifying and tracking a guide wire in the coronary arteries during angioplasty from X-ray images. IEEE Trans. Biomed. Eng. 44(2), 152–164 (1997) 9. Prasoon, A., Petersen, K., Igel, C., Lauze, F., Dam, E., Nielsen, M.: Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013, Part II. LNCS, vol. 8150, pp. 246–253. Springer, Heidelberg (2013) 10. Roth, H.R., Wang, Y., Yao, J., Lu, L., Burns, J.E., Summers, R.M.: Deep convolutional networks for automated detection of posterior-element fractures on spine CT. In: SPIE Medical Imaging, p. 97850 (2016) 11. Shi, X., Ling, H., Xing, J., Hu, W.: Multi-target tracking by rank-1 tensor approximation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2394 (2013) 12. Vedaldi, A., Lenc, K.: MatConvNet – convolutional neural networks for MATLAB. In: Proceedings of the ACM International Conference on Multimedia (2015) 13. Wang, P., Chen, T., Zhu, Y., Zhang, W., Zhou, S.K., Comaniciu, D.: Robust guidewire tracking in fluoroscopy. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 691–698 (2009) 14. Zhu, Y., Tsin, Y., Sundar, H., Sauer, F.: Image-based respiratory motion compensation for fluoroscopic coronary roadmapping. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI 2010, Part III. LNCS, vol. 6363, pp. 287– 294. Springer, Heidelberg (2010)

Real-Time Online Adaption for Robust Instrument Tracking and Pose Estimation Nicola Rieke1(B) , David Joseph Tan1 , Federico Tombari1,4 , Josu´e Page Vizca´ıno1 , Chiara Amat di San Filippo3 , Abouzar Eslami3 , and Nassir Navab1,2

2

1 Computer Aided Medical Procedures, Technische Universit¨ at M¨ unchen, Munich, Germany [email protected] Computer Aided Medical Procedures, Johns Hopkins University, Baltimore, USA 3 Carl Zeiss MEDITEC M¨ unchen, Munich, Germany 4 DISI, University of Bologna, Bologna, Italy

Abstract. We propose a novel method for instrument tracking in Retinal Microsurgery (RM) which is apt to withstand the challenges of RM visual sequences in terms of varying illumination conditions and blur. At the same time, the method is general enough to deal with different background and tool appearances. The proposed approach relies on two random forests to, respectively, track the surgery tool and estimate its 2D pose. Robustness to photometric distortions and blur is provided by a specific online refinement stage of the offline trained forest, which makes our method also capable of generalizing to unseen backgrounds and tools. In addition, a peculiar framework for merging together the predictions of tracking and pose is employed to improve the overall accuracy. Remarkable advantages in terms of accuracy over the state-of-the-art are shown on two benchmarks.

1

Introduction and Related Work

Retinal Microsurgery (RM) is a challenging task wherein a surgeon has to handle anatomical structures at micron-scale dimension while observing targets through a stereo-microscope. Novel imaging modalities such as interoperative Optical Coherence Tomography (iOCT) [1] aid the physician in this delicate task by providing anatomical sub-retinal information, but lead to an increased workload due to the required manual positioning to the region of interest (ROI). Recent research has aimed at introducing advanced computer vision and augmented reality techniques within RM to increase safety during surgical maneuvers and to simplify the surgical workflow. A key step for most of these methods is represented by an accurate and real-time localization of the instrument tips, which allows to automatically position the iOCT according to it. This further enables to calculate the distance of the instrument tip to the retina and to provide a real-time feedback to the physician. In addition, the trajectories performed by the instrument during surgery can be compared with other surgeries, thus paving c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 422–430, 2016. DOI: 10.1007/978-3-319-46720-7 49

Real-Time Online Adaption for Robust Instrument Tracking

423

the way to objective quality assessment for RM. Surgical tool tracking has been investigated in different medical specialties: nephrectomy [2], neurosurgery [3], laparoscopy/endoscopy [4,5]. However, RM presents specific challenges such as strong illumination changes, blur and variability of surgical instruments appearance, that make the aforementioned approaches not directly applicable in this scenario. Among the several works recently proposed in the field of tool tracking for RM, Pezzementi et al. [6] suggested to perform the tracking in two steps: first via appearance modeling, which computes a pixel-wise probability of class membership (foreground/background), then filtering, which estimates the current tool configuration. Richa et al. [7] employ mutual information for tool tracking. Snitzman et al. [8] introduced a joint algorithm which performs simultaneously tool detection and tracking. The tool configuration is parametrized and tracking is modeled as a Bayesian filtering problem. Succesively, in [9], they propose to use a gradient-based tracker to estimate the tool’s ROI followed by foreground/background classification of the ROI’s pixels via boosted cascade. In [10], a gradient boosted regression tree is used to create a multi-class classifier which is able to detect different parts of the instrument. Li et al. [11] present a multi-component tracking, i.e. a gradient-based tracker able to capture the movements and an online-detector to compensate tracking losses. In this paper, we introduce a robust closed-loop framework to track and localize the instrument parts in in-vivo RM sequences in real-time, based on the dual-random forest approach for tracking and pose estimation proposed in [12]. A fast tracker directly employs the pixel intensities in a random forest to infer the tool tip bounding box in every frame. To cope with the strong illumination changes affecting the RM sequences, one of the main contributions of our paper is to adapt the offline model to online information while tracking, so to incorporate the appearance changes learned by the trees with real photometric distortions witnessed at test time. This offline learning - online adaption leads to a substantial capability regarding the generalization to unseen sequences. Secondly, within the estimated bounding box, another random forest predicts the locations of the tool joints based on gradient information. Differently from [12], we enforce spatial temporal constraints by means of a Kalman filter [13]. As a third contribution of this work, we propose to “close the loop” between the tracking and 2D pose estimation by obtaining a joint prediction concerning the template position acquired by merging the outcome of the two separate forests through the confidence of their estimation. Such cooperative prediction will in turn provide pose information for the tracker, improving its robustness and accuracy. The performance of the proposed approach is quantitatively evaluated on two different in-vivo RM datasets, and demonstrate remarkable advantages with respect to the state-of-the-art in terms of robustness and generalization.

2

Method

In this section, we discuss the proposed method, for which an overview is depicted in Fig. 1. First, a fast intensity-based tracker locates a template around the

424

N. Rieke et al.

Fig. 1. Framework: The description of the tracker, sampling and online learning can be found in Sect. 2.1. The pose estimator and Kalman filter is presented in Sect. 2.2. Details on the integrator are given in Sect. 2.3.

instrument tips using an offline trained model based on random forest (RF) and the location of the template in the previous frame. Within this ROI, a pose estimator based on HOG recovers the three joints employing another offline learned RF and filters the result by temporal-spatial constraints. To close the loop, the output is propagated to an integrator, aimed at merging together the intensity-based and gradient-based predictions in a synergic way in order to provide the tracker with an accurate template location for the prediction in the next frame. Simultaneously, the refined result is propagated to a separate thread which adapts the model of the tracker to the current data characteristics via online learning. A central element in this approach is the definition of the tracked template, which we define by the landmarks of the forceps. Let (L, R, C)⊤ ∈ R2×3 be the left, right and central joint of the instrument, then the midpoint between the tips and the 2D similarity transform from the patch coordinate is given by M = L+R 2 system to the frame coordinate system can be defined as ⎤ ⎤⎡ ⎡ 10 0 s · cos(θ) −s · sin(θ) Cx H = ⎣ s · sin(θ) s · cos(θ) Cy ⎦ ⎣0 1 30⎦ 00 1 0 0 1   M −C b with s = 100 · max{L − C2 , R − C2 } and θ = cos−1 My−Cy 2 for a fixed patch size of 100×150 pixel and b ∈ R defining the relative size. In this way, the entire instrument tip is enclosed by the template and aligned with the tool’s direction. In the following, details of the different components are presented. 2.1

Tracker – Offline Learning, Online Adaption

Derived from image registration, tracking aims to determine a transformation parameter that minimizes the similarity measure to a given template. In contrast to attaining a single template, the tool undergoes an articulated motion and a variation of lighting changes which is difficult to minimize as an energy

Real-Time Online Adaption for Robust Instrument Tracking

425

function. Thus, the tracker learns a generalized model of the tool based on multiple templates, taken as the tool undergoes different movements in a variety of environmental settings, and predicts the translation parameter from the intensity values at n random points {xp }np=1 within the template, similar to [12]. In addition, we assume a piecewise constant velocity from consecutive frames. Therefore, given the image It at time t and the translation vector of the template from t − 2 to t − 1 as vt−1 = (vx , vy )⊤ , the input to the forest is a feature vector concatenating the intensity values on the current location of the template It (xp ) with the velocity vector vt−1 , assuming a constant time interval. In order to learn the relation between the feature vector and the transformation update, we use a random forest that follows a dimension-wise splitting of the feature vector such that the translation vector on the leaves point to a similar location. The cost of generalization is the inadequacy to describe the conditions that are specific to a particular situation, such as the type of tool used in the surgery. As a consequence, the robustness of the tracker is affected, since it cannot confidently predict the location of the template for challenging frames with high variations from the generalized model. Hence, in addition to the offline learning for a generalized tracker, we propose to perform an online learning strategy that considers the current frames and learns the relation of the translation vector with respect to the feature vector. The objective is to stabilize the tracker by adapting its forest to the specific conditions at hand. In particular, we propose to incrementally add new trees to the forest by using the predicted template location on the current frames of the video sequence. To achieve this goal, we impose random synthetic transformations on the bounding boxes that enclose the templates to build the learning dataset with pairs of feature and translation vectors, such that the transformations emulate the motion of the template between two consecutive frames. Thereafter, the resulting trees are added to the existing forest and the prediction for the succeeding frames include both the generalized and environment-specific trees. Notably, our online learning approach does not learn from all the incoming frames, but rather introduces in Sect. 2.3 a confidence measure to evaluate and accumulate templates. 2.2

2D Pose Estimation with Temporal-Spatial Constraints

During pose estimation, we model a direct mapping between image features and the location of the three joints in the 2D space of the patch. Similar to [12], we employ HOG features around a pool of randomly selected pixel locations within the provided ROI as an input to the trees in order to infer the pixel offsets to the joint positions. Since the HOG feature vector is extracted as in [14], the splitting function of the trees considers only one dimension of the vector and is optimized by means of information gain. The final vote is aggregated by a densewindow algorithm. The predicted offsets to the joints in the reference frame of the patch are back-warped onto the frame coordinate system. Up to now, the forest considers every input as a still image. However, the surgical movement is usually continuous. Therefore, we enforce a temporal-spatial relationship for

426

N. Rieke et al.

all joint locations via a Kalman filter [13] by employing the 2D location of the joints in the frame coordinate system and their frame-to-frame velocity. 2.3

Closed Loop via Integrator

Although the combination of the pose estimation with the Kalman filter would already define a valid instrument tracking for all three joints, it completely relies on the gradient information, which may be unreliable in case of blurred frames. In these scenarios, the intensity information is still a valid source for predicting the movement. On the other hand, gradient information tends to be more reliable for precise localization in focused images. Due to the definition of the template, the prediction of the joint positions can directly be connected to the expected prediction of the tracker via the similarity transform. Depending on the confidence for the current prediction of the separate random forests, we define the scale sF and the translation tF of the joint similarity transform as the weighted average sF =

sT · σP + sP · σT σT + σP

and

tF =

tT · σ P + tP · σ T σT + σP

where σT and σP are the average standard deviation of the tracking prediction and pose prediction, respectively, and the tF is set to be greater than or equal to the initial translation. In this way, the final template is biased towards the more reliable prediction. If σT is higher than a threshold τσ , the tracker transmits the previous location of the template, which is subsequently corrected by the similarity transform of the predicted pose. Furthermore, the prediction of the pose can also correct for the scale of the 2D similarity transform which is actually not captured by the tracker, leading to a scale adaptive tracking. This is an important improvement because an implicit assumption of the pose algorithm is that the size of the bounding box corresponds to the size of the instrument due to the HOG features. The refinement also guarantees that only reliable templates are used for the online learning thread.

3

Experiments and Results

We evaluated our approach on two different datasets ([9,12]), which we refer to as Szn- and Rie-dataset, respectively. We considered both datasets because of their intrinsic difference: the first one presents a strong coloring of the sequences and a well-focused ocular of the microscope; the second presents different types of instruments, changing zoom factor, presence of light source and presence of detached epiretinal membrane. Further information on the dataset can be found in Table 1 and in [9,12]. Analogously to baseline methods, we evaluate the performance of our method by means of a threshold measure [9] for the separate joint predictions and the strict PCP score [15] for evaluating the parts connected by the joints. The proposed method is implemented in C++ and runs at 40 fps on a Dell Alienware Laptop, Intel Core i7-4720HQ @ 2.6 GHz and 16 GB RAM.

Real-Time Online Adaption for Robust Instrument Tracking

427

Table 1. Summary of the datasets. Set # Frames I II III IV Resolution

Szn [9] 402 222 547 — 640×480

Rie [12] 200 200 200 200 1920×1080 Fig. 2. Component evaluation.

In the offline learning for the tracker, we trained 100 trees per parameter, employed 20 random intensity values and velocity as feature vectors, and used 500 sample points. For the pose estimation, we used 15 trees and the HOG features are set to a bin size of 9 and pixel size resolution of 50×50. 3.1

Evaluation of Components

To analyze the influence of the different proposed components, we evaluate the algorithm with different settings on the Rie-dataset, whereby the sequences I, II and III are used for the offline learning and sequence IV is used as the test sequence. Figure 2 shows the threshold measure for the left tip in (a) and the strict PCP for the left fork in (b). Individually, each component excels in performance and contribute to a robust performance when combined. Among them, the most prominent improvement is the weighted averaging of the templates from Sect. 2.3. 3.2

Comparison to State-of-the-Art

We compare the performance of our method against the state-of-the-art methods DDVT [9], MI [7], ITOL [11] and POSE [12]. Throughout the experiments on the Szn-dataset, the proposed method can compete with state-of-the-art methods, as depicted in Fig. 3. In the first experiment, in which the forest are learned on the first half of a sequence and evaluated on the second half, our method reaches an accuracy of at least 94.3 % by means of threshold distance for the central joint. In the second experiment, all the first halves of the sequences are included into the learning database and tested on the second halves. In contrast to the Szn-dataset, the Rie-dataset is not as saturated in terms of accuracy and therefore the benefits of our methods are more evident. Figure 4 illustrates the results for the cross-validation setting, i.e. the offline training is performed on three sequences and the method is tested on the remaining one. In this case, our method outperforms POSE for all test sequences. Notably, there is a significant improvement in accuracy for the Rie-Set IV which demonstrates the generalization capacity of our method for unseen illumination and instrument. Table 2 also reflects this improvement in the strict PCP scores which indicate that our method is nearly twice as accurate as the baseline method [12].

428

N. Rieke et al.

Fig. 3. Szn-dataset: Sequential and combined evaluation for sequence 1–3. For over 93 %, the results are so close that the single graphs are not distinguishable.

Fig. 4. Rie-dataset: Cross validation evaluation – the offline forests are learned on three sequences and tested on the unseen one.

Real-Time Online Adaption for Robust Instrument Tracking

429

Table 2. Strict PCP for cross validation of Rie-dataset for Left and Right fork. Methods

Set I (L/R) Set II (L/R) Set III (L/R) Set IV (L/R)

Our work

89.0/88.5

POSE [12] 69.7/58.5

4

98.5/99.5

99.5/99.5

94.5/95.0

93.94/93.43

94.47/94.47

46.46/57.71

Conclusion

In this work, we propose a closed-loop framework for tool tracking and pose estimation, which runs at 40 fps. A combination of separate predictors yields robustness which is able to withstand the challenges of RM sequences. The work further shows the method’s capability to generalize to unseen instruments and illumination changes by allowing an online adaption. These key drivers allow our method to outperform state-of-the-art on two benchmark datasets.

References 1. Ehlers, J.P., Kaiser, P.K., Srivastava, S.K.: Intraoperative optical coherence tomography using the rescan 700: preliminary results from the discover study. Br. J. Ophthalmol. 98, 1329–1332 (2014) 2. Reiter, A., Allen, P.K.: An online learning approach to in-vivo tracking using synergistic features. In: IROS, pp. 3441–3446 (2010) 3. Bouget, D., Benenson, R., Omran, M., Riffaud, L., Schiele, B., Jannin, P.: Detecting surgical tools by modelling local appearance and global shape. IEEE Trans. Med. Imaging 34(12), 2603–2617 (2015) 4. Allan, M., Chang, P.L., Ourselin, S., Hawkes, D., Sridhar, A., Kelly, J., Stoyanov, D.: Image based surgical instrument pose estimation with multi-class labelling and optical flow. In: MICCAI, pp. 331–338 (2015) 5. Wolf, R., Duchateau, J., Cinquin, P., Voros, S.: 3D tracking of laparoscopic instruments using statistical and geometric modeling. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011, Part I. LNCS, vol. 6891, pp. 203–210. Springer, Heidelberg (2011) 6. Pezzementi, Z., Voros, S., Hager, G.D.: Articulated object tracking by rendering consistent appearance parts. In: ICRA, pp. 3940–3947 (2009) 7. Richa, R., Balicki, M., Meisner, E., Sznitman, R., Taylor, R., Hager, G.: Visual tracking of surgical tools for proximity detection in retinal surgery. In: Taylor, R.H., Yang, G.-Z. (eds.) IPCAI 2011. LNCS, vol. 6689, pp. 55–66. Springer, Heidelberg (2011) 8. Sznitman, R., Basu, A., Richa, R., Handa, J., Gehlbach, P., Taylor, R.H., Jedynak, B., Hager, G.D.: Unified detection and tracking in retinal microsurgery. In: Fichtinger, G., Martel, A., Peters, T. (eds.) MICCAI 2011, Part I. LNCS, vol. 6891, pp. 1–8. Springer, Heidelberg (2011) 9. Sznitman, R., Ali, K., Richa, R., Taylor, R.H., Hager, G.D., Fua, P.: Data-driven visual tracking in retinal microsurgery. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012, Part II. LNCS, vol. 7511, pp. 568–575. Springer, Heidelberg (2012)

430

N. Rieke et al.

10. Sznitman, R., Becker, C., Fua, P.: Fast part-based classification for instrument detection in minimally invasive surgery. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part II. LNCS, vol. 8674, pp. 692– 699. Springer, Heidelberg (2014) 11. Li, Y., Chen, C., Huang, X., Huang, J.: Instrument tracking via online learning in retinal microsurgery. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part I. LNCS, vol. 8673, pp. 464–471. Springer, Heidelberg (2014) 12. Rieke, N., Tan, D.J., Alsheakhali, M., Tombari, F., Amat di San Filippo, C., Belagiannis, V., Eslami, A., Navab, N.: Surgical tool tracking and pose estimation in retinal microsurgery. In: MICCAI, pp. 266–273 (2015) 13. Haykin, S.S.: Kalman Filtering and Neural Networks. Wiley, Hoboken (2001) 14. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010) 15. Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: CVPR, pp. 1–8 (2008)

Integrated Dynamic Shape Tracking and RF Speckle Tracking for Cardiac Motion Analysis Nripesh Parajuli1(B) , Allen Lu2 , John C. Stendahl3 , Maria Zontak4 , Nabil Boutagy3 , Melissa Eberle3 , Imran Alkhalil3 , Matthew O’Donnell4 , Albert J. Sinusas3,5 , and James S. Duncan1,2,5 1

Departments of Electrical Engineering, Yale University, New Haven, CT, USA [email protected] 2 Biomedical Engineering, Yale University, New Haven, CT, USA 3 Internal Medicine, Yale University, New Haven, CT, USA 4 Department of Bioengineering, University of Washington, Seattle, WA, USA 5 Radiology and Biomedical Imaging, Yale University, New Haven, CT, USA

Abstract. We present a novel dynamic shape tracking (DST) method that solves for Lagrangian motion trajectories originating at the left ventricle (LV) boundary surfaces using a graphical structure and Dijkstra’s shortest path algorithm. These trajectories, which are temporally regularized and accrue minimal drift, are augmented with radio-frequency (RF) speckle tracking based mid-wall displacements and dense myocardial deformation fields and strains are calculated. We used this method on 4D Echocardiography (4DE) images acquired from 7 canine subjects and validated the strains using a cuboidal array of 16 sonomicrometric crystals that were implanted on the LV wall. The 4DE based strains correlated well with the crystal based strains. We also created an ischemia on the LV wall and evaluated how strain values change across ischemic, non-ischemic remote and border regions (with the crystals planted accordingly) during baseline, severe occlusion and severe occlusion with dobutamine stress conditions. We were able to observe some interesting strain patterns for the different physiological conditions, which were in good agreement with the crystal based strains.

1

Introduction

Characterization of left ventricular myocardial deformation is useful for the detection and diagnosis of cardiovascular diseases. Conditions such as ischemia and infarction undermine the contractile property of the LV and analyzing Lagrangian strains is one way of identifying such abnormalities. Numerous methods calculate dense myocardial motion fields and then compute strains using echo images. Despite the substantial interest in Lagrangian motion and strains, and some recent contributions in spatio-temporal tracking [1,2], most methods typically calculate frame-to-frame or Eulerian displacements first and then obtain Lagrangian trajectories. c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 431–438, 2016. DOI: 10.1007/978-3-319-46720-7 50

432

N. Parajuli et al.

Therefore, we propose a novel dynamic shape tracking (DST) method that first provides Lagrangian motion trajectories of points in the LV surfaces and then computes dense motion fields to obtain Lagrangian strains. This approach aims to reduce the drift problem which is prevalent in many frame-to-frame tracking methods. We first segment our images and obtain point clouds of LV surfaces, which are then set up as nodes in a directed acyclic graph. Nearest neighbor relationships define the edges, which have weights based on the Euclidean distance and difference in shape properties between neighboring and starting points. Finding the trajectory is then posed as a shortest path problem and solved using Dijkstra’s algorithm and dynamic programming. Once we obtain trajectories, we calculate dense displacement fields for each frame using radial basis functions (RBFs), which are regularized using sparsity and incompressibility constraints. Since the trajectories account for motion primarily in the LV boundaries, we also include mid-wall motion vectors from frame-to-frame RF speckle tracking. This fusion strategy was originally proposed in [3] and expanded in [4]. Ultimately, we calculate Lagrangian strains and validate our results by comparing with sonomicrometry based strains.

2 2.1

Methods Dynamic Shape Tracking (DST)

We first segment our images to obtain endocardial and epicardial surfaces using an automated (except the first frame) level-set segmentation method [5]. There are surfaces from N frames, each with K points, and each point has M neighbors (Fig. 1a, with M = 3). x ∈ RN ×K×3 is the point matrix, F ∈ RN ×K×S is the shape descriptor matrix (where S is the descriptor length) and η ∈ RN ×K×M is the neighborhood matrix (shape context feature is described in [6]). The j th point of the ith frame is indexed as xi,j (i ∈ [1 : N ] and j ∈ [1 : K]). Let Aj (i) : NN → NK be the set that indexes the points in the trajectory starting at point x1,j . For any point xi,Aj (i) in the trajectory, we assume that: 1. It will not move too far away from the previous point xi−1,Aj (i−1) and the starting point x1,j . Same applies to its shape descriptor Fi,Aj (i) . 2. Its displacement will not differ substantially from that of the previous point. Same applies to its shape descriptors. We call these the 2nd order weights. Trajectories satisfying the above conditions will have closely spaced consecutive points with similar shape descriptors (Fig. 1b). They will also have points staying close to the starting point and their shape descriptor remaining similar to the starting shape descriptor, causing them to be more closed. The 2nd order weights enforce smoothness and shape consistency.

Integrated DST and RF Speckle Tracking

(a) Endocardial surfaces, derived points and the graphical grid structure with a trajectory.

433

(b) Abstraction of points (circles) and shape descriptors (dotted curves) in a trajectory.

Fig. 1. Points, shape descriptors and trajectories.

Let A represent the set of all possible trajectories originating from x1,j . Hence, Aj ∈ A minimizes, the following energy: Aˆj = argmin Aj ∈A

N 

λ1 ||xi,Aj (i) − xi−1,Aj (i−1) || + λ2 ||xi,Aj (i) − x1,j ||

i=2

+ λ3 ||Fi,Aj (i) − Fi−1,Aj (i−1) || + λ4 ||Fi,Aj (i) − F1,j || + 2nd order weights (1) Graph based techniques have been used in particle tracking applications such as [7]. We set each point xi,j as a node in our graph. Directed edges exist between a point xi,j and its neighbors ηi,j in frame i + 1. Each edge has an associated cost of traversal (defined by Eq. 1). The optimal trajectory Aˆj is the one that accrues the smallest cost in traversing from point xi,j to the last frame. This can be solved using Dijkstra’s shortest path algorithm. Algorithm. Because our search path is causal, we don’t do all edge weight computations in advance. We start at x1,j and proceed frame-by-frame doing edge cost calculations between points and their neighbors and dynamically updating a cost matrix E ∈ RN ×K and a correspondence matrix P ∈ RN ×K . The search for the trajectory AJ stemming from a point j = J in frame 1 is described in Algorithm 1. 2.2

Speckle Tracking

We use speckle information, which is consistent in a small temporal window (given sufficient frequency), from the raw radio frequency (RF) data from our

434

N. Parajuli et al.

Algorithm 1. DST Algorithm 1: Inputs: x, F and η 2: Initialize: Ei,j = ∞ and Pi,j = 0 ∀(i, j) ∈ NN ×K 3: for i = 2, . . . , N − 1 do 4: for j = 1, . . . , K do 5: for l ∈ ηi,j do 6: etemp ← Ei,j + λ1 ||xi+1,l − xi,j || + λ2 ||xi+1,l − x1,J || + λ3 ||Fi+1,l − Fi,j || + λ4 ||Fi+1,l − F1,J || + 2nd order weights 7: if Ei,j + etemp < Ei+1,l then 8: Ei+1,l ← Ei,j + etemp 9: Pi+1,l ← j 10: AJ (N ) = argmini EN,i 11: for i = N − 1, . . . 1 do 12: AJ (i) = Pi+1,AJ (i+1)

echo acquisitions, and correlate them from frame-to-frame to provide mid-wall displacement values. A kernel of one speckle length, around a pixel in the complex signal, is correlated with neighboring kernels in the next frame [8]. The peak correlation value is used to determine the matching kernel in the next frame and calculate displacement and confidence measure. The velocity in the echo beam direction is further refined using zero-crossings of the phase of the complex correlation function. 2.3

Integrated Dense Displacement Field

We use a RBF based representation to solve for a dense displacement field U that adheres to the dynamic shape (Ush ) and speckle tracking (Usp ) results and regularize the field in the same manner as [4]: w ˆ = argmin fadh (Hw, U sh , U sp ) + λ1 ||w||1 + λ2 fbiom (U ) w

fbiom (U ) = ||∇ · U ||22 + α||∇U ||22 .

(2)

Here, U is parametrized by Hw, H represents the RBF matrix, w represents the weights on the bases, fadh is the squared loss function, fbiom is the penalty on divergences and derivatives, which along with the l1 norm penalty results in smooth and biomechanically consistent displacement fields. This is a convex optimization problem and can be solved efficiently. The λ’s here and in Eq. 1 are chosen heuristically and scaled based on the number of frames in the images.

Integrated DST and RF Speckle Tracking

3

435

Experiment and Results

4DE Data (Acute Canine Studies). We acquired 4DE images from 7 acute canine studies (open chested, transducer suspended in a water bath) in the following conditions: baseline (BL), severe occlusion of the left anterior descending (LAD) coronary artery (SO), and severe LAD occlusion with low dose dobutamine stress (SODOB, dosage: 5 µg\kg\min). All BL images were used while 2 SO and SODOB images were not used due to lack of good crystal data. Philips iE33 ultrasound system, with the X7-2 probe and a hardware attachment that provided RF data were used for acquisition. All experiments were conducted in compliance with the Institutional Animal Care and Use Committee policies. Sonomicrometry Data. We utilized an array of sonomicrometry crystals (Sonometrics Corp) to validate strains calculated via echo. 16 crystals were implanted in the anterior LV wall. They were positioned with respect to the LAD occlusion and perfusion characteristics within the crystal area, which are defined accordingly: ischemic (ISC), border (BOR) and remote (REM) (see Fig. 2). Three crystals were implanted in the apical (1) and basal (2) regions (similar to [9]). We adapted the 2D sonomicrometry based strain calculation method outlined in [10] to our 3D case. Sonomicrometric strains were calculated for each cube and used for the validation of echo based strains.

(a) 3 cuboidal lattices.

(b) Crystals aligned to the LV.

Fig. 2. Crystals and their relative position in the LV

Agreement with Crystal Data. In Fig. 3a, we show, for one baseline image, strains calculated using our method (echo) and using sonomicrometry (crys). We can see that the strain values correlate well and drift error is minimal. In Fig. 3b, we present bar graphs to explicitly quantify the final frame drift as the distance between actual crystal position and results from tracking. We compare the results from this method (DST) against that of GRPM (described in [4,11]), which is a frame-to-frame tracking method, in BL and SO conditions for 5 canine datasets. The last frame errors for crystal position were lower and statistically significant for both BL an SO conditions (p < .01). Strain Correlation. Pearson correlations of echo based strain curves (calculated using the shape based tracking (SHP) only and the shape and speckle tracking combined (COMB) with corresponding crystal based strain curves, across

436

N. Parajuli et al.

the cubic array regions (ISC, BOR, REM) for all conditions are summarized in Table 1. We see slightly improved correlations from SHP to COMB method. Correlation values were generally lower for ischemic region and longitudinal strains for both methods. Since we only had a few data points to carry out statistical analysis in this format, we also calculated overall correlations (with strain values included for all time points and conditions together, n > 500) and computed statistical significance using Fisher’s transformation. Change in radial strains (SHP r = .72 to COMB r = .82) was statistically significant (p < .01), while circumferential (SHP r = .73 to COMB r = .75) and longitudinal (SHP r = .44 to COMB r = .41) were not. Table 1. Mean correlation values across regions for SHP and COMB methods. Radial Circ Long ISC BOR REM ISC BOR REM ISC BOR REM SHP

.75

.85

0.82

.77

.86

.87

.30

.66

.80

COMB .72

.85

0.86

.77

.87

.92

.35

.71

.75

Fig. 3. Peak strain bar graphs (with mean and standard deviations) for radial, circumferential and longitudinal strains - shown across ISC, BOR and REM regions for echo and crystal based strains.

Integrated DST and RF Speckle Tracking

437

Physiological Response. Changes in the crystal and echo based (using the combined method) strain magnitudes, across the physiological testing conditions - BL, SO and SODOB, is shown in Fig. 3. Both echo and crystal strain magnitudes generally decreased with severe occlusion and increased beyond baseline levels with low dose dobutamine stress. The fact that functional recovery was observed with dobutamine stress indicates that, at the dose given ischemia was not enhanced. Rather, it appears that the vasodilatory and inotropic effects of dobutamine were able to overcome the effects of the occlusion. However, in average, the strain magnitude recovery is less in the ISC region compared to BOR and REM regions for both echo and crystals. For echo, the overall physiological response was more pronounced for radial strains.

4

Conclusion

The DST method has provided improved temporal regularization and therefore drift errors have been reduced, specially in the diastolic phase. A combined dense field calculation method that integrates the DST results with RF speckle tracking results provided good strains, which is validated by comparing with sonomicrometry based strains. The correlation values were specifically good for radial and circumferential strains. We also studied how strains vary across the ISC, BOR and REM regions (defined by the cuboidal array of crystals in the anterior LV wall) during the BL, SO and SCODOB conditions. Strain magnitudes (particularly radial) varied in keeping with the physiological conditions, and also in good agreement with the crystal based strains. We seek to improve our methods as we notice that the longitudinal strains and strains in the ischemic region were not very good. Also, the DST algorithm occasionally resulted in higher error at end systole. Therefore, in the future, we will enforce spatial regularization directly by solving for neighboring trajectories together, where the edge weights will be influenced by the neighboring trajectories. We would also like to extend the method to work with point sets generated from other feature generation processes than segmentation. Acknowledgment. Several members of Dr. Albert Sinusas’s lab, including Christi Hawley and James Bennett, were involved in the image acquisitions. Dr. Xiaojie Huang provided code for image segmentation. We would like to sincerely thank everyone for their contributions. This work was supported in part by the National Institute of Health (NIH) grant number 5R01HL121226.

References 1. Craene, M., Piella, G., Camara, O., Duchateau, N., Silva, E., Doltra, A., Dhooge, J., Brugada, J., Sitges, M., Frangi, A.F.: Temporal diffeomorphic freeform deformation: application to motion and strain estimation from 3D echocardiography. Med. Image Anal. 16(2), 427–450 (2012)

438

N. Parajuli et al.

2. Ledesma-Carbayo, M.J., Kybic, J., Desco, M., Santos, A., S¨ uhling, M., Hunziker, P., Unser, M.: Spatio-temporal nonrigid registration for ultrasound cardiac motion estimation. IEEE Trans. Med. Imaging 24(9), 1113–1126 (2005) 3. Compas, C.B., Wong, E.Y., Huang, X., Sampath, S., Lin, B.A., Pal, P., Papademetris, X., Thiele, K., Dione, D.P., Stacy, M., et al.: Radial basis functions for combining shape and speckle tracking in 4D echocardiography. IEEE Trans. Med. Imaging 33(6), 1275–1289 (2014) 4. Parajuli, N., Compas, C.B., Lin, B.A., Sampath, S., ODonnell, M., Sinusas, A.J., Duncan, J.S.: Sparsity and biomechanics inspired integration of shape and speckle tracking for cardiac deformation analysis. In: van Assen, H., Bovendeerd, P., Delhaas, T. (eds.) FIMH 2015. LNCS, vol. 9126, pp. 57–64. Springer, Heidelberg (2015) 5. Huang, X., Dione, D.P., Compas, C.B., Papademetris, X., Lin, B.A., Bregasi, A., Sinusas, A.J., Staib, L.H., Duncan, J.S.: Contour tracking in echocardiographic sequences via sparse representation and dictionary learning. Med. Image Anal. 18(2), 253–271 (2014) 6. Belongie, S., Malik, J., Puzicha, J.: Shape context: a new descriptor for shape matching and object recognition. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, pp. 831–837. MIT Press, Cambridge (2001) 7. Shafique, K., Shah, M.: A noniterative greedy algorithm for multiframe point correspondence. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 51–65 (2005) 8. Chen, X., Xie, H., Erkamp, R., Kim, K., Jia, C., Rubin, J., O’Donnell, M.: 3-D correlation-based speckle tracking. Ultrason. Imaging 27(1), 21–36 (2005) 9. Dione, D., Shi, P., Smith, W., DeMan, P., Soares, J., Duncan, J., Sinusas, A.: Threedimensional regional left ventricular deformation from digital sonomicrometry. In: Proceedings of the 19th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 2, pp. 848–851. IEEE (1997) 10. Waldman, L.K., Fung, Y., Covell, J.W.: Transmural myocardial deformation in the canine left ventricle. Normal in vivo three-dimensional finite strains. Circ. Res. 57(1), 152–163 (1985) 11. Lin, N., Duncan, J.S.: Generalized robust point matching using an extended freeform deformation model: application to cardiac images. In: 2004 IEEE International Symposium on Biomedical Imaging: Nano to Macro, pp. 320–323. IEEE (2004)

The Endoscopogram: A 3D Model Reconstructed from Endoscopic Video Frames Qingyu Zhao1(B) , True Price1 , Stephen Pizer1,2 , Marc Niethammer1 , Ron Alterovitz1 , and Julian Rosenman1,2 1

Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, USA [email protected] 2 Radiation Oncology, University of North Carolina at Chapel Hill, Chapel Hill, USA

Abstract. Endoscopy enables high resolution visualization of tissue texture and is a critical step in many clinical workflows, including diagnosis and radiation therapy treatment planning for cancers in the nasopharynx. However, an endoscopic video does not provide explicit 3D spatial information, making it difficult to use in tumor localization, and it is inefficient to review. We introduce a pipeline for automatically reconstructing a textured 3D surface model, which we call an endoscopogram, from multiple 2D endoscopic video frames. Our pipeline first reconstructs a partial 3D surface model for each input individual 2D frame. In the next step (which is the focus of this paper), we generate a single highquality 3D surface model using a groupwise registration approach that fuses multiple, partially overlapping, incomplete, and deformed surface models together. We generate endoscopograms from synthetic, phantom, and patient data and show that our registration approach can account for tissue deformations and reconstruction inconsistency across endoscopic video frames.

1

Introduction

Modern radiation therapy treatment planning relies on imaging modalities like CT for tumor localization. For throat cancer, an additional kind of medical imaging, called endoscopy, is also taken at treatment planning time. Endoscopic videos provide direct optical visualization of the pharyngeal surface and provide information, such as a tumor’s texture and superficial (mucosal) spread, that is not available on CT due to CT’s relatively low contrast and resolution. However, the use of endoscopy for treatment planning is significantly limited by the fact that (1) the 2D frames from the endoscopic video do not explicitly provide 3D spatial information, such as the tumor’s 3D location; (2) reviewing the video is time-consuming; and (3) the optical views do not provide the full geometric conformation of the throat. In this paper, we introduce a pipeline for reconstructing a 3D textured surface model of the throat, which we call an endoscopogram, from 2D video frames. The model provides (1) more complete 3D pharyngeal geometry; (2) efficient c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 439–447, 2016. DOI: 10.1007/978-3-319-46720-7 51

440

Q. Zhao et al.

visualization; and (3) the opportunity to register endoscopy data with the CT, thereby enabling transfer of the tumor contours and texture into the CT space. State-of-the-art monocular endoscopic reconstruction techniques have been applied in applications like colonoscopy inspection [1], laparoscopic surgery [2] and orthopedic surgeries [3]. However, most existing methods cannot simultaneously deal with the following three challenges: (1) non-Lambertian surfaces; (2) non-rigid deformation of tissues across frames; and (3) poorly known shape or motion priors. Our proposed pipeline deals with these problems using (1) a Shape-from-Motion-and-Shading (SfMS) method [4] incorporating a new reflectance model for generating single-frame-based partial reconstructions; and (2) a novel geometry fusion algorithm for non-rigid fusion of multiple partial reconstructions. Since our pipeline does not assume any prior knowledge on environments, motion and shapes, it can be readily generalized to other endoscopic applications in addition to our nasopharyngoscopy reconstruction problem. In this paper we focus on the geometry fusion step mentioned above. The challenge here is that all individual reconstructions are only partially overlapping due to the constantly changing camera viewpoint, may have missing data (holes) due to camera occlusion, and may be slightly deformed since the tissue may have deformed between 2D frame acquisitions. Our main contribution in this paper is the design of a novel groupwise surface registration algorithm that can deal with these limitations. An additional contribution is an outlier geometry trimming algorithm based on robust regression. We generate endoscopograms and validate our registration algorithm with data from synthetic CT surface deformations and endoscopic video of a rigid phantom and real patients.

2

Endoscopogram Reconstruction Pipeline

The input to our system (Fig. 1) is a video sequence of hundreds of consecutive frames {Fi |i = 1...N }. The output is an endoscopogram, which is a textured 3D surface model derived from the input frames. We first generate for each frame Fi a reconstruction Ri by the SfMS method. We then fuse multiple single-frame reconstructions {Ri } into a single geometry R. Finally, we texture R by pulling color from the original frames {Fi }. We will focus on the geometry fusion step in Sect. 3 and briefly introduce the other techniques in the rest of this section.

Fig. 1. The endoscopogram reconstruction pipeline.

The Endoscopogram

441

Shape from Motion and Shading (SfMS). Our novel reconstruction method [4] has been shown to be efficient in single-camera reconstruction of live endoscopy data. The method leverages sparse geometry information obtained by Structure-from-Motion (SfM), Shape-from-Shading (SfS) estimation, and a novel reflectance model to characterize non-Lambertian surfaces. In summary, it iteratively estimates the reflectance model parameters and a SfS reconstruction surface for each individual frame under sparse SfM constraints derived within a sliding time window. One drawback of this method is that large tissue deformation and lighting changes across frames can induce inconsistent individual SfS reconstructions. Nevertheless, our experiments show that this kind of error can be well compensated in the subsequent geometry fusion step. In the end, for each frame Fi , a reconstruction Ri is produced as a triangle mesh and transformed into the world space using the camera position parameters estimated from SfM. Mesh faces that are nearly tangent to the camera viewing ray are removed because they correspond to occluded regions. The end result of this is that the reconstructions {Ri } have missing patches and different topology and are only partially overlapping with each other. Texture Mapping. The goal of texture mapping is to assign a color to each vertex v k (superscripts refer to vertex index) in the fused geometry R, which is estimated by the geometry fusion (Sect. 3) of all the registered individual frame surfaces {R′i }. Our idea is to find a corresponding point of v k in a registered surface R′i and to trace back its color in the corresponding frame Fi . Since v k might have correspondences in multiple registered surfaces, we formulate this procedure as a labeling problem and optimize a Markov Random Field (MRF) energy function. In general, the objective function prefers pulling color from nonboundary nearby points in {R′i }, while encouraging regional label consistency.

3

Geometry Fusion

This section presents the main methodological contributions of this paper: a novel groupwise surface registration algorithm based on N-body interaction, and an outlier-geometry trimming algorithm based on robust regression. Related Work. Given the set of partial reconstructions {Ri }, our goal is to non-rigidly deform them into a consistent geometric configuration, thus compensating for tissue deformation and minimizing reconstruction inconsistency among different frames. Current groupwise surface registration methods often rely on having or iteratively estimating the mean geometry (template) [5]. However, in our situation, the topology change and partially overlapping data renders initial template geometry estimation almost impossible. Missing large patches also pose serious challenges to the currents metric [6] for surface comparison. Templatefree methods have been studied for images [7], but it has not been shown that such methods can be generalized to surfaces. The joint spectral graph framework [8] can match a group of surfaces without estimating the mean, but these methods do not explicitly compute deformation fields for geometry fusion.

442

Q. Zhao et al.

Zhao et al. [9] proposed a pairwise surface registration algorithm, Thin Shell Demons, that can handle topology change and missing data. We have extended this algorithm into our groupwise situation. Thin Shell Demons. Thin Shell Demons is a physics-motivated method that uses geometric virtual forces and a thin shell model to estimate surface deformation. The so-called forces {f } between two surfaces {R1 , R2 } are vectors connecting automatically selected corresponding vertex pairs, i.e. {f (v k ) = uk −v k | v k ∈ R1 , uk ∈ R2 } (with some abuse of notation, we use k here to index correspondences). The algorithm regards the surfaces as elastic thin shells and produces a non-parametric deformation vector M field φ : R1 → R2 by iteratively minimizing the energy function E(φ) = k=1 c(v k )(φ(v k ) − f (v k ))2 + Eshell (φ). The first part penalizes inconsistency between the deformation vector and the force vector applied on a point and uses a confidence score c to weight the penalization. The second part minimizes the thin shell deformation energy, which is defined as the integral of local bending and membrane energy:

Eshell (φ) =



λ1 W (σmem (p)) + λ2 W (σbend (p)),

(1)

W (σ) = Y /(1 − τ 2 )((1 − τ )tr(σ 2 ) + τ tr(σ)2 ),

(2)

R

where Y and τ are the Young’s modulus and Poisson’s ratio of the shell. σmem is the tangential Cauchy-Green strain tensor characterizing local stretching. The bending strain tensor σbend characterizes local curvature change and is computed as the shape operator change. 3.1

N-Body Surface Registration

Our main observation is that the virtual force interaction is still valid among N partial shells even without the mean geometry. Thus, we propose a groupwise deformation scenario as an analog to the N-body problem: N surfaces are deformed under the influence of their mutual forces. This groupwise attraction can bypass the need of a target mean and still deform all surfaces into a single geometric configuration. The deformation of a single surface is independent and fully determined by the overall forces exerted on it. With the physical thin shell model, its deformation can be topology-preserving and not influenced by its partial-ness. With this notion in mind, we now have to define (1) mutual forces among N partial surfaces; (2) an evolution strategy to deform the N surfaces. Mutual Forces. In order to derive mutual forces, correspondences should be credibly computed among N partial surfaces. It has been shown that by using the geometric descriptor proposed in [10], a set of correspondences can be effectively computed between partial surfaces. Additionally, in our application, each surface Ri has an underlying texture image Fi . Thus, we also compute texture correspondences between two frames by using standard computer vision techniques. To improve matching accuracy, we compute inlier SIFT correspondences

The Endoscopogram

443

only between frame pairs that are at most T seconds apart. Finally, these SIFT matchings can be directly transformed to 3D vertex correspondences via the SfSM reconstruction procedure. In the end, any given vertex vik ∈ Ri will have Mik corresponding vertices in other surfaces {Rj |j = i}, given as vectors {f β (vik ) = uβ − vik , β = 1...Mik }, where uβ is the β th correspondence of vik in some other surface. These correspondences are associated with confidence scores {cβ (vik )} defined by

β

c

(vik )

 δ(uβ , vik ) = c¯

if uβ , vik  is a geometric correspondence, if uβ , vik  is a texture correspondence,

(3)

where δ is the geometric feature distance defined in [10]. Since we only consider inlier SIFT matchings using RANSAC, the confidence score for texture correspondences is a constant c¯. We then define the overall force exerted on vik as the Mik β k β k Mik β k weighted average: f¯(v k ) = c (v )f (v )/ c (v ). i

β=1

i

i

β=1

i

Deformation Strategy. With mutual forces defined, we can solve for the group deformation fields {φi } by optimizing independently for each surface E(φi ) =

Mi 

c(vik )(φ(vik ) − f¯(vik ))2 + Eshell (φi ),

(4)

k=1

where Mi is the number of vertices that have forces applied. Then, a groupwise deformation scenario is to evolve the N surfaces by iteratively estimating the mutual forces {f } and solving for the deformations {φi }. However, a potential hazard of our algorithm is that without a common target template, the N surfaces could oscillate, especially in the early stage when the force magnitudes are large and tend to overshoot the deformation. To this end, we observe that the thin shell energy regularization weights λ1 , λ2 control the deformation flexibility. Thus, to avoid oscillation, we design the strategy shown in Algorithm 1. Algorithm 1. N-body Groupwise Surface Registration 1: 2: 3: 4: 5:

3.2

Start with large regularization weights: λ1 (0), λ2 (0) In iteration p, compute {f } from the current N surfaces {Ri (p)} Optimize Eq. 4 independently for each surface to obtain {Ri (p + 1)} λ1 (p + 1) = σ ∗ λ1 (p), λ2 (p + 1) = σ ∗ λ1 (p), with σ < 1 Go to step 2 until reaching maximum number of iterations.

Outlier Geometry Trimming

The final step of geometry fusion is to estimate a single geometry R from the registered surfaces {R′i } [11]. However, this fusion step can be seriously harmed by the outlier geometry created by SfMS. Outlier geometries are local surface parts

444

Q. Zhao et al.

(a)

(b)

(c)

(d)

Fig. 2. (a) 5 overlaying registered surfaces, one of which (pink) has a piece of outlier geometry (circled) that does not correspond to anything else. (b) Robust quadratic fitting (red grid) to normalized N (v k ). The outlier scores are indicated by the color. (c) Color-coded W on L. (d) Fused surface after outlier geometry removal.

that are wrongfully estimated by SfMS under bad lighting conditions (insufficient lighting, saturation, or specularity) and are drastically different from all other surfaces (Fig. 2a). The sub-surfaces do not correspond to any part in other surfaces and thereby are carried over by the deformation process to {R′i }. Our observation is that outlier geometry changes a local surface’s topology (branching) and violates many differential geometry properties. We know that the local surface around a point in a smooth 2-manifold can be approximately presented by a quadratic Monge Patch h : U → R3 , where U defines a 2D open set in the tangent plane, and h is a quadratic height function. Our idea is that if we robustly fit a local quadratic surface at a branching place, the surface points on the wrong branch of outlier geometry will be counted as outliers (Fig. 2b). We define the 3D point cloud L = {v 1 , ...v P } of P points as the ensemble of all vertices in {R′i }, N (v k ) as the set of points in the neighborhood of v k and W as the set of outlier scores of L. For a given v k , we transform N (v k ) by taking v k as the center of origin and the normal direction of v k as the z-axis. Then, we use Iteratively Reweighted Least Squares to fit a quadratic polynomial to the normalized N (v k ) (Fig. 2b). The method produces outlier scores for each of the points in N (v k ), which are then accumulated into W (Fig. 2c). We repeat this robust regression process for all v k in L. Finally, we remove the outlier branches by thresholding the accumulated scores W, and the remaining largest point cloud is used to produce the final single geometry R [11] (Fig. 2d).

4

Results

We validate our groupwise registration algorithm by generating and evaluating endoscopograms from synthetic data, phantom data, and real patient endoscopic videos. We selected algorithm parameters by tuning on a test patient’s data (separate from the datasets presented here). We set the thin shell elastic parameters Y = 2, τ = 0.05, the energy weighting parameters λ1 = λ2 = 1, σ = 0.95, the frame interval T = 0.5s, and the texture confidence score c¯ = 1. Synthetic Data. We produced synthetic deformations to 6 patients’ head-andneck CT surfaces. Each surface has 3500 vertices and a 2–3 cm cross-sectional

The Endoscopogram

445

Fig. 3. Left to right: error plot of synthetic data for 6 patients; a phantom endoscopic video frame; the fused geometry with color-coded deviation (in millimeters) from the ground truth CT.

diameter, covering from the pharynx down to the vocal cords. We created deformations typically seen in real data, such as the stretching of the pharyngeal wall and the bending of the epiglottis. We generated for each patient 20 partial surfaces by taking depth maps from different camera positions in the CT space. Only geometric correspondences were used in this test. We measured the registration error as the average Euclidean distance of all pairs of corresponding vertices after registration (Fig. 3). Our method significantly reduced error and performed better than a spectral-graph-based method [10], which is another potential framework for matching partial surfaces without estimating the mean. Phantom Data. To test our method on real-world data in a controlled environment, we 3D-printed a static phantom model (Fig. 3) from one patient’s CT data and then collected endoscopic video and high-resolution CT for the model. We produced SfMS reconstructions for 600 frames in the video, among which 20 reconstructions were uniformly selected for geometry fusion (using more surfaces for geometry fusion won’t further increase accuracy, but will be computationally slower). The SfMS results were downsampled to ∼2500 vertices and rigidly aligned to the CT space. Since the phantom is rigid, the registration plays the role of unifying inconsistent SfMS estimation. No outlier geometry trimming was performed in this test. We define a vertex’s deviation as its distance to the nearest point in the CT surface. The average deviation of all vertices is 1.24 mm for the raw reconstructions and is 0.94 mm for the fused geometry, which shows that the registration can help filter out inaccurate SfMS geometry estimation. Figure 3 shows that the fused geometry resembles the ground truth CT surface except in the farther part, where less data was available in the video. Patient Data. We produced endoscopograms for 8 video sequences (300 frames per sequence) extracted from 4 patient endoscopies. Outlier geometry trimming was used since lighting conditions were often poor. We computed the overlap distance (OD) defined in [12], which measures the average surface deviation between all pairs of overlapping regions. The average OD of the 8 cases is 1.6 ± 0.13 mm before registration, 0.58 ± 0.05 mm after registration, and 0.24 ± 0.09 mm after outlier geometry trimming. Figure 4 shows one of the cases.

446

Q. Zhao et al.

Fig. 4. OD plot on the point cloud of 20 surfaces. Left to right: before registration, after registration, after outlier geometry trimming, the final endoscopogram.

5

Conclusion

We have described a pipeline for producing an endoscopogram from a video sequence. We proposed a novel groupwise surface registration algorithm and an outlier-geometry trimming algorithm. We have demonstrated via synthetic and phantom tests that the N-body scenario is robust for registering partiallyoverlapping surfaces with missing data. Finally, we produced endoscopograms for real patient endsocopic videos. A current limitation is that the video sequence is at most 3–4 s long for robust SfM estimation. Future work involves fusing multiple endoscopograms from different video sequences. Acknowledgements. This work was supported by NIH grant R01 CA158925.

References 1. Hong, D., Tavanapong, W., Wong, J., Oh, J., de Groen, P.C.: 3D reconstruction of virtual colon structures from colonoscopy images. Comput. Med. Imaging Graph. 38(1), 22–23 (2014) 2. Maier-Hein, L., Mountney, P., Bartoli, A., Elhawary, H., Elson, D., Groch, A., Kolb, A., Rodrigues, M., Sorger, J., Speidel, S., Stoyanov, D.: Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery. Med. Image Anal. 17(8), 974–996 (2013) 3. Wu, C., Narasimhan, S.G., Jaramaz, B.: A multi-image shape-from-shading framework for near-lighting perspective endoscopes. Int. J. Comput. Vis. 86(2), 211–228 (2010) 4. Price, T., Zhao, Q., Rosenman, J., Pizer, S., Frahm, J.M.: Shape from motion and shading in uncontrolled environments. Under submission, To appear. http:// midag.cs.unc.edu/ 5. Durrleman, S., Prastawa, M., Korenberg, J.R., Joshi, S., Trouv´e, A., Gerig, G.: Topology preserving atlas construction from shape data without correspondence using sparse parameters. In: Ayache, N., Delingette, H., Golland, P., Mori, K. (eds.) MICCAI 2012, Part III. LNCS, vol. 7512, pp. 223–230. Springer, Heidelberg (2012) 6. Durrleman, S., Pennec, X., Trouv´e, A., Ayache, N.: Statistical models of sets of curves and surfaces based on currents. Med. Image Anal. 13(5), 793–808 (2009) 7. Balci, S.K., Golland, P., Shenton, M., Wells, W.M.: Free-form B-spline deformation model for groupwise registration. In: MICCAI, pp. 23–30 (2007) 8. Arslan, S., Parisot, S., Rueckert, D.: Joint spectral decomposition for the parcellation of the human cerebral cortex using resting-state fMRI. In: Ourselin, S., Alexander, D.C., Westin, C.-F., Cardoso, M.J. (eds.) IPMI 2015. LNCS, vol. 9123, pp. 85–97. Springer, Heidelberg (2015)

The Endoscopogram

447

9. Zhao, Q., Price, J.T., Pizer, S., Niethammer, M., Alterovitz, R., Rosenman, J.: Surface registration in the presence of topology changes and missing patches. In: Medical Image Understanding and Analysis, pp. 8–13 (2015) 10. Zhao, Q., Pizer, S., Niethammer, M., Rosenman, J.: Geometric-feature-based spectral graph matching in pharyngeal surface registration. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part I. LNCS, vol. 8673, pp. 259–266. Springer, Heidelberg (2014) 11. Curless, B., Levoy, M.: A volumetric method for building complex models from range images. In: SIGGRAPH, pp. 303–312 (1996) 12. Huber, D.F., Hebert, M.: Fully automatic registration of multiple 3D data sets. Image Vis. Comput. 21(7), 637–650 (2003)

Robust Image Descriptors for Real-Time Inter-Examination Retargeting in Gastrointestinal Endoscopy Menglong Ye1(B) , Edward Johns2 , Benjamin Walter3 , Alexander Meining3 , and Guang-Zhong Yang1 1

The Hamlyn Centre for Robotic Surgery, Imperial College London, London, UK [email protected] 2 Dyson Robotics Laboratory, Imperial College London, London, UK 3 Centre of Internal Medicine, Ulm University, Ulm, Germany

Abstract. For early diagnosis of malignancies in the gastrointestinal tract, surveillance endoscopy is increasingly used to monitor abnormal tissue changes in serial examinations of the same patient. Despite successes with optical biopsy for in vivo and in situ tissue characterisation, biopsy retargeting for serial examinations is challenging because tissue may change in appearance between examinations. In this paper, we propose an inter-examination retargeting framework for optical biopsy, based on an image descriptor designed for matching between endoscopic scenes over significant time intervals. Each scene is described by a hierarchy of regional intensity comparisons at various scales, offering tolerance to long-term change in tissue appearance whilst remaining discriminative. Binary coding is then used to compress the descriptor via a novel random forests approach, providing fast comparisons in Hamming space and real-time retargeting. Extensive validation conducted on 13 in vivo gastrointestinal videos, collected from six patients, show that our approach outperforms state-of-the-art methods.

1

Introduction

In gastrointestinal (GI) endoscopy, serial surveillance examinations are increasingly used to monitor recurrence of abnormalities, and detect malignancies in the GI tract in time for curative therapy. In addition to direct visualisation of the mucosa, serial endoscopic examinations involve the procurement of histological samples from suspicious regions, for diagnosis and assessment of pathologies. Recent advances in imaging modalities such as confocal laser endomicroscopy and narrow band imaging (NBI), allow for in vivo and in situ tissue characterisation with optical biopsy. Despite the advantages of optical biopsy, the required retargeting of biopsied locations, for tissue monitoring, during intra- or interexamination of the same patient is challenging. For intra-examination, retargeting techniques using local image features have been proposed, which include feature matching [1], geometric transformations [2], c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 448–456, 2016. DOI: 10.1007/978-3-319-46720-7 52

Robust Image Descriptors for Real-Time Inter-Examination Retargeting

449

Fig. 1. A framework overview. Grey arrows represent the training phase using diagnosis video while black arrows represent the querying phase in the surveillance examination.

tracking [3,4], and mapping [5]. However, when applied over successive examinations, these often fail due to the long-term variation in appearance of tissue surface, which causes difficulty in detecting the same local features. For interexamination, endoscopic video manifolds (EVM) [6] was proposed, with retargeting achieved by projecting query images into manifold space using locality preserving projections. In [7], an external positioning sensor was used for retargeting, but requiring manual trajectory registration which interferes with the clinical workflow, increasing the complexity and duration of the procedure. In this work, we propose an inter-examination retargeting framework (see Fig. 1) for optical biopsy. This enables recognition of biopsied locations in the surveillance (second) examination, based on targets defined in the diagnosis (first) examination, whilst not interfering with the clinical workflow. Rather than relying on feature detection, a global image descriptor is designed based on regional image comparisons computed at multiple scales. At the higher scale, this offers robustness to small variations in tissue appearance across examinations, whilst at the lower scale, this offers discrimination in matching those tissue regions which have not changed. Inspired by [8], efficient descriptor matching is achieved by compression into binary codes, with a novel mapping function based on random forests, allowing for fast encoding of a query image and hence real-time retargeting. Validation was performed on 13 in vivo GI videos, obtained from successive endoscopies of the same patient, with 6 patients in total. Extensive comparisons to state-of-the-art methods have been conducted to demonstrate the practical clinical value of our approach.

2 2.1

Methods A Global Image Descriptor for Endoscopic Scenes

Visual scene recognition is often addressed using keypoint-based methods such as SIFT [9], typically made scalable with Bag-of-Words (BOW) [10]. However, these approaches rely on consistent detection of the same keypoint on different observations of the same scene, which is often not possible when applied to endoscopic scenes undergoing long-term appearance changes of the tissue surface.

450

M. Ye et al.

Fig. 2. (a) Obtaining an integer from one location; (b) creating the global image descriptor from all locations using spatial pyramid pooling.

In recent years, the use of local binary patterns (LBP) [11] has proved popular for recognition due to its fast computational speed, and robustness to image noise and illumination variation. Here, pairs of pixels within an image patch are compared in intensity to create a sequence of binary numbers. We propose a novel, symmetric version of LBP which performs 4 diagonal comparisons within a patch to yield a 4-bit string for each patch, representing an integer from 0 to 15. This comparison mask acts as a sliding window over the image, and a 16-bin histogram is created from the full set of integers. To offer tolerance to camera translation, we extend LBP by comparing local regions rather than individual pixels, with each region the average of its underlying pixels, as shown in Fig. 2(a). To encode global geometry such that retargeting ensures similarity at multiple scales, we adopt the spatial pyramid pooling method [12] which divides an image into a set of coarse-to-fine levels. As shown in Fig. 2(b), we perform pooling with three levels, where the second and third levels are divided into 2× 2 and 4 × 4 partitions, respectively, with each partition assigned its own histogram based on the patches it contains. For the second and third levels, further overlapped partitions of 1 × 1 and 3 × 3 are created to allow for deformation and scale variance. For patches of 3×3 regions, we use patches of 24×24, 12×12 and 6 × 6 pixels for the first, second and third levels, respectively. The histograms for all partitions over all levels are then concatenated to create a 496-d descriptor. 2.2

Compressing the Descriptor into a Compact Binary Code

Recent advances in large-scale image retrieval propose compressing image descriptors into compact binary codes (known as Hashing [8,13–15]), to allow for efficient descriptor matching in Hamming space. To enable real-time retargeting, and hence application without affecting the existing clinical workflow, we similarly compress our descriptor via a novel random forests hash. Furthermore, we propose to learn the hash function with a loss function, which maps to a new space where images from the same scene have a smaller descriptor distance, compared with the original descriptor.

Robust Image Descriptors for Real-Time Inter-Examination Retargeting

451

n

Let us consider a set of training image descriptors {xi }i=1 from the diagnosis sequence, each assigned to a scene label representing its topological location, where each scene is formed of a cluster of adjacent images. We now aim to infer a binary code of m bits for each descriptor, by encouraging the Hamming distance between the codes of two images to be small for images of the same scene, and large for images of different scenes, as in [8]. Let us now denote Y as an affinity matrix, where yij = 1 if images xi and xj have the same scene label, and yij = 0 if not. We now sequentially optimise each bit in the code, such that for r-th bit optimisation, we have the objective function: min b(r)

n  n 

n

lr (br,i , br,j ; yij ) , s.t. b(r) ∈ {0, 1} .

(1)

i=1 j=1

Here, br,i is the r-th bit of image xi , b(r) is a vector of the r-th bits for all n images, and lr (·) is the loss function for the assignment of bits br,i and br,j given the image affinity yij . As proved in [8], this objective can be optimised by formulating a quadratic hinge loss function as follows:   2 0 − D bri , brj , if yij = 1    2 lr (br,i , br,j ; yij ) =  (2) max 0.5m − D bri , brj , 0 , if yij = 0   Here, D bri , brj denotes the Hamming distance between bi and bj for the first r bits. Note that during binary code inference, the optimisation of each bit uses the results of the optimisation of the previous bits, and hence this is a series of local optimisations due to the intractability of global optimisation. 2.3

Learning Encoding Functions with Random Forests

With each training image descriptor assigned a compact binary code, we now n n propose a novel method for mapping {xi }i=1 to {bi }i=1 , such that the binary code for a new query image may be computed. We denote this function Φ (x), m and represent it as a set of independent hashing functions {φi (x)}i=1 , one for th each bit. To learn the hashing function φi of the i bit in b, we treat this as a n binary classifier which is trained on input data {xi }i=1 with labels b(i) . Rather than using boosted trees as in [8], we employ random forests [16], which are faster for training and less susceptible to overfitting. Our approach allows for fast hashing which enables encoding to be achieved without slowing down the clinical workflow. We create a set of random forests, one for each m hashing function {φi (x)}i=1 . Each tree in one forest is independently trained n with a random subset of {xi }i=1 , and comparisons of random pairs of descriptor elements as the split functions. We grow each binary decision tree by maximising the information gain to optimally split the data X into left XL and right XR subsets at each node. This information gain I is defined as: I = π (X) −

1 |X|



k∈{L,R}

|Xk |π (Xk )

(3)

452

M. Ye et al.

Table 1. Mean average precision for recognition, both for the descriptor and the entire framework. Note that the results of hashing-based methods are at 64-bit. Descriptor Framework Methods BOW GIST SPACT Ours EVM AGH ITQ KSH Fasthash Ours Pat.1 Pat.2 Pat.3 Pat.4 Pat.5 Pat.6

0.227 0.307 0.321 0.331 0.341 0.201

0.387 0.636 0.576 0.495 0.415 0.345

0.411 0.477 0.595 0.412 0.389 0.315

0.488 0.722 0.705 0.573 0.556 0.547

0.238 0.304 0.248 0.274 0.396 0.273

0.340 0.579 0.501 0.388 0.435 0.393

0.145 0.408 0.567 0.289 0.342 0.298

0.686 0.921 0.903 0.889 0.883 0.669

0.802 0.925 0.911 0.923 0.896 0.812

0.920 0.956 0.969 0.957 0.952 0.895

 where π (X) is the Shannon entropy: π (X) = − y∈{0,1} py log (py ). Here, py is the fraction of data in X assigned to label y. Tree growth terminates when the tree reaches a defined maximum depth, or I is below a certain threshold (e−10 ). With T trained trees, each returning a value αt (x) between 0 and 1, the hashing function for the ith bit then averages the responses from all trees and rounds this accordingly to either 0 or 1:  T 0 if T1 t=1 αt (x) < 0.5 φi (x) = (4) 1 otherwise Finally, to generate the m-bit binary code, the mapping function Φ (x) conm catenates the output bits from all hashing functions {φi (x)}i=1 into a single binary string. Therefore, to achieve retargeting, the binary string assigned to a query image from the surveillance sequence is compared, via Hamming distance, to the binary strings of scenes captured in a previous diagnosis sequence.

3

Experiments and Results

For validation, in vivo experiments were performed on 13 GI videos (≈ 17, 700 images) obtained from six patients. For each from patients 1–5, two videos were recorded in two separate endoscopies of the same examination, resulting in ten videos. For patient 6, three videos were collected in three serial examinations, with each consecutive examination 3–4 months apart. All videos were collected using standard Olympus endoscopes, with NBI-mode on for image enhancement. The videos were captured at 720 × 576-pixels, and the black borders in the images were cropped out. We used leave-one-video-out cross validation, where one surveillance video (O1) and one diagnosis video (O2) are selected for each experiment, for a total of 16 experiments (two for each of patients 1–5, and six for patient 6). Intensitybased k-means clustering was used to divide O2 into clusters, with the number of clusters defined empirically and proportional to the video length (10–34 clusters). To assign ground truth labels to test images, videos O1 and O2 were observed

Robust Image Descriptors for Real-Time Inter-Examination Retargeting

453

Fig. 3. (a) Means and standard deviations of recognition rates (precisions @ 1-NN) and (b) precision values @ 50-NN with different binary code lengths; (c-h) precision-recall curves of individual experiments using 64-bit codes.

side-by-side manually by an expert, moving simultaneously from start to end. For each experiment, we randomly selected 50 images from O1 (testing) as queries. Our framework has been implemented using Matlab and C++, and runs on an HP workstation (Intel ×5650 CPU). Recognition results for our original descriptor before binary encoding were compared to a range of standard image descriptors, including a BOW vector [10] based on SIFT features, a global descriptor GIST based on frequency response [17], and SPACT [11], a global descriptor based on pixel comparisons. We used the publicly-available code of GIST, and implemented a 10, 000-d BOW descriptor and a 1, 240-d SPACT descriptor. Descriptor similarity was computed using the L2 distance for all methods. Table 1 shows the mAP results, with our descriptor significantly outperforming all competitors. As expected, BOW offers poor results due to the inconsistency of local keypoint detection over long time intervals. We also outperform SPACT as the latter is based on pixel-level comparisons, while our regional comparisons are more robust to illumination variation and camera translation. Whilst GIST typically offers good tolerance to scene deformation, it lacks local texture encoding, whereas the multi-level nature of our novel descriptor ensures that similar descriptors suggest image similarity across a range of scales. Our entire framework was compared to the EVM method [6] and hashingbased methods, including ITQ [15], AGH [13], KSH [14] and Fasthash [8]. For the competitors based on hashing, our descriptor was used as input. For our framework, the random forest consisted of 100 trees, each with a stopping criteria of maximum tree depth of 10, or minimum information gain of e−10 . Figure 3(a) shows the recognition rate if the best-matched image is correct (average preci-

454

M. Ye et al.

Fig. 4. Example top-ranked images for the proposed framework on six patients. Yellowborder images are queries from a surveillance sequence, green- and red-border images are the correctly and incorrectly matches from a diagnosis sequence, respectively.

sion at 1-nearest-neighbour(NN)). We compare across a range of binary string lengths, with our framework consistently outperforming others and with the highest mean recognition rate {0.87, 0.86, 0.82, 0.75}. We also show the precision values at 50-NN in Fig. 3(b). Precision-recall curves (at 64-bit length) for each patient data are shown in Fig. 3(c-h), with mAP values in Table 1. As well as surpassing the original descriptor, our full framework outperforms all other hashing methods, with the highest mAP scores and graceful fall-offs in precision-recall. Our separation of encoding and hashing achieves strong discrimination through a powerful independent classifier compared to the single-stage approaches of [13– 15] and the less flexible classifier of [8]. We also found that the performance of EVM is inferior to ours (Table 1), and significantly lower than that presented in [6]. This is because in their work, training and testing data were from the same video sequence. In our experiments however, two different sequences were used for training and testing, yielding a more challenging task, to fully evaluate the performance on inter-examination retargeting. The current average querying time using 64-bit strings (including descriptor extraction, binary encoding and Hamming distance calculation) is around 19 ms, which demonstrates its realtime capability, compared to 490 ms for querying with the original descriptor. Finally, images for example retargeting attempts are provided for our framework in Fig. 4. Note that our descriptor currently does not explicitly address rotation invariance. However, from the experiments, we do observe that the framework allows for a moderate degree of rotation, due to the inclusion of global geometry in the descriptor. A simple way to achieve full rotation-invariance would be to augment the training data with images from rotated versions of the diagnosis video.

4

Conclusions

In this paper, we have proposed a retargeting framework for optical biopsy in serial endoscopic examinations. A novel global image descriptor with regional

Robust Image Descriptors for Real-Time Inter-Examination Retargeting

455

comparisons over multiple scales deals with tissue appearance variation across examinations, whilst binary encoding with a novel random forest-based mapping function adds discrimination and speeds up recognition. The framework can be readily incorporated into the existing endoscopic workflow due to its capability of real-time retargeting and no requirement of manual calibration. Validation on in vivo videos of serial endoscopies from six patients, shows that both our descriptor and hashing scheme are consistently state-of-the-art.

References 1. Atasoy, S., Glocker, B., Giannarou, S., Mateus, D., Meining, A., Yang, G.-Z., Navab, N.: Probabilistic region matching in narrow-band endoscopy for targeted optical biopsy. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 499–506. Springer, Heidelberg (2009) 2. Allain, B., Hu, M., Lovat, L.B., Cook, R.J., Vercauteren, T., Ourselin, S., Hawkes, D.J.: Re-localisation of a biopsy site in endoscopic images and characterisation of its uncertainty. Med. Image Anal. 16(2), 482–496 (2012) 3. Ye, M., Giannarou, S., Patel, N., Teare, J., Yang, G.-Z.: Pathological site retargeting under tissue deformation using geometrical association and tracking. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013, Part II. LNCS, vol. 8150, pp. 67–74. Springer, Heidelberg (2013) 4. Ye, M., Johns, E., Giannarou, S., Yang, G.-Z.: Online scene association for endoscopic navigation. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part II. LNCS, vol. 8674, pp. 316–323. Springer, Heidelberg (2014) 5. Mountney, P., Giannarou, S., Elson, D., Yang, G.-Z.: Optical biopsy mapping for minimally invasive cancer screening. In: Yang, G.-Z., Hawkes, D., Rueckert, D., Noble, A., Taylor, C. (eds.) MICCAI 2009, Part I. LNCS, vol. 5761, pp. 483–490. Springer, Heidelberg (2009) 6. Atasoy, S., Mateus, D., Meining, A., Yang, G.Z., Navab, N.: Endoscopic video manifolds for targeted optical biopsy. IEEE Trans. Med. Imag. 31(3), 637–653 (2012) 7. Vemuri, A.S., Nicolau, S.A., Ayache, N., Marescaux, J., Soler, L.: Inter-operative trajectory registration for endoluminal video synchronization: application to biopsy site re-localization. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013, Part I. LNCS, vol. 8149, pp. 372–379. Springer, Heidelberg (2013) 8. Lin, G., Shen, C., van den Hengel, A.: Supervised hashing using graph cuts and boosted decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2317–2331 (2015) 9. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) 10. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR, pp. 1–8 (2007) 11. Wu, J., Rehg, J.: Centrist: a visual descriptor for scene categorization. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1489–1501 (2011) 12. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR, pp. 2169–2178 (2006)

456

M. Ye et al.

13. Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: ICML, pp. 1–8 (2011) 14. Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: CVPR, pp. 2074–2081 (2012) 15. Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2916–2929 (2013) 16. Criminisi, A., Shotton, J.: Decision Forests for Computer Vision and Medical Image Analysis. Springer, Heidelberg (2013) 17. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

Kalman Filter Based Data Fusion for Needle Deflection Estimation Using Optical-EM Sensor Baichuan Jiang1,3(&), Wenpeng Gao2,3, Daniel F. Kacher3, Thomas C. Lee3, and Jagadeesan Jayender3 1

3

Department of Mechanical Engineering, Tianjin University, Tianjin, China [email protected] 2 School of Life Science and Technology, Harbin Institute of Technology, Harbin, China Department of Radiology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA

Abstract. In many clinical procedures involving needle insertion, such as cryoablation, accurate navigation of the needle to the desired target is of paramount importance to optimize the treatment and minimize the damage to the neighboring anatomy. However, the force interaction between the needle and tissue may lead to needle deflection, resulting in considerable error in the intraoperative tracking of the needle tip. In this paper, we have proposed a Kalman filter-based formulation to fuse two sensor data — optical sensor at the base and magnetic resonance (MR) gradient-field driven electromagnetic (EM) sensor placed 10 cm from the needle tip — to estimate the needle deflection online. Angular springs model based tip estimations and EM based estimation without model are used to form the measurement vector in the Kalman filter. Static tip bending experiments show that the fusion method can reduce the error of the tip estimation by from 29.23 mm to 3.15 mm and from 39.96 mm to 6.90 mm at the MRI isocenter and 650 mm from the isocenter respectively. Keywords: Sensor fusion navigation



Needle deflection



Kalman filter



Surgical

1 Introduction Minimally invasive therapies such as biopsy, brachytherapy, radiofrequency ablation and cryoablation involve the insertion of multiple needles into the patient [1–3]. Accurate placement of the needle tip can result in reliable acquisition of diagnostic samples [4], effective drug delivery [5] or target ablation [2]. When the clinicians maneuver the needle to the target location, the needle is likely to bend due to the tissue-needle or hand-needle interaction, resulting in suboptimal placement of the needle. Mala et al. [3] have reported that in nearly 28 % of the cases, cryoablation of liver metastases was inadequate due to improper placement of the needles among other © Springer International Publishing AG 2016 S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 457–464, 2016. DOI: 10.1007/978-3-319-46720-7_53

458

B. Jiang et al.

reasons. We propose to develop a real-time navigation system for better guidance while accounting for the needle bending caused by the needle-tissue interactions. Many methods have been proposed to estimate the needle deflection. The most popular class of methods is the model-based estimation [6–8]. Roesthuis et al. proposed the virtual springs model considering the needle as a cantilever beam supported by a series of springs and utilized Rayleigh-Ritz method to solve for needle deflection [8]. The work of Dorileo et al. merged needle-tissue properties, tip asymmetry and needle tip position updates from images to estimate the needle deflection as a function of insertion depth [7]. However, since the model-based estimation is sensitive to model parameters and the needle-tissue interaction is stochastic in nature, needle deflection and insertion trajectory are not completely repeatable. The second type of estimation is achieved using an optical fiber based sensor. Park et al. designed an MRI-compatible biopsy needle instrumented with optical fiber Bragg gratings to track needle deviation [4]. However, the design and functionality of certain needles, such as cryoablation and radiofrequency ablation needles, do not allow for instrumentation of the optical fiber based sensor in the lumen of the needle. The third kind of estimation strategy was proposed in [9], where Kalman filter was employed to combine a needle bending model with the needle base and tip position measurements from two electromagnetic (EM) trackers to estimate the true tip position. This approach can effectively compensate for the quantification uncertainties of the needle model and therefore be more reliable. However, this method is not feasible in the MRI environment due to the use of MRI-unsafe sensors. In this work, we present a new fusion method using an optical tracker at the needle’s base and an MRI gradient field driven EM tracker attached to the shaft of the needle. By integrating the sensor data with the angular springs model presented in [10], the Kalman filter-based fusion model can significantly reduce the estimation error in presence of needle bending.

2 Methodology 2.1

Sensor Fusion

Needle Configuration. In this study, we have used a cone-tip IceRod® 1.5 mm MRI Cryoablation Needle (Galil Medical, Inc.), as shown in Fig. 1. A frame with four passive spheres (Northern Digital Inc. and a tracking system from Symbow Medical Inc.) is mounted on the base of the needle, and an MRI-safe EndoScout® EM sensor (Robin Medical, Inc.) is attached to the needle’s shaft with 10 cm offset from the tip set by a depth stopper. Through pivot calibration, the optical tracking system can provide the needle base position POpt and the orientation of the straight needle OOpt . The EM sensor obtains the sensor’s location PEM and its orientation with respect to the magnetic field of the MR scanner OEM . Kalman Filter Formulation. The state vector is set as xk ¼ ½PtipðkÞ ; P_ tipðkÞ ŠT . The insertion speed during cryoablation procedure is slow enough to be considered as a

Kalman Filter Based Data Fusion for Needle Deflection Estimation

459

Fig. 1. Cryoablation needle mounted with Optical and EM sensor and a depth stopper.

constant. Therefore, the process model can be formulated in the form of xk ¼ Axk 1 þ wk 1 as follows:   T2      s PtipðkÞ I3 TS I3 Ptipðk 1Þ € tipðkÞ ð1Þ þ 2 I3 P ¼ 03 I3 P_ tipðk 1Þ P_ tipðkÞ Ts I3 |fflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflffl} transition matrixA where TS , I3 , 03 stand for the time step, 3-order identity matrix and 3-order null matrix. € tipðkÞ represent the tip position, velocity, acceleration, respectively. The PtipðkÞ , P_ tipðkÞ , P T2 € tipðkÞ is taken as the process noise, denoted by acceleration element ½ s I3 ; Ts I3 ŠT P 2

 N ð0; QÞ, where Q is the process noise covariance matrix. When considering the needle as straight, the tip position was estimated using the three sets of data as follows: TIPOpt (using POpt , OOpt and needle length offset), TIPEM (using PEM , OEM and EM offset), and TIPOptEM (drawing a straight line using POpt and PEM , and needle length offset). When taking the needle bending into account, we can estimate the needle tip position using the angular springs model with either the combination of PEM , POpt , and Oopt (TIPEMOptOpt Þ or the combination of POpt , PEM and OEM (TIPOptEMEM Þ, which are formulated in (2) and (3).

wk

1

  PEMOptOpt ¼ g1 PEM ; POpt ; OOpt   POptEMEM ¼ g2 POpt ; PEM ; OEM

ð2Þ ð3Þ

In our measurement equation zk ¼ Hxk þ vk , as zk is of crucial importance for the     stability and accuracy, zk ¼ ½g1 Popt ; PEM ; Oopt ; g2 Popt ; PEM ; OEM ; TIPEM ŠT is suggested by later experiments. Accordingly, H is defined as in (4): 2

I3 H ¼ 4 I3 I3

3 O3 O3 5 O3

ð4Þ

460

B. Jiang et al.

The measurement noise is denoted as vk  N ð0; RÞ, where R is the measurement noise covariance matrix. For finding the optimal noise estimation, we used the Nelder-Mead simplex method [11]. 2.2

Bending Model

In order to estimate the flexible needle deflection from the sensor data, an efficient and robust bending model is needed. In [10], three different models are presented, including two models based on finite element method (FEM) and one angular springs model. Further in [8] and [12], a virtual springs model was proposed, which took the needle-tissue force interaction into consideration. In [5] and [9], a kinematic quadratic polynomial model is implemented to estimate the needle tip deflection. Since we assume that the deflection is planar and caused by the orthogonal force acting on the needle tip, we have investigated multiple models and here we present the angular springs formulation to model the needle. Angular Springs Model. In this method, the needle is modeled into n rigid rods connected by angular springs with the same spring constant k, which can be identified through experiment. Due to the orthogonal force acting on the needle tip, the needle deflects causing the springs to extend. The insertion process is slow enough to be considered as quasi-static, therefore the rods and springs are in equilibrium at each time step. Additionally, for the elastic range of deformations, the springs behave linearly, i.e., si ¼ k  qi , where si is the spring torque at each joint. The implementation of this method is demonstrated in Fig. 2, and the mechanical relations are expressed as in (5). 8 > > > > > > > > >
kq ¼ F l > 2 tip ½1 þ cos q5 þ cosðq5 þ q4 Þ þ cosðq5 þ q4 þ q3 ފ > > > > kq1 ¼ Ftip l½1 þ cos q5 þ cosðq5 þ q4 Þ þ cosðq5 þ q4 þ q3 Þ > > > : þ cosðq5 þ q4 þ q3 þ q2 ފ

ð5Þ

The Eq. (5) can be written in the form of k  U ¼ Ftip  JðUÞ, where U ¼ ½q1 ; q2 ; . . .; qn Š, and J is the parameter function calculating the force-deflection relationship vector. In order to implement this model into the tip estimation method in (2) and (3), one more equation is needed for relating sensor input data with (5). As the data of PEM ; POpt ; Oopt and Popt ; PEM ; OEM are received during insertion, the deflection of the needle can be estimated as: dEM ¼ l  ½sin q1 þ sinðq1 þ q2 ފ ð6Þ dbase ¼ l  ½sin q3 þ sinðq3 þ q2 ފ

ð7Þ

where dEM represents the deviation of the EM sensor from the optical-measured straight needle orientation and dbase stands for the relative deviation of the needle base from the EM measured direction.

Kalman Filter Based Data Fusion for Needle Deflection Estimation

461

Fig. 2. Angular springs model, taking 5 rods as an example

To estimate the needle deflection from PEM ; POpt ; Oopt or Popt ; PEM ; OEM , a set of nonlinear equations consisting of either (5) (6) or (5) (7) needs to be solved. However, as proposed in [10], the nonlinear system of (6) can be solved iteratively using Picard’s method, which is expressed in (8). Given the needle configuration Ut , we can use the function J to estimate the needle posture at the next iteration. Ut þ 1 ¼ k 1 J ðUt ÞFtip

ð8Þ

For minor deflections, it only takes less than 10 iterations to solve this nonlinear equations, which is efficient enough to achieve real-time estimation. However, the implementation of Picard’s method requires the Ftip to be known. In order to find the Ftip using the sensor inputs, a series of simulation experiments are conducted and linearly-increasing simulated tip force Ftip with the corresponding dEM , dbase are collected. The simulation results are shown in Fig. 3. Left. A least square method is implemented to fit the force-deviation data with a cubic polynomial. Thereafter, to solve the needle configuration using PEM ; POpt ; Oopt and POpt ; PEM ; OEM , the optimal cubic polynomial is used first to estimate the tip force from the measured dEM and dbase , and then (5) is solved iteratively using (8).

Fig. 3. Left: Tip force and deflection relation: tip force increases with 50 mN intervals. Right: Static tip bending experiment setup at MRI entrance.

462

B. Jiang et al.

3 Experiments In order to validate our proposed method, we designed the static tip bending experiment, which was performed at the isocenter and 650 mm offset along z-axis from the isocenter (entrance) of MRI shown in Fig. 3. Right. The experiment is conducted in two steps: first, the needle tip was placed at a particular point (such as inside a phantom marker) and kept static without bending the needle. The optical and EM sensor data were recorded for 10 s. Second, the needle’s tip remained at the same point and the needle was bent by maneuvering the needle base, with a mean magnitude of about 40 mm tip deviation for large bending validation and 20 mm for small bending validation. Similarly, the data were recorded from both sensors for an additional 20 s. Besides, needle was bent in three patterns: in the x-y plane of MRI, y-z plane and all directions, to evaluate the relevance between EM sensor orientation and its accuracy. From the data collected in the first step, the estimated needle tip mean position without needle deflection compensation can be viewed as the gold standard reference point TIPgold . In the second step, the proposed fusion method, together with other tip estimation methods, was used to estimate the static tip position, which was compared with TIPgold . The results are shown in Fig. 4. For large bending, error of TIPOpt , TIPEM and TIPfused is 29.23 mm, 6.29 mm, 3.15 mm at isocenter, and 39.96 mm, 9.77 mm, 6.90 mm at MRI entrance, respectively. For small bending they become 21.00 mm, 3.70 mm, 2.20 mm at isocenter, and 16.54 mm, 5.41 mm, 4.20 mm at entrance, respectively.

4 Discussion By comparing the TIPfused with TIPOpt instead of TIPEM , it should be noted that the EM sensor is primarily used to augment the measurements of the optical sensor and compensate for its line-of-sight problem. Although EM sensor better estimates the needle tip position in presence of needle bending, it is sensitive to the MR gradient field nonlinearity and noise. Therefore, its performance is less reliable when performing the needle insertion procedure at the MRI entrance. Although quantifying the range of bending during therapy is difficult, our initial insertion experiments in a homogeneous spine phantom using the same needle demonstrated a needle bending of over 10 mm. Therefore, we attempted to simulate a larger bending (40 mm tip deviation) that could be anticipated when needle is inserted through heterogeneous tissue composition. However, as small bending will be more commonly observed, validation experiments were conducted and demonstrated consistently better estimation using the data fusion method. From Fig. 4 Bottom, we find that the green dots, which represent bending in the x-y plane, exhibit higher accuracy of the EM sensor, thus resulting in a better fusion result. For large bending experiment in the x-y plane at the entrance, the mean error of TIPOpt , TIPEM and TIPfused are 28.22 mm, 5.76 mm, 3.40 mm, respectively. The result suggests that by maneuvering the needle in the x-y plane, the estimation accuracy can be further improved.

Kalman Filter Based Data Fusion for Needle Deflection Estimation

463

Fig. 4. Top: Single experiment result. Each scattered point represent a single time step record. The left-side points represent the estimated tip positions using different methods. The light blue points in the middle and dark blue points to the right represent the raw data of EM sensor locations and needle base positions respectively. The black sphere is centered at the gold standard point, and encompasses 90 % of the fused estimation points (black). Lines connect the raw data and estimated tip positions of a single time step. Bottom: From left to right: large bending experiment at isocenter, large-entrance, small-isocenter, small-entrance. X axis, from 1 to 6, stand for TIPfused , TIPEM , TIPOptEMEM , TIPEMOptOpt , TIPOptEM , TIPOpt , respectively. Y axis indicates the mean estimation error (mm) and each dot represents a single experiment result.

It should be noted that the magnitude of estimation errors using fusion method still appears large due to the significant bending introduced in the needle. When the actual bending becomes less conspicuous, the estimation error can be much smaller. In addition, the estimation error is not equal to the overall targeting error. It only represents the real-time tracking error in presence of needle bending. By integrating the data fusion algorithm with the 3D Slicer-based navigation system [13], clinicians can be provided with better real-time guidance and maneuverability of the needle.

464

B. Jiang et al.

5 Conclusion In this work, we proposed a Kalman filter based optical-EM sensor fusion method to estimate the flexible needle deflection. The data fusion method exhibits consistently smaller mean error than the methods without fusion. The EM sensor used in our method is MR-safe, and the method requires no other force or insertion-depth sensor, making it easy to integrate with the clinical workflow. In the future, we will improve the robustness of the needle bending model and integrate with our navigation system.

References 1. Abolhassani, N., Patel, R., Moallem, M.: Needle insertion into soft tissue: a survey. Med. Eng. Phys. 29(4), 413–431 (2007) 2. Dupuy, D.E., Zagoria, R.J., Akerley, W., Mayo-Smith, W.W., Kavanagh, P.V., Safran, H.: Percutaneous radiofrequency ablation of malignancies in the lung. AJR Am. J. Roentgenol. 174(1), 57–59 (2000) 3. Mala, T., Edwin, B., Mathisen, Ø., Tillung, T., Fosse, E., Bergan, A., Søreide, Ø., Gladhaug, I.: Cryoablation of colorectal liver metastases: minimally invasive tumour control. Scand. J. Gastroenterol. 39(6), 571–578 (2004) 4. Park, Y.L., Elayaperumal, S., Daniel, B., Ryu, S.C., Shin, M., Savall, J., Black, R.J., Moslehi, B., Cutkosky, M.R.: Real-time estimation of 3-D needle shape and deflection for MRI-guided interventions. IEEE/ASME Trans. Mechatron. 15(6), 906–915 (2010) 5. Wan, G., Wei, Z., Gardi, L., Downey, D.B., Fenster, A.: Brachytherapy needle deflection evaluation and correction. Med. Phys. 32(4), 902–909 (2005) 6. Asadian, A., Kermani, M.R., Patel, R.V.: An analytical model for deflection of flexible needles during needle insertion. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2551–2556 (2011) 7. Dorileo, E., Zemiti, N., Poignet, P.: Needle deflection prediction using adaptive slope model. In: 2015 IEEE International Conference on Advanced Robotics (ICAR), pp. 60–65 (2015) 8. Roesthuis, R.J., Van Veen, Y.R.J., Jahya, A., Misra, S.: Mechanics of needle-tissue interaction. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2557–2563 (2011) 9. Sadjadi, H., Hashtrudi-Zaad, K., Fichtinger, G.: Fusion of electromagnetic trackers to improve needle deflection estimation: simulation study. IEEE Trans. Biomed. Eng. 60(10), 2706–2715 (2013) 10. Goksel, O., Dehghan, E., Salcudean, S.E.: Modeling and simulation of flexible needles. Med. Eng. Phys. 31(9), 1069–1078 (2009) 11. Lagarias, J.C., Reeds, J.A., Wright, M.H., Wright, P.E.: Convergence properties of the Nelder–Mead simplex method in low dimensions. SIAM J. Optim. 9(1), 112–147 (1998) 12. Du, H., Zhang, Y., Jiang, J., Zhao, Y.: Needle deflection during insertion into soft tissue based on virtual spring model. Int. J. Multimedia Ubiquit. Eng. 10(1), 209–218 (2015) 13. Jayender, J., Lee, T.C., Ruan, D.T.: Real-time localization of parathyroid adenoma during parathyroidectomy. N. Engl. J. Med. 373(1), 96–98 (2015)

Bone Enhancement in Ultrasound Based on 3D Local Spectrum Variation for Percutaneous Scaphoid Fracture Fixation Emran Mohammad Abu Anas1(B) , Alexander Seitel1 , Abtin Rasoulian1 , Paul St. John2 , Tamas Ungi3 , Andras Lasso3 , Kathryn Darras4 , David Wilson5 , Victoria A. Lessoway6 , Gabor Fichtinger3 , Michelle Zec2 , David Pichora2 , Parvin Mousavi3 , Robert Rohling1,7 , and Purang Abolmaesumi1 1

7

Electrical and Computer Engineering, University of British Columbia, Vancouver, BC, Canada [email protected] 2 Kingston General Hospital, Kingston, ON, Canada 3 School of Computing, Queen’s University, Kingston, ON, Canada 4 Vancouver General Hospital, Vancouver, BC, Canada 5 Orthopaedics and Centre for Hip Health and Mobility, University of British Columbia, Vancouver, BC, Canada 6 BC Women’s Hospital, Vancouver, BC, Canada Mechanical Engineering, University of British Columbia, Vancouver, BC, Canada

Abstract. This paper proposes a 3D local phase-symmetry-based bone enhancement technique to automatically identify weak bone responses in 3D ultrasound images of the wrist. The objective is to enable percutaneous fixation of scaphoid bone fractures, which occur in 90 % of all carpal bone fractures. For this purpose, we utilize 3D frequency spectrum variations to design a set of 3D band-pass Log-Gabor filters for phase symmetry estimation. Shadow information is also incorporated to further enhance the bone surfaces compared to the soft-tissue response. The proposed technique is then used to register a statistical wrist model to intraoperative ultrasound in order to derive a patient specific 3D model of the wrist bones. We perform a cadaver study of 13 subjects to evaluate our method. Our results demonstrate average mean surface and Hausdorff distance errors of 0.7 mm and 1.8 mm, respectively, showing better performance compared to two state-of-the art approaches. This study demonstrate the potential of the proposed technique to be included in an ultrasound-based percutaenous scaphoid fracture fixation procedure. Keywords: Bone enhancement · Scaphoid fracture · Phase symmetry Log-Gabor filters · Shadow map

1

·

Introduction

Scaphoid fracture is the most probable outcome of wrist injury and it often occurs due to sudden fall on an outstretched arm. To heal the fracture, casting c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 465–473, 2016. DOI: 10.1007/978-3-319-46720-7 54

466

E.M.A. Anas et al.

is usually recommended which immobilizes the wrist in a short arm cast. The typical healing time is 10–12 weeks, however, it can be longer especially for a fracture located at the proximal pole of the scaphoid bone [8]. Better outcome and faster recovery are normally achieved through open (for displaced fracture) or percutaneous (for non-displaced fracture) surgical procedure, where a surgical screw is inserted along the longest axis of the fractured scaphoid bone within a clinical accuracy of 2 mm [7]. In the percutaneous surgical approach for scaphoid fractures, fluoroscopy is usually used to guide the screw along its desired drill path. The major drawbacks of a fluoroscopic guidance are that only a 2D projection view of a 3D anatomy can be used and that the patient and the personnel working in the operating room are exposed to radiation. For reduction of the X-ray radiation exposure, a camerabased augmentation technique [10] can be used. As an alternative to fluoroscopy, 3D ultrasound (US)-based procedure [2,3] has been suggested, mainly to allow real-time 3D data for the navigation. However, the main challenge of using US in orthopaedics lies in the enhancement of weak, dis-connected, blurry and noisy US bone responses. The detection/enhancement of the US bone responses can be broadly categorized into two groups: intensity-based [4] and phase-based approaches [2,5,6]. A review of the literature of these two approaches suggests the phase-based approaches have an advantage where there are low-contrast or variable bone responses, as often observed in 3D US data. Hacihaliloglu et al. [5,6] proposed a number of phase-based bone enhancement approaches using a set of quadrature band-pass (Log-Gabor) filters at different scales and orientations. These filters assumed isotropic frequency responses across all orientations. However, the bone responses in US have a highly directional nature that in turn produce anisotropic frequency responses in the frequency domain. Most recently, Anas et al. [2] presented an empirical wavelet-based approach to design a set of 2D anisotropic band-pass filters. For bone enhancement of a 3D US volume, that 2D approach could be applied to individual 2D frames of a given US volume. However, as a 2D-based approach, it cannot take advantage of correlations between adjacent US frames. As a result, the enhancement is affected by the spatial compounding errors and the errors resulting from the beam thickness effects [5]. In this work, we propose to utilize local 3D Fourier spectrum variations to design a set of Log-Gabor filters for 3D local phase symmetry estimation applied to enhance the wrist bone response in 3D US. In addition, information from the shadow map [4] is utilized to further enhance the bone response. Finally, a statistical wrist model is registered to the enhanced response to derive a patientspecific 3D model of the wrist bones. A study consisting of 13 cadaver wrists is performed to determine the accuracy of the registration, and the results are compared with two previously published bone enhancement techniques [2,5].

2

Methods

Bone responses in US are highly directional with respect to the direction of scanning, i.e., the width of the bone response along the scanning direction is

Bone Enhancement in Ultrasound Based on 3D Local Spectrum Variation

467

significantly narrower than along other directions. As a result, the magnitude spectrum of an US volume has wider bandwidth along the scanning direction than along other directions. Most of the existing phase-based approaches [5,6] employ isotropic filters (having same bandwidths and center frequencies) across different directions for the phase symmetry estimation. However, the isotropic filter bank may not be appropriate to extract the phase symmetry accurately from an anisotropic magnitude spectrum. In contrast to those approaches, here, we account for the spectrum variations in different directions to design an anisotropic 3D Log-Gabor filter bank for an improved phase symmetry estimation. 2.1

Phase Symmetry Estimation

The 3D local phase symmetry estimation starts with dividing a 3D frequency spectrum into different orientations (example in Fig. 1(a)). A set of orientational filters are used for this purpose, where the frequency response of each filter is defined as a multiplication of azimuthal (Φ(φ)) and polar (Θ(θ)) filters:     (φ − φ0 )2 (θ − θ0 )2 O(φ, θ) = Φ(φ) × Θ(θ) = exp − × exp − , 2σφ 2σθ

(1)

where the azimuthal angle φ (0 ≤ φ ≤ 2π) measures the angle in the xy-plane from the positive x-axis in counter-clockwise direction, and the polar angle θ (0 ≤ θ ≤ π) indicates the angle from the positive z-axis. φ0 and θ0 indicate the center of the orientation, and σφ and σθ represent the span/bandwidth of the orientation (Fig. 1(a)). The purpose of the polar orientational filter is to divide the 3D spectrum into different cones, and the azimuthal orientational filter further divides each cone into different sub-spectrums/orientations (Fig. 1(a)).

y 0

x

(a)

u( )

0

7.6 7.5 7.4 7.3 7.2 7.1 7 6.9 6.8 0.5

0 −10

)

orientation

z

cone

−20 −30

, ,

−40 −50 −60 1

1.5

2

2.5

−70

−2 10

(b)

−1 10

10

0

(c)

Fig. 1. Utilization of the spectrum variation in local phase symmetry estimation. (a) A 3D frequency spectrum is divided into different cones, and each segmented cone is further partitioned into different orientations. (b) The variation of spectrum strength over the polar angle. (c) The variation of spectrum strength over the angular frequency.

After selection of a particular orientation, band-pass Log-Gabor filters are applied at different scales. Mathematically, the frequency response of a LogGabor filter is defined as below: 

R(ω) = exp −

(ln( ωω0 ))2 2(ln(κ))2



,

(2)

468

E.M.A. Anas et al.

√ where ω (0 ≤ ω ≤ 3π) represents the angular frequency, ω0 represents the peak tuning frequency, and 0 < κ < 1 is related to the octave bandwidth. Finally, the frequency response of a band-pass Log-Gabor filter at a particular orientation can be expressed as: F (ω, φ, θ) = R(ω)O(φ, θ). 2.2

Enhancement of Bone Responses in US

The bone enhancement starts with the estimation of the parameters for the orientational filter (φ0 , θ0 , σφ , σθ ) for each orientation (Sect. 2.2.1). The estimation of the Log-Gabor filter parameters for each orientation is presented in Sect. 2.2.2. The subsequent bone surface extraction is described afterward (Sect. 2.2.3). 2.2.1 Parameters for Orientational Filter The first step is to compute spherical Fourier transform (FT) P (ω, φ, θ) from a given US volume. To do so, the 3D conventional FT in rectangular coordinates is calculated, followed by transforming them into spherical coordinates. For segmentation of the spectrum into different cones, we compute the√ strength of the spectrum along the polar angle coordinate as: u(θ) =  3π 2π ω=0 φ=0 log(|P (ω, φ, θ)|). An example u(θ) is demonstrated in Fig. 1(b). The locations θm of the maxima of u(θ) are detected, where m = 1, 2, ..., M , and M is the total number of detected maxima. For each θm , the detected left and right minima are represented as − θm and + θm (shown in Fig. 1(b)), and the difference between these two minima positions is estimated as: σθ,m =+ θm −− θm . Note that each detected maxima corresponds to a cone in the 3D frequency spectrum, i.e., the total number of cones is M , the center and the bandwidth of the m-th cone are θm and σθ,m , respectively. Subsequently, each segmented cone is further divided into different subspectrums. To do so, the strength of the spectrum is calculated along the azimuthal angle within a particular cone (say, m-th cone), followed by the maxima and corresponding two neighboring minima as before. Then, the center m φm n and the bandwidth σφ,n of the n-th sub-spectrum within m-th cone are calculated. 2.2.2 Parameters for Log-Gabor Filters For estimation of the Log-Gabor filter parameters at each orientation, the spectrum strength is calculated within that orientation as: wm,n (ω) =  20 log(|P (ω, φ, θ)|) dB. A segmentation of wm,n (ω) is performed [2] to θ

φ

estimate the parameters of the Log-Gabor filters at different scales. The lower m,n m,n ωs,l cut-off frequencies for a scale s are determined from and upper ωs,u m,n w (ω) (an example is shown in Fig. 1(c)), where, 1 ≤ s ≤ S m,n , S m,n is the total number of scales at n-th orientation within m-th cone. The right subscripts ‘l’ and ‘u’ indicate the lower and upper cut-off frequencies. The parameters of the Log-Gabor filters (ω0 and κ) can be directly calculated

Bone Enhancement in Ultrasound Based on 3D Local Spectrum Variation m,n from the lower and upper cut-off frequencies as: ωs,0 = m,n √   ω κm,n = exp(−0.25 log2 ωu,l 2 ln 2). m,n s



469

m,n m,n ωs,l ωs,u and

s,l

2.2.3 Bone Surface Extraction The above estimated filter parameters are then utilized to compute the frequency responses of the orientational and Log-Gabor filters using Eqs. (1)-(2). These filters are subsequently used in 3D phase symmetry estimation [5]. As local phase symmetry also enhances other anatomical interfaces having symmetrical responses, shadow information is utilized to suppress those responses from other anatomies. A shadow map is estimated for each voxel by weighted summation of the intensity values of all voxels beneath [4]. The product of the shadow map with the phase symmetry is defined as the bone response (BR) in this work, which has a range from 0 to 1. To construct a target bone surface, we use a simple thresholding with a threshold of Tbone on the BR volume to detect the bones in the US volume. An optimized selection of the threshold is not possible due to a smaller sample size (13) in this work, therefore, an empirical threshold value of 0.15 is chosen. 2.3

Registration of a Statistical Wrist Model

A multi-object statistical wrist shape+scale+pose model is developed based on the idea in [1] to capture the main modes of shape, scale and pose variations of the wrist bones across a group of subjects at different wrist positions. For the training during the model development, we use a publicly available wrist database [9]. For registration of the model to a target point cloud, a multi-object probabilistic registration is used [11]. The sequential registration is carried out in two steps: (1) the statistical model is registered to a preoperative CT acquired at neutral wrist position, and (2) a subsequent registration of the model to the extracted bone surface in US acquired at a non-neutral wrist position. Note that in the second step only pose coefficients are optimized to capture the pose differences between CT and US. Note that the pose model in [1] captures both the rigid-body and scale variations; however, in this work we use two different models (pose and scale, respectively) to capture those variations. The key idea behind separation of the scale from the rigid-body motion is to avoid the scale optimization during the US registration, as the scale estimation from a limited view of the bony anatomy in US may introduce additional registration error.

3

Experiments, Evaluation and Results

A cadaver experiment including 13 cadaver wrists was performed for evaluation as well as comparison of our proposed approach with two state-of-the art techniques: a 2D empirical wavelet based local phase symmetry (EWLPS) [2] and a 3D local phase symmetry (3DLPS) [5] methods.

470

3.1

E.M.A. Anas et al.

Experimental Setup

For acquisition of US data from each cadaver wrist, a motorized linear probe (Ultrasonix 4D L14-5/38, Ultrasonix, Richmond, BC, Canada) was used with a frequency of 10 MHz, a depth of 40 mm and a field-of-view of 30◦ focusing mainly on the scaphoid bone. A custom-built wrist holder was used to keep the wrist fixed at extension position (suggested by expert hand surgeons) during scanning. To obtain a preoperative image and a ground truth of wrist US bone responses, CTs were acquired at neutral and extension positions, respectively, for all 13 cadaver wrists. An optical tracking system equipped with six fiducial markers was used to track the US probe. 3.2

Evaluation

To generate the ground truth wrist bone surfaces, CTs were segmented manually using the Medical Imaging Interaction Toolkit. Fiducial-based registration was used to align the segmented CT with the wrist bone responses in US. We also needed a manual adjustment afterward to compensate the movement of the wrist bones during US acquisition due to the US probe’s pressure on the wrist. The manual translational adjustment was mainly performed along the direction of the US scanning axis by registering the CT bone surfaces to the US bone responses. For evaluation, we measured the mean surface distance error (mSDE) and maximum surface (Hausdorff) distance error (mxSDE) between the registered and reference wrist bone surfaces. The surface distance error (SDE) at each point in the registered bone surface is defined as its Euclidean distance to the closest neighboring point in the reference surface. mSDE and mxSDE are defined as the average and maximum of SDEs across all vertices, respectively. We also recorded the run-times of the three bone enhancement techniques from unoptimized MATLABT M (Mathworks, Natick, MA, USA) code on an Intel Core i7-2600M CPU at 3.40 GHz for an US volume of size of 57.3 × 36.45 × 32.7 mm3 with a pixel spacing of 0.4 mm in all dimensions. 3.3

Results

Table 1 reports a comparative result of our approach with respect to the EWLPS and 3DLPS methods. For each bone enhancement technique, a consistent threshold value that provides the least error is used across 13 cadaver cases. Table 1. Comparative results of the proposed approach. Method mSDE (mm) mxSDE (mm) Run-time (sec) Our

0.7 ± 0.2

1.8 ± 0.3

11

EWLPS 0.8 ± 0.2

2.5 ± 0.5

4

3DLPS

2.3 ± 0.4

10

0.9 ± 0.3

Bone Enhancement in Ultrasound Based on 3D Local Spectrum Variation (e)

(b)

(f)

(i)

EWLPS

EWLPS

US frame

(a)

3DLPS

(j)

(g)

3DLPS

(c)

(k) (d)

(h)

Our method

Our method

471

Fig. 2. Results of the proposed, EWLPS and 3DLPS methods. (a-h) Example sagittal US frames are shown in (a), (e). The corresponding bone enhancement are demonstrated in (b-d), (f-h). The differences in the enhancement are prominent in the surfaces marked by arrows. (i-k) Example registration results of the statistical model to US for three different methods.

Figure 2 demonstrates the significant improvement we obtain in bone enhancement quality using our proposed method over the two competing techniques. Two example US sagittal slices are demonstrated in Figs. 2(a), (e), and the corresponding bone enhancement are shown below. The registered atlases superimposed on the US volume are displayed in Figs. 2(i–k). The 2D EWLPS method is applied across the axial slices to obtain the bone enhancement of the given US volume, therefore, a better bone enhancement is expected across the axial slices than across the other directions. Figure 2(i) demonstrates a better registration accuracy in axial direction compared to the sagittal one (solid vs dash arrow) for the EWLPS method. The 3DLPS method mainly fails to enhance the curvy surfaces (an example in Fig. 2(c)), as a result, it leads to less accuracy in registration to the curvy surfaces (Fig. 2(j)).

4

Discussion and Conclusion

We have presented a bone enhancement method for 3D US volumes based on the utilization of local 3D spectrum variation. The introduction of the spectrum variation in the filter design allows us to estimate the 3D local phase symmetry more accurately, subsequently better enhancing the expected bone locations. The improved bone enhancement in turn allows a better statistical model registration to the US volume. We have applied our technique to 13 cadaver wrists, and obtained an average mSDE of 0.7 mm and an average mxSDE of 1.8 mm between the registered and reference scaphoid bone surfaces. Though our mxSDE

472

E.M.A. Anas et al.

improvement of 0.5 mm is small in absolute magnitude, the achieved improvement is significant at about 25 % of the clinical surgical accuracy (2 mm). The appearance of neighboring bones in the US volume has a significant impact on the registration accuracy. We have observed better registration accuracies where the scaphoid and all of its four neighboring bones (lunate, trapezium, capitate, part of radius) are included in the field of view of the US scans. The tuning parameter (Tbone ) acts as a trade-off between the appearance of the bony anatomy and the outlier in the extracted surface. We have selected Tbone in such a way that more outliers are allowed with the purpose of increased bone visibility. The effect of the outlier has been compensated by using a probabilistic registration approach that was robust to noise and outliers. One of the limitations of the proposed approach is enhancement of the symmetrical noise. This type of noise mainly appears as scattered objects (marked by circles in Figs. 2(d), (h)) in the bone enhanced volumes. Another limitation is ineffective shadow information utilization. The shadow map used in this work was not able to reduce the non-bony responses substantially. Future work includes the development of a post-filtering approach on the bone enhanced volume to remove the scattered outliers. We also aim to integrate the proposed technology in a clinical workflow and compare it with a fluoroscopic guidance-based technique. Further improvement of the run-time is also needed for the clinical implementation. Acknowledgements. We would like to thank the Natural Sciences and Engineering Research Council, and the Canadian Institutes of Health Research for funding this project.

References 1. Anas, E.M.A., et al.: A statistical shape+pose model for segmentation of wrist CT images. In: SPIE Medical Imaging, vol. 9034, pp. T1–8. International Society for Optics and Photonics (2014) 2. Anas, E.M.A., et al.: Bone enhancement in ultrasound using local spectrum variations for guiding percutaneous scaphoid fracture fixation procedures. IJCARS 10(6), 959–969 (2015) 3. Beek, M., et al.: Validation of a new surgical procedure for percutaneous scaphoid fixation using intra-operative ultrasound. Med. Image Anal. 12(2), 152–162 (2008) 4. Foroughi, P., Boctor, E., Swartz, M.: 2-D ultrasound bone segmentation using dynamic programming. In: IEEE Ultrasonics Symposium, pp. 2523–2526 (2007) 5. Hacihaliloglu, I., et al.: Automatic bone localization and fracture detection from volumetric ultrasound images using 3-D local phase features. UMB 38(1), 128–144 (2012) 6. Hacihaliloglu, I., et al.: Local phase tensor features for 3D ultrasound to statistical shape+pose spine model registration. IEEE TMI 33(11), 2167–2179 (2014) 7. Menapace, K.A., et al.: Anatomic placement of the herbert-whipple screw in scaphoid fractures: a cadaver study. J. Hand Surg. 26(5), 883–892 (2001) 8. van der Molen, M.A.: Time off work due to scaphoid fractures and other carpal injuries in the Netherlands in the period 1990 to 1993. J. Hand Surg.: Br. Eur. 24(2), 193–198 (1999)

Bone Enhancement in Ultrasound Based on 3D Local Spectrum Variation

473

9. Moore, D.C., et al.: A digital database of wrist bone anatomy and carpal kinematics. J. Biomech. 40(11), 2537–2542 (2007) 10. Navab, N., Heining, S.M., Traub, J.: Camera augmented mobile C-arm (CAMC): calibration, accuracy study, and clinical applications. IEEE TMI 29(7), 1412–1423 (2010) 11. Rasoulian, A., Rohling, R., Abolmaesumi, P.: Lumbar spine segmentation using a statistical multi-vertebrae anatomical shape+pose model. IEEE TMI 32(10), 1890–1900 (2013)

Bioelectric Navigation: A New Paradigm for Intravascular Device Guidance Bernhard Fuerst1,2(B) , Erin E. Sutton1,3 , Reza Ghotbi4 , Noah J. Cowan3 , and Nassir Navab1,2 1

Computer Aided Medical Procedures, Johns Hopkins University, Baltimore, MD, USA {be.fuerst,esutton5}@jhu.edu 2 Computer Aided Medical Procedures, Technische Universit¨ at M¨ unchen, Munich, Germany 3 Department of Mechanical Engineering, Johns Hopkins University, Baltimore, MD, USA 4 Department of Vascular Surgery, HELIOS Klinikum M¨ unchen West, Munich, Germany

Abstract. Inspired by the electrolocalization behavior of weakly electric fish, we introduce a novel catheter guidance system for interventional vascular procedures. Impedance measurements from electrodes on the catheter form an electric image of the internal geometry of the vessel. That electric image is then mapped to a pre-interventional model to determine the relative position of the catheter within the vessel tree. The catheter’s measurement of its surroundings is unaffected by movement of the surrounding tissue, so there is no need for deformable 2D/3D image registration. Experiments in a synthetic vessel tree and ex vivo biological tissue are presented. We employed dynamic time warping to map the empirical data to the pre-interventional simulation, and our system correctly identified the catheter’s path in 25/30 trials in a synthetic phantom and 9/9 trials in biological tissue. These first results demonstrated the capability and potential of Bioelectric Navigation as a non-ionizing technique to guide intravascular devices.

1

Introduction

As common vascular procedures become less invasive, the need for advanced catheter navigation techniques grows. These procedures depend on accurate navigation of endovascular devices, but the clinical state of the art presents significant challenges. In practice, the interventionalist plans the path to the area of interest based on pre-interventional images, inserts guide wires and catheters, and navigates to the area of interest using multiple fluoroscopic images. However, it is difficult and time-consuming to identify bifurcations for navigation, and the challenge is compounded by anatomic irregularities. B. Fuerst and E.E. Sutton are joint first authors, having contributed equally. c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 474–481, 2016. DOI: 10.1007/978-3-319-46720-7 55

Bioelectric Navigation

475

Previous work has focused interventional and pre-interventional image registration [2,3,9], but adopting those techniques to endovascular navigation requires deformable registration, a task that has proven challenging due to significant vessel deformation [1]. From a clinical standpoint, the interventionalist is interested in the catheter’s position within the vessel tree, and deformation is irrelevant to navigation since the catheter is constrained to stay within the tree. In fact, to guide the catheter, one only needs a series of consecutive local measurements of its surroundings to identify the branch and excursion into the branch. This suggests that a sensor directly on the catheter could be employed for navigation. In this paper, we propose a radical new solution, Bioelectric Navigation. It was inspired by the weakly electric fish which generates an electric field to detect subtle features of nearby objects [4]. Our technique combines such local impedance measurements with estimates from pre-interventional imaging to determine the position of catheter within the vascular tree (Fig. 1). Instead of the interventionalist relying on fluoroscopy, the catheter itself is equipped with electrodes to provide feedback. One or more of the electrodes on the catheter emits a weak electric signal and measures the change to the resulting electric field as the catheter advances through the vessel tree. The impedance of blood is much lower than that of vessel walls and surrounding tissue [5], so the catheter detects local vessel geometry from measured impedance changes. For instance, as the device passes a bifurcation, it detects a significant disturbance to the electric field caused by the dramatic increase in vessel cross-sectional area. Bioimpedance analysis has been proposed for plaque classification [8] and vessel lumen measurement [7,12] but, to our knowledge, has not been applied to navigation. The relative ordering and amplitude of the features (e.g. bifurcations, stenoses) used for matching the live signal to the pre-interventional estimate are unchanged under deformation, so the system is unaffected by movement and manipulation of the surrounding tissue and does not require 2D/3D deformable registration.

Diagnostic Images

Vessel Model

Signal Simulation

Signal Registration

Live Signal Aquisition

Signal Processing

Visualization

Fig. 1. In Bioelectric Navigation, live bioelectric measurements are registered to simulated signals from a pre-interventional image to identify the catheter’s position.

In our novel system, the local measurement from the catheter is compared to predicted measurements from a pre-interventional image to identify the global position of the catheter relative to the vessel tree. It takes advantage of highresolution pre-interventional images and live voltage measurement for improved device navigation. Its primary benefit would be the reduction of radiation exposure for the patient, interventionalist, and staff. Experiments in a synthetic vessel tree and ex vivo biological tissue show the potential of the proposed technology.

476

2 2.1

B. Fuerst et al.

Materials and Methods Modeling Bioimpedance as a Function of Catheter Location

The first step in Bioelectric Navigation is the creation of a bioimpedance model from the pre-interventional image. A complete bioimpedance model requires solution of the 3D Poisson equation, assuming known permittivities of blood and tissue. Given a relatively simple geometry, one can employ finite element analysis to numerically solve for the electric potential distribution. For our first feasibility experiments, we designed an eight-path vessel phantom with two stenoses and one aneurysm. We imported the 3D CAD model into Comsol Multiphysics (COMSOL, Inc., Stockholm, Sweden) and simulated the signal as a two-electrode catheter passed through the six primary branches (Fig. 2A). The simulation yielded six distinct models, one for each path. 2.2

Cross-Sectional Area to Parameterize Vessel Tree

It is unfeasible to import the geometry of an entire human cardiovascular system and simulate every path from a given insertion site. However, for catheter navigation, we are only interested in temporal variation of the measured signal as the catheter travels through the vascular tree. This is why sensing technologies such as intravascular ultrasound and optical coherence tomography are excessive for catheter navigation. Instead of an exact characterization of the vessel wall at each location, we use a simpler parameter to characterize the vessel geometry: cross-sectional area along the centerline of a vessel. We model the blood between the emitting electrode and the ground electrode as an RC circuit, so the voltage magnitude at the emitting electrode is inversely proportional to the cross-sectional area of the vessel between the two electrodes, greatly simplifying our parameterization of the vessel tree. There are many methods for the segmentation of the vascular tree in CT images, and selecting the optimal method is not a contribution of this work. In fact, our system is largely invariant to the segmentation algorithm chosen. It uses the relative variation between segments to guide the catheter, so as long as it captures major geometric features, the extracted model need not have high resolution. Here, we selected segmentation parameters specific to the imaging modality (e.g. threshold, shape, background suppression) based on published techniques [6,10]. After manual initialization at an entry point, the algorithm detected the centerline and the shortest path between two points in a vessel-like segmentation. It generated the vessel model and computed the cross-sectional area at each segment for each possible path. For the synthetic phantom, the simulated voltage at the emitting electrode was proportional to the inverse of the cross-sectional area extracted from the cone-beam CT (CBCT) (Fig. 2B). We conclude that cross-sectional area is adequate for localization with a two-electrode catheter, the minimum required for Bioelectric Navigation.

Bioelectric Navigation

477

1.0 0.16

0.5 0

0.08

-0.5 0

-1.0 50

100

Position, mm

150

Simulated Voltage, µV

1.5

1/Area, 1/mm

B 0.24 2

A

200

Fig. 2. (A) Simulation of synthetic vessel phantom from imported CAD geometry. The electrodes (black) span the left-most stenosis in this image. The voltage decreases at a bifurcation (blue star) and increases at a stenosis (pink star). (B) Simulated voltage magnitude (green) and the inverse of the cross-sectional area (purple) from the segmented CBCT.

2.3

Bioimpedance Acquisition

Like the fish, the catheter measures changes to its electric field to detect changes in the geometry of its surroundings. The bioimpedance acquisition consists of three main components: the catheter, the electronics, and the signal processing. Almost any commercially available catheter equipped with electrodes can be used to measure bioimpedance. A function generator supplies a sinusoidal input to a custom-designed constant current source. The current source supplies a constant low-current signal to the emitting electrode on the catheter, creating a weak electric field in its near surroundings. As such, the voltage measured by the catheter is a function of the change in impedance. Our software simply extracts the voltage magnitude at the input frequency as the catheter advances. 2.4

Modeled and Empirical Signal Matching

The bioimpedance signal is a scaled and time-warped version of the inverse crosssectional area of the vessel, so the alignment of measured bioimpedance from the catheter with the modeled vessel tree is the foundation of our technique. While we are investigating other alignment methods, in these initial experiments we used open-ended dynamic time warping (OE-DTW) [11]. OE-DTW was chosen because it can be adapted to provide feedback to the interventionalist during a procedure. OE-DTW enables the alignment of incomplete test time series with complete references. The incomplete voltage series during a procedure is incrementally compared to each of the complete references from the model to obtain constant feedback about the predicted location of the catheter. See [11] for details. In our implementation, inverse cross-sectional area along each path formed the reference dataset, and the voltage magnitude from the catheter was the test time series. It estimated the most likely position of the catheter in

478

B. Fuerst et al.

the vessel model by identifying the reference path with the highest similarity measure: the normalized cross-correlation with the test signal. In these initial experiments, we advanced the catheter through approximately 90 % of each path, so our analysis did not take advantage of the open-ended nature of the algorithm.

3 3.1

Experiments and Results Experimental Setup

The prototype was kept constant for the two experiments presented here (Fig. 3). We used a 6 F cardiac electrophysiology catheter (MutliCath 10J, Biotronik, Berlin, Germany). Its ten ring electrodes were 2 mm wide with 5 mm spacing. The input to the current source was ±5 mV at 430 Hz, and the current source supplied a constant 18 µA to the emitting electrode. A neighboring electrode was grounded. The voltage between the two electrodes was amplified and filtered by a low-power biosignal acquisition system (RHD2000, Intan Technologies, Los Angeles, USA). The Intan software (Intan Interface 1.4.2, Intan Technologies, Los Angeles, USA) logged the signal from the electrodes. A windowed discrete Fourier transform converted the signal into the frequency domain, and the magnitude at the input frequency was extracted from each window. The most likely path was identified as described in Sect. 2.4. While real-time implementation is crucial to navigation, these first experiments involved only post hoc analyses.

Fig. 3. A function generator supplied a sinusoidal signal to the current source, creating a weak electric field around the catheter tip. Electrodes on the catheter recorded the voltage as it was pulled through six paths of the phantom. Inset: catheter in phantom.

3.2

Synthetic Vessel Tree

We performed the first validation experiments in the synthetic vessel tree immersed in 0.9 % saline (Fig. 4). A camera recorded the trajectory of the catheter through the phantom as it advanced through the six main paths at 1–2 mm/s. The OE-DTW algorithm correctly identified the path taken in 25/30 trials. The similarity measure was 0.5245 ± 0.0683 for misidentified trials and 0.6751 ± 0.1051 for correctly identified trials.

Bioelectric Navigation

A

B

85

Position, mm 170

255 2.0

2

1.5

0

1.0

-2

0.5

-4

0

5

Time, sec

10

D

Predict 6 1 4 4 4

E

Correspondence

0

15

Similarity 0.4639 0.4396 0.5841 0.5838 0.5509 Warped Signals

Simulated Voltage, µ V

Catheter Voltage Magnitude

C 40

Actual 1 2 5 5 5

479

0

5

10

Time, sec

Position, mm 4

0

160

240

300

0.75

2

0.5

0

0.25

0

-2 0

4

8

Time, sec

Correspondence

200

Position, mm

80

1/Area, 1/mm2

Catheter Voltage Magnitude

Fig. 4. (A) Synthetic phantom with labeled paths. The two halves of the phantom were machined from acrylic and sealed with a thin layer of transparent waterproof grease. When assembled, it measured 10 cm × 25.4 cm × 5 cm. (B) Trials for which OE-DTW incorrectly predicted catheter position. (C) The measured voltage (blue) and the simulated signal (green) identify the two stenoses and four bifurcations. The signals appear correlated but misaligned. (D) The OE-DTW algorithm found a correspondence path between the two signals. (E) OE-DTW aligned the simulated data to the measured data and calculated the cross-correlation between the two signals.

12

16

Warped Signals

100

0

0

5

10

Time, sec

15

0

5

10

15

Time, sec

Fig. 5. Biological tissue experiment (left) and results from one trial in the long path (right). The stenosis and bifurcation are visible in both the inverse of the cross-sectional area and voltage magnitude.

3.3

Ex Vivo Aorta

The impedance difference between saline and vessel is less dramatic than between saline and acrylic, so we expected lower amplitude signals in biological tissue. We sutured two porcine aorta into a Y-shaped vessel tree and simulated a stenosis

480

B. Fuerst et al.

in the trunk with a cable tie. We embedded the vessel in a 20 % gelatin solution and filled the vessel with 0.9 % saline. The ground truth catheter position was recorded from fluoroscopic image series collected simultaneously with the voltage measurements (Fig. 5). The catheter was advanced six times through the long path and three times through the short path. The algorithm correctly identified the path 9/9 times with similarity measure 0.6081 ± 0.1614.

4

Discussion

This preliminary investigation suggests that the location of the catheter in a blood vessel can be estimated by comparing a series of local measurements to simulated bioimpedance measurements from a pre-interventional image. Our technology will benefit from further integration of sensing and imaging before clinical validation. While OE-DTW did not perfectly predict the location of the catheter, the trials for which the algorithm misclassified the path also had the lowest similarity scores. In practice, the system would prompt the interventionalist to take a fluoroscopic image when similarity is low. Because it measures local changes in bioimpedance, we expect the highest accuracy in feature-rich environments, those most relevant to endovascular procedures. The estimate is least accurate in low-feature environments like a long, uniform vessel, but as soon as the catheter reaches the next landmark, the real-time location prediction is limited only by the resolution of the electric image from the catheter. A possible source of uncertainty is the catheter’s position in the vessel cross-section relative to the centerline, but according to our simulations and the literature [12], it does not significantly impact the voltage measurement. To display the real-time position estimate, our next step is to compare techniques that match simulated and live data in real time (e.g. OE-DTW, Hidden Markov Models, random forests, and particle filters). A limitation of these matching algorithms is that they fail when the catheter changes direction (insertion vs retraction). One way we plan to address this is by attaching a simple encoder to the introducer sheath to detect the catheter’s heading and prompting our software to only analyze data from when the catheter is being inserted. We recently validated Bioelectric Navigation in biologically relevant flow in the synthetic phantom and performed successful renal artery detection the the abdominal aorta of a sheep cadaver model. Currently, we are evaluating the prototype’s performance in vivo, navigating through the abdominal vasculature of swine.

5

Scientific and Clinical Context

The generation and measurement of bioelectrical signals within vessels and their mapping to a patient-specific vessel model has never been proposed for catheter navigation. This work is complementary to the research done within MICCAI community and has the potential to advance image-guided intervention. Bioelectric Navigation circumvents many clinical imaging challenges such as catheter detection, motion compensation, and catheter tracking. Significantly, deformable

Bioelectric Navigation

481

registration for global 3D localization becomes irrelevant; the interventionalist may move a vessel, but the catheter remains in the same vascular branch. Once incorporated into the clinical workflow, Bioelectric Navigation has the potential to significantly reduce fluoroscope use during common endovascular procedures. In addition, it could ease the positioning of complex grafts, for instance a graft to repair abdominal aortic aneurysm. These custom grafts incorporate holes such that the the visceral arterial ostia are not occluded. Angiographic imaging is of limited use to the positioning of the device because the operator must study the graft markers and arterial anatomy simultaneously. In contrast, when the bioelectric catheter passes a bifurcation, the electric impedance changes dramatically. Bioelectric Navigation’s inside-out sensing could change the current practice for device deployment by providing real-time feedback about device positioning from inside the device itself.

References 1. Ambrosini, P., Ruijters, D., Niessen, W.J., Moelker, A., van Walsum, T.: Continuous roadmapping in liver TACE procedures using 2D–3D catheter-based registration. Int. J. CARS 10, 1357–1370 (2015) 2. Aylward, S.R., Jomier, J., Weeks, S., Bullitt, E.: Registration and analysis of vascular images. Int. J. Comput. Vis. 55(2), 123–138 (2003) 3. Dibildox, G., Baka, N., Punt, M., Aben, J., Schultz, C., Niessen, W., van Walsum, T.: 3D/3D registration of coronary CTA and biplane XA reconstructions for improved image guidance. Med. Phys. 41(9), 091909 (2014) 4. Von der Emde, G., Schwarz, S., Gomez, L., Budelli, R., Grant, K.: Electric fish measure distance in the dark. Nature 395(6705), 890–894 (1998) 5. Gabriel, S., Lau, R., Gabriel, C.: The dielectric properties of biological tissues: II. Measurements in the frequency range 10 Hz to 20 GHz. Phys. Med. Biol. 41(11), 2251–2269 (1996) 6. Groher, M., Zikic, D., Navab, N.: Deformable 2D–3D registration of vascular structures in a one view scenario. IEEE Trans. Med. Imaging 28(6), 847–860 (2009) 7. Hettrick, D., Battocletti, J., Ackmann, J., Linehan, J., Waltier, D.: In vivo measurement of real-time aortic segmental volume using the conductance catheter. Ann. Biomed. Eng. 26, 431–440 (1998) 8. Metzen, M., Biswas, S., Bousack, H., Gottwald, M., Mayekar, K., von der Emde, G.: A biomimetic active electrolocation sensor for detection of atherosclerotic lesions in blood vessels. IEEE Sens. J. 12(2), 325–331 (2012) 9. Mitrovic, U., Spiclin, Z., Likar, B., Pernus, F.: 3D–2D registration of cerebral angiograms: a method and evaluation on clinical images. IEEE Trans. Med. Imaging 32(8), 1550–1563 (2013) 10. Pauly, O., Heibel, H., Navab, N.: A machine learning approach for deformable guide-wire tracking in fluoroscopic sequences. In: Jiang, T., Navab, N., Pluim, J.P.W., Viergever, M.A. (eds.) MICCAI 2010, Part III. LNCS, vol. 6363, pp. 343– 350. Springer, Heidelberg (2010) 11. Tormene, P., Giorgino, T., Quaglini, S., Stefanelli, M.: Matching incomplete time series with dynamic time warping: an algorithm and an application to post-stroke rehabilitation. Artif. Intell. Med. 45, 11–34 (2009) 12. Choi, H.W., Zhang, Z., Farren, N., Kassab, G.: Implications of complex anatomical junctions on conductance catheter measurements of coronary arteries. J. Appl. Physiol. 114(5), 656–664 (2013)

Process Monitoring in the Intensive Care Unit: Assessing Patient Mobility Through Activity Analysis with a Non-Invasive Mobility Sensor Austin Reiter1(B) , Andy Ma1 , Nishi Rawat2 , Christine Shrock2 , and Suchi Saria1 1

2

The Johns Hopkins University, Baltimore, MD, USA [email protected] Johns Hopkins Medical Institutions, Baltimore, MD, USA

Abstract. Throughout a patient’s stay in the Intensive Care Unit (ICU), accurate measurement of patient mobility, as part of routine care, is helpful in understanding the harmful effects of bedrest [1]. However, mobility is typically measured through observation by a trained and dedicated observer, which is extremely limiting. In this work, we present a video-based automated mobility measurement system called NIMS: Non-Invasive Mobility Sensor . Our main contributions are: (1) a novel multi-person tracking methodology designed for complex environments with occlusion and pose variations, and (2) an application of human-activity attributes in a clinical setting. We demonstrate NIMS on data collected from an active patient room in an adult ICU and show a high inter-rater reliability using a weighted Kappa statistic of 0.86 for automatic prediction of the highest level of patient mobility as compared to clinical experts. Keywords: Activity recognition

1

· Tracking · Patient safety

Introduction

Monitoring human activities in complex environments are finding an increasing interest [2,3]. Our current investigation is driven by automated hospital surveillance, specifically, for critical care units that house the sickest and most fragile patients. In 2012, the Institute of Medicine released their landmark report [4] on developing digital infrastructures that enable rapid learning health systems; one of their key postulates is the need for improvement technologies for measuring the care environment. Currently, simple measures such as whether the patient has moved in the last 24 h, or whether the patient has gone unattended for several hours require manual observation by a nurse, which is highly impractical to scale. Early mobilization of critically ill patients has been shown to reduce physical impairments and decrease length of stay [5], however the reliance on direct observation limits the amount of data that may be collected [6]. To automate this process, non-invasive low-cost camera systems have begun to show promise [7,8], though current approaches are limited due to the unique c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 482–490, 2016. DOI: 10.1007/978-3-319-46720-7 56

Patient Mobility in the ICU with NIMS

483

challenges common to complex environments. First, though person detection in images is an active research area [9,10], significant occlusions present limitations because the expected appearances of people do not match what is observed in the scene. Part-based deformable methods [11] do somewhat address these issues as well as provide support for articulation, however when combining deformation with occlusion, these too suffer for similar reasons. This paper presents two main contributions towards addressing challenges common to complex environments. First, using an RGB-D sensor, we demonstrate a novel methodology for human tracking systems which accounts for variations in occlusion and pose. We combine multiple detectors and model their deformable spatial relationship with a temporal consistency so that individual parts may be occluded at any given time, even through articulation. Second, we apply an attribute-based framework to supplement the tracking information in order to recognize activities, such as mobility events in a complex clinical environment. We call this system NIMS: A Non-Invasive Mobility Sensor . 1.1

Related Work

Currently, few techniques exist to automatically and accurately monitor ICU patient’s mobility. Accelerometry is one method that has been validated [12], but it has limited use in critically ill inpatient populations [6]. Related to multi-person tracking, methods have been introduced to leverage temporal cues [13,14], however hand-annotated regions are typically required at the onset, limiting automation. To avoid manual initializations, techniques such as [15,16] employ a single per-frame detector with temporal constraints. Because single detectors are limited towards appearance variations, [15] proposes to make use of multiple detectors, however this assumes that the spatial configuration between the detectors is fixed, which does not scale to address significant pose variations. Much activity analysis research has approached action classification with bagof-words approaches. Typically, spatio-temporal features, such as Dense Trajectories [17], are used with a histogram of dictionary elements or a Fisher Vector encoding [17]. Recent work has applied Convolutional Neural Network (CNN) models to the video domain [18,19] by utilizing both spatial and temporal information within the network topology. Other work uses Recurrent Neural Networks with Long Short Term Memory [20] to model sequences over time. Because the “activities” addressed in this paper are more high-level in nature, traditional spatio-temporal approaches often suffer. Attributes describe high-level properties that have been demonstrated for activities [21], but these tend to ignore contextual information. The remainder of this paper is as follows: first, we describe our multi-person tracking framework followed by our attributes and motivate their use in the clinical setting to predict mobility. We then describe our data collection protocol and experimental results and conclude with discussions and future directions.

484

2

A. Reiter et al.

Methods

Figure 1 shows an overview of our NIMS system. People are localized, tracked, and identified using an RGB-D sensor. We predict the pose of the patient and identify nearby objects to serve as context. Finally, we analyze in-place motion and train a classifier to determine the highest level of patient mobility. 1. Person Localization

2. Patient Identification

3. Patient Pose Classification and Context Detection (pixels/s)

Input Video

4. Motion analysis

5. Mobility Classification

Mobility Level

In-bed activity

Motion vs. non-motion threshold

Caregiver

Patient

Bed up without Patient Sitting in Chair Patient

No in-bed activity (frames)

Fig. 1. Flowchart of our mobility prediction framework. Our system tracks people in the patient’s room, identifies the “role” of each (“patient”, “caregiver”, or “family member”), relevant objects, and builds attribute features for mobility classification.

2.1

Multi-person Tracking by Fusing Multiple Detectors

Our tracking method works by formulating an energy functional comprising of spatial and temporal consistency over multiple part-based detectors (see Fig. 2). We model the relationship between detectors within a single frame using a deformable spatial model and then track in an online setting.

Fig. 2. Full-body (red) and Head (green) detectors trained by [11]. The head detector may fail with (a) proximity or (d) distance. The full-body detector may also struggle with proximity [(b) and (c)]. (To protect privacy, all images are blurred). (Color figure online)

Modeling Deformable Spatial Configurations - For objects that exhibit deformation, such as humans, there is an expected spatial structure between regions of interest (ROIs) (e.g., head, hands, etc.) across pose variations. Within each pose (e.g. lying, sitting, or standing), we can speculate about an ROI (e.g. head) based on other ROIs (e.g. full-body). To model such relationships, we assume that there is a projection matrix Acll′ which maps the location of ROI l to that of l′ for a given pose c. With a training dataset, C types of poses

Patient Mobility in the ICU with NIMS

485

are determined automatically by clustering location features [10], and projection matrix Acll′ can be learnt by solving a regularized least-square optimization problem. To derive the energy function of our deformable model, we denote the number of persons in the t-th frame as M t . For the m-th person, the set of corresponding bounding boxes from L ROIs is defined by X t = {X1t (m), · · · , XLt (m)}. For any two proposed bounding boxes Xlt′ (m) and Xlt (m) at frame t for individual m, the deviation from the expected spatial configuration is quantified as the error between the expected location of the bounding box for the second ROI conditioned on the first. The total cost is computed by summing, for each of the M t individuals, the minimum cost for each of the C subcategories: t

t

t

Espa (X , M ) =

M 

m=1

min

1≤c≤C



Acll′ Xlt (m) − Xlt′ (m)2

(1)

l=l′

Grouping Multiple Detectors - Next we wish to automate the process of detecting people to track using a combination of multiple part-based detectors. A collection of existing detection methods [11] can be employed to train K detectors; each detector is geared towards detecting an ROI. Let us consider two bounding boxes Dkt (n) and Dkt ′ (n′ ) from any two detectors k and k ′ , respectively. If these are from the same person, the overlapped region is large when they are projected to the same ROI using our projection matrix. In this case, the average depths in these two bounding boxes are similar. We calculate the probability that these are from the same person as: p = apover + (1 − a)pdepth

(2)

where a is a positive weight, pover and pdepth measure the overlapping ratio and depth similarity between two bounding boxes, respectively. These scores are:   ′ t t ′ t t |Ac(k)(k′ ) Dk (n)∩Dk |Dk (n)∩Ac(k′ )(k) Dk ′ (n )| ′ (n )| pover = max min(|A (3) , c t t t c t ′ ′ D (n)|,|D (n )|) min(|D (n)|,|A D (n )|) (k)(k′ )

k

k′

−(v t (n)−v t (n′ ))2

k

(k′ )(k)

k′

−(v t (n)−v t ′ (n′ ))2

k k k k′ 1 1 2v t ′ (n′ )2 2σ t (n)2 k k pdepth = e + e (4) 2 2 where ℓ maps the k-th detector to the l-th region-of-interest, v and σ denote the mean and standard deviation of the depth inside a bounding box, respectively. By the proximity measure given by (2), we group the detection outputs into N t sets of bounding boxes. In each group Gt (n), the bounding boxes are likely from the same person. Then, we define a cost function that represents the matching relationships between the true positions of our tracker and the candidate locations suggested by the individual detectors as: t

t

t

Edet (X , M ) =

N 

n=1

min

1≤m≤M t



t wkt (n)||Dkt (n′ ) − Xℓ(k) (m)||2

(5)

t (n′ )∈Gt (n) Dk

where wkt (n) is the detection score as a penalty for each detected bounding box.

486

A. Reiter et al.

Tracking Framework - We initialize our tracker at time t = 1 by aggregating the spatial (Eq. 1) and detection matching (Eq. 5) cost functions. To determine the best bounding box locations at time t conditioned on the inferred bounding box locations at time t − 1, we extend the temporal trajectory Edyn and appearance Eapp energy functions from [16] and solve the joint optimization (definition for Eexc , Ereg , Edyn , Eapp left out for space considerations) as: min λdet Edet + λspa Espa + λexc Eexc + λreg Ereg + λdyn Edyn + λapp Eapp (6)

X t ,M t

We refer the interested reader to [22] for more details on our tracking framework. 2.2

Activity Analysis by Contextual Attributes

We describe the remaining steps for our NIMS system here. Patient Identification - We fine-tune a pre-trained CNN [24] based on the architecture in [25], which is initially trained on ImageNet (http://image-net. org/). From our RGB-D sensor, we use the color images to classify images of people into one of the following categories: patient, caregiver, or family-member. Given each track from our multi-person tracker, we extract a small image according to the tracked bounding box to be classified. By understanding the role of each person, we can tune our activity analysis to focus on the patient as the primary “actor” in the scene and utilize the caregivers into supplementary roles. Patient Pose Classification and Context Detection - Next, we seek to estimate the pose of the patient, and so we fine-tune a pre-trained network to classify our depth images into one of the following categories: lying-down, sitting, or standing. We choose depth over color as this is a geometric decision. To supplement our final representation, we apply a real-time object detector [24] to localize important objects that supplement the state of the patient, such as: bed upright, bed down, and chair. By combining bounding boxes identified as people with bounding boxes of objects, the NIMS may better ascertain if a patient is, for example, “lying-down in a bed down” or “sitting in a chair”. Motion Analysis - Finally, we compute in-place body motion. For example, if a patient is lying in-bed for a significant period of time, clinicians are interested in how much exercise in-bed occurs [23]. To achieve this, we compute the mean magnitude of motion with a dense optical flow field within the bounding box of the tracked patient between successive frames in the sequence. This statistic indicates how much frame-to-frame in-place motion the patient is exhibiting. Mobility Classification - [23] describes a clinically-accepted 11-point mobility scale (ICU Mobility Scale), as shown in Table 1 on the right. We collapsed this into our Sensor Scale (left) into 4 discrete categories. The motivation for this collapse was that when a patient walks, this is often performed outside the room where our sensors cannot see. By aggregating the different sources of information described in the preceding steps, we construct our attribute feature Ft with:

Patient Mobility in the ICU with NIMS

487

Table 1. Table comparing our Sensor Scale, containing the 4 discrete levels of mobility that the NIMS is trained to categorize from a video clip of a patient in the ICU, to the standardized ICU Mobility Scale [23], used by clinicians in practice today. Sensor Scale

ICU Mobility Scale

A. Nothing in bed

0. Nothing (lying in bed)

B. In-bed activity

1. Sitting in bed, exercises in bed

C. Out-of-bed activity 2. 3. 4. 5. 6. D. Walking

Passively moved to chair (no standing) Sitting over edge of bed Standing Transferring bed to chair (with standing) Marching in place (at bedside) for short duration

7. Walking with assistance of 2 or more people 8. Walking with assistance of 1 person 9. Walking independently with a gait aid 10. Walking independently without a gait aid

1. Was a patient detected in the image? (0 for no; 1 for yes) 2. What was the patient’s pose? (0 for sitting; 1 for standing; 2 for lying-down; 3 for no patient found ) 3. Was a chair found? (0 for no; 1 for yes) 4. Was the patient in a bed? (0 for no; 1 for yes) 5. Was the patient in a chair? (0 for no; 1 for yes) 6. Average patient motion value 7. Number of caregivers present in the scene We chose these attributes because their combination describes the “state” of the activity. Given a video segment of length T , all attributes F = T [F1 , F2 , . . . , FT ] are extracted and the mean Fμ = t=1 Ft /T is used to represent the overall video segment (the mean is used to account for spurious errors that may occur). We then train a Support Vector Machine (SVM) to automatically map each Fμ to the corresponding Sensor Scale mobility level from Table 1.

3

Experiments and Discussions

Video data was collected from a surgical ICU at a large tertiary care hospital. All ICU staff and patients were consented to participate in our IRB-approved study. A Kinect sensor was mounted on the wall of a private patient room and was connected to a dedicated encrypted computer where data was de-identified and encrypted. We recorded 362 h of video and manually curated 109 video segments covering 8 patients. Of these 8 patients, we use 3 of them to serve as training data for the NIMS components (Sect. 2), and the remaining 5 to evaluate. Training - To train the person, patient, pose, and object detectors we selected 2000 images from the 3 training patients to cover a wide range of appearances.

488

A. Reiter et al.

We manually annotated: (1) head and full body bounding boxes; (2) person identification labels; (3) pose labels; and (4) chair, upright, and down beds. To train the NIMS Mobility classifier, 83 of the 109 video segments covering the 5 left-out patients were selected, each containing 1000 images. For each clip, a senior clinician reviewed and reported the highest level of patient mobility and we trained our mobility classifier through leave-one-out cross validation. Tracking, Pose, and Identification Evaluation - We quantitatively compared our tracking framework to the current SOTA. We evaluate with the widely used metric MOTA (Multiple Object Tracking Accuracy) [26], which is defined as 100 % minus three types of errors: false positive rate, missed detection rate, and identity switch rate. With our ICU dataset, we achieved a MOTA of 29.14 % compared to −18.88 % with [15] and −15.21 % with [16]. Using a popular RGBD Pedestrian Dataset [27], we achieve a MOTA of 26.91 % compared to 20.20 % [15] and 21.68 % [16]. We believe the difference in improvement here is due to there being many more occlusions in our ICU data compared to [27]. With respect to our person and pose ID, we achieved 99 % and 98 % test accuracy, respectively, over 1052 samples. Our tracking framework requires a runtime of 10 secs/frame (on average), and speeding this up to real-time is a point of future work. Table 2. Confusion matrix demonstrating clinician and sensor agreement. A. Nothing B. In-Bed C. Out-of-Bed D. Walking A. Nothing

18

4

0

0

B. In-Bed

3

25

2

0

C. Out-of-Bed

0

1

25

1

D. Walking

0

0

0

4

Mobility Evaluation - Table 2 shows a confusion matrix for the 83 video segments to demonstrate the inter-rater reliability between the NIMS and clinician ratings. We evaluated the NIMS using a weighted Kappa statistic with a linear weighting scheme [28]. The strength of agreement for the Kappa score was qualitatively interpreted as: 0.0–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, 0.81–1.0 as perfect [28]. Our weighted Kappa was 0.8616 with a 95 % confidence interval of (0.72, 1.0). To compare to a popular technique, we computed features using Dense Trajectories [17] and trained an SVM (using Fisher Vector encodings with 120 GMMs), achieving a weighted Kappa of 0.645 with a 95 % confidence interval of (0.43, 0.86). The main source of difference in agreement was contained within differentiating “A” from “B”. This disagreement highlights a major difference between human and machine observation in that the NIMS is a computational method being used to distinguish activities containing motion from those that do not with a quantitative, repeatable approach.

Patient Mobility in the ICU with NIMS

4

489

Conclusions

In this paper, we demonstrated a video-based activity monitoring system called NIMS. With respect to the main technical contributions, our multi-person tracking methodology addresses a real-world problem of tracking humans in complex environments where occlusions and rapidly-changing visual information occurs. We will to continue to develop our attribute-based activity analysis for more general activities as well as work to apply this technology to rooms with multiple patients and explore the possibility of quantifying patient/provider interactions.

References 1. Brower, R.: Consequences of bed rest. Crit. Care Med. 37(10), S422–S428 (2009) 2. Corchado, J., Bajo, J., De Paz, Y., Tapia, D.: Intelligent environment for monitoring Alzheimer patients, agent technology for health care. Decis. Support Syst. 44(2), 382–396 (2008) 3. Hwang, J., Kang, J., Jang, Y., Kim, H.: Development of novel algorithm and realtime monitoring ambulatory system using bluetooth module for fall detection in the elderly. In: IEEE EMBS (2004) 4. Smith, M., Saunders, R., Stuckhardt, K., McGinnis, J.: Best Care at Lower Cost: the Path to Continuously Learning Health Care in America. National Academies Press, Washington, DC (2013) 5. Hashem, M., Nelliot, A., Needham, D.: Early mobilization and rehabilitation in the intensive care unit: moving back to the future. Respir. Care 61, 971–979 (2016) 6. Berney, S., Rose, J., Bernhardt, J., Denehy, L.: Prospective observation of physical activity in critically ill patients who were intubated for more than 48 hours. J. Crit. Care 30(4), 658–663 (2015) 7. Chakraborty, I., Elgammal, A., Burd, R.: Video based activity recognition in trauma resuscitation. In: International Conference on Automatic Face and Gesture Recognition (2013) 8. Lea, C., Facker, J., Hager, G., et al.: 3D sensing algorithms towards building an intelligent intensive care unit. In: AMIA Joint Summits Translational Science Proceedings (2013) 9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE CVPR (2005) 10. Chen, X., Mottaghi, R., Liu, X., et al.: Detect what you can: detecting and representing objects using holistic models and body parts. In: IEEE CVPR (2014) 11. Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. PAMI 32(9), 1627–1645 (2010) 12. Verceles, A., Hager, E.: Use of accelerometry to monitor physical activity in critically ill subjects: a systematic review. Respir. Care 60(9), 1330–1336 (2015) 13. Babenko, D., Yang, M., Belongie, S.: Robust object tracking with online multiple instance learning. PAMI 33(8), 1619–1632 (2011) 14. Lu, Y., Wu, T., Zhu, S.: Online object tracking, learning and parsing with and-or graphs. In: IEEE CVPR (2014) 15. Choi, W., Pantofaru, C., Savarese, S.: A general framework for tracking multiple people from a moving camera. PAMI 35(7), 1577–1591 (2013) 16. Milan, A., Roth, S., Schindler, K.: Continuous energy minimization for multi-target tracking. TPAMI 36(1), 58–72 (2014)

490

A. Reiter et al.

17. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: IEEE ICCV (2013) 18. Karpathy, A., Toderici, G., Shetty, S., et al.: Large-scale video classification with convolutional neural networks. In: IEEE CVPR (2014) 19. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014) 20. Wu, Z., Wang, X., Jiang, Y., Ye, H., Xue, X.: Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: ACMMM (2015) 21. Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: IEEE CVPR (2011) 22. Ma, A.J., Yuen, P.C., Saria, S.: Deformable distributed multiple detector fusion for multi-person tracking (2015). arXiv:1512.05990 [cs.CV] 23. Hodgson, C., Needham, D., Haines, K., et al.: Feasibility and inter-rater reliability of the ICU mobility scale. Heart Lung 43(1), 19–24 (2014) 24. Girshick, R.: Fast R-CNN (2015). arXiv:1504.08083 25. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012) 26. Keni, B., Rainer, S.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J. Image Video Proces. 2008, 1–10 (2008) 27. Spinello, L., Arras, K.O.: People detection in RGB-D data. In: IROS (2011) 28. McHugh, M.: Interrater reliability: the Kappa statistic. Biochemia Med. 22(3), 276–282 (2012)

Patient MoCap: Human Pose Estimation Under Blanket Occlusion for Hospital Monitoring Applications Felix Achilles1,2(B) , Alexandru-Eugen Ichim3 , Huseyin Coskun1 , Federico Tombari1,4 , Soheyl Noachtar2 , and Nassir Navab1,5 1

5

Computer Aided Medical Procedures, Technische Universit¨ at M¨ unchen, Munich, Germany [email protected] 2 Department of Neurology, Ludwig-Maximilians-University of Munich, Munich, Germany 3 Graphics and Geometry Laboratory, EPFL, Lausanne, Switzerland 4 DISI, University of Bologna, Bologna, Italy Computer Aided Medical Procedures, Johns Hopkins University, Baltimore, USA

Abstract. Motion analysis is typically used for a range of diagnostic procedures in the hospital. While automatic pose estimation from RGBD input has entered the hospital in the domain of rehabilitation medicine and gait analysis, no such method is available for bed-ridden patients. However, patient pose estimation in the bed is required in several fields such as sleep laboratories, epilepsy monitoring and intensive care units. In this work, we propose a learning-based method that allows to automatically infer 3D patient pose from depth images. To this end we rely on a combination of convolutional neural network and recurrent neural network, which we train on a large database that covers a range of motions in the hospital bed. We compare to a state of the art pose estimation method which is trained on the same data and show the superior result of our method. Furthermore, we show that our method can estimate the joint positions under a simulated occluding blanket with an average joint error of 7.56 cm. Keywords: Pose estimation RNN · Random forest

1

·

Motion capture

·

Occlusion

·

CNN

·

Introduction

Human motion analysis in the hospital is required in a broad range of diagnostic procedures. While gait analysis and the evaluation of coordinated motor functions [1,2] allow the patient to move around freely, the diagnosis of sleep-related motion disorders and movement during epileptic seizures [3] requires a hospitalization and long-term stay of the patient. In specialized monitoring units, the movements of hospitalized patients are visually evaluated in order to detect critical events and to analyse parameters such as lateralization, movement extent c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 491–499, 2016. DOI: 10.1007/978-3-319-46720-7 57

492

F. Achilles et al.

or the occurrence of pathological patterns. As the analysis of patient movements can be highly subjective [4], several groups have developed semi-automatic methods in order to provide quantified analysis. However, in none of the above works, a full body joint regression has been attempted, which would be necessary for automatic and objective quantification of patient movement. In this work, we propose a new system for fully automatic continuous pose estimation of hospitalized patients, purely based on visual data. In order to capture the constrained body movements in the hospital bed, we built up a large motion database that is comprised of synchronized data from a motion capture system and a depth sensor. We use a novel combination of a deep convolutional neural network and a recurrent network in order to discriminatively predict the patient body pose in a temporally smooth fashion. Furthermore, we augment our dataset with blanket occlusion sequences, and show that our approach can learn to infer body pose even under an occluding blanket. Our contributions can be summarized as follows: (1) proposing a novel framework based on deep learning for real time regression of 3D human pose from depth video, (2) collecting a large dataset of movement sequences in a hospital bed, consisting of synchronized depth video and motion capture data, (3) developing a method for synthetic occlusion of the hospital bed frames with a simulated blanket model, (4) evaluating our new approach against a state-of-the-art pose estimation method based on Random Forests.

2

Related Work

Human pose estimation in the hospital bed has only been approached as a classification task, which allows to estimate a rough pose or the patient status [5,6]. Li et al. [5] use the Kinect sensor SDK in order to retrieve the patient pose and estimate the corresponding status. However, they are required to leave the test subjects uncovered by a blanket, which reduces the practical value for real hospital scenarios. Yu et al. [6] develop a method to extract torso and head locations and use it to measure breathing motion and to differentiate sleeping positions. No attempt was made to infer precise body joint locations and blanket occlusion was reported to decrease the accuracy of the torso detection. While the number of previous works that aim at human pose estimation for bed-ridden subjects is limited, the popularity of depth sensors has pushed research on background-free 3D human pose estimation. Shotton et al. [7] and Girshick et al. [8] train Random Forests on a large non-public synthetic dataset of depth frames in order to capture a diverse range of human shapes and poses. In contrast to their method, we rely on a realistic dataset that was specifically created to evaluate methods for human pose estimation in bed. Furthermore, we augment the dataset with blanket occlusions and aim at making it publicly available. More recently, deep learning has entered the domain of human pose estimation. Belagiannis et al. [9] use a convolutional neural network (CNN) and devise a robust loss function to regress 2D joint positions in RGB images. Such one-shot estimations however do not leverage temporal consistency. In the work of Fragkiadaki et al. [10], the

Patient MoCap: Human Pose Estimation Under Blanket Occlusion

493

authors rely on a recurrent neural network (RNN) to improve pose prediction on RGB video. However in their setting, the task is formulated as a classification problem for each joint, which results in a coarse detection on a 12 × 12 grid. Our method in contrast produces accurate 3D joint predictions in the continuous domain, and is able to handle blanket occlusions that occur in hospital monitoring settings.

3 3.1

Methods Convolutional Neural Network

A convolutional neural network is trained for the objective of one-shot pose estimation in 3D. The network directly predicts all 14 joint locations y which are provided by the motion capture system. We use an L2 objective during stochastic gradient descent training. Incorrect joint predictions yˆ result in a gradient g = 2 · (ˆ y − y), which is used to optimize the network weights via backpropagation. An architecture of three convolutional layers followed by two fully connected layers proved successful for this task. The layers are configured as [9-9-64]/[3-3-128]/[3-3-128]/[13-5-1024]/[1024-42] in terms of [height-widthchannels]. A [2x2] max pooling is applied after each convolution. In order to achieve better generalization of our network, we use a dropout function before the second and before the fifth layer during training, which randomly switches off features with a probability of 50 %. Rectified linear units are used after every learned layer in order to allow for non-linear mappings of input and output. In total, the CNN has 8.8 million trainable weights. After convergence, we use the 1024-element feature of the 4th layer and pass it to a recurrent neural network in order to improve the temporal consistence of our joint estimations. An overview of the full pipeline of motion capture and depth video acquisition as well as the combination of convolutional and recurrent neural network is shown in Fig. 1. 3.2

Recurrent Neural Network

While convolutional neural networks have capability of learning and exploiting local spatial correlations of data, their design does not allow them to learn temporal dependencies. Recurrent neural networks on the other hand are specifically modeled to process timeseries data and can hence complement convolutional networks. Their cyclic connections allow them to capture long-range dependencies by propagating a state vector. Our RNN is built in a Long Short Term Memory (LSTM) way and its implementation closely follows the one described in Graves et al. [11]. We use the 1024-element input vector of the CNN and train 128 hidden LSTM units to predict the 42-element output consisting of x-, yand z-coordinate of each of the 14 joints. The number of trainable weights of our RNN is around 596,000. During training, backpropagation through time is limited to 20 frames.

494

F. Achilles et al.

Motion capture

ground truth yˆ1

[1024]

Depth sensor

frame 1

CNN

RNN

frame 2

CNN

RNN

frame 3

CNN

RNN

frame 4

CNN

RNN

frame N

CNN

RNN

yˆ2 yˆ3 yˆ4

L

y1

L

y2

L

y3

L

y4

L

yN

yˆN

Fig. 1. Data generation and training pipeline. Motion capture (left) allows to retrieve ground truth joint positions y, which are used to train a CNN-RNN model on depth video. A simulation tool was used to occlude the input (blue) with a blanket (grey), such that the system can learn to infer joint locations yˆ even under blanket occlusion.

3.3

Patient MoCap Dataset

Our dataset consists of a balanced set of easier sequences (no occlusion, little movement) and more difficult sequences (high occlusion, extreme movement) with ground truth pose information. Ground truth is provided through five calibrated motion capture cameras which track 14 rigid targets attached to each subject. The system allows to infer the location of 14 body joints (head, neck, shoulders, elbows, wrists, hips, knees and ankles). All test subjects (5 female, 5 male) performed 10 sequences, with a duration of one minute per sequence. Activities include getting out/in the bed, sleeping on a horizontal/elevated bed, eating with/without clutter, using objects, reading, clonic movement and a calibration sequence. During the clonic movement sequence, the subjects were asked to perform rapid twitching movements of arms and legs, such as to display motions that occur during the clonic phase of an epileptic seizure. A calibrated and synchronized Kinect sensor was used to capture depth video at 30 fps. In total, the dataset consists of 180, 000 video frames. For training, we select a bounding box that only contains the bed. To alleviate the adaption to different hospital environments, all frames are rendered from a consistent camera viewpoint, fixed at 2 m distance from the center of the bed at a 70 ◦ inclination. 3.4

Blanket Simulation

Standard motion capture technologies make it impossible to track bodies under blankets due to the necessity of the physical markers to be visible to the tracking cameras. For this reason, we captured the ground truth data of each person lying on the bed without being covered. We turned to physics simulation in order

Patient MoCap: Human Pose Estimation Under Blanket Occlusion

495

Fig. 2. Snapshots of iterations of the physics simulation that was used to generate depth maps occluded by a virtual blanket.

to generate depth maps with the person under a virtual blanket. Each RGB-D frame is used as a collision body for a moving simulated blanket, represented as a regular triangle mesh. At the beginning of a sequence, the blanket is added to the scene at about 2 m above the bed. For each frame of the sequence, gravity acts upon the blanket vertices. Collisions are handled by using a sparse signed distance function representation of the depth frame, implemented in OpenVDB [12]. See Fig. 2 for an example rendering. In order to optimize for the physical energies, we employ a state-of-the-art projection-based dynamics solver [13]. The geometric energies used in the optimization are triangle area preservation, triangle strain and edge bending constraints for the blanket and closeness constraints for the collisions, which results in realistic bending and folding of the simulated blanket.

4

Experiments

As to validate our method, we compare to the regression forest (RF) method introduced by Girshick et al. [8]. The authors used an RF to estimate the body pose from depth data. At the training phase, random pixels in the depth image are taken as training samples. A set of relative offset vectors from each sample’s 3D location to the joint positions is stored. At each branch node, a depthdifference feature is evaluated and compared to a threshold, which determines if the sample is passed to the left or the right branch. Threshold and the depthdifference feature parameters are jointly optimized to provide the maximum information gain at the branch node. The tree stops growing after a maximum depth has been reached or if the information gain is too low. At the leaves, the sets of offsets vectors are clustered and stored as vote vectors. During test time, body joint locations are inferred by combining the votes of all pixels via mean shift. The training time of an ensemble of trees on >100 k images is prohibitively long, which is why the original authors use a 1000-core computational cluster to achieve state-of-the-art results [7]. To circumvent this requirement, we randomly

496

F. Achilles et al.

sample 10 k frames per tree. By evaluating the gain of using 20 k and 50 k frames for a single tree, we found that the accuracy saturates quickly (compare Fig. 6 of [8]), such that using 10k samples retains sufficient performance while cutting down the training time from several days to hours. 4.1

Comparison on the Patient MoCap Dataset

We fix the training and test set by using all sequences of 4 female and 4 male subjects for training, and the remaining subjects (1 female, 1 male) for testing. A grid search over batch sizes B and learning rates η provided B = 50 and η = 3 · 10−2 as the best choice for the CNN and η = 10−4 for the RNN. The regression forest was trained on the same distribution of training data, from which we randomly sampled 10,000 images per tree. We observed a saturation of the RF performance after training 5 trees with a maximum depth of 15. We compare the CNN, RNN and RF methods with regard to their average joint error (see Table 1) and with regard to their worst case accuracy, which is the percentage of frames for which all joint errors satisfy a maximum distance constraint D, see Fig. 3. While the RNN reaches the lowest average error at 12.25 cm, the CNN appears to have less outlier estimations which result in the best worst case accuracy curve. At test-time, the combined CNN and RNN block takes 8.87 ms to infer the joint locations (CNN: 1.65 ms, RNN: 7.25 ms), while the RF algorithm takes 36.77 ms per frame.

Fig. 3. Worst case accuracy computed on 36,000 test frames of the original dataset. On the y-axis we plot the ratio of frames in which all estimated joints are closer to the ground truth than a threshold D, which is plotted on the x-axis.

4.2

Blanket Occlusion

A blanket was simulated on a subset of 10,000 frames of the dataset (as explained in Sect. 3.4). This set was picked from the clonic movement sequence, as it is most relevant to clinical applications and allows to compare one-shot (CNN and RF) and time series methods (RNN) on repetitive movements under occlusion. The three methods were trained on the new mixed dataset consisting of all

Patient MoCap: Human Pose Estimation Under Blanket Occlusion

497

Fig. 4. Average per joint error on the blanket occluded sequence. a)

b)

c)

d)

Table 1. Euclidean distance errors in [cm]. Error on the occluded test set decreases after retraining the models on blanket occluded sequences (+r). Sequence All Occluded Occluded+r

CNN 12.69 9.05 8.61

RNN 12.25 9.23 7.56

RF 28.10 21.30 19.80

Fig. 5. Examples of estimated (red) and ground truth skeletons (green). Pose estimations work without (a,b) and underneath (c,d ) the blanket (blue).

other sequences (not occluded by a blanket) and the new occluded sequence. For the RF, we added a 6th tree which was trained on the occluded sequence. Figure 4 shows a per joint comparison of the average error that was reached on the occluded test set. Especially for hips and legs, the RF approach at over 20 cm error performs worse than CNN and RNN, which achieve errors lower than 10 cm except for the left foot. However, the regression forest manages to identify the head and upper body joints very well and even beats the best method (RNN) for head, right shoulder and right hand. In Table 1 we compare the average error on the occluded sequence before and after retraining each method with blanket data. Without retraining on the mixed dataset, the CNN performs best at 9.05 cm error, while after retraining the RNN clearly learns to infer a better joint estimation for occluded joints, reaching the lowest error at 7.56 cm. Renderings of the RNN predictions on unoccluded and occluded test frames are shown in Fig. 5.

498

5

F. Achilles et al.

Conclusions

In this work we presented a unique hospital-setting dataset of depth sequences with ground truth joint position data. Furthermore, we proposed a new scheme for 3D pose estimation of hospitalized patients. Training a recurrent neural network on CNN features reduced the average error both on the original dataset and on the augmented version with an occluding blanket. Interestingly, the RNN benefits a lot from seeing blanket occluded sequences during training, while the CNN can only improve very little. It appears that temporal information helps to determine the location of limbs which are not directly visible but do interact with the blanket. The regression forest performed well for arms and the head, but was not able to deal with occluded legs and hip joints that are typically close to the bed surface, resulting in a low contrast. The end-to-end feature learning of our combined CNN-RNN model enables it to better adapt to the low contrast of occluded limbs, which makes it a valuable tool for pose estimation in realistic environments. Acknowledgments. The authors would like to thank Leslie Casas and David Tan from TUM and Marc Lazarovici from the Human Simulation Center Munich for their support. This work has been funded by the German Research Foundation (DFG) through grants NA 620/23-1 and NO 419/2-1.

References 1. Stone, E.E., Skubic, M.: Unobtrusive, continuous, in-home gait measurement using the microsoft kinect. IEEE Trans. Biomed. Eng. 60(10), 2925–2932 (2013) 2. Kontschieder, P., Dorn, J.F., Morrison, C., Corish, R., Zikic, D., Sellen, A., D’Souza, M., Kamm, C.P., Burggraaff, J., Tewarie, P., Vogel, T., Azzarito, M., Glocker, B., Chin, P., Dahlke, F., Polman, C., Kappos, L., Uitdehaag, B., Criminisi, A.: Quantifying progression of multiple sclerosis via classification of depth videos. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014, Part II. LNCS, vol. 8674, pp. 429–437. Springer, Heidelberg (2014) 3. Cunha, J., Choupina, H., Rocha, A., Fernandes, J., Achilles, F., Loesch, A., Vollmar, C., Hartl, E., Noachtar, S.: NeuroKinect: a novel low-cost 3Dvideo-EEG system for epileptic seizure motion quantification. PLOS ONE 11(1), e0145669 (2015) 4. Benbadis, S.R., LaFrance, W., Papandonatos, G., Korabathina, K., Lin, K., Kraemer, H., et al.: Interrater reliability of eeg-video monitoring. Neurology 73(11), 843–846 (2009) 5. Li, Y., Berkowitz, L., Noskin, G., Mehrotra, S.: Detection of patient’s bed statuses in 3D using a microsoft kinect. In: EMBC. IEEE (2014) 6. Yu, M.-C., Wu, H., Liou, J.-L., Lee, M.-S., Hung, Y.-P.: Multiparameter sleep monitoring using a depth camera. In: Schier, J., Huffel, S., Conchon, E., Correia, C., Fred, A., Gamboa, H., Gabriel, J. (eds.) BIOSTEC 2012. CCIS, vol. 357, pp. 311– 325. Springer, Heidelberg (2013) 7. Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finocchio, M., Blake, A., Cook, M., Moore, R.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56(1), 116–124 (2013)

Patient MoCap: Human Pose Estimation Under Blanket Occlusion

499

8. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: ICCV. IEEE (2011) 9. Belagiannis, V., Rupprecht, C., Carneiro, G., Navab, N.: Robust optimization for deep regression. In: ICCV. IEEE (2015) 10. Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: ICCV. IEEE (2015) 11. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013) 12. Museth, K., Lait, J., Johanson, J., Budsberg, J., Henderson, R., Alden, M., Cucka, P., Hill, D., Pearce, A.: OpenVDB: an open-source data structure and toolkit for high-resolution volumes. In: ACM SIGGRAPH 2013 Courses. ACM (2013) 13. Bouaziz, S., Martin, S., Liu, T., Kavan, L., Pauly, M.: Projective dynamics: fusing constraint projections for fast simulation. ACM Trans. Graph. (TOG) 33(4), 154 (2014)

Numerical Simulation of Cochlear-Implant Surgery: Towards Patient-Specific Planning Olivier Goury1,2(B) , Yann Nguyen2 , Renato Torres2 , Jeremie Dequidt1 , and Christian Duriez1 1

2

Inria Lille - Nord Europe, Universit´e de Lille 1, Villeneuve-d’Ascq, France [email protected] Inserm, UMR-S 1159, Universit´e Paris VI Pierre et Marie Curie, Paris, France Abstract. During Cochlear Implant Surgery, the right placement of the implant and the minimization of the surgical trauma to the inner ear are an important issue with recurrent fails. In this study, we reproduced, using simulation, the mechanical insertion of the implant during the surgery. This simulation allows to have a better understanding of the failing cases: excessive contact force, buckling of the implant inside and outside the cochlea. Moreover, using a patient-specific geometric model of the cochlea in the simulation, we show that the insertion angle is a clinical parameter that has an influence on the forces endured by both the cochlea walls and the basilar membrane, and hence to post-operative trauma. The paper presents the mechanical models used for the implant, for the basilar membrane and the boundary conditions (contact, friction, insertion etc...) and discuss the obtained results in the perspective of using the simulation for planning and robotization of the implant insertion. Keywords: Cochlear implant surgery

1

· Cochlea modeling · FEM

Introduction

Cochlear implant surgery can be used for profoundly deafened patient, for whom hearing aids are not satisfactory. An electrode array is inserted into the tympanic ramp of the patient’s cochlea (scala tympani). When well-inserted, this array can then stimulate the auditory nerve and provide a substitute way of hearing (Fig. 2). However, as of today, the surgery is performed manually and the surgeon Fig. 2. Cross-section of a cochlea with has only little perception on what hap- implant inserted. pens in the cochlea while he is doing the insertion [1]. Yet, it is often the case that the implant gets blocked in the cochlea before c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 500–507, 2016. DOI: 10.1007/978-3-319-46720-7 58

Numerical Simulation of Cochlear-Implant Surgery

501

being completely inserted (Fig. 1). Another issue is the fact this insertion can create trauma on the wall of the cochlea as well as damaging the basilar membrane. This can lead to poor postoperative speech performances or loss of remaining acoustic hearing in lower frequencies that can be combined with electric stimulation. The simulation the insertion procedure would allow for great outcomes. Indeed, it can be used for surgery planning, where the surgeon wish to predict the quality of the insertion depending on various parameters (such as the insertion angle or the type of implant used) for a specific patient, or surgery assistance in the longer term (where the procedure would be robot-based). Cochlear implant surgery was simulated in [2,3] respectively in 2 and 3 dimensions, on simplified representations of the cochlea. These works allowed to make first predictions about the forces endured by the cochlea walls. In this contribution, we develop a framework able to accurately simulate, in three dimensions, the whole process of the implant insertion into a patientspecific cochlea, including the basilar membrane deformation. The simulation is done using the finite element method and the SOFA framework1 . The implant is modelled using the beam theory, while shell elements are used to define a computational model of the basilar membrane. The cochlea walls are modelled as rigid which is a common assumption [4] due to the bony nature of the cochlea.

Fig. 1. Examples of 3 insertions with different outcomes, from left to right: successful insertion, failed insertion (folding tip), incomplete insertion.

2

Numerical Models and Algorithms

In this section, we describe the numerical model used to capture the mechanical behavior and the specific shapes of the cochlear implant and the basilar membrane. Moreover, the computation of the boundary conditions (contacts with the cochlea walls, insertion of the implant) are also described, as they play an important role in this simulation. Implant Model: The implant is made of silicone and has about 20 electrodes (depending on the manufacturer) spread along its length. It is about half a millimetre thick and about two to three centimetre long. Its thin shape makes 1

www.sofa-framework.org.

502

O. Goury et al.

it possible to use beam elements to capture its motion (see Fig. 3). Its dynamics can be modelled as follows: Mv˙ = p − F(q, v) + HT λ,

(1)

where M is the mass matrix, q is the vector of generalised coordinates (each node at the extremity of a beam contains three spatial degrees of freedom and three angular degrees of freedom), v is the vector of velocities. F represents the internal forces of the beams while p gathers the external forces. λ is the vector of contact forces magnitudes with either the cochlea wall or the basilar membrane, and H gathers the contact directions. The computation of the internal forces F relies on the assumption of an elastic behavior, which brings back the electrode to its rest shape when external forces are released. In practice, we defined a Young’s modulus of around 250 MPa as in [5] and we rely on the assumption of a straight rest shape to model the electrode we used for the experiments. However some pre-shaped electrodes exist, and our implementation of the beam model supports the use of curved reference shape.

Fig. 3. (Left) The implant is modeled using beam elements and, (middle) its motion is constrained by contact and friction response to collision with cochlear walls. (right) Contact forces induces strain on the Basilar membrane.

Basilar Membrane Model: The basilar membrane separates two liquid-filled tunnels that run along the coil of the cochlea: scala media and scala tympani (by which the implant is inserted). It is made of a stiff material but is very thin (about 4 µm) and thus very sensitive to the contact with the electrodes. During the insertion, even if the electrode is soft, the membrane will deform to comply with its local shape. In case of excessive contact force, the membrane will rupture: the electrode could then freely go in the scala media or scala vestibuli. This will lead to loss of remaining hearing, damages to auditory nerve dendrites and fibrosis. To represent the Basilar membrane, we use a shell model [6] that derives from a combination of a triangular in-plane membrane element and a triangular thin plate in bending. The nodes of the membrane that are connected with the walls of the cochlea are fixed, like in the real case. Implant Motion: During the procedure, the implant is pushed (using pliers) through the round window which marks the entrance of the cochlea. To simplify the implant model, we only simulate the portion of the implant which is inside

Numerical Simulation of Cochlear-Implant Surgery

503

the cochlea. The length of the beam model is thus increased progressively during the simulation to simulate the insertion made by the surgeon. Fortunately, our beam model relies on continuum equations, and we can adapt the sampling of beam elements at each simulation step while keeping the continuity of the values of F. The position and orientation of the implant body may play an important role (see Sect. 4), so these are not fixed. Conversely, we consider that the implant is pushed at constant velocity, as a motorized tool for pushing the implant was used in the experiments. Contact Response on Cochlear Walls: The motion of the implant is constrained by contact and friction forces that appear when colliding the walls of the cochlea. To obtain an accurate simulation, the modeling of both geometry of the cochlea walls and physics of the collision response are important. To reproduce the geometry of the cochlea, we rely on images issued from cone-beam CT. The images are segmented using ITK-Snap and the surface of the obtained mesh are smoothed to remove sampling noise. Compared to previous work [2,3], our simulations do not used a simplified geometric representation of the cochlear walls. The contact points between implant and cochlea walls are detected using an algorithm that computes the closest distances (proximity queries) between the mesh and the centerline of the implant model. The algorithm is derived from [7]. At each contact point, the signed distance distance δ n (q) between the centerline and the corresponding point on the collision surface (along the normal direction of the surface) must be larger than the radius of the implant (δ n (q) ≥ r). It should be noted that this collision formulation creates a round shape at the tip of the implant which is realistic but badly displayed visually in the simulation. The contact force λn follows the Signorini’s law: 0 ≤ λn ⊥ δ n (q) − r ≥ 0

(2)

In addition to the precision, one advantage of this law is that there is no additional parameters rather than the relative compliance of the deformable structure in contact. In the tangential direction, λt follows Coulomb’s law friction to reproduce the stick/slip transitions that are observed in the clinical practice. At each contact point, the collision response is based on Signorini’s law and Coulomb’s friction using the solvers available in SOFA. Unfortunately, the friction coefficient µ is one of the missing parameter of the simulation. Several studies have tried to estimate the frictional conditions between the electrode array of the implant and the endosteum lining and the wall of the tympani such as [8] or [9]. However experiments were performed exvivo on a relatively small set of samples and exhibit some important variability and heterogeneity. As a consequence, in Sect. 4, we perform a sensitivity analysis of this parameter.

3

Experimental Validation

As mentioned in the introduction, it is difficult to have an immediate feedback on how the implant deploys in the cochlea due to very limited workspace and

504

O. Goury et al.

visibility. This poor feedback prevents the surgeon to adapt and update his/her gesture to improve the placement of the implant. To have a better understanding of the behaviors and to simplify the measurements, we have conducted experiments of implant placement on temporal bones issued from cadavers. In this section, these experiments are presented as well as a comparison between the measurements and the simulation results. Material : An custom experimental setup is built up to evaluate the forces endured by the scala tympani during the insertion of an electrode array at constant velocity. This setup is described in Fig. 4. Recorded data: This setup allows to compare forces when performing a manual insertion and a motorized, more regular, insertion. With this setup, we are able to reproduce failure cases such as incomplete insertion or so-called folding tip insertion, as displayed in Fig. 1. Ability to Reproduce Incomplete Insertions: The goal of this first comparison is to show if we can reproduce what is observed in practice using simulation. Due to contact and friction conditions and the fact that we work with living structures, it is never possible to reproduce the same insertion, even if the insertion is motorized. So we do not expect the simulation to be predictive. However, we show that the simulation is able to reproduce different scenarios of insertion (complete/incomplete insertion or folding tip). Like in practice, the first important resistance to the insertion of the implant appears in the turn at the bottom of the cochlea (like in the picture (middle) of Fig. 3.) This resistance create a buckling of the implant that limits the transmission in the longitudinal direction till the implant presses the cochlear walls and manages to advance again. If the resistance to motion is too large, the implant stays blocked. This differentiates a complete and incomplete insertion and is captured by the simulation. Evolution of the implant forces while performing the insertion: An indicator of the

Fig. 4. Experimental setup. Microdissected cochleae are molded into resin (a) and fixed to a 6-axis force sensor (c). A motorized uniaxial insertion tool (b) is used to push the electrode array into the scala tympani at a constant velocity. The whole setup is schemed in (d).

Numerical Simulation of Cochlear-Implant Surgery

505

smoothness of the insertion is the force applied on the implant by the surgeon during the surgery. For minimising trauma, that force should typically remain low. Experimental data shows that this force generally increases as the insertion progresses. This is explained by the fact that as the implant is inserted, its surface of contact onto the cochlea walls and the basilar membrane increases, leading to more and more friction. The force has a peak near the first turn of the cochlea wall (the basal turn). We see that the simulation reproduces this behaviour (See Figs. 1 and 6).

4

Sensitivity of the Results to Mechanical and Clinical Parameters

Many parameters can influence the results of the simulation. We distinguish the mechanical parameters (such as friction on the cochlea walls, stiffness of the implant, elasticity of the membrane, etc...) and the clinical parameters, which the surgeon can control to improve the success of the surgery. In this first study, among all the mechanical parameters, we selected to study the influence of the friction, which is complex to measure. We show that the coefficient of friction has an influence on the completeness of the insertion but has less influence on the force that is applied on the basilar membrane (see Fig. 7). For the clinical parameters, we focus on the angle of insertion (see Fig. 5). The position and orientation of the implant compared to the cochlea tunnels plays an important role in the easiness of inserting the implant. The anatomy makes it difficult to have a perfect alignment but the surgeon has still a certain freedom in the placement of the tube tip. Furthermore, his mental representation of the optimal insertion axis is related to his experience and even experts have a 7◦ error of alignment [1]. We test the simulation with various insertion angles, from a aligned case with θ = 0 to a case where the implant is almost orthogonal

Fig. 5. (Left) Forces when performing motorized versus manual insertion using the setup presented in Fig. 4. (Right) Dissected temporal bone used during experiments with the definition of the insertion angle θ: the angle formed by the implant and the wall of the cochlea’s entrance

506

O. Goury et al.

Fig. 6. Comparison between experiments and simulation in 3 cases. We can see that the simulation can reproduce cases met in real experiments (see Fig. 1). Regarding forces on the cochlea walls, the general trend of the simulation is similar to the experiments. To reproduce the folding tip case in the simulation, which is a rare in practice, the array was preplaced with a folded tip at the round window region, which is why the curve does not start from 0 length. In the incomplete insertion case, the force increases greatly when the implant reaches the first turn. The simulation curves stops then. This is because we did note include the real anatomy outside the entrance of the cochlea that would normally constrain the implant and lead the force to keep increasing.

Fig. 7. Forces applied on the cochlea wall (left) and the basilar membrane (center) at the first turn of the cochlea. We can see that larger forces are generated when inserting the implant at a wide angle. Regarding the forces on the basilar membrane, there are two distinct groups of angle: small angles lead to much smaller forces than wider ones. Changing the friction generally increases the forces (right). This leads to an early buckle of the implant outside the cochlea and hence to an incomplete insertion.

to the wall entrance with θ = 85, and compare the outcome of the insertion, as well as the forces induced on the basilar membrane and the implant. Findings are displayed in Fig. 7.

Numerical Simulation of Cochlear-Implant Surgery

5

507

Conclusion and Future Work

In this paper, we propose the first mechanical simulation tool that reproduces the insertion of the cochlear implant in 3D, using patient data. Several scenarios are considered and the results we obtained exhibit that several failures in the surgery can be reproduced in the simulator. Moreover similar pattern of forces against the cochlea’s wall are measured in experimental scenarios and their corresponding simulations. From a quantitative standpoint, an analysis has been conducted to estimate the influence of the main parameters reported by clinicians. This preliminary study could be extended with the following perspectives: first, we need to enrich our experimental study by considering several patients and different implants; second a (semi-)automatized framework should be considered in order to generate patient-specific data from medical images in order to allow in a clinical time a virtual planning of the surgery. This work could be a first step towards the use of simulation in the planning of cochlear implant surgery or even robot-assisted surgery. This objective would require the use of accurate and validated bio-mechanical simulations of the whole procedure (anatomical structures and implant). In-vivo experiments may be necessary. Acknowledgements. The authors thank the foundation “Agir pour l’audition” which funded this work and Oticon Medical.

References 1. Torres, R., Kazmitcheff, G., Bernardeschi, D., De Seta, D., Bensimon, J.L., Ferrary, E., Sterkers, O., Nguyen, Y.: Variability of the mental representation of the cochlear anatomy during cochlear implantation. European Archives of ORL, pp. 1–10 (2015) 2. Chen, B.K., Clark, G.M., Jones, R.: Evaluation of trajectories and contact pressures for the straight nucleus cochlear implant electrode arraya two-dimensional application of finite element analysis. Med. Eng. Phys. 25(2), 141–147 (2003) 3. Todd, C.A., Naghdy, F.: Real-time haptic modeling and simulation for prosthetic insertion, vol. 73, pp. 343–351 (2011) 4. Ni, G., Elliott, S.J., Ayat, M., Teal, P.D.: Modelling cochlear mechanics. BioMed. Res. Int. 2014, 42 p. (2014). Article ID 150637, doi:http://dx.doi.org/10.1155/2014/ 150637 5. Kha, H.N., Chen, B.K., Clark, G.M., Jones, R.: Stiffness properties for nucleus standard straight and contour electrode arrays. Med. Eng. Phys. 26(8), 677–685 (2004) 6. Comas, O., Cotin, S., Duriez, C.: A shell model for real-time simulation of intraocular implant deployment. In: Bello, F., Cotin, S. (eds.) ISBMS 2010. LNCS, vol. 5958, pp. 160–170. Springer, Heidelberg (2010) 7. Johnson, D., Willemsen, P.: Six degree-of-freedom haptic rendering of complex polygonal models. In: Haptic Interfaces for Virtual Environment and Teleoperator Systems, HAPTICS 2003, pp. 229–235. IEEE (2003) 8. Tykocinski, M., Saunders, E., Cohen, L., Treaba, C., Briggs, R., Gibson, P., Clark, G., Cowan, R.: The contour electrode array: safety study and initial patient trials of a new perimodiolar design. Otol. Neurotol. 22(1), 33–41 (2001) 9. Kha, H.N., Chen, B.K.: Determination of frictional conditions between electrode array and endosteum lining for use in cochlear implant models. J. Biomech. 39(9), 1752–1756 (2006)

Meaningful Assessment of Surgical Expertise: Semantic Labeling with Data and Crowds Marzieh Ershad1 , Zachary Koesters1 , Robert Rege2 , and Ann Majewicz1,2(B) 1

The University of Texas at Dallas, Richardson, TX, USA [email protected] 2 UT Southwestern Medical Center, Dallas, TX, USA http://www.utdallas.edu/hero/

Abstract. Many surgical assessment metrics have been developed to identify and rank surgical expertise; however, some of these metrics (e.g., economy of motion) can be difficult to understand and do not coach the user on how to modify behavior. We aim to standardize assessment language by identifying key semantic labels for expertise. We chose six pairs of contrasting adjectives and associated a metric with each pair (e.g., fluid/viscous correlated to variability in angular velocity). In a user study, we measured quantitative data (e.g., limb accelerations, skin conductivity, and muscle activity), for subjects (n = 3, novice to expert) performing tasks on a robotic surgical simulator. Task and posture videos were recorded for each repetition and crowd-workers labeled the videos by selecting one word from each pair. The expert was assigned more positive words and also had better quantitative metrics for the majority of the chosen word pairs, showing feasibility for automated coaching. Keywords: Surgical training and evaluation ment · Semantic descriptors

1

·

Crowdsourced assess-

Introduction

A great musician, an all-star athlete, and a highly skilled surgeon share one thing in common: the casual observer can easily recognize their expertise, simply by observing their movements. These movements, or rather, the appearance of the expert in action, can often be described by words such as fluid, effortless, swift, and decisive. Given that our understanding of expertise is so innate and engrained in our vocabulary, we seek to develop a lexicon of surgical expertise through combined data analysis (e.g., user movements and physiological response) and crowd-sourced labeling [1,2]. In recent years, the field of data-driven identification of surgical skill has grown significantly. Methods now exist to accurately classify expert vs. novice users based on motion analysis [3], eye tracking [4], and theories from motor control literature [5], to name a few. Additionally, it is also possible to rank several users in terms of expertise through pairwise comparisons of surgical videos [2]. While all these methods present novel ways for determining and ranking expertise, an open question remains: how can observed skill deficiencies translate into c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 508–515, 2016. DOI: 10.1007/978-3-319-46720-7 59

Meaningful Assessment of Surgical Expertise Using Data and Crowds

509

more effective training programs? Leveraging prior work showing the superiority of verbal coaching for training [6], we aim to develop and validate a mechanism for translating conceptually difficult, but quantifiable, differences between novice and expert surgeons (e.g. “more directed graphs on the known state transition diagrams” [7] and “superior exploitation of kinematic redundancy” [5]) into actionable, connotation-based feedback that a novice can understand and employ. The central hypothesis to this study is that human perception of surgical expertise is not so much a careful, rational evaluation, but rather an instinctive, impulsive one. Prior work has proposed that surgical actions, or surgemes (e.g. knot tying, needle grasping, etc.) are ultimately the building blocks surgery [8]. While these semantic labels describe the procedural flow of a surgical task, we believe that surgical skill identification is more fundamental than how to push a needle. It is about the quality of movement that one can observe from a short snapshot of data. Are the movements smooth? Do they look fluid? Does the operator seem natural during the task? The hypothesis that expertise is a universal, instinctive assessment is supported by recent work in which crowd-workers from the general population identified surgical expertise with high accuracy [1]. Thus, the key to developing effective training strategies is to translate movement qualities into universally understandable, intuitive, semantic descriptors.

2

Semantic Descriptors of Expertise

Inspired by studies showing the benefit of verbal coaching for surgical training [6], we selected a set of semantic labels which could be used to both describe surgical expertise and coach a novice during training. The choice of adjectives was informed by metrics commonly found in the literature (e.g., time, jerk, acceleration), as well as through discussions with surgeon educators and training staff. As this is a preliminary study, we selected adjective pairs that were commonly used and also had some logical data metric that could be associated with them. For example, “crisp/jittery” can be matched with a jerk measurement, and “relaxed/tense” can be matched to some metric from electromyography (EMG) recordings. Galvanic skin response (GSR) is another physiological measurement Table 1. Semantic labeling lexicon Positive adjective Positive metric

Negative metric

Negative adjective

Crisp

Low mean jerk

Jittery

High mean jerk

Fluid

Low ang. velocity var High ang. velocity var Viscous

Smooth

Low acceleration var

Swift

Short completion time Long completion time Sluggish

Relaxed

Low normalized EMG High normalized EMG Tense

Calm

Low GSR event count High GSR event count Anxious

High acceleration var

Rough

510

M. Ershad et al.

that is useful for counting stressful events, which correlate to increased anxiety [9], thus serving as a basis for a “calm/anxious” word pair. The choice of word pairs and the corresponding metric is not unique; however, for the purpose of this paper, we simply aim to determine whether or not these word pairs have some relevance in terms of surgical skill evaluation. The six preliminary word pairs chosen and their corresponding data metrics are listed in Table 1.

3

Experimental Setup and Methods

Our hypothesis is that crowd-chosen words, which simultaneously correlate to expertise level and measurable data metrics, will be good choices for an automated coaching system. Many studies have recently investigated the effectiveness of crowd-sourcing for the assessment of technical skills and have shown correlations to expert evaluations [1,10–12]. These studies support the hypothesis that the identification of expertise is somewhat instinctive, regardless of whether or not the evaluator is a topic-expert in his or her evaluation area. The goal of this portion of our study is to see if the crowd can identify which of the chosen words for expertise are most important or relevant for surgical skill. Therefore, we conducted an experimental evaluation of our word pairs and metrics through a human subjects study, using the da Vinci Surgical Simulator (on loan from Intuitive Surgical, Sunnyvale, CA). Users were outfitted with a variety of sensors used to collect metric data while performing tasks on the simulator. Video recordings of the user performing tasks were used for crowd sourced identification of relevant semantic descriptors. 3.1

Data Collection System

To quantify task movements and physiological response for our semantic label metrics, we chose to measure joint positions (elbow, wrist, shoulder), limb accelerations (hand, forearms), forearm muscle activity with EMG, and GSR. Joint positions were recorded using an electromagnetic tracker (trakSTAR, Model 180 sensors, Northern Digital Inc., Ontario, Canada) with an elbow estimation method as described in [5]. Limb accelerations, EMG and GSR were measured using sensor units from Shimmer Sensing, Inc. (Dublin, Ireland). Several muscles were selected from EMG measurement including bilateral (left and right) extensors, and a flexor on the left arm, which are important for wrist rotation, as well as the abductor pollicus, which is important for pinch grasping with the thumb [13]. These muscles were recommended by a surgeon educator. We also recorded videos of the user posture and simulated surgical training task with CCD cameras (USB 3.0, Point Grey, Richmond, Canada). The Robot Operating System (ROS) was used to synchronize all data collection. The experimental setup and sensor locations are shown in Fig. 1(a,c).

Meaningful Assessment of Surgical Expertise Using Data and Crowds

511

Skills Simulator

Limb Inertial Measurement Units with EMG + GSR

Electromagnetic Joint Position Tracking

(a) Human Subject Trial EMG

EMG

Right Extenso r

EMG

Abductor Pollicis

(b) Ring and Rail Task IMU Left Foot

Bilateral Flexors

IMU Bilateral Forearms Left Hand

GSR

(c) Shimmer Sensor Placement

(d) Suturing Task

Fig. 1. Experimental setup and sensor positioning.

3.2

Simulated Surgical Tasks and Human Subject Study

The simulated surgical tasks chosen for this study were used to evaluate endowrist manipulation and needle control and driving skills (Fig. 1(a,c)). Endowrist instruments provide surgeons with range of motions greater than a human hand, thus, these simulated tasks evaluates the subject’s ability to manipulate these instruments. The needle driving task evaluates the subject’s ability to effectively hand off and position needles for different types of suture throws (forehand and backhand) and while using different hands (left and right). Three subjects were recruited to participate in this study, approved by both UTD and UTSW IRB offices (UTD #14-57, UTSW #STU 032015-053). The subjects (right handed, 25–45 years old) consisted of: An expert (+6 years clinical robotic cases), an intermediate (PGY-4 surgical resident) and a novice (PGY-1 surgical resident). All subjects had limited to no training using the da Vinci simulator; however, the expert and intermediate had exposure to the da Vinci clinical robots. All subjects first conducted two non-recorded warm up tasks (i.e., Matchboard 3 for endowrist manipulation warm up and Suture Sponge 2 for needle driving warmup). After training, the subjects then underwent baseline data collection including arm measurements, and maximum voluntary isometric muscle contractions (MVIC) for normalization and cross-subject comparison [14]. Subjects then conducted the recorded experimental tasks for endowrist manipulation (Ring and Rail 2) and needle driving (Suture Sponge 3). For the purposes of data analysis, each task was subdivided into three repeated trials, corresponding to a single pass of a different colored ring (i.e., red, blue or yellow), or two consecutive suture throws.

512

3.3

M. Ershad et al.

Crowd-Worker Recruitment and Tasks

For each trial, side-by-side, time-synchronized videos of the simulated surgical task and user posture were posted on Amazon Mechanical Turk. The videos ranged in length from 15 s to 3 min and 40 s. Anonymous crowd-workers (n = 547) were recruited to label the videos using one from each of the six contrasting adjectives pairs (Fig. 4(a)). Crowd-workers received $0.10 for each video and were not allowed to evaluate the same video more than once. 3.4

Data Analysis Methods

For each word pair, many options exist for correlation to a desired metric (e.g., sensor location, muscle type, summary static etc.). In this paper, we selected metrics based on logical reasoning and feedback from surgical collaborators. To measure crisp vs. jittery hand movement, we calculated the standard deviation of the trial average mean value of jerk from the inertial measurement unit (IMU) mounted on the subject‘s right hand. Similarly, fluid/viscous was measured by variability in angular velocity of the same IMU. Smooth/rough was measured by the variability in acceleration magnitude of an IMU mounted on the right forearm. The acceleration magnitude was calculated for each time-sampled x, y, and z accelerations to eliminate the effects due to gravity [15]. To measure the calmness vs. anxiousness of the user, the GSR signal was processed using the Ledalab EDA data analysis toolbox in Matlab to count stressful events [16]. Mean EMG levels were recorded through electrodes placed on the forearm extensor as a measure of relaxedness vs. tenseness of the subject. In order to compare EMG levels between different subjects, these signals were normalized using the maximum of three repeated EMG signals during a maximal voluntary isometric contraction (MVIC) for each muscle. All EMG signals were high pass filtered using a fourth order Butterworth filter with a 20 Hz cut-off frequency to remove motion artifacts and were detrended, rectified, and smoothed [14]. The EM tracker data was used to visualize user movements and was not correlated to any word pairs. Finally, a Pearson’s R correlation was used to compare the crowd-worker results and data metrics for each word pair.

4

Results and Discussion

The trajectories for each subjects wrist movements as measured by the EM tracker are shown in Fig. 2. As expected, the expert movements are tighter and smoother than the intermediate and novice. Figure 3 compares the mean and standard deviation of each chosen metric through all trials among three subjects. Of the 547 crowd-workers recruited, 7 jobs were rejected due to incomplete labeling assignments, resulting in 30 complete jobs for each of the 18 videos posted. The results of the analysis can be seen in Fig. 4(b). An ANOVA analysis was conducted to identify significant groups in terms of expertise level, type of task, and repetition for the data metrics, as well as crowd sourced data. Additionally, the crowd sourced data was evaluated for significant differences in terms

Meaningful Assessment of Surgical Expertise Using Data and Crowds 0.25

0.01

0.4

0.2

0.005

0.3

0

0.15 0.1

z (m)

z (m)

0.2

z (m)

513

0.1

0.05

-0.005 -0.01 -0.015

0

0 -0.05 -0.2

-0.02 -0.025 -0.05

-0.1 0.1 -0.1 0

x (m)

0.1

-0.1

0

-0.05

0.05

0

0.1

-0.1

x (m)

y (m)

-0.2

0.15

0.1

0.05

-0.05

0

0

-0.1

0.05 0.1

x (m)

y (m)

(b) Intermediate

(a) Novice

-0.01

0.02

0.01

0

0.03

y (m)

(c) Expert

Fig. 2. Wrist trajectory of subjects performing Ring and Rail 2 (red ring)

More Fluidity

10

5 Intermediate

0.25

More Rough 0.1 0.05

Intermediate

40 38 More Crisp

36 34

Expert

Novice

100 More Sluggish 50

0.45

Ring & Rail Novice Suture Sponge Novice Ring & Rail Intermediate Suture Sponge Intermediate Ring & Rail Expert Suture Sponge Expert

350 300 250 200

More Anxiety

150 100 50

0 Intermediate

Expert

0 Novice

Intermediate

(e)

Intermediate

Expert

(c)

Anxious (vs. Calm)

400

Number of GSR Events

Completion Time (sec)

42

30

450

150

(d)

44

(b)

Ring & Rail Novice Suture Sponge Novice Ring & Rail Intermediate Suture Sponge Intermediate Ring & Rail Expert Suture Sponge Expert

Novice

46

Ring & Rail Novice Suture Sponge Novice Ring & Rail Intermediate Suture Sponge Intermediate Ring & Rail Expert Suture Sponge Expert

32

0 Novice

Sluggish (vs. Swift)

200

)

0.15

(a) 250

48

0.2

Expert

Crisp (vs. Jittery)

50

Ring & Rail Novice Suture Sponge Novice Ring & Rail Intermediate Suture Sponge Intermediate Ring & Rail Expert Suture Sponge Expert

0.3

Expert

Normalized Mean EMG Activation (%MVIC)

Novice

Rough (vs. Smooth)

0.35

3

) 2

15

Forearm Acceleration Variability (mm/s

Wrist Angular Velocity Variability (rad/sec)

Ring & Rail Novice Suture Sponge Novice Ring & Rail Intermediate Suture Sponge Intermediate Ring & Rail Expert Suture Sponge Expert

Mean Hand Jerk (mm/s

Fluid (vs. Viscous)

20

0.4 0.35 0.3

Tense (vs. Relaxed) Left Arm Extensor Ring & Rail Novice Suture Sponge Novice Ring & Rail Intermediate Suture Sponge Intermediate Ring & Rail Expert Suture Sponge Expert

0.25 0.2 More Tense

0.15 0.1 0.05 0 Novice

Intermediate

Expert

(f)

Fig. 3. Mean and standard deviation of all metrics for all trials and subjects.

of word assignment rates. Table 2 summarizes the significant statistical results (p ≤ 0.05), significant groups (post-hoc Scheffe test), and the correlation between the crowd ratings and data metrics. For nearly all metrics, there was no significant effect due to task or repetition. The expert exhibited better performance on all metrics, with the exception of the smooth/rough and relaxed/tense. This could be due to a poor choice of metrics, or data collection errors with the expert EMG signal or baseline. The crowd assigned significantly better semantic labels to the expert, then the intermediate and novice. Additionally, the crowd rated the ring and rail tasks with significantly lower ratings than the suturing task, and evaluated the second repetition across all subjects as worse than the first and last. The magnitude of the data metric to crowd rating correlation ranged from 0.25 to 0.99. The best correlated metric to word-pair was swift/sluggish and the worst correlated metric was smooth/rough followed by crisp/jittery.

514

M. Ershad et al. Table 2. Statistical analysis summary

Source

Metric/crowd Subject (E, I, N) correlation p

Task (RR, SS)

Repetition (1–3)

Significance p

Significance p

0.82

0.0005

E > I &N

RR > SS

Smooth/rough −0.25

0.0001

I < E &N

0.1240

n/a

0.3366 n/a

Crisp/jittery

0.073

n/a

0.7521

n/a

0.9128 n/a

Fluid/viscous

0.63

Calm/anxious −0.98

0.0374

Significance

0.1134 n/a

0.035

E SS

0.6291 n/a

Swift/sluggish −0.99

0.0028

EI>N

0.01). ADMTP found trajectories that were safer, in terms of reduced risk score and increased distance to the closest critical structure in 145/186 trajectories (p < 0.01).

548

R. Sparks et al.

(a)

(b)

Fig. 4. Manual (pink) and ADMTP (blue) trajectories are shown with veins (cyan), skull (opaque white), and with (a) the cortex (peach) and (b) no cortex.

3.3

Implantation Plan Suitability

Suitability of implantation plans were assessed by (1) distance between trajectories and (2) the ratio of unique gyri sampled to total number of electrodes. Ideally each electrode samples a unique gyrus, corresponding to a ratio of 1. ADMTP has an median distance between trajectories of 35.5 mm (5.2–124.2 mm) compared to manual plans with an median of 34.2 mm (1.3–117.5 mm). Manual plans and ADMTP have 12 trajectory pairs separated by less than dtraj (5 min) safe trajectories. Acknowledgments. This publication represents in part independent research commissioned by the Health Innovation Challenge Fund (HICF-T4-275, WT097914, WT106882), a parallel funding partnership between the Wellcome Trust and the Department of Health, and the National Institute for Health Research University College London Hospitals Biomedical Research Centre (NIHR BRC UCLH/UCL High Impact Initiative). This work was undertaken at University College London Hospitals, which received a proportion of funding from the Department of Health’s NNIHR BRC funding scheme. The views expressed in this publication are those of the authors and not necessarily those of the Wellcome Trust or NIHR.

References 1. Beare, R., Lehmann, G.: Finding regional extrema - methods and performance (2005). http://hdl.handle.net/1926/153 2. B´eriault, S., Subaie, F.A., Collins, D.L., Sadikot, A.F., Pike, G.B.: A multi-modal approach to computer-assisted deep brain stimulation trajectory planning. IJCARS 7(5), 687–704 (2012) 3. Cardoso, M.J., Modat, M., Wolz, R., Melbourne, A., Cash, D., Rueckert, D., Ourselin, S.: Geodesic information flows: Spatially-variant graphs and their application to segmentation and fusion. IEEE TMI 34(9), 1976–1988 (2015) 4. De Momi, E., Caborni, C., Cardinale, F., Casaceli, G., Castana, L., Cossu, M., Mai, R., Gozzo, F., Francione, S., Tassi, L., Lo Russo, G., Antiga, L., Ferrigno, G.: Multi-trajectories automatic planner for StereoElectroEncephaloGraphy (SEEG). IJCARS, 1–11 (2014) 5. Essert, C., Haegelen, C., Lalys, F., Abadie, A., Jannin, P.: Automatic computation of electrode trajectories for deep brain stimulation: a hybrid symbolic and numerical approach. IJCARS 7(4), 517–532 (2012) 6. Shamir, R.R., Joskowicz, L., Tamir, I., Dabool, E., Pertman, L., Ben-Ami, A., Shoshan, Y.: Reduced risk trajectory planning in image-guided keyhole neurosurgery. Med. Phys. 39(5), 2885–2895 (2012) 7. Zelmann, R., Beriault, S., Marinho, M.M., Mok, K., Hall, J.A., Guizard, N., Haegelen, C., Olivier, A., Pike, G.B., Collins, D.L.: Improving recorded volume in mesial temporal lobe by optimizing stereotactic intracranial electrode implantation planning. IJCARS 10(10), 1599–1615 (2015)

550

R. Sparks et al.

8. Zombori, G., Rodionov, R., Nowell, M., Zuluaga, M.A., Clarkson, M.J., Micallef, C., Diehl, B., Wehner, T., Miserochi, A., McEvoy, A.W., Duncan, J.S., Ourselin, S.: A computer assisted planning system for the placement of sEEG electrodes in the treatment of epilepsy. In: Stoyanov, D., Collins, D.L., Sakuma, I., Abolmaesumi, P., Jannin, P. (eds.) IPCAI 2014. LNCS, vol. 8498, pp. 118–127. Springer, Heidelberg (2014) 9. Zuluaga, M.A., Rodionov, R., Nowell, M., Achhala, S., Zombori, G., Mendelson, A.F., Cardoso, M.J., Miserocchi, A., McEvoy, A.W., Duncan, J.S., Ourselin, S.: Stability, structure and scale: improvements in multi-modal vessel extraction for seeg trajectory planning. IJCARS 10(8), 1227–1237 (2015)

Recognizing Surgical Activities with Recurrent Neural Networks Robert DiPietro1(B) , Colin Lea1 , Anand Malpani1 , Narges Ahmidi1 , S. Swaroop Vedula1 , Gyusung I. Lee2 , Mija R. Lee2 , and Gregory D. Hager1 1

Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA [email protected] 2 Department of Surgery, Johns Hopkins University, Baltimore, MD, USA

Abstract. We apply recurrent neural networks to the task of recognizing surgical activities from robot kinematics. Prior work in this area focuses on recognizing short, low-level activities, or gestures, and has been based on variants of hidden Markov models and conditional random fields. In contrast, we work on recognizing both gestures and longer, higher-level activites, or maneuvers, and we model the mapping from kinematics to gestures/maneuvers with recurrent neural networks. To our knowledge, we are the first to apply recurrent neural networks to this task. Using a single model and a single set of hyperparameters, we match state-of-the-art performance for gesture recognition and advance state-of-the-art performance for maneuver recognition, in terms of both accuracy and edit distance. Code is available at https://github.com/ rdipietro/miccai-2016-surgical-activity-rec.

1

Introduction

Automated surgical-activity recognition is a valuable precursor for higher-level goals such as objective surgical-skill assessment and for providing targeted feedback to trainees. Previous research on automated surgical-activity recognition has focused on gestures within a surgical task [9,10,13,15]. Gestures are atomic segments of activity that typically last for a few seconds, such as grasping a needle. In contrast, maneuvers are composed of a sequence of gestures and represent higher-level segments of activity, such as tying a knot. We believe that targeted feedback for maneuvers is meaningful and consistent with the subjective feedback that faculty surgeons currently provide to trainees. Here we focus on jointly segmenting and classifying surgical activities. Other work in this area has focused on variants of hidden Markov models (HMMs) and conditional random fields (CRFs) [9,10,13,15]. HMM and CRF based methods often define unary (label-input) and pairwise (label-label) energy terms, and during inference find a global label configuration that minimizes overall energy. Here we put emphasis on the unary terms and note that defining unaries that are both general and meaningful is a difficult task. For example, of the works above, the unaries of [10] are perhaps most general: they are computed using c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 551–558, 2016. DOI: 10.1007/978-3-319-46720-7 64

552

R. DiPietro et al.

Fig. 1. Example images from the JIGSAWS and MISTIC datasets.

learned convolutional filters. However, we note that even these unaries depend only on inputs from fairly local neighborhoods in time. In this work, we use recurrent neural networks (RNNs), and in particular long short-term memory (LSTM), to map kinematics to labels. Rather than operating only on local neighborhoods in time, LSTM maintains a memory cell and learns when to write to memory, when to reset memory, and when to read from memory, forming unaries that in principle depend on all inputs. In fact, we will rely only on these unary terms, or in other words assume that labels are independent given the sequence of kinematics. Despite this, we will see that predicted labels are smooth over time with no post-processing. Further, using a single model and a single set of hyperparameters, we match state-of-the-art performance for gesture recognition and improve over state-of-the-art performance for maneuver recognition, in terms of both accuracy and edit distance.

2

Methods

The goal of this work is to use nx kinematic signals over time to label every time step with one of ny surgical activities. An individual sequence of length T is composed of kinematic inputs {xt }, with each xt ∈ Rnx , and a collection of one-hot encoded activity labels {yt }, with each yt ∈ {0, 1}ny . (For example, if we have classes 1, 2, and 3, then the one-hot encoding of label 2 is (0, 1, 0)T .) We aim to learn a mapping from {xt } to {yt } in a supervised fashion that generalizes to users that were absent from the training set. In this work, we use recurrent neural networks to discriminatively model p(yt |x1 , x2 , . . . , xt ) for all t when operating online and p(yt |x1 , x2 , . . . , xT ) for all t when operating offline. 2.1

Recurrent Neural Networks

Though not yet as ubiquitous as their feedforward counterparts, RNNs have been applied successfully to many diverse sequence-modeling tasks, from textto-handwriting generation [6] to machine translation [14]. ˜ t, A generic RNN is shown in Fig. 2a. An RNN maintains a hidden state h ˜ and at each time step t, the nonlinear block uses the previous hidden state ht−1 ˜ t and an output m and the current input xt to produce a new hidden state h ˜ t.

Recognizing Surgical Activities with Recurrent Neural Networks

˜0 h

m ˜2 ˜1 h

x1

m ˜3 ˜2 h

x2

ht−1

···

b

Wx

tanh

ht

xt

x3

(a) A recurrent neural network.

Wh

+

m ˜1

553

(b) A vanilla RNN block.

Fig. 2. A recurrent neural network.

If we use the nonlinear block shown in Fig. 2b, we end up with a specific and simple model: a vanilla RNN with one hidden layer. The recursive equation for a vanilla RNN, which can be read off precisely from Fig. 2b, is ht = tanh(Wx xt + Wh ht−1 + b)

(1)

Here, Wx , Wh , and b are free parameters that are shared over time. For the ˜ t = ht . The height of ht is a hyperparameter and vanilla RNN, we have m ˜t = h is referred to as the number of hidden units. In the case of multiclass classification, we use a linear layer to transform m ˜ t to appropriate size ny and apply a softmax to obtain a vector of class probabilities: yˆt = softmax(Wym m ˜ t + by )

(2)

p(ytk = 1 | x1 , x2 , . . . , xt ) = yˆtk (3)  where softmax(x) = exp(x)/ i exp(xi ). RNNs traditionally propagate information forward in time, forming predictions using only past and present inputs. Bidirectional RNNs [12] can improve performance when operating offline by using future inputs as well. This essentially consists of running one RNN in the forward direction and one RNN in the backward direction, concatenating hidden states, and computing outputs jointly. 2.2

Long Short-Term Memory

Vanilla RNNs are very difficult to train because of what is known as the vanishing gradient problem [1]. LSTM [8] was specifically designed to overcome this problem and has since become one of the most widely-used RNN architectures. The recursive equations for the LSTM block used in this work are x ˜t = tanh(Wx˜x xt + Wx˜m mt−1 + bx˜ )

(4)

it = σ(Wix xt + Wim mt−1 + Wic ct−1 + bi ) ft = σ(Wf x xt + Wf m mt−1 + Wf c ct−1 + bf )

(5) (6)

ct = it ⊙ x ˜t + ft ⊙ ct−1 ot = σ(Wox xt + Wom mt−1 + Woc ct + bo )

(7) (8)

mt = ot ⊙ tanh(ct )

(9)

554

R. DiPietro et al.

where ⊙ represents element-wise multiplication and σ(x) = 1/(1 + exp(−x)). All matrices W and all biases b are free parameters that are shared across time. LSTM maintains a memory over time and learns when to write to memory, when to reset memory, and when to read from memory [5]. In the context of ˜ t is the concatenation of ct and mt . ct is the the generic RNN, m ˜ t = mt , and h memory cell and is updated at each time step to be a linear combination of x ˜t and ct−1 , with proportions governed by the input gate it and the forget gate ft . mt , the output, is a nonlinear version of ct that is filtered by the output gate ot . Note that all elements of the gates it , ft , and ot lie between 0 and 1. This version of LSTM, unlike the original, has forget gates and peephole connections, which let the input, forget, and output gates depend on the memory cell. Forget gates are a standard part of modern LSTM [7], and we include peephole connections because they have been found to improve performance when precise timing is required [4]. All weight matrices are full except the peephole matrices Wic , Wf c , and Woc , which by convention are restricted to be diagonal. Loss. Because we assume every yt is independent of all other yt′ given x1 , . . . , xt , maximizing the log likelihood of our data is equivalent to minimizing the overall yt }. The cross entropy between the true labels {yt } and the predicted labels {ˆ global loss for an individual sequence is therefore   lseq ({yt }, {ˆ yt }) = lt (yt , yˆt ) with lt (yt , yˆt ) = − ytk log yˆtk t

k

Training. All experiments in this paper use standard stochastic gradient descent to minimize loss. Although the loss is non-convex, it has repeatedly been observed empirically that ending up in a poor local optimum is unlikely. Gradients can be obtained efficiently using backpropagation [11]. In practice, one can build a computation graph out of fundamental operations, each with known local gradients, and then apply the chain rule to compute overall gradients with respect to all free parameters. Frameworks such as Theano and Google TensorFlow let the user specify these computation graphs symbolically and alleviate the user from computing overall gradients manually. Once gradients are obtained for a particular free parameter p, we take a small step in the direction opposite to that of the gradient: with η being the learning rate,  ∂lt ∂lseq ∂lseq with = p′ = p − η ∂p ∂p ∂p t

3 3.1

Experiments Datasets

The JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) [2] is a public benchmark surgical activity dataset recorded using the da Vinci. JIGSAWS contains synchronized video and kinematic data from a standard 4-throw

Recognizing Surgical Activities with Recurrent Neural Networks

555

suturing task performed by eight subjects with varying skill levels. All subjects performed about 5 trials, resulting in a total of 39 trials. We use the same measurements and activity labels as the current state-of-the-art method [10]. Measurements are position (x, y, z), velocity (vx , vy , vz ), and gripper angle (θ) for each of the left and right slave manipulators, and the surgical activity at each time step is one of ten different gestures. The Minimally Invasive Surgical Training and Innovation Center - Science of Learning (MISTIC-SL) dataset, also recorded using the da Vinci, includes 49 right-handed trials performed by 15 surgeons with varying skill levels. We follow [3] and use a subset of 39 right-handed trials for all experiments. All trials consist of a suture throw followed by a surgeon’s knot, eight more suture throws, and another surgeon’s knot. We used the same kinematic measurements as for JIGSAWS, and the surgical activity at each time step is one of 4 maneuvers: suture throw (ST), knot tying (KT), grasp pull run suture (GPRS), and intermaneuver segment (IMS). It is not possible for us to release this dataset at this time, though we hope we will be able to release it in the future. 3.2

Experimental Setup

JIGSAWS has a standardized leave-one-user-out evaluation setup: for the i-th run, train using all users except i and test on user i. All results in this paper are averaged over the 8 runs, one per user. We follow the same strategy for MISTIC-SL, averaging over 11 runs, one for each user that does not appear in the validation set, as explained below. We include accuracy and edit distance (Levenshtein distance) as performance metrics. Accuracy is the percentage of correctly-classified frames, measuring performance without taking temporal consistency into account. In contrast, edit distance is the number of operations needed to transform predicted segment-level labels into ground-truth segment-level labels, here normalized for each dataset using the maximum number (over all sequences) of segment-level labels. 3.3

Hyperparameter Selection and Training

Here we include the most relevant details regarding hyperparameter selection and training; other details are fully specified in code, available at https://github. com/rdipietro/miccai-2016-surgical-activity-rec. For each run we train for a total of approximately 80 epochs, maintaining a learning rate of 1.0 for the first 40 epochs and then halving the learning rate every 5 epochs for the rest of training. Using a small batch size is important; we found that otherwise the lack of stochasticity let us converge to bad local optima. We use a batch size of 5 sequences for all experiments. Because JIGSAWS has a fixed leave-one-user-out test setup, with all users appearing in the test set exactly once, it is not possible to use JIGSAWS for hyperparameter selection without inadvertently training on the test set. We therefore choose all hyperparameters using a small MISTIC-SL validation set consisting of 4 users (those with only one trial each), and we use the resulting hyperparameters for both JIGSAWS experiments and MISTIC-SL experiments.

556

R. DiPietro et al.

We performed a grid search over the number of RNN hidden layers (1 or 2), the number of hidden units per layer (64, 128, 256, 512, or 1024), and whether dropout [16] is used (with p = 0.5). 1 hidden layer of 1024 units, with dropout, resulted in the lowest edit distance and simultaneously yielded high accuracy. These hyperparameters were used for all experiments. Using a modern GPU, training takes about 1 h for any particular JIGSAWS run and about 10 h for any particular MISTIC-SL run (MISTIC-SL sequences are approximately 10x longer than JIGSAWS sequences). We note, however, that RNN inference is fast, with a running time that scales linearly with sequence length. At test time, it took the bidirectional RNN approximately 1 s of compute time per minute of sequence (300 time steps). 3.4

Results

Table 1 shows results for both JIGSAWS (gesture recognition) and MISTICSL (maneuver recognition). A forward LSTM and a bidirectional LSTM are compared to the Markov/semi-Markov conditional random field (MsM-CRF), Shared Discriminative Sparse Dictionary Learning (SDSDL), Skip-Chain CRF (SC-CRF), and Latent-Convolutional Skip-Chain CRF (LC-SC-CRF). We note that the LC-SC-CRF results were computed by the original author, using the same MISTIC-SL validation set for hyperparameter selection. We include standard deviations where possible, though we note that they largely describe the user-to-user variations in the datasets. (Some users are exceptionally challenging, regardless of the method.) We also carried out statisticalsignificance testing using a paired-sample permutation test (p-value of 0.05). This test suggests that the accuracy and edit-distance differences between the bidirectional LSTM and LC-SC-CRF are insignificant in the case of JIGSAWS but are significant in the case of MISTIC-SL. We also remark that even the forward LSTM is competitive here, despite being the only algorithm that can run online. Qualitative results are shown in Fig. 3 for the trials with highest, median, and lowest accuracies for each dataset. We note that the predicted label sequences are smooth, despite the fact that we assumed that labels are independent given the sequence of kinematics. Table 1. Quantitative results and comparisons to prior work. JIGSAWS MISTIC-SL Accuracy (%) Edit dist. (%) Accuracy (%) Edit dist. (%) MsM-CRF [15]

72.6







SDSDL [13]

78.7







SC-CRF [9]

80.3







LC-SC-CRF [10] 82.5 ± 5.4

14.8 ± 9.4

81.7 ± 6.2

29.7 ± 6.8

Forward LSTM

80.5 ± 6.2

19.8 ± 8.7

87.8 ± 3.7

33.9 ± 13.3

Bidir. LSTM

83.3 ± 5.7

14.6 ± 9.6

89.5 ± 4.0

19.5 ± 5.2

Recognizing Surgical Activities with Recurrent Neural Networks

557

Fig. 3. Qualitative results for JIGSAWS (top) and MISTIC-SL (bottom) using a bidirectional LSTM. For each dataset, we show results from the trials with highest accuracy (top), median accuracy (middle), and lowest accuracy (bottom). In all cases, ground truth is displayed above predictions.

4

Summary

In this work we performed joint segmentation and classification of surgical activities from robot kinematics. Unlike prior work, we focused on high-level maneuver prediction in addition to low-level gesture prediction, and we modeled the mapping from inputs to labels with recurrent neural networks instead of with HMM or CRF based methods. Using a single model and a single set of hyperparameters, we matched state-of-the-art performance for JIGSAWS (gesture recognition) and advanced state-of-the-art performance for MISTIC-SL (maneuver recognition), in the latter case increasing accuracy from 81.7 % to 89.5 % and decreasing normalized edit distance from 29.7 % to 19.5 %.

References 1. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994) 2. Gao, Y., Vedula, S.S., Reiley, C.E., Ahmidi, N., Varadarajan, B., Lin, H.C., Tao, L., Zappella, L., Bejar, B., Yuh, D.D., Chen, C.C.G., Vidal, R., Khudanpur, S., Hager, G.D.: Language of surgery: a surgical gesture dataset for human motion modeling. In: Modeling and Monitoring of Computer Assisted Interventions (M2CAI) 2014. Springer, Boston, USA (2014) 3. Gao, Y., Vedula, S., Lee, G.I., Lee, M.R., Khudanpur, S., Hager, G.D.: Unsupervised surgical data alignment with application to automatic activity annotation. In: 2016 IEEE International Conference on Robotics and Automation (ICRA) (2016)

558

R. DiPietro et al.

4. Gers, F.A., Schmidhuber, J.: Recurrent nets that time and count. In: IEEE Conference on Neural Networks, vol. 3 (2000) 5. Graves, A.: Supervised Sequence Labelling. Springer, Heidelberg (2012) 6. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013) 7. Greff, K., Srivastava, R.K., Koutn´ık, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: A search space odyssey. arXiv preprint arXiv:1503.04069 (2015) 8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 9. Lea, C., Hager, G.D., Vidal, R.: An improved model for segmentation and recognition of fine-grained activities with application to surgical training tasks. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1123– 1129. IEEE (2015) 10. Lea, C., Vidal, R., Hager, G.D.: Learning convolutional action primitives for finegrained action recognition. In: 2016 IEEE International Conference on Robotics and Automation (ICRA) (2016) 11. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by backpropagating errors. Cogn. Model. 5(3), 1 (1988) 12. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997) 13. Sefati, S., Cowan, N.J., Vidal, R.: Learning shared, discriminative dictionaries for surgical gesture segmentation and classification. In: Modeling and Monitoring of Computer Assisted Interventions (M2CAI) 2015. Springer, Heidelberg (2015) 14. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems (2014) 15. Tao, L., Zappella, L., Hager, G.D., Vidal, R.: Surgical gesture segmentation and recognition. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013, Part III. LNCS, vol. 8151, pp. 339–346. Springer, Heidelberg (2013) 16. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014)

Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction Accuracy for Orthognathic Surgery Daeseung Kim1, Chien-Ming Chang1, Dennis Chun-Yu Ho1, Xiaoyan Zhang1, Shunyao Shen1, Peng Yuan1, Huaming Mai1, Guangming Zhang2, Xiaobo Zhou2, Jaime Gateno1,3, Michael A.K. Liebschner4, and James J. Xia1,3(&)

3

1 Department of Oral and Maxillofacial Surgery, Houston Methodist Research Institute, Houston, TX, USA [email protected] 2 Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC, USA Department of Surgery, Weill Medical College, Cornell University, New York, NY, USA 4 Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA

Abstract. It is clinically important to accurately predict facial soft tissue changes prior to orthognathic surgery. However, the current simulation methods are problematic, especially in clinically critical regions. We developed a two-stage finite element method (FEM) simulation model with realistic tissue sliding effects. In the 1st stage, the facial soft-tissue-change following bone movement was simulated using FEM with a simple sliding effect. In the 2nd stage, the tissue sliding effect was improved by reassigning the bone-soft tissue mapping and boundary condition. Our method has been quantitatively and qualitatively evaluated using 30 patient datasets. The two-stage FEM simulation method showed significant accuracy improvement in the whole face and the critical areas (i.e., lips, nose and chin) in comparison with the traditional FEM method.

1 Introduction Facial appearance significantly impacts human social life. Orthognathic surgery is a bone-only surgical procedure to treat patients with dentofacial deformity, in which the deformed jaws are cut into pieces and repositioned to a desired position (osteotomy). Currently, only osteotomies can be accurately planned presurgically. Facial soft tissue changes, a direct result from osteotomies, cannot be accurately predicted due to the complex nature of facial anatomy. Traditionally soft tissue simulation is based on bone-to-soft tissue movement ratios, which have been proven inaccurate. Among the published reports, finite element method (FEM) [1] is reported to be the most common, accurate and biomechanically relevant method [1, 2]. Nonetheless, the predicted results are still less than ideal, especially in nose, lips and chin regions, which are extremely © Springer International Publishing AG 2016 S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 559–567, 2016. DOI: 10.1007/978-3-319-46720-7_65

560

D. Kim et al.

important for orthognathic surgery. Therefore, there is an urgent clinical need to develop a reliable method of accurately predicting facial changes following osteotomies. Traditional FEM for facial soft tissue simulation assumes that the FEM mesh nodes move together with the contacting bone surfaces. However, this assumption can lead to significant errors when a large bone movement and occlusion changes are involved. In human anatomy, cheek and lip mucosa are not directly attached to the bone and teeth; they slide over each other. The traditional FEM does not consider this sliding, which we believe is the main reason for inaccurate prediction in the lips and chin. Implementing the realistic sliding effect into FEM is technically challenging. It requires high computational times and efforts because the sliding mechanism in human mouth is a dynamic interaction between two surfaces. The 2nd challenge is that even if the sliding movement with force constraint is implemented, the simulation results may still be inaccurate, because there is no strict nodal displacement boundary condition applied to the sliding areas. The soft tissues at sliding surfaces follow the buccal surface profile of the bones and teeth. Thus, it is necessary to consider the displacement boundary condition for sliding movement. The 3rd challenge is that the mapping between the bone surface and FEM mesh nodes needs to be reestablished after the bony segments are moved to a desired planned position. This is because the bone and soft tissue relationship is not constant before and after the bone movement, e.g. a setback or advancement surgery may either decrease or increase the soft tissue contacting area to the bones and teeth. This mismatch may lead to the distortion of the resulting mesh. The 4th challenge is that occlusal changes, e.g. from preoperative cross-bite to postoperative Class I (normal) bite, may cause a mesh distortion in the lip region where the upper and lower teeth meet. Therefore, a simulation method with more advanced sliding effects is required to increase the prediction accuracy in critical regions such as the lips and chin. We solved these technical problems. In this study, we developed a two-stage FEM simulation method. In the first stage, the facial soft tissue changes following the bony movements were simulated with an extended sliding boundary condition to overcome the mesh distortion problem in traditional FEM simulations. The nodal force constraint was applied to simulate the sliding effect of the mucosa. In the second stage, nodal displacement boundary conditions were implemented in the sliding areas to accurately reflect the postoperative bone surface geometry. The corresponding nodal displacement for each node was recalculated after reassigning the mapping between the mesh and bone surface in order to achieve a realistic sliding movement. Finally, our simulation method was evaluated quantitatively and qualitatively using 30 sets of preoperative and postoperative patient computed tomography (CT) datasets.

2 Two-Stage FEM Simulation Algorithm Our two-stage approach of simulating facial soft tissue changes following the osteotomies is described below in details. In the 1st stage, a patient-specific FEM model with homogeneous linear elastic material property is generated using a FEM template model (Total of 38280 elements and 48593 nodes) [3]. The facial soft tissue changes are

Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction

561

predicted using FEM with the simple sliding effect of the mucosa around the teeth and partial maxillary and mandibular regions. Only the parallel nodal force is considered on the corresponding areas. In the 2nd stage, explicit boundary conditions are applied to improve the tissue sliding effect by exactly reflecting the bone surface geometry, thus ultimately improving the prediction accuracy.

2.1

The First Stage of FEM Simulation with Simple Sliding Effect

The patient-specific volume mesh is generated from an anatomically detailed FEM template mesh, which was previously developed from a Visible Female dataset [3]. Both inner and outer surfaces of the template mesh are registered to the patient’s skull and facial surfaces respectively using anatomical landmark-based thin-plate splines (TPS) technique. Finally, the total mesh volume is morphed to the patient data by interpolating the surface registration result using TPS again [3]. Although there have been studies investigating optimal tissue properties, the effect of using different linear elastic material properties on the simulation results was negligible [4]. Furthermore, shape deformation patterns are independent of Young’s modulus for isotropic material under displacement boundary conditions as long as the loading that causes the deformation is irrelevant for the study. Therefore, in our study, we assign 3000 (Pa) for Young’s modulus and 0.47 for Poisson’s ratio [4]. Surface nodes of the FEM mesh are divided into the boundary nodes and free nodes (Fig. 1). The displacements of free nodes (GreenBlue in Fig. 1b and c) are determined by the displacements of boundary nodes using FEM. Boundary nodes are further divided into static, moving and sliding nodes. The static nodes do not move in the surgery (red in Fig. 1). Note that the lower posterior regions of the soft tissue mesh (orange in Fig. 1b) are assigned as free nodes in the first stage. This is important because together with the ramus sliding boundary condition, it maintains the soft tissue integrity, flexibility and smoothness in the posterior and inferior mandibular regions when an excessive mandibular advancement or setback occurs.

Fig. 1. Mesh nodal boundary condition. (a) Mesh inner surface boundary condition (illustrated on bones for better understanding) for the 1st stage only; (b) Posterior and superior surface boundary condition for both 1st and 2nd stages; (c) Mesh inner surface boundary condition (illustrated on bones for better understanding) for the 2nd stage only. Static nodes: red, and orange (2nd stage only); Moving nodes: Blue; Sliding nodes: pink; Free nodes: GreenBlue, and orange (1st stage only); Scar tissue: green.

562

D. Kim et al.

The moving nodes on the mesh are the ones moving in sync with the bones (blue in Fig. 1a). The corresponding relationships of the vertices of the STL bone segments to the moving nodes of the mesh are determined by a closest point search algorithm. The movement vector (magnitude and the direction) of each bone segment is then applied to the moving nodes as a nodal displacement boundary condition. In addition, the areas where two bone (proximal and distal) segments collide with each other after the surgical movements are excluded from the moving boundary nodes. These are designated as free nodes to further solve the mesh distortion at the mandibular inferior border. Moreover, scar tissue is considered as a moving boundary (green in Fig. 1a). This is because the soft tissues in these regions are degloved intraoperatively, causing scars postoperatively, which subsequently affects the facial soft tissue geometry. The scar tissue is added onto the corresponding moving nodes by shifting them an additional 2 mm in anterior direction as the displacement boundary condition. In the first stage, the sliding boundary conditions are applied to the sliding nodes (pink in Fig. 1a) of the mouth, including the cheek, lips, and extended to the mesh inner surface corresponding to a partial maxilla and mandible (including partial ramus). The sliding boundary conditions in mucosa area are adopted from [2]. Movement of the free nodes (Fig. 1b) is determined by FEM with the aforementioned boundary conditions (Fig. 1a and b). An iterative FEM solving algorithm is developed to calculate the movement of the free nodes and to solve the global FEM equation: Kd ¼ f , where K is a global stiffness matrix, d is a global nodal displacement, and f is a global nodal force. This equation can be rewritten as: K11 K12 T K12 K22

!

d1 d2

!

¼

f1 f2

!

ð1Þ

where d1 is the displacement of the moving and static nodes, d2 is the displacement of the free and sliding nodes to be determined. The parameter f1 is the nodal force on the moving and static nodes, and f2 is the nodal force acting on both free and sliding nodes. The nodal force of the free nodes is assumed to be zero, and only tangential nodal forces along the contacting bone surface are considered for the sliding nodes [2]. The final value of d2 is calculated by iteratively updating d2 using Eq. (2) until the converging condition is satisfied [described later]. ðk þ 1Þ

d2

ðkÞ

ðkÞ

¼ d2 þ d2 update ; ðk ¼ 1; 2; . . .::; nÞ

ð2Þ

d2 update is calculated as follows. First, f2 is calculated by substituting current d2 into Eq. (3) that is derived from Eq. (1). At the start of the iteration (k = 1), the initial d2 is randomly assigned and substituted for d2 to solve Eq. (3). f2 is composed of nodal force of the sliding nodes (f2 sliding ) and the free nodes (f2 free ). T f2 ¼ K12 d1 þ K22 d2

ð3Þ

Second, f2t is calculated by transforming the nodal force of the sliding nodes among f2 to have only tangential nodal force component [2]. Now, f2t is composed of the nodal

Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction

563

force of the free nodes (f2 free ) and only a tangential component of the nodal force of the sliding nodes (f2t sliding ). In the final step of the iteration, f2 update is acquired to determine the required nodal displacement (d2 update Þ. Nodal force f2 update is the difference between f2t and f2 . d2 update  T is finally calculated as follows: d2 update ¼ K221 f2 update þ K12 d1 , which is derived ðk þ 1Þ

is calculated using Eq. (2). The iteration continues until the from Eq. (1). Then, d2 maximal absolute value of f2 update converges below 0.01 N (k = n). The final values of d (d1 and d2 ) represents the displacement of mesh nodes after the appling bone movements and the simple sliding effect. The algorithm was implemented in MATLAB. The final d in this first-stage simulation is designated as dfirst .

2.2

The Second Stage of FEM Simulation with Advanced Sliding Effect

The predicted facial soft tissue changes in the first stage are further refined in the second stage by adding an advanced sliding effect. This is necessary because the first stage only accounts for the nodal force constraint, which may result in a mismatch between the simulated mesh inner surface and the bone surface (Fig. 2).

Fig. 2. Assignment of nodal displacement in the second stage of FEM. (a) Mismatch between the simulated mesh inner surface and the bone surface. (b) Description of nodal displacement boundary condition assignment.

Based on real clinical situations, the geometries of the teeth and bone buccal surface and its contacting surface on the inner side of the soft tissue mesh should exactly be matched, even though the relationship between the vertices of the bones and the nodes of the soft tissue mesh is changed after the bony segments are moved. Therefore, the boundary mapping and condition between the bone surface and soft tissue mesh nodes need to be reestablished in the sliding areas in order to properly reflect the above realistic sliding effect. First, the nodes of the inner mesh surface corresponding to the maxilla and mandible are assigned as the moving nodes in the second stage (blue in Fig. 1c). The nodal displacements of the moving nodes are calculated by finding the closest point from each mesh node to the bone surface, instead of finding them from the bone to the mesh in the first-stage. The assignment is processed from superior to inferior direction, ensuring an appropriate boundary condition implementation without mesh distortion (Fig. 2). This is because clinically the postoperative lower teeth are always inside of the upper teeth (as a normal bite) despite of the preoperative condition. This procedure prevents the nodes from having the same

564

D. Kim et al.

nodal displacement being counted twice, thus solving the mismatch problem between the bone surface and its contacting surface on the inner side of the simulated mesh. Once computed, the vector between each node and its corresponding closest vertex on the bone surface is assigned as the nodal displacement for the FEM simulation. The free nodes at the inferoposterior surface of the soft tissue mesh in the first-stage are now assigned as static nodes in this stage (orange in Fig. 1b). The rest of the nodes are assigned as the free nodes (GreenBlue in Fig. 1b and c). The global stiffness matrix (K), the nodal displacement (d) and the nodal force (f) are reorganized according to the new boundary conditions. The 2nd-stage results are calculated by solving Eq. (1). Based on the assumption that the nodal force of the free nodes, f2 , is zero (note no sliding nodes in the second-stage), the nodal displacement of the free nodes, d2 , can be T d1 (from Eq. (1)). Then, the final d (d1 and d2 ) is calculated as follows: d2 ¼ K221 K12 designated as dsecond . Finally, the overall nodal displacement is calculated by combining the resulted nodal displacements of the first (dfirst ) and the second (dsecond ) FEM simulations.

3 Quantitative and Qualitative Evaluations and Results The evaluation was completed by using 30 randomly selected datasets of patients who had dentofacial deformity and underwent an orthognathic surgery [IRB0413-0045]. Each patient had a complete preoperative and postoperative CT scans. The soft tissue prediction was completed using 3 methods: (1) the traditional FEM without considering the slide effect [1]; (2) the FEM with first-stage (simple) sliding effect by only considering the nodal force constraint; and (3) our novel FEM with two-stage sliding effects. All FEM meshes were generated by adapting our FEM template to the patient’s individual 3D model [3]. In order to determine the actual movement vector of each bony segment, the postoperative patient’s bone and soft tissue 3D CT models were registered to the preoperative ones at the cranium (surgically unmoved). The movement vector of each bony segment was calculated by moving the osteotomized segment from its preoperative original position to the postoperative position. Finally, the simulated results were evaluated quantitatively and qualitatively. In the quantitative evaluation, displacement errors (absolute mean Euclidean distances) were calculated between the nodes on the simulated facial mesh and their corresponding points on the postoperative model. The evaluation was completed for the whole face and 8 sub-regions (Fig. 3). Repeated measures analysis of variance and its post-hoc tests were used to detect the statistically significant difference. In the qualitative evaluation, two maxillofacial surgeons who are experienced in orthognathic surgery together evaluated the results based on their clinical judgement and consensus. They were also blinded from the methods used for the simulation. The predicted results were compared to the postoperative ones using a binary visual analog scale (Unacceptable: the predicted result was not clinically realistic; Acceptable: the predicted result was clinically realistic and very similar to the postoperative outcome). Chi-square test was used to detect the statistical significant differences.

Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction

565

Fig. 3. Sub-regions (automatically divided using anatomical landmarks)

The results of the quantitative evaluation showed that our two-stage sliding effects FEM method significantly improved the accuracy of the whole face, as well as the critical areas (i.e., lips, nose and chin) in comparison with the traditional FEM method. The chin area also showed a trend of improvement (Table 1). Finally, the malar region showed a significant improvement due to the scar tissue modeling. The results of the qualitative evaluation showed that 73 % (22/30) predicted results achieved with 2-stage FEM method were clinically acceptable. The prediction accuracy of the whole face and the critical regions (e.g., lips and nose) were significantly improved (Table 1). However, only 43 % (13/30) were acceptable with both traditional and simple sliding FEMs. This was mainly due to the poor lower lip prediction. Even though the cheek prediction was significantly improved in the simple sliding FEM, inaccurately predicted lower lips severely impacted the whole facial appearance. Table 1. Improvement of the simple and 2-stage sliding over the traditional FEM method (%) for 30 patients. Region

Quantitative evaluation Simple sliding Two-stage sliding Entire face 1.9 4.5* 1. Nose 7.2* 8.4* 2. Upper lip −1.3 9.2* 3. Lower lip −12.0 10.2 4. Chin −2.0 3.6 5. Right malar 6.1* 6.2* 6. Left malar 9.2* 8.8* 7. Right cheek 0.1 1.3 8. Left cheek 3.0 1.4 * Significant difference compared to the traditional

Qualitative evaluation Simple sliding Two-stage sliding 0.0 30.0* 0.0 0.0 13.3 20.0* −6.7 23.3* 3.3 10.0 0.0 0.0 0.0 0.0 23.3* 23.3* 30.0* 30.0* method (P < 0.05).

Figure 4 illustrates the predicted results of a typical patient. Using the traditional FEM, the upper and lower lip moved together with the underlying bone segments without considering the sliding movement (1.4 mm of displacement error for the upper lip; 1.6 mm for the lower), resulting in large displacement errors (clinically unacceptable, Fig. 4(a)). The predicted upper lip using the simple sliding FEM was moderately improved (1.1 mm of error), while the lower lip showed a larger error (3.1 mm). The upper and lower lips were in a wrong relation (clinically unacceptable, Fig. 4(b)).

566

D. Kim et al.

Fig. 4. An example of quantitative and qualitative evaluation results. The predicted mesh (red) is superimposed to the postoperative bone (blue) and soft tissue (grey). (a) Traditional FEM simulation (1.6 mm of error for the whole face, clinically not acceptable). (b) Simple sliding FEM simulation (1.6 mm of error, clinically not acceptable). (c) Two-stage FEM simulation (1.4 mm of error, clinically acceptable).

The mesh inner surface, and the bony/teeth geometries were also mismatched that should be perfectly matched clinically. Finally, our two-stage FEM simulation achieved the best results of accurately predicting clinically important facial features with a correct lip relation (the upper lip error: 0.9 mm; the lower: 1.3 mm, clinically acceptable, Fig. 4(c)).

4 Discussion and Future Work We developed a novel two-stage FEM simulation method to accurately predict facial soft tissue changes following osteotomies. Our approach was quantitatively and qualitatively evaluated using 30 patient datasets. The clinical contribution of this method is significant. Our approach allows doctors to understand how the bony movements affect the facial soft tissues changes preoperatively, and subsequently revise the plan as needed. In addition, it also allows patients to foresee their postoperative facial appearance prior to the surgery (patient education). The technical contributions include: (1) Efficient 2-stage sliding effects are implemented into the FEM simulation model to predict realistic facial soft tissue changes following the osteotomies. (2) The extended definition of the boundary condition and the ability of changing node types during the simulation clearly solve the mesh distortion problem, not only in the sliding regions, but also in the bone collision areas where the proximal and distal segments meet. (3) The patient-specific soft tissue FEM model can be efficiently generated by deforming our FEM template, without the need of building FEM model for each patient. It makes the FEM simulation feasible for clinical use. There are still some limitations in the current approach. Preoperative strained lower lip is not considered in the simulation. It can be automatically corrected to a reposed status in the surgery by a pure horizontal surgical movement. But the same is not true in the simulation. The 8 clinically unacceptable results using our two-stage FEM method were all due to this reason. We are working on solving this clinically observed phenomenon. In addition, we are also improving the error evaluation method. The quantitative results in this study do not necessary reflect the qualitative results as shown in Table 1 and Fig. 4. Nonetheless, our two-stage FEM simulation is the first step towards achieving a realistic facial soft-tissue-change prediction following osteotomies. In the near future, it will be fully tested in a larger clinical study.

Two-Stage Simulation Method to Improve Facial Soft Tissue Prediction

567

References 1. Pan, B., et al.: Incremental kernel ridge regression for the prediction of soft tissue deformations. Med. Image Comput. Comput. Assist. Interv. 15(Pt 1), 99–106 (2012) 2. Kim, H., Jürgens, P., Nolte, L.-P., Reyes, M.: Anatomically-driven soft-tissue simulation strategy for cranio-maxillofacial surgery using facial muscle template model. In: Jiang, T., Navab, N., Pluim, J.P., Viergever, M.A. (eds.) MICCAI 2010, Part I. LNCS, vol. 6361, pp. 61–68. Springer, Heidelberg (2010) 3. Zhang, X., et al.: An eFace-template method for efficiently generating patient-specific anatomically-detailed facial soft tissue FE models for craniomaxillofacial surgery simulation. Ann. Biomed. Eng. 44, 1656–1671 (2016) 4. Mollemans, W., Schutyser, F., Nadjmi, N., Maes, F., Suetens, P.: Parameter optimisation of a linear tetrahedral mass tensor model for a maxillofacial soft tissue simulator. In: Harders, M., Székely, G. (eds.) ISBMS 2006. LNCS, vol. 4072, pp. 159–168. Springer, Heidelberg (2006)

Hand-Held Sound-Speed Imaging Based on Ultrasound Reflector Delineation Sergio J. Sanabria(B) and Orcun Goksel Computer-assisted Applications in Medicine, ETH Zurich, Zurich, Switzerland [email protected]

Abstract. A novel hand-held speed-of-sound (SoS) imaging method is proposed, which requires only minor hardware extensions to conventional ultrasound (US) B-mode systems. A hand-held reflector is used as a timing reference for US signals. A robust reflector-detection algorithm, based on dynamic programming (DP), achieves unambiguous timing even with 10 dB signal-to-noise ratio in real tissues, successfully detecting delays < 100 ns introduced by SoS heterogeneities. An Anisotropically-Weighted Total-Variation (AWTV) regularization based on L1-norm smoothness reconstruction is shown to achieve significant improvements in the delineation of focal lesions. The Contrast-to-noise-ratio (CNR) is improved from 15 dB to 37 dB, and the axial resolution loss from > 300 % to < 15 %. Experiments with breast-mimicking phantoms and ex-vivo liver samples showed, for hard hypoechogenic inclusions not visible in B-mode US, a high SoS contrast (2.6 %) with respect to cystic inclusions (0.9 %) and the background SoS noise (0.6 %). We also tested our method on a healthy volunteer in a preliminary in-vivo test. The proposed technique demonstrates potential for low-cost and non-ionizing screening, as well as for diagnostics in daily clinical routine.

1

Introduction

Breast cancer is a high-prevalence disease affecting 1/8 women in the USA. Current routine screening consists of X-ray mammography, which, however, shows low sensitivity to malign tumors in dense breasts, for which a large number of false positives leads to an unnecessary number of breast biopsies. Also, the use of ionizing radiation advises against a frequent utilization, for instance, to monitor the progress of a tumor. Finally, the compression of the breast down to a few centimeter may cause patient discomfort. For these reasons, latest recommendations restrict the general use of X-ray mammography to biennial examinations in women over 50 year old [13]. Ultrasound (US) is a safe, pain-free, and widely available medical imaging modality, which can complement routine mammographies. Conventional screening breast US (B-mode), which measures reflectivity and scattering from tissue structures, showed significantly higher sensitivity combined with mammography (97 %) than the latter alone (74 %) [8]. However, B-mode US shows poor specificity. A novel US modality, Ultrasound Computed-tomography (USCT), aims at c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 568–576, 2016. DOI: 10.1007/978-3-319-46720-7 66

Hand-Held Sound-Speed Imaging

569

mapping other tissue parameters, such as the speed-of-sound (SoS), which shows a high potential for tumor differentiation (e.g., fibroadenoma, carcinoma, cysts) [2]. However, this method requires dedicated and complex systems consisting of a large number of transducer elements located around the breast in order to measure US wave propagation paths along multiple trajectories, from which the SoS-USCT image is reconstructed [3,5,10,11]. Low-cost extensions of conventional B-mode systems that only require a single multi-element array transducer are desirable for SoS-USCT for the daily clinical routine. There have been some early attempts to combine B-mode systems with X-ray mammography, using the back compression plate as a timing reference. Yet, the reconstruction suffers from strong limited-angle artifacts, which provide unsatisfactory image quality, unless detailed prior information of the screened inclusion geometry is available [6,9]. In this work we propose a novel SoS-USCT method, hand-held sound-speed imaging, which overcomes the above listed limitations. By transmitting US waves through tissue between a BTransducer mode transducer and a hand-held reflector, a Reflector SoS-USCT image of sufficient quality for tumor screening is obtained (Fig. 1). A specific reflector design combined with dedicated image processTissue ing provides unambiguous measurement of US Ultrasound waves time-of-flight (ToF) between different transmitter/receiver elements of a transducer, from Fig. 1. SoS imaging setup. which local tissue SoS is derived as an image. Total-variation regularization overcomes the previously reported limited-angle artifacts and enables prior-less SoS imaging and precise delineation of piece-wise homogeneous inclusions. The proposed method only requires a small and localized breast compression, while allowing for flexible access to arbitrary imaging planes within the breast.

2

Methods

A 128-element 5 MHz linear ultrasound array (L14/5-38) was operated in multistatic mode (a), each element sequentially firing (Tx) and the rest receiving (Rx). For this purpose, a custom acquisition sequence was implemented on a research ultrasound machine (SonixTouch, Ultrasonix, Richmond, Canada). In a first implementation, a conventional ultrasound beamformer is adapted to the application by beamforming only a single element pair in Tx and Rx at a time, which requires the acquisition of 128 × 128 RF lines for 40 mm depth in about 8 s. To keep the measurement scene stable during acquisition, a positioning frame was introduced to keep the orientation of transducer and reflector fixed with respect to each other (Fig. 5b). For each line, the raw (unmodulated) ultrasound R . data (RF lines) are recorded. Computations are then performed in Matlab

S.J. Sanabria and O. Goksel

60

c)

100 First echo

Inclusion

Secondary echo Reflector

First echo

Fading

Tx = Rx

Independent RF line analysis Timing ambiguities

Adaptive amplitude-tracking Tx = Rx

Secondary echo

∆ ti,o (µs)

Inclusion

∆ ti,o (µs)

d) ∆ ti,o (µs)

Line l (Tx = Rx)

b)

Dynamic programming (DP)

Tracked delays ti,o (µs)

Transducer 20

Tracked delays ti,o (µs)

a)

Tracked delays ti,o (µs)

570

Lost tracking

Fading

Time (µs)

Fig. 2. Reflector identification for ex-vivo liver test (Fig. 5c). a) Setup details; b) RF lines acquired with overlapped DP delineation for the case of same Tx and Rx; c) the measured ToF matrix ti,o ; and d) the relative path delays Δti,o after compensating for geometric effects. The proposed DP method outperforms independent RF line analysis and adaptive amplitude-tracking [12].

2.1

Reflector Delineation

The reflector consists of a thin Plexiglas stripe (50 mm × 7 mm × 5 mm), which limits the reflected echoes to the desired imaging plane, and allows for flexible access to different breast locations. The material choice ensures a coherent wave reflection along the tested angular range. The flat reflector geometry is simple for manufacture, and easy to identify in US data. Secondary echoes corresponding to wave reflections between reflector boundaries are well-separated from the main echo and filtered out (Fig. 2a, b). In a real tissue scenario, a modulated ultrasound waveform with an oscillatory pressure pattern is recorded. The recorded signal shows multiple local maxima, with varying amplitudes depending on the wave path. Simply picking the peak response in each RF line yields incorrect ToF values, since different peaks may be selected for different transmit-receive (Tx-Rx) pairs. An adaptive amplitude-tracking measurement, which uses the current timing measurement as prior information for the adjacent Tx-Rx pairs, was shown for non-destructive testing of heterogeneous materials [12]. However, it requires manual initialization, which is not affordable for in-vivo scenarios and fails when, due to wave interference and scattering effects, the reflected wave-front falls below the system noise level (fading), as frequently observed in real tissue samples (Fig. 2c, d). In this work a global optimization is introduced, which simultaneously considers the full Tx-Rx dataset. Based on Dynamic Programming (DP), which has been applied in US for the segmentation of bones [4] and vessel walls [1], an algorithm for detecting oscillatory patterns in RF lines is proposed. It consists of a global cost matrix C(l, tl ), which is cumulatively built along successive RF lines l (adjacent Tx-Rx pairs) for a list of N timing candidates tl = t0l , t1l . . . tN l , i.e., a list of possible time samples in the current RF line l, among which the optimum reflector timing can be found. Also, a memory matrix M (l, tl ) records

Hand-Held Sound-Speed Imaging

571

discrete timing decisions for each line and candidate. The optimum reflector timing is then found, which minimizes the cumulative cost, and following M (l, tl ) backwards the optimum reflector delineation T (l) is drawn:     mintl−1 {C(l − 1, tl−1 ) + f1 (tl , tl−1 )} + f0 (tl ) C(l, tl ) = (1) argmintl−1 {C(l − 1, tl−1 ) + f1 (tl , tl−1 )} M (l, tl )  T (l) = argmin C(l, tl ), l = L; M (l + 1, T (l + 1)), l = 1 . . . L − 1; tl

with f0 and f1 non-linear functions that incorporate ToF for current t1 and neighbouring tl−1 RF lines. The general formulation of Eq. 1 introduces regularization into the reflector timing problem, enabling the natural incorporation of available prior information (oscillatory pattern, smoothness, multiple echoes, path geometry) into the optimization. Moreover, the delineation does not require manual initialization and is parallelizable linewise. The currently not optimized Matlab code runs on a single-core of an Intel Core i7-4770K CPU in 0 (related to the local concentration of scatterers) and Ω > 0 (related to the local backscattered energy) are called the shape and scaling parameters, respectively. Similarly to the Rayleigh distribution, the envelope of the RF signal x2 follows a gamma distribution. By fine-tuning the shape of the distribution parameter m, other statistical distributions can be modeled, such as, an approximation of the Rician distribution (i.e., post-Rayleigh) for m > 1, a Rayleigh distribution for the special case when m = 1, and when m < 1 a K-distribution (i.e., preRayleigh). The envelope-detected RF signal based on the Nakagami m parameter was used subsequently for investigating tissue heterogeneity. 2.2

Circular Harmonic Wavelets

A natural way of assessing the echo signal f (x, y) is to analyse its statistical properties at different spatial scales. An efficient way to systematically decompose f (x, y) into successive dyadic scales is to use a redundant isotropic wavelet transform. The recursive projection of f on Simoncelli’s isotropic wavelet provides such a frame representation [9]. The radial profile of the wavelet function is defined in polar coordinates in the Fourier domain as    π  , 4 0 is a constant. Given a volume set V of constructed envelope-detected RF tumor regions fiμ (x, y), where μ stands for the Nakagami shape parameter and i is a certain slice in the acquired volume, tissue fractal characteristics from the backscattered envelope are investigated. A fractal texture map F, having a size m × n and for k dimensions, can be defined as in (5) based on the CHW frames for all k of fiμ (x, y), f ∈ V . The k value empirically specifies corresponding voxels vxy the maximum convolution kernel size I used in estimating Δv of (4). The slope of linear regression line of the log-log plot of (4) gives H from which the localized fractal dimension is estimated (F D = 3−H). This procedure is iterated covering k which yields a set of multi-dimensional fractal texture maps Mf to be all vxy constructed for each V , where Mf = {F1 , F2 , . . . , Fz }, and z is the total number of F in Mf .

Multidimensional Texture Analysis for Improved Prediction k v11 k ⎜ v21 ⎜ ⎜ .. ⎜. F (N,J) {f } (x, y) = ⎜ k ⎜ vx1 ⎜ ⎜. ⎝ .. k vm1



k k v12 · · · v1y k k v22 · · · v2y .. . . .. .. . k k vx2 · · · vxy .. .. . . k k vm2 · · · vmy

k · · · v1n k · · · v2n .. . k · · · vxn . . .. .. k · · · vmn

623

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ 

(5)

The integration of the fBm model at different CHW orders N and scales J can contribute towards a better separability between the mixtures of speckle k patterns within tissue. Such that Δv and Δr locally estimate the FD of each vxy μ up to the resolution limits of fi specified by k that best characterizes the speckle patterns for different scales and circular frequencies. This approach enables for further probing the resolution of CHW frames, and hence facilitates for assessing the speckle pattern heterogeneity. Lacunarity Analysis. To further differentiate between textures having similar FD values, the lacunarity (L) – a coefficient of variation that can measure the sparsity of the fractal texture – can assist in quantifying aspects of patterns that exhibit scale-dependent changes in structure [15]. Namely, it measures the heterogeneity of the fractal pattern, providing meta-information about the dimension of F. The lower the L value, the more heterogeneous the examined tumor region fiμ represented by F, and vice versa. L can be defined in terms of the relative variance of the size distribution in F as   2 1   2 1 2 x y F − mn x yF mn V ar [F] E[F 2 ] − E [F] = (6) L= =

2 2 .   2 E [F] E [F] 1 F x y mn

3 3.1

Results and Discussion Clinical Tumor Cross-Sectional Dataset

The approach has been validated on RF ultrasound data acquired using a diagnostic ultrasound system (Zonare Medical Systems, Mountain View, CA, USA) 4 MHz curvilinear transducer and 11 MHz sampling. The output 2-D image size was 65 × 160 mm with a resolution of 225 × 968 pixels. A total of 287 cross sectional images of 33 volumetric liver tumors manually segmented were obtained from the stacks of 2-D images, 117 were responsive and 170 did not responded to chemotherapy treatment. Response to treatment was determined based on conventional computed tomography follow up imaging as part of the patient standard clinical care based on the response evaluation criteria in solid tumors (RECIST) [16]. The baseline cross-sectional imaging was compared against those performed at the end of treatment according to the RECIST criteria to determine response to treatment for each target tumor. A tumor was classified as responsive if categorized as partial response and non-responsive if no change or disease demonstrated progression.

624

3.2

O.S. Al-Kadi et al.

Statistical Analysis

To quantitatively assess the robustness of our approach, 2 (N + 1) × J features, where 2 stands for both the average F and L estimated at each N and J per slice of each of the acquired volumes, were fed into a support vector machine classifier to compare the overall modeling performance of classifying responsive versus non-responsive cases. Cross-validation was performed based on a leaveone-tumor-out (loo) approach, and further validated on independent test-set of 107 cross sectional images (69 responsive versus 38 non-responsive images). The convolution kernel size I used in estimating the localized FD of F was initially optimized while having N and J fixed, see Fig. 1. Then the classification performance for different N and J values of the CHW representation was investigated in order to quantify L extracted from F. Hence, when the optimized values of N , J and I are employed, 97.91 % (97.2 % for unseen data) best classification accuracy is achieved as compared to 92.1 % in the work of [6], and similarly applies to the 5- and 10-folds cross validation results (indicated in terms of mean ± standard deviation of the performance over 60 runs). Figure 2 shows the F and corresponding L for a non-responsive vs responsive case. A less heterogeneous texture (i.e. higher L values colored in red in Fig. 2) is witnessed in the responsive case. This indicates tumor tissue texture is becoming more sparse, which could be signs of necrotic regions, and hence responding to treatment (Table 1).

Fig. 1. Classification accuracies for varying convolution kernel size (I) in pixels with fixed order (N = 2) and scale (J = 6)

Characterizing the speckle patterns in terms of multi-scale circular harmonics representation could assist in better characterization of the backscattered signal, which adapts according to the varying nature of tissue structure. As changes in the scatterers’ spatial distribution and number density reflect in the ultrasound backscattering, the sensitivity of response to treatment of the envelope RF signal is implicitly linked to changes in FD and associated L on the CHW frames. Finally, tumors with varying depth would decrease the amplitude of the RF data, and quantifying the tumor response to chemotherapy treatment under such conditions is planned for future work.

Multidimensional Texture Analysis for Improved Prediction

625

Table 1. Classification performance for the multidimensional heterogeneity analysis of clinical liver tumor dataset Cross-validation Statistical measures loo 5-fold

10-fold

Accuracy

97.91 93.30 ± 0.017 95.70 ± 0.009

Sensitivity

98.80 96.40 ± 0.888 97.50 ± 0.931

Specificity

96.60 88.80 ± 0.964 93.10 ± 0.975

ROC-AUC

97.70 92.60 ± 0.020 95.30 ± 0.009

Fig. 2. (1st column) Tumor B-mode images, (2nd column) fractal texture maps and (3rd column) corresponding tissue heterogeneity representation for a (1st row) nonresponsive vs (2nd row) responsive case, respectively. Red regions in (c) and (f) indicate response to treatment according to RECIST criteria [16]. CHW decomposition was based on a 2nd order and up to the 8-th scale.

4

Conclusion

A novel approach has been presented for quantifying liver tumor response to chemotherapy treatment with three main contributions: (a) ultrasound liver tumor texture analysis based on a Nakagami distribution model for analyzing the envelope RF data is important to retain enough information; (b) a set of CHW frames are used to define a new tumor heterogeneity descriptor that is characterized at multi-scale circular harmonics of the ultrasound RF envelope data; (c) the heterogeneity is specified by the lacunarity measure, which is viewed as the size distribution of gaps on the fractal texture of the decomposed CHW coefficients. Finally the measurement of heterogeneity for the proposed representation model is realized by means of support vector machines. Acknowledgments. We would like to thank Dr. Daniel Y.F. Chung for providing the ultrasound dataset. This work was partially supported by the Swiss National Science Foundation (grant PZ00P2 154891) and the Arab Fund (grant 2015-02-00627).

626

O.S. Al-Kadi et al.

References 1. Bae, Y.H., Mrsny, R., Park, K.: Cancer Targeted Drug Delivery: An Elusive Dream, pp. 689–707. Springer, New York (2013) 2. Sadeghi-Naini, A., Papanicolau, N., Falou, O., Zubovits, J., Dent, R., Verma, S., Trudeau, M., Boileau, J.F., Spayne, J., Iradji, S., Sofroni, E., Lee, J., Lemon-Wong, S., Yaffe, M., Kolios, M.C., Czarnota, G.J.: Quantitative ultrasound evaluation of tumor cell death response in locally advanced breast cancer patients receiving chemotherapy. Clin. Cancer Res. 19(8), 2163–2174 (2013) 3. Tadayyon, H., Sadeghi-Naini, A., Wirtzfeld, L., Wright, F.C., Czarnota, G.: Quantitative ultrasound characterization of locally advanced breast cancer by estimation of its scatterer properties. Med. Phys. 41, 012903 (2014) 4. Gangeh, M.J., Sadeghi-Naini, A., Diu, M., Kamel, M.S., Czarnota, G.J.: Categorizing extent of tumour cell death response to cancer therapy using quantitative ultrasound spectroscopy and maximum mean discrepancy. IEEE Trans. Med. Imaging 33(6), 268–272 (2014) 5. Wachinger, C., Klein, T., Navab, N.: The 2D analytic signal for envelope detection and feature extraction on ultrasound images. Med. Image Anal. 16(6), 1073–1084 (2012) 6. Al-Kadi, O.S., Chung, D.Y., Carlisle, R.C., Coussios, C.C., Noble, J.A.: Quantification of ultrasonic texture intra-heterogeneity via volumetric stochastic modeling for tissue characterization. Med. Image Anal. 21(1), 59–71 (2015) 7. Al-Kadi, O.S., Watson, D.: Texture analysis of aggressive and non-aggressive lung tumor CE CT images. IEEE Trans. Bio-med. Eng. 55(7), 1822–1830 (2008) 8. Shankar, P.M.: A general statistical model for ultrasonic backscattering from tissues. IEEE T Ultrason. Ferroelectr. Freq. Control 47(3), 727–736 (2000) 9. Portilla, J., Simoncelli, E.P.: A parametric texture model based on joint statistics of complex wavelet coefficients. Int. J. Comput. Vis. 40, 49–70 (2000) 10. Ojala, T., Pietik¨ anen, M., M¨ aenp¨ aa ¨, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24, 971–987 (2002) 11. Unser, M., Chenouard, N.: A unifying parametric framework for 2D steerable wavelet transforms. SIAM J. Imaging Sci. 6(1), 102–135 (2013) 12. Unser, M., Van De Ville, D.: Wavelet steerability and the higher-order Riesz transform. IEEE Trans. Image Process. 19(3), 636–652 (2010) 13. Depeursinge, A., P¨ usp¨ oki, Z., et al.: Steerable wavelet machines (SWM): learning moving frames for texture classification. IEEE Trans. Image Process. (submitted) 14. Lopes, R., Betrouni, N.: Fractal and multifractal analysis: a review. Med. Image Anal. 13(4), 634–649 (2009) 15. Plotnick, R.E., Gardner, R.H., Hargrove, W.W., Prestegaard, K., Perlmutter, M.: Lacunarity analysis: a general technique for the analysis of spatial patterns. Phys. Rev. E 53(5), 5461–5468 (1996) 16. Eisenhauer, E.A., Therasse, P., et al.: New response evaluation criteria in solid tumours: revised RECIST guideline. Eur. J. Cancer 45(2), 228–247 (2009)

Classification of Prostate Cancer Grades and T-Stages Based on Tissue Elasticity Using Medical Image Analysis Shan Yang(B) , Vladimir Jojic, Jun Lian, Ronald Chen, Hongtu Zhu, and Ming C. Lin University of North Carolina at Chapel Hill, Chapel Hill, USA [email protected] http://gamma.cs.unc.edu/CancerClass Abstract. In this paper, we study the correlation of tissue (i.e. prostate) elasticity with the spread and aggression of prostate cancers. We describe an improved, in-vivo method that estimates the individualized, relative tissue elasticity parameters directly from medical images. Although elasticity reconstruction, or elastograph, can be used to estimate tissue elasticity, it is less suited for in-vivo measurements or deeply-seated organs like prostate. We develop a non-invasive method to estimate tissue elasticity values based on pairs of medical images, using a finite-element based biomechanical model derived from an initial set of images, local displacements, and an optimization-based framework. We demonstrate the feasibility of a statistically-based multi-class learning method that classifies a clinical T-stage and Gleason score using the patient’s age and relative prostate elasticity values reconstructed from computed tomography (CT) images.

1

Introduction

Currently screening of prostate cancers is usually performed through routine prostate-specific antigen (PSA) blood tests and/or a rectal examination. Based on positive PSA indication, a biopsy of randomly sampled areas of the prostate can then be considered to diagnose the cancer and assess its aggressiveness. Biopsy may miss sampling cancerous tissues, resulting in missed or delayed diagnosis, and miss areas with aggressive cancers, thus under-staging the cancer and leading to under-treatment. Studies have shown that the tissue stiffness described by the tissue properties may indicate abnormal pathological process. Ex-vivo, measurement-based methods, such as [1,11] using magnetic resonance imaging (MRI) and/or ultrasound, were proposed for study of prostate cancer tissue. However, previous works in material property reconstruction often have limitations with respect to their genericity, applicability, efficiency and accuracy [22]. More recent techniques, such as inverse finite-element methods [6,13,17,21,22], stochastic finite-element methods [18], and image-based ultrasound [20] have been developed for in-vivo soft tissue analysis. c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 627–635, 2016. DOI: 10.1007/978-3-319-46720-7 73

628

S. Yang et al.

In this paper, we study the possible use of tissue (i.e. prostate) elasticity to help evaluate the prognosis of prostate cancer patients given at least two set of CT images. The clinical T-stage of a prostate cancer is a measure of how much the tumor has grown and spread; while a Gleason score based on the biopsy of cancer cells indicates aggressiveness of the cancer. They are commonly used for cancer staging and grading. We present an improved method that uses geometric and physical constraints to deduce the relative tissue elasticity parameters. Although elasticity reconstruction, or elastography, can be used to estimate tissue elasticity, it is less suited for in-vivo measurements or deeply seated organs like prostate. We describe a non-invasive method to estimate tissue elasticity values based on pairs of CT images, using a finite-element based biomechanical model derived from an initial set of images, local displacements, and an optimization-based framework. Given the recovered tissue properties reconstructed from analysis of medical images and patient’s ages, we develop a multiclass classification system for classifying clinical T-stage and Gleason scores for prostate cancer patients. We demonstrate the feasibility of a statistically-based multiclass classifier that classifies a supplementary assessment on cancer T-stages and cancer grades using the computed elasticity values from medical images, as an additional clinical aids for the physicians and patients to make more informed decision (e.g. more strategic biopsy locations, less/more aggressive treatment, etc.). Concurrently, extracted image features [8–10] using dynamic contrast enhanced (DCE) MRI have also been suggested for prostate cancer detection. These methods are complementary to ours and can be used in conjunction with ours as a multimodal classification method to further improve the overall classification accuracy.

2

Method

Our iterative simulation-optimization-identification framework consists of two alternating phases: the forward simulation to estimate the tissue deformation and inverse process that refines the tissue elasticity parameters to minimize the error in a given objective function. The input to our framework are two sets of 3D images. After iterations of the forward and inverse processes, we obtain the best set of elasticity parameters. Below we provide a brief overview of the key steps in this framework and we refer the interested readers to the supplementary document at http://gamma.cs.unc.edu/CancerClass/ for the detailed mathematical formulations and algorithmic process to extract the tissue elasticity parameters from medical images. 2.1

Forward Simulation: BioTissue Modeling

In our system, we apply Finite Element Method (FEM) and adopt Mooney Rivlin material for bio-tissue modeling [3]. After discretization using FEM, we arrive at a linear system,

Classification of Prostate Cancer Grades and T-Stages

Ku = f

629

(1)

with K as the stiffness matrix, u as the displacement field and f as the external forces. The stiffness matrix K is not always symmetric possitive definite due to complicated boundary condition. The boundary condition we applied is the traction forces (shown in Fig. 7(a) of the supplementary document) computed based on the displacement of the surrounding tissue (overlapping surfaces shown in Fig. 7(b) of the supplementary document). We choose to use the Generalized Minimal Residual (GMRES) [16] solver to solve the linear system instead of the Generalized Conjugate Gradient (GCG) [14], as GMRES can better cope with non-symmetric, positive-definite linear system. The computation of the siffness matrix K in Eq. 1 depends on the energy function Ψ of the Mooney Rivlin material model [15,19]. 2 1 1 1 µ1 ((I21 − I2 )/I33 − 6) + µ2 (I1 /I33 − 3) + v1 (I32 − 1)2 , (2) 2 where µ1 , µ2 and v1 are the material parameters. In this paper, we recover parameters µ1 and µ2 . Since prostate soft tissue (without tumors) tend to be homogenous, we use the average µ ¯ of µ1 and µ2 as our recovered elasticity parameter. To model incompressibility, we set v1 to be a very large value (1 + e7 was used in our implementation). v1 is linearly related to the bulk modulus. The larger the bulk modulus, the more incompressible the object.

Ψ=

Relative Elasticity Value: In addition, we divide the recovered absolute elasticity parameter µ ¯ by the that of the surrounding tissue to compute the relative elasticity parameter µ ˆ. This individualized relativity value helps to remove the variation in mechanical properties of tissues between patients, normalizing the per-patient fluctuation in absolute elasticity values due to varying degrees of hydration and other temporary factors. We refer readers to our supplementary document for details regarding non-linear material models. 2.2

Inverse Process: Optimization for Parameter Identification

To estimate the patient-specific relative elasticity, our framework minimizes the error due to approximated parameters in an objective function. Our objective function as defined in Eq. 3 consists of the two components. The first part is the difference between the two surfaces – one reconstructed from the reference (initial) set of images, deformed using FEM simulation with the estimated parameters toward the target surface, and one target surface reconstructed from the second set of images. This difference is measured by the Hausdorff distance [4]. In addition we add a Tikhonov regularization [5,7] term, which improves the conditioning of a possibly ill-posed problem. With regularization, our objective function is given as:  (3) d(Sl , St )2 + λΓ Sl , μ = argmin µ

with d(Sl , St ) as the distance between deformed surface and the reference surface, λ as the regularization weight, and Γ as the second-order differentiatial operator.

630

S. Yang et al.

The second-order differential operator Γ on a continuous surface (2manifolds) S is the curvatures of a point on the surface. The curvature is defined through the tangent plane passing that point. We denote the normal vector of the tangent plane as n and the unit direction in the tangent plane as eθ . The curvature related to the unit direction eθ is κ(θ). The mean curvature κmean for a continuous surface is defined as the average curvature of all the direc 2π 1 tions, κmean = 2π 0 κ(θ)dθ. In our implementation, we use triangle mesh to approximate a continuous surface. We use the 1-ring neighbor as the region for computing the mean curvature normal on our discrete surface Sl . We treat each triangle of the mesh as a local surface with two conformal space parameters u and v. With these two parameters u and v the second-order differential operator Γ on vertex x is, ∆u,v x = xuu + xvv . 2.3

Classification Methods

For classification of cancer prognostic scores, we develop a learning method to classify patient cancer T-Stage and Gleason score based on the relative elasticity parameters recovered from CT images. Both the prostate cancer T-stage and the Gleason score are generally considered as ordinal responses. We study the effectiveness of ordianl logistic regression [2] and multinomial logistic regression [12] in the context of prostate cancer staging and grading. For both cases we use RBF kernel to project our feature to higher dimentional space. We refer readers to supplementary document for method details and the comparison with the Random Forests method.

3 3.1

Patient Data Study Preprocessing and Patient Dataset

Given the CT images (shown in Fig. 1a) of the patient, the prostate, bladder and rectum are first segmented in the images. Then the 3D surfaces (shown in

(a)

(b)

Fig. 1. Real Patient CT Image and Reconstructed Organ Surfaces. (a) shows one slice of the parient CT images with the bladder, prostate and rectum segmented. (b) shows the reconstructed organ surfaces.

Classification of Prostate Cancer Grades and T-Stages

631

Fig. 1b) of these organs are reconstructed using VTK and these surfaces would be the input to our elasticity parameter reconstruction algorithm. Our patient dataset contains 113 (29 as the reference and 84 as target) sets of CT images from 29 patients, each patient having 2 to 15 sets of CT images. Every patient in the dataset has prostate cancer with cancer T-stage ranging from T1 to T3, Gleason score ranging from 6 to 10, and age from 50 to 85. Gleanson scores are usually used to assess the aggressiveness of the cancer. 3.2

Cancer Grading/Staging Classification Based on Prostate Elasticity Parameters

We further study the feasibility of using recovered elasticity parameters as a cancer prognostic indicator using our classifier based on relative tissue elasticity values and ages. Two classification methods, ordinal logistic regression and multinomial logistic regression, were tested in our study. We test each method with two sets of features. The first set of features contains only the relative tissue elasticity values µ ˆ. The resultant feature vector is one dimension. The second set of features contains both the relative tissue elasticity values and the age. The feature vector for this set of features is two dimensional. Our cancer staging has C = 3 classes, T1, T2 and T3. And the cancer grading has G = 5 classes, from 6 to 10. In our patient dataset, each patient has at least 2 sets of CT images. The elasticity parameter reconstruction algorithm needs 2 sets of CT images as input. We fix one set of CT images as the initial (reference) image and use the other M number of images T , where |T | = M as the target (deformed) images. By registering the initial image to the target images, we obtain one elasticity parameter µ ˆi , i = 1 . . . M for each image in T . We perform both per-patient and per-image cross validation. Per-Image Cross Validation: We treat all the target images (N = 84) of all the patients as data points of equal importance. The elasticity feature for each target image is the recovered elasticity parameter µ ˆ. In this experiment, we train our classifier using the elasticity feature of the 83 images then cross validate with the one left out. Then, we add the patient’s age as another feature to the classifier and perform the validation. The results for cancer staging (T-Stage) classification are shown in Fig. 2a and that for cancer grading (Gleason score) classification are shown in Fig. 2b. The error metric is measured as the absolute difference between the classified cancer T-Stage and the actual cancer T-Stage. Zero error-distance means our classifier accurately classifies the cancer T-Stage. The multinomial method outperforms the ordinal method for both cancer staging (T-Stage) and cancer aggression (Gleason score) classification. The main reason that we are observing this is due to the optimization weights or the unknown regression coefficients β (refer to supplementary document for the definition) dimension of the multinomial and ordinal logistic regression method. The dimension of the unknown regression coefficients of the multinomial logistic regression for cancer staging classification (with elasticity parameter and age as features) is 6 while that of ordinal logistic regression is 4. With the ‘age’ feature,

632

S. Yang et al.

(a)

(b)

Fig. 2. Error Distribution of Cancer Grading/Staging Classification for PerImage Study. (a) shows error distribution of our cancer staging classification using the recovered prostate elasticity parameter and the patient’s age. For our patient dataset, the multinomial classifier (shown in royal blue and sky blue) outperforms the ordinal classifier (shown in crimson and coral). We achieve up to 91 % accuracy using multinomial logistic regression and 89 % using ordinal logistic regression for classifying cancer T-Stage based on recovered elasticity parameter and age. (b) shows the correlation between the recovered relative elasticity parameter and the Gleason score with/without the patient’s age. We achieve up to 88 % accuracy using multinomial logistic regression and 81 % using ordinal logistic regression for classifying Gleason score based on recovered elasticity parameter and age.

we obtain up to 91 % accuracy for perdicting cancer T-Stage using multinomial logistic regression method and 89 % using ordinal logistic regression method. For Gleason score classification we achieve up to 88 % accuracy using multinomial logistic regression method and 81 % using ordinal logistic regression method. Per-Patient Cross Validation: For patients with more than 2 sets of images, we apply Gaussian sampling to µ ˆi , i = 1 . . . M to compute the sampled elasticity parameter as the elasticity feature of the patient. We first train our classifier using the elasticity feature of the 28 patients then test the trained classifier on the remaining one patient not in the training set. We repeat this process for each of the 29 patients. Then we include the patient age as another feature in the classifier. The error distribution for cancer staging (T-Stage) classification results are shown in Fig. 3a and the error distribution of cancer grading (Gleason score) classification are shown in Fig. 3b. We observe that the multinomial method in general outperforms the ordinal method. More interestingly, the age feature helps to increase the classification accuracy by 2 % for staging classification and 7 % for Gleason scoring classification). With the age feature, our multinomial classifier achieves up to 84 % accuracy for classifying cancer T-Stage and up to 77 % accuracy for classifying Gleason scores. And our ordinal classifier achieves up to 82 % for cancer T-Stage classification and 70 % for Gleason score classification. The drop in accuracy for per-patient experiments compared with per-image ones is primary due to the decrease in data samples.

Classification of Prostate Cancer Grades and T-Stages

(a)

633

(b)

Fig. 3. Error Distribution of Cancer Aggression/Staging Classification for Per-Patient Study. (a) shows the accuracy and error distribution of our recovered prostate elasticity parameter and cancer T-Stage. For our patient dataset, the multinomial classifier (shown in royal blue and sky blue) outperforms the ordinal classifier (shown in crimson and coral). We achieve up to 84 % accuracy using multinomial logistic regression and 82 % using ordinal logistic regression for classifying cancer T-Stage based on our recovered elasticity parameter and patient age information. (b) shows the correlation between the recovered relative elasticity parameter and the Gleason score. We achieve up to 77 % accuracy using multinomial logistic regression and 70 % using ordinal logistic regression for classifying Gleason score based on our recovered elasticity parameter and patient age information.

Among the 16 % failure cases for cancer staging classification, 15 % of our multinomial classification results with age feature is only 1 stage away from the ground truth. And for the failure cases for scoring classification, only 10 % of the classified Gleason scores is 1 away from the ground truth and 13 % of them are 2 away from the ground truth.

4

Conclusion and Future Work

In this paper, we present an improved, non-invasive tissue elasticity parameter reconstruction framework using CT images. We further studied the correlation of the recovered relative elasticity parameters with prostate cancer T-Stage and Gleason score for multiclass classification of cancer T-stages and grades. The classification accuracy on our patient dataset using multinormial logistic regression method is up to 84 % accurate for cancer T-stages and up to 77 % accurate for Gleason scores. This study further demonstrates the effectiveness of our algorithm for recovering (relative) tissue elasticity parameter in-vivo and its promising potential for correct classification in cancer screening and diagnosis. Future Work: This study is performed on 113 sets of images from 29 prostate cancer patients all treated in the same hospital. More image data from more patients across multiple institutions can provide a much richer set of training

634

S. Yang et al.

data, thus further improving the classification results and testing/validating its classification power for cancer diagnosis. With more data, we could also apply our learned model for cancer stage/score prediction. And other features, such as the volume of the prostate can also be included in the larger study. Another possible direction is to perform the same study on normal subjects and increase the patient diversity from different locations. A large-scale study can enable more complete analysis and lead to more insights on the impact of variability due to demographics and hospital practice on the study results. Similar analysis and derivation could also be performed using other image modalities, such as MR and ultrasound, and shown to be applicable to other types of cancers. Acknowledgments. This project is supported in part by NIH R01 EB020426-01.

References 1. Ashab, H.A.D., Haq, N.F., Nir, G., Kozlowski, P., Black, P., Jones, E.C., Goldenberg, S.L., Salcudean, S.E., Moradi, M.: Multimodal classification of prostate tissue: a feasibility study on combining multiparametric MRI and ultrasound. In: SPIE Medical Imaging, p. 94141B. International Society for Optics and Photonics (2015) 2. Bender, R., Grouven, U.: Ordinal logistic regression in medical research. J. R. Coll. Physicians Lond. 31(5), 546–551 (1997) 3. Cotin, S., Delingette, H., Ayache, N.: Real-time elastic deformations of soft tissues for surgery simulation. IEEE Trans. Vis. Comput. Graph. 5(1), 62–73 (1999) 4. Dubuisson, M.P., Jain, A.K.: A modified hausdorff distance for object matching. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, 1994, vol. 1-Conference A: Computer Vision and Image Processing, vol. 1, pp. 566–568. IEEE (1994) 5. Engl, H.W., Kunisch, K., Neubauer, A.: Convergence rates for Tikhonov regularisation of non-linear ill-posed problems. Inverse Prob. 5(4), 523 (1989) 6. Goksel, O., Eskandari, H., Salcudean, S.E.: Mesh adaptation for improving elasticity reconstruction using the FEM inverse problem. IEEE Trans. Med. Imaging 32(2), 408–418 (2013) 7. Golub, G.H., Hansen, P.C., O’Leary, D.P.: Tikhonov regularization and total least squares. SIAM J. Matrix Anal. Appl. 21(1), 185–194 (1999) 8. Haq, N.F., Kozlowski, P., Jones, E.C., Chang, S.D., Goldenberg, S.L., Moradi, M.: Prostate cancer detection from model-free T1-weighted time series and diffusion imaging. In: SPIE Medical Imaging, p. 94142X. International Society for Optics and Photonics (2015) 9. Haq, N.F., Kozlowski, P., Jones, E.C., Chang, S.D., Goldenberg, S.L., Moradi, M.: Improved parameter extraction and classification for dynamic contrast enhanced MRI of prostate. In: SPIE Medical Imaging, p. 903511. International Society for Optics and Photonics (2014) 10. Haq, N.F., Kozlowski, P., Jones, E.C., Chang, S.D., Goldenberg, S.L., Moradi, M.: A data-driven approach to prostate cancer detection from dynamic contrast enhanced MRI. Comput. Med. Imaging Graph. 41, 37–45 (2015)

Classification of Prostate Cancer Grades and T-Stages

635

11. Khojaste, A., Imani, F., Moradi, M., Berman, D., Siemens, D.R., Sauerberi, E.E., Boag, A.H., Abolmaesumi, P., Mousavi, P.: Characterization of aggressive prostate cancer using ultrasound RF time series. In: SPIE Medical Imaging, p. 94141A. International Society for Optics and Photonics (2015) 12. Kleinbaum, D.G., Klein, M.: Ordinal logistic regression. Logistic Regression, pp. 463–488. Springer, Berlin (2010) 13. Lee, H.P., Foskey, M., Niethammer, M., Krajcevski, P., Lin, M.C.: Simulationbased joint estimation of body deformation and elasticity parameters for medical image analysis. IEEE Trans. Med. Imaging 31(11), 2156–2168 (2012) 14. Liu, Y., Storey, C.: Efficient generalized conjugate gradient algorithms, part 1: theory. J. Optim. Theory Appl. 69(1), 129–137 (1991) 15. Rivlin, R.S., Saunders, D.: Large elastic deformations of isotropic materials. VII. Experiments on the deformation of rubber. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Sci. 243(865), 251–288 (1951) 16. Saad, Y., Schultz, M.H.: Gmres: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Computing. 7(3), 856–869 (1986) 17. Shahim, K., J¨ urgens, P., Cattin, P.C., Nolte, L.-P., Reyes, M.: Prediction of craniomaxillofacial surgical planning using an inverse soft tissue modelling approach. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013, Part I. LNCS, vol. 8149, pp. 18–25. Springer, Heidelberg (2013) 18. Shi, P., Liu, H.: Stochastic finite element framework for simultaneous estimation of cardiac kinematic functions and material parameters. Med. Image Anal. 7(4), 445–464 (2003) 19. Treloar, L.R., Hopkins, H., Rivlin, R., Ball, J.: The mechanics of rubber elasticity [and discussions]. Proc. R. Soc. Lond. A. Math. Phys. Sci. 351(1666), 301–330 (1976) 20. Uniyal, N., et al.: Ultrasound-based predication of prostate cancer in MRI-guided biopsy. In: Linguraru, M.G., Laura, C.O., Shekhar, R., Wesarg, S., Ballester, ´ M.A.G., Drechsler, K., Sato, Y., Erdt, M. (eds.) CLIP 2014. LNCS, vol. 8680, pp. 142–150. Springer, Heidelberg (2017) 21. Vavourakis, V., Hipwell, J.H., Hawkes, D.J.: An inverse finite element u/pformulation to predict the unloaded state of in vivo biological soft tissues. Ann. Biomed. Eng. 44(1), 187–201 (2016) 22. Yang, S., Lin, M.: Materialcloning: Acquiring elasticity parameters from images for medical applications (2015)

Automatic Determination of Hormone Receptor Status in Breast Cancer Using Thermography Siva Teja Kakileti, Krithika Venkataramani(B) , and Himanshu J. Madhu Xerox Research Centre India, Bangalore, India {SivaTeja.Kakileti,Krithika.Venkataramani,Himanshu.Madhu2}@xerox.com

Abstract. Estrogren and progesterone hormone receptor status play a role in the treatment planning and prognosis of breast cancer. These are typically found after Immuno-Histo-Chemistry (IHC) analysis of the tumor tissues after surgery. Since breast cancer and hormone receptor status affect thermographic images, we attempt to estimate the hormone receptor status before surgery through non-invasive thermographic imaging. We automatically extract novel features from the thermographic images that would differentiate hormone receptor positive tumors from hormone receptor negative tumors, and classify them though machine learning. We obtained a good accuracy of 82 % and 79 % in classification of HR+ and HR− tumors, respectively, on a dataset consisting of 56 subjects with breast cancer. This shows a novel application of automatic thermographic classification in breast cancer prognosis. Keywords: Thermography · Breast cancer prognosis tor status

1

· Hormone recep-

Introduction

Breast cancer has the highest incidence among cancers in women [1]. Breast cancer also has wide variations in the clinical and pathological features [2], which are taken into account for treatment planning [3], and to predict survival rates or treatment outcomes [2,4]. Thermography offers a radiation free and non-contact approach to breast imaging and is being re-investigated in recent times [5–8] with the availability of high resolution thermal cameras. Thermography detects the temperature increase in malignancy due to the increased metabolism of cancer [9] and due to the additional blood flow generated for feeding the malignant tumors [6]. Thermography may also be sensitive to hormone receptor status as these hormones release Nitric Oxide, which causes vasodilation and temperature increase [6,10]. Both these effects could potentially lead to evaluation of hormone receptor status of malignant tumors using thermography. If this is possible, it provides a non-invasive way of predicting the hormone receptor status of malignancies through imaging, before going through Immuno-Histo-Chemistry (IHC) analysis on the tumor samples after surgery. This paper investigates this possibility and the prediction accuracy. Most other breast imaging techniques including c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 636–643, 2016. DOI: 10.1007/978-3-319-46720-7 74

Automatic Determination of Hormone Receptor Status in Breast Cancer

637

mammography are not able to detect hormone receptor status changes. Though the paper by Chaudhuri et al. [11] claims that Dynamic Contrast Enhanced (DCE) MRI can be used for prediction of Estrogen status, it is invasive, and has been tested only on a small dataset of 20 subjects with leave-one-out crossvalidation. There has been a study to analyze the effect of hormone receptor status of malignant tumors on thermography [12] though quantitative analysis of average or maximum temperatures of the tumor, the mirror tumor site and the breasts. [12] reports a significant difference in these temperature measurements for hormone receptor positive and negative status using thermography. In this paper, we automatically extract features from the thermographic images in the region of interest (ROI), i.e. the breast tissue, using image processing and attempt to classify the hormone receptor status of malignant tumors using machine learning techniques. The determination of whether or not a subject has breast cancer using thermography, i.e. screening for cancer, is out of scope for this paper. There are other algorithms for breast cancer screening using thermography [8,13], which the reader may refer to based on interest. The paper is organized as follows. Section 2 provides details on the effect of hormone receptor positive and negative breast cancers on thermography from the existing literature. Section 3 describes our approach to automatic feature extraction from the ROI for HR+ and HR− malignant tumor classification. Section 4 describes the dataset used for our experiments and our classification results are provided in Sect. 5. Conclusions and future work are given in Sect. 6.

2

Effect of Hormone Receptor Status on Thermography

There is usage of readily available tumor markers such as Estrogen Receptor (ER), Progesterone Receptor (PR), Human Epidermal growth factor Receptor 2 (HER2) and tumor cell growth protein marker Ki67, for treatment planning [3,14], and survival rate prediction [2,4], especially in resource constrained developing countries like India. [2] uses ER, PR and HER2 for estimating breast cancer mortality risk from a large dataset of more than 100, 000 patients with invasive breast cancer. They find that there is variability in the 8 different ER/PR/HER2 subtypes, and the ER status has the largest importance. ER+ tumors have a lower risk than ER− tumors. PR status has a lesser importance than ER status and PR+ tumors have lower risk than PR− tumors. HER2 status has variations in risk across the different hormone receptor subtypes, depending on the stage of the cancer, with the lowest risk for the ER+/PR+/HER2− tumors, and the highest risk for ER−/PR−/HER2− tumors. The effect of the Ki-67 marker indicates the rate of tumor cell growth [14]. More aggressive tumors may have higher temperatures due to their increased metabolism [9] and so the Ki-67 marker status may play a role in thermography, but it has not been formally investigated in any study yet. Estrogen leads to increase in vasodilation due to the production of Nitric Oxide with a resultant temperature increase [6,15]. Progesterone is also

638

S.T. Kakileti et al.

associated with locally high concentrations of Nitric Oxide generation [10] for prolonged periods of time. [12] find there is a significant difference in average and maximum temperature of the tumor site between PR+ and PR− tumors, with the PR− tumors being hotter. The same pattern holds for ER status although in a non-significant manner. Their study showed that the more aggressive ER−/PR− tumors were hotter than the less aggressive ER+/PR+ tumors. Their study also indicates that the difference in average temperatures of the tumor and its mirror sites in contra-lateral breasts is higher in ER− tumors than in ER+ tumors, although in a non-significant manner. The same pattern holds for the PR status too. Since the hormone sensitivity of both breast tissues are similar, it is probable that there is a thermal increase on both breasts for estrogen or progesterone positive cases. [12] don’t specifically analyze the four different subtypes of ER/PR status, probably because the difference in temperatures are small for just one hormone receptor status. Using these medical reasons and empirical observations, in the next section, we design a set of novel features along with a few existing features that would either extract these observations automatically or would correlate with these findings for classifying hormone receptor positive and negative tumors.

3

Automatic Feature Extraction for Hormone Receptor Status

We attempt to classify all combinations of Hormone Receptor (HR) positive (ER+/PR+, ER+/PR−, ER−/PR+) tumors from the HR negative (ER−/PR−) tumors. We extracted features from elevated temperature regions in the ROI, and the overall ROI. The elevated temperature regions, i.e., the hot-spots are extracted as below. 3.1

Abnormal Region Extraction

The entire ROI is divided into abnormal regions and normal regions based on their regional temperatures. The malignant tumor region is typically an abnormal region with an elevated temperature. The abnormal regions have the highest regional temperature in the Region of Interest (ROI). To segment an abnormal region, we used an algorithm proposed in [16], where segmentation areas are combined from multiple features defined by Eqs. 1 and 2 using a decision rule. T1 = M ode(ROI) + ρ ∗ (Tmax − M ode(ROI))

(1)

T2 = Tmax − τ

(2)

In the above equations, Tmax represents the overall maximum temperature in all views and M ode(ROI) represents the mode of the temperature histogram obtained using temperature values of pixels from the ROIs of all views. The parameters ρ, τ and the decision fusion rule are selected based on the accuracy

Automatic Determination of Hormone Receptor Status in Breast Cancer

639

of classification on a training/cross-validation subset and diversity in the segmentation decisions. Decision fusion results in better hot-spot detection than simple thresholding techniques [16]. Heat transmission from deep tumors results in diffused lower temperatures on the surface and these parameters play a large role in the deep tumor detection. Research on determining the combined depth and size of tumors that can be detected needs to be done. As discussed in [12], HR− tumors are hotter compared to HR+ tumors while temperature increase on both sides is observed for HR+ tumors due to the presence of similar hormone sensitive tissues. To capture these properties, we extract the following features from these detected abnormal regions. Distance Between Regions. The malignant tumor region is hotter than the surrounding region, but the relative difference is higher for HR− tumors. In case of HR+ tumors, the entire breast region is warmed up, and so this difference is lesser. We use the normalized histogram of temperatures, or probability mass function (PMF), to represent each region, and find the distance between regions using a distance measure between PMFs. Here, the Jensen-Shannon Divergence (JSD) is used a measure, as it is a symmetric measure. The JSD is defined as JSD(P ||Q) =

1  P (i) Q(i) 1  ∗ )P (i) + ∗ )Q(i), (log (log 2 M (i) 2 M (i) i i

(3)

where M = 12 (P + Q). The value of JSD(P ||Q) tends to zero when P and Q have identical distributions and has a very high value when the distributions are very different. To include a measure of distance between multiple regions, one or more of the PMFs of one region is modified by the mean temperature of another region. The JSD between P − µ2 and Q − µ1 , where P is the PMF of the abnormal region on the malignant side, Q is the PMF of the normal region on the malignant side, µ1 is the mean of the contra-lateral side abnormal region and µ2 is the mean of the contra-lateral side normal region, is taken as a feature. In case of absence of an abnormal region on the contralateral side, µ1 is taken to be equal to µ2 . A subtraction of the contralateral region means corresponds to a relative increase in the heat with respect to the contralateral regions. For HR− tumors, there may be no abnormal regions on the contra-lateral side, due to which this JSD will be higher. Relative Hotness to the Mirror Site. HR+ tumors have a lower temperature difference between the tumor site and the mirror tumor site on the contra-lateral side. To capture this, we use the mean squared distance between the temperature of the malignant side abnormal region pixels and the mean temperature of the contra-lateral side abnormal region, as defined in Eq. 4. RH =

1  ||T (x, y) − µ||2 |A| x∈A y∈A

(4)

640

S.T. Kakileti et al.

(a)

(b)

(c)

(d)

Fig. 1. Shows subjects with malignant tumors having a. ER+/PR+ status b. ER− /PR− status c. ER+/PR+ status with asymmetrical thermal response d. ER−/PR− status with some symmetrical thermal response

where T (x, y) represents temperature of the malignant side abnormal region pixels at location (x, y) in the image, µ represents mean temperature of the contralateral side abnormal region and |A| represents the cardinality of abnormal region A on the malignant side. This value is lower for HR+ tumors compared to HR− tumors, as hormone sensitive tissues will be present on both sides. As shown in Fig. 1a and b, we see thermal responses on both sides for HR+ tumors and no thermal response on the normal breast for HR− tumors. However, there might be outliers like Fig. 1c and d. Thermal Distribution Ratio. In addition to the temperature change, the areas of the abnormal regions on both sides are also considered as features. We used the ratio of areas of abnormal regions on the contralateral side to the malignant side. This value tends to be zero for HR− tumors, as there may be no abnormal region on the contralateral side, and is higher for HR+ tumors. 3.2

Entire ROI Features

Textural features are used here to extract the features from the entire ROI. However, instead of using the original temperature map of the ROI, a modified temperature map is used. The thermal map formed by subtracting the malignant side ROI with the contra-lateral side mean temperature, i.e. the relative temperature from the contralateral side, is used to determine the textural features. The

Automatic Determination of Hormone Receptor Status in Breast Cancer

641

Run Length Matrix (RLM) is computed from the thermal map, after quantizing the temperature into l bins. Gray level non-uniformity and Energy features from the RLM are computed, as mentioned in [7]. The non-uniformity feature would be higher for HR− tumors as their tumors have more focal temperatures.

4

Dataset Description

We obtained an anonymized dataset of 56 subjects with biopsy confirmed breast cancer with age varying from 27 to 76 years through our collaboration with Manipal University. The FLIR E60 camera with a spatial resolution of 320 × 240 pixels is used to capture the initial 20 subjects and a high-resolution FLIR T650Sc camera with an image resolution of 640 × 480 pixels is used for the remaining subjects. A video is captured for each subject, and the acquisition protocol involved asking the subject to rotate from right lateral to left lateral views. The data for each subject included the mammography, sono-mammography, biopsy reports, the ER/PR status values, with surgery reports and HER2 Neu status values, where available of the tumors. From this data, there are 32 subjects with HR+ malignant tumors and rest of them have HR− tumors.

5

Classification Results

From the obtained videos, we manually selected five frames that correspond to frontal, right & left oblique and lateral views, and manually cropped the ROIs in these. Consideration of multiple views helps in better tumor detection since it might not be seen in a fixed view. From these multiple views, the view corresponding to maximum abnormal region area with respect to the ROI area is considered as the best view. This best view along with its contra-lateral side view is used to calculate the features from the abnormal regions and the entire ROI as mentioned in Sect. 3. The training set and testing set comprise of a randomly chosen subset of 26 and 30 subjects, respectively, with an internal division of 14 HR+ & 12 HR− and 18 HR+ & 12 HR− tumors, respectively. The abnormal region is located using ρ = 0.2, τ = 3◦ C using the AN D decision rule, to optimize for the accuracy in classification. All 11 deep tumors of size 0.9 cm and above have been detected in this dataset. The bin width of the PMFs used is 0.5◦ C. The step size of the temperature bins in the RLM computation is 0.25◦ C. A two-class Random Forest ensemble classifier is trained using the features obtained. The Random Forest (RF) randomly chooses a training sub-set & a feature sub-set for training a decision tree, and combines the decisions from multiple such trees to get more accuracy in classification. The mode of all trees is taken as the final classification decision. RFs with increasing number of trees have a lower standard deviation in the accuracies over multiple iterations. The standard deviation in (HR−, HR+) accuracies of the RFs using all features with 5, 25 and 100 trees over 20 iterations is (9.1 %, 11.1 %), (6.4 %, 4.8 %), (2.5 %, 2.0 %), respectively, and hence a large number of 100 trees is chosen. Table 1 shows the

642

S.T. Kakileti et al.

max. accuracies over 20 iterations of RFs with 100 trees using individual and combined features proposed in our approach. We tested with different textural features obtained from both RLM and Gray Level Co-occurence Matrix, but we found out that gray-level non-uniformity from the RLM is having better accuracy than others. Using an optimal combined set of region based features and textural features, we obtained an accuracy of 82 % and 79 % in classification of HR+ and HR− tumors respectively. Table 1. Accuracies with different features obtained using our approach Feature set

Features

HR−Accuracy HR+Accuracy

Distance between regions Abnormal Region Relative Hotness Features Thermal Distribution Ratio Combination of above three features

74 % 79 % 63 % 84 %

56 % 73 % 27 % 73 %

Entire ROI Features

Gray-level non-uniformity

68 %

64 %

Overall features

Combination of features from 79 % abnormal and entire ROI regions

82 %

From Table 1, it is clear that Abnormal Region features plays an important role compared to textural features. Among these abnormal region features, features corresponding to relative temperatures, i.e., Relative Hotness and Distance Between Regions, have an important role in the classification of HR+ and HR− tumors, thus validating the findings of [12].

6

Conclusions and Future Work

We have come up with a novel application to automatically classify breast cancer tumors into HR+ tumors from HR− tumors using thermography with a reasonably good accuracy of around 80 %. This is a first approach through image processing features and machine learning algorithms for such automatic classification. This also presents an advantage to thermography over other imaging modalities in estimating prognosis and treatment planning of breast cancer without invasive surgery. In future work, we will test our algorithm on larger datasets with more variation in data and modify the algorithm to detect sub classes within HR+ tumors. Additionally, we will try to determine the role of Ki-67 status in thermography to refine the automatic classification. Acknowledgement. We thank Manipal University and Dr. L. Ramachandra, Dr. S. S. Prasad and Dr. Vijayakumar for sharing the data and assisting us in thermographic image interpretation.

Automatic Determination of Hormone Receptor Status in Breast Cancer

643

References 1. Fitzmaurice, C., et al.: The global burden of cancer 2013. JAMA Oncol. 1(4), 505–527 (2015) 2. Parise, C.A., Caggiano, V.: Breast cancer survival defined by the er/pr/her2 subtypes and a surrogate classification according to tumor grade and immunohistochemical biomarkers. J. Cancer Epidemiol. 2014, 11 p. (2014). Article ID 469251 3. Alba, E., et al.: Chemotherapy (CT) and hormonotherapy (HT) as neoadjuvant treatment in luminal breast cancer patients: results from the GEICAM/2006-03, a multicenter, randomized, phase-ii study. Ann. Oncol. 23(12), 3069–3074 (2012) 4. Cheang, M., Chia, S.K., Voduc, D., et al.: Ki67 index, HER2 status, and prognosis of patients with luminal B breast cancer. J. Nat. Cancer Inst. 101(10), 736–750 (2009) 5. Keyserlingk, J., Ahlgren, P., Yu, E., Belliveau, N., Yassa, M.: Functional infrared imaging of the breast. Eng. Med. Biol. Mag. 19(3), 30–41 (2000) 6. Kennedy, D.A., Lee, T., Seely, D.: A comparative review of thermography as a breast cancer screening technique. Integr. Cancer Ther. 8(1), 9–16 (2009) 7. Acharya, U.R., Ng, E., Tan, J.H., Sree, S.V.: Thermography based breast cancer detection using texture features and support vector machine. J. Med. Syst. 36(3), 1503–1510 (2012) 8. Borchartt, T.B., Conci, A., Lima, R.C., Resmini, R., Sanchez, A.: Breast thermography from an image processing viewpoint: a survey. Signal Process. 93(10), 2785–2803 (2013) 9. Gautherie, M.: Thermobiological assessment of benign and malignant breast diseases. Am. J. Obstet. Gynecol. 147(8), 861–869 (1983) 10. Vakkala, M., Kahlos, K., Lakari, E., Paakko, P., Kinnula, V., Soini, Y.: Inducible nitric oxide synthase expression, apoptosis, and angiogenesis in in-situ and invasive breast carcinomas. Clin. Cancer Res. 6(6), 2408–4216 (2000) 11. Chaudhury, B., et al.: New method for predicting estrogen receptor status utilizing breast mri texture kinetic analysis. In: Proceedings of the SPIE Medical Imaging (2014) 12. Zore, Z., Boras, I., Stanec, M., Oresic, T., Zore, I.F.: Influence of hormonal status on thermography findings in breast cancer. Acta Clin. Croat. 52, 35–42 (2013) 13. Madhu, H., Kakileti, S.T., Venkataramani, K., Jabbireddy, S.: Extraction of medically interpretable features for classification of malignancy in breast thermography. In: 38th Annual IEEE International Conference on Engineering in Medicine and Biology Society (EMBC) (2016) 14. Urruticoechea, A.: Proliferation marker ki-67 in early breast cancer. J. Clin. Oncol. 23(28), 7212–7220 (2005) 15. Ganong, W.F.: Review of Medical Physiology. McGraw-Hill Medical, New York (2005) 16. Venkataramani, K., Mestha, L.K., Ramachandra, L., Prasad, S., Kumar, V., Raja, P.J.: Semi-automated breast cancer tumor detection with thermographic video imaging. In: 37th Annual International Conference on Engineering in Medicine and Biology Society, pp. 2022–2025 (2015)

Prostate Cancer: Improved Tissue Characterization by Temporal Modeling of Radio-Frequency Ultrasound Echo Data Layan Nahlawi1(B) , Farhad Imani2 , Mena Gaed4 , Jose A. Gomez4 , Madeleine Moussa4 , Eli Gibson3 , Aaron Fenster4 , Aaron D. Ward4 , Purang Abolmaesumi2 , Hagit Shatkay1,5 , and Parvin Mousavi1 1

3

School of Computing, Queen’s University, Kingston, Canada [email protected] 2 Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, Canada Centre for Medical Image Computing, University College London, London, UK 4 Department of Medical Biophysics, Pathology and Robarts Institute, Western University, London, Canada 5 Department of Computer and Information Sciences, University of Delaware, Newark, USA

Abstract. Despite recent advances in clinical oncology, prostate cancer remains a major health concern in men, where current detection techniques still lead to both over- and under-diagnosis. More accurate prediction and detection of prostate cancer can improve disease management and treatment outcome. Temporal ultrasound is a promising imaging approach that can help identify tissue-specific patterns in time-series of ultrasound data and, in turn, differentiate between benign and malignant tissues. We propose a probabilistic-temporal framework, based on hidden Markov models, for modeling ultrasound time-series data obtained from prostate cancer patients. Our results show improved prediction of malignancy compared to previously reported results, where we identify cancerous regions with over 88 % accuracy. As our models directly represent temporal aspects of the data, we expect our method to be applicable to other types of cancer in which temporal-ultrasound can be captured.

1

Introduction

Prostate cancer is the most widely diagnosed form of cancer in men [1]. The American Cancer Society predicts that one in seven men will be diagnosed with prostate cancer during their lifetime. Initial assessment includes measuring Prostate Specific Antigen level in blood serum and digital rectal examination. If either test is abnormal, core needle biopsy is performed under Trans-Rectal Ultrasound (TRUS) guidance. Disease prognosis and treatment decisions are then based on H. Shatkay and P. Mousavi—These authors have contributed equally to the manuscript. c Springer International Publishing AG 2016  S. Ourselin et al. (Eds.): MICCAI 2016, Part I, LNCS 9900, pp. 644–652, 2016. DOI: 10.1007/978-3-319-46720-7 75

Prostate Cancer: Improved Tissue Characterization by Temporal Modeling

645

grading, i.e., assessing the degree of cancer-aggressiveness in the biopsy cores. TRUS-guided biopsy often leads to a high rate (∼30 %) of false negatives for cancer diagnosis as well as to over- and under-estimation of the cancer grade. Extensive heterogeneity in morphology and pathology of prostate adenocarcinoma are additional challenging factors for making an accurate diagnosis. While improved prostate-cancer screening has reduced mortality rates by 45 % over the past two decades [3], inaccurate diagnosis and grading have resulted in a surge in over-treatment. Notably, radical over-aggressive treatment of prostate-cancer patients leads to a decline in their quality of life. For indolent prostate cancer, such aggressive treatment should be avoided as active surveillance has proven to be an effective disease management course [13]. Accurate identification and grading of lesions and their extent – especially using low cost, readily accessible technology such as ultrasound – can, therefore, significantly contribute to appropriate effective treatment. To achieve this, methods must be developed to guide TRUS biopsies to target regions likely to be malignant. The task of differentiating malignant tissue from its surrounding tissue is referred to in the literature as tissue-typing or characterization. In this paper we propose a new method that utilizes ultrasound time-series data to characterize malignant vs. benign tissue obtained from prostate-cancer patients. Most of the research on ultrasound-based tissue characterization focuses on analysis of texture- [5] and spectral-features [4] within single ultrasound frames. Elastography [8], another ultrasound technique, aims to distinguish tissue types based on their measured stiffness in response to external vibrations. A different way of utilizing ultrasound is by acquiring radio-frequency (rf) time series, which is a sequence of ultrasound frames captured from sonication of tissue over time, without moving the tissue or the ultrasound probe. Frequency domain analysis of rf time series has shown promising results for tissue characterization in breast and prostate cancer. Moradi et al. [10] used the fractal dimension of rf time series as features and employed Bayesian and neural network classifiers for exvivo characterization of prostate tissue. More recently, Imani et al. [7] combined wavelet features and mean central frequency of rf time-series to characterize in-vivo prostate tissue using SVMs. Neither of these lines of work have explicitly modeled the temporal aspect of the time-series data and utilized it for tissue characterization. In a recent study [11] we have suggested that the temporal aspect of the data may carry useful information if directly captured. Here we carry the idea forward, presenting a new method for analyzing rf time series, using a probabilistic temporal model, namely, a hidden Markov model, hmm [12], specifically representing and making use of the temporal aspect of the data. We apply the method to differentiate between malignant and benign prostate tissue and demonstrate its utility, showing an improved performance compared to previous methods. Probabilistic temporal modeling, and hmms in particular, have been applied to a wide range of clinical data. They are typically used to model a time-dependent physiological process (e.g. heartbeats [2]), or the progression of disease-risk over time [6]. hmms are also used within and outside the biomedical domain to model sequence-data such as text [9], proteins, DNA

646

L. Nahlawi et al.

sequences and others. Here we use them to model rf time series where time does have an impact on the ultrasound data being recorded. We next describe our rf time-series data and its representation, followed by a tissue-characterization framework. We then present experiments and results demonstrating the effectiveness of the method.

2

RF Time Series Data

rf time series record tissue-response to prolonged sonication. These responses consist of reflected ultrasound echo intensity values. Figure 1 shows ultrasound image-frames collected from prostate sonication over time (each such frame is referred to as an rfframe). The boundary of the prostate is encircled in white. The solid red dots indicate the same location within the prostate over time, while the dotted blue arrows point to the corresponding echo intensity values. The sequence of echo intensities obtained from the same point in the prostate over time makes up an rf time series (bottom right of the Figure). Due to the scattering phenomenon in ultrasound imaging, very small objects such as individual cells cannot be identified using single rf values. As such, we partition each rf frame using a grid into small regions, where each window in the grid is referred to as a Region of Interest ( roi), and comprises multiple rf values. In this work, we use the same dataset as Imani et al. [7], and adopt the same roi size 1.7 × 1.7 mm2 , which corresponds to 44 × 2 rf values. The 88 rf values within each grid-window in a single frame recorded at time t, are averaged to produce a single value representing each roi at the corresponding time-point t.

Fig. 1. Ultrasound rf-frames collected from a prostate-cancer patient over time. Solid red dots indicate the same location across multiple frames. The time series for this location is shown at the bottom right. A grid dividing each frame into rois is shown on the left-most frame. Pathology labels for malignant/benign rois are also shown.

The image data consists of in-vivo rf frames gathered from 9 prostate-cancer patients who have undergone radical prostatectomy1 . Prior to the surgery, 128 1

The study was approved by the institutional research ethics board and the patients provided consent to participate.

Prostate Cancer: Improved Tissue Characterization by Temporal Modeling

647

rf frames recorded over a time-period of 4 s, were gathered from each patient. A grid is overlaid on each of the frames, rois are obtained as described above, and for each roi, Rk , a 128-long time series Rk = Rk1 ,. . ., Rk128 is created. Each point Rkt in the series corresponds to the average intensity of that roi in the rf-frame recorded at time t, where 1 ≤ t ≤ 128. While the number of patients is relatively low, the total number of rois per patient is quite high (see Table 1), thus providing sufficient data to support effective model-learning. As commonly done in time series analysis, we map the series associated with each roi, Rk , to its first-order difference series, i.e. the sequence of differences between pairs of consecutive time-points. To simplify the modeling task, we further discretize the difference series, by placing the values into 10 equallyspaced bins, where the values in the lowest bin are all mapped to 1, and those at the top-most bin are mapped to 10. We denote the sequence obtained by discretizing Rk , as Ok1 , ..., Ok127 . Our experiments suggest that 10 bins are sufficient for obtaining good classification performance. To create a gold-standard of labeled malignant vs benign regions we use whole-mount histopathology information. To obtain such information, following prostatectomy, the tissues are formalin-fixed and imaged using mri. The tissues are then cut into ∼4 mm slices, and further processed to enable high resolution microscopy. Two clinicians then assign (in consensus) the appropriate labels to the malignant and to the benign areas within each slice. A multi-step rigorous registration process, in which mri images are used as an intermediate step, is employed to overlay the labeled histopathology images on the in-vivo ultrasound frames (see [7] for additional details). This registration process results in an assignment of a pathology label to each roi, indicating whether it is malignant or benign. Figure 1 shows several examples of such labeled rois. We use the same 570 labeled rois as in [7], of which 286 are malignant and 284 benign. Table 1 summarizes the data. The rf time-series associated with the labeled rois are used as training and test data for building a probabilistic model for distinguishing between benign and malignant tissues, as described in the next section. Table 1. The distribution of malignant and benign rois over the 9 patients. Patient

P1 P2 P3 P4 P5 P6 P7 P8 P9 Total

Malignant rois 42 29 18 64 35 28 23 30 17 286 Benign rois

3

42 29 18 61 35 29 23 30 17 284

Probabilistic Modeling Using Hidden Markov Models

hmms are often used to model time series where the generating process is unknown or prone to variation and noise. The process is viewed as a sequence of stochastic transitions between unobservable (hidden) states; some aspects of each state are observed and recorded. As such, the states may be estimated from the observation-sequence [12]. A simplifying assumption underlying the

648

L. Nahlawi et al.

use of these models is the Markov property, namely, that the state at a given time-point depends only on the state at the preceding point, conditionally independent of all other time points. In this work we view a tissue response value recorded in an rf frame and discretized as discussed above, as an observation; employing the Markov property, we assume each such value depends only on the response recorded at the frame directly preceding it, independent of any earlier responses. Formally, an hmm λ consists of five components: A set of N states, S = {s1 , . . . , sN }; a set of M observation symbols, V = {v1 , . . . , vM }; an N × N stochastic matrix A governing the state-transition probability, where Aij = P r(statet+1 = si |statet = sj ), 1 ≤ i, j ≤ N , and statet is the state at time t; an N × M stochastic-emission matrix B, where Bik = P r(obt = vk |statet = si ), 1 ≤ i ≤ N, 1 ≤ k ≤ M , denoting the probability of observing vk at state si ; an N -dimensional stochastic vector π, where for each state si , πi = P r(state1 = si ), denotes the probability to start the process at state si . Learning a model λ from a sequence of observations O = o1 , o2 , . . . , o127 , amounts to estimating the model parameters (namely, A, B & π), to maximize log[P r(O|λ)], i.e. the observations’ probability given the model λ. In practice, π is fixed such that π1 = P r(state1 = s1 ) = 1 & πj = 0 for j = 1, i.e. s1 is always the first state. In the experiments reported here, we also fix the matrix A to an initial estimate based on clustering (as described below), while the matrix B is learned using the Baum-Welch algorithm [12]. The hmms we develop, as illustrated in Fig. 2, are ergodic models consisting of 5 states and 10 observations. A small number of states allows for a computationally efficient model while typically leading to good generalization beyond the training set. We determined the number of states by experimenting with 2–6 state models (and a few larger ones with >10 states). The classification performance of 5-state models was higher than that of others. Moreover, each of the 5 states is associated with a distinct emission probability distribution, which is not the case when using additional/fewer states. The observation set, as discussed in Sect. 2, consists of 10 observation symbols v1 , ..., v10 , each of which corresponds to a discretized interval of first-order difference values of the rf time-series.

Fig. 2. Example of hmms learned from (A) malignant rois, and (B) benign rois. Nodes represent states. Edges are labeled by transition probabilities; Emission probabilities are shown to the right of each model. Edges with probability